Replication Fork Polarity Gradients Revealed by Megabase-Sized U-Shaped Replication Timing Domains in Human Cell Lines

In higher eukaryotes, replication program specification in different cell types remains to be fully understood. We show for seven human cell lines that about half of the genome is divided in domains that display a characteristic U-shaped replication timing profile with early initiation zones at borders and late replication at centers. Significant overlap is observed between U-domains of different cell lines and also with germline replication domains exhibiting a N-shaped nucleotide compositional skew. From the demonstration that the average fork polarity is directly reflected by both the compositional skew and the derivative of the replication timing profile, we argue that the fact that this derivative displays a N-shape in U-domains sustains the existence of large-scale gradients of replication fork polarity in somatic and germline cells. Analysis of chromatin interaction (Hi-C) and chromatin marker data reveals that U-domains correspond to high-order chromatin structural units. We discuss possible models for replication origin activation within U/N-domains. The compartmentalization of the genome into replication U/N-domains provides new insights on the organization of the replication program in the human genome.


Introduction
Comprehensive knowledge of genetic inheritance at different development stages relies on elucidating the mechanisms that regulate the DNA spatio-temporal replication program and its possible conservation during evolution [1]. In multi-cellular organisms, there is no clear consensus sequence where initiation may occur [2,3]. Instead epigenetic mechanisms may take part in the spatial and temporal control of replication initiation in higher eukaryotes in relation with gene expression [4][5][6][7][8][9]. For many years, understanding the determinants that specify replication origins has been hampered by the small number (approximately 30) of wellestablished replication origins in the human genome and more generally in mammalian genomes [1,7,10]. Recently, nascent DNA strands synthesized at origins were purified by various methods [11][12][13][14] to map a few hundreds putative origins in 1% of the human genome. For unclear reasons, the concordance between the different studies is very low (from v5% to v25%) [12][13][14][15]. In a completely different approach to map replication origins, previous in silico analyses of the nucleotide compositional skew S~(T{A)=(TzA)z(G{C)=(GzC) of the human genome showed that the sign of S abruptly changed from ({) to (z) when crossing known replication initiation sites. This allowed us to predict putative origins at more than a thousand sites of S sign inversion (S-jumps) along the human genome [16,17]. Further analyses of S patterns identified 663 megabase-sized Ndomains whose skew profile displays a N-like shape (Fig. 1A), with two abrupt S-jumps bordering a DNA segment whose skew linearly decreases between the two jumps [16][17][18][19][20][21]. Skew Ndomains have a mean length of 1:2+0:6 Mb and cover 29.2% of the human genome. The initiation zones predicted at N-domains borders would be specified by an open chromatin structure favorable to early replication initiation and permissive to transcription [21,22]. The determination of HeLa replication timing profile [23] and the analysis of available timing profiles in several human cell lines [24][25][26] allowed us to confirm that significant numbers of N-domains borders harbor early initiation zones active in germline as well as in somatic cell types [18,27].
Recent studies have shown that replication induces different mutation rates on the leading and lagging replicating strands [27]. This asymmetry of rates acting during evolution has generated the skew upward jumps that result from inversion of replication fork polarity at N-domain extremities. The skew profile along Ndomains would result from superimposed effects of transcription and of replication [19,20,[28][29][30][31]. Accordingly, the linear decrease of the skew (Fig. 1A) may reflect a decrease in the proportion of replicating forks propagating from the left (59) to the right (39) Ndomain extremity. This organization of replication in a large proportion of the genome contrasts with the previously proposed segmentation of mammalian chromosomes in regions replicated either by multiple synchronous origins with equal proportion of forks coming from both directions (0.2-2.0 Mb Constant Timing Regions) or by unidirectional replication forks (0.1-0.6 Mb Transition Timing Regions) [25,[32][33][34].
Here, to determine the existence of a new type of replication domains presenting gradients of replication fork polarity, we establish (i) that the replication fork polarity and the compositional skew are proportional to each other, (ii) that the replication fork polarity can be directly extracted from the derivative of the replication timing profile. Taking advantage of replication timing profiles in several human cell types [23,26], we show that the derivative of the replication timing profile of N-domains is shaped as a N. The corresponding U-shape of the replication timing profile is not specific to the germline but is generally observed in all replication timing profiles examined, thus establishing these ''Udomains'' as a new type of replication domains, consistent with the recent experimental observation of multiple replication initiations in most Transition Timing Regions in several human cell lines [35]. As observed with the early initiation zones bordering Ndomain extremities, those specific to the U-domains are significantly enriched in open chromatin markers as well as insulator-binding proteins CTCF [36,37] and are prone to gene activity. Analysis of recent Hi-C data [38] reveals that U-domains correspond to self-interacting structural chromatin units. These data make a compelling case that the ''islands'' of open chromatin observed at U-domains borders are at the heart of a compartmentalization of chromosomes into chromatin units of independent replication and of coordinated gene transcription.

Results/Discussion
Linking replication fork polarity to nucleotide compositional skew profile and replication timing To establish the existence of replication domains associated with replication fork polarity gradients, we first demonstrate the relations between replication fork polarity, nucleotide compositional skew and derivative of the replication timing profile. Under appropriate hypotheses, the skew S resulting from mutational asymmetries associated with replication is proportional to the fork polarity p(x)~p (z) (x){p ({) (x) at position x on the sequence (Material and Methods): where p (z) (x) (resp. p ({) (x)) is the proportion of forks replicating in the 5'?3' (resp. 3'?5') direction on the Watson strand. The linear decrease of S in N-domains from positive (5' end) to negative (3' end) values thus likely reflects a linear decrease of the replication fork polarity with a change of sign in the middle of the N-domains. This result strongly supports the interpretation of Ndomains ( Fig. 1A-C) as the signature of a higher-order organization of replication origins in germline cells. The replication fork polarity can also be directly deduced from replication timing data under the central hypotheses that the replication fork velocity v is constant and that replication is bidirectional from each origin. Note that recent DNA combing experiments in HeLa cells have shown that replication fork velocity does not significantly vary during S phase which strongly supports the former hypothesis [35]. We demonstrate that the replication fork polarity p(x) is the product of the derivative of the mean replication timing (MRT) and the replication fork velocity v (Material and Methods): The fork polarity should therefore provide a direct link between the skew S and the derivative of the replication timing profile in germline cells. To test this relationship, we used a substitute for germline MRT, the replication timing profiles of seven somatic cell lines (one embryonic stem cell, three lymphoblastoid, a fibroblast, an erythroid and HeLa cell lines) (Material and Methods). We first correlated the skew S with dMRT=dx, in the BG02 embryonic stem cells, over the 22 human autosomes (Fig. 1D) (Table 1). These correlations are as strong as those obtained between the dMRT=dx profiles in different cell lines (Supplementary Table  S1), as well as those previously reported between the replication timing data themselves [26,34,39]. The correlations between S and dMRT=dx are even stronger when focusing on the 663 skew N-domains (Table 1). The correlations obtained in intergenic regions (R~0:45+0:06) are recovered to a large extent in genic regions (R~0:34+0:03) where the transcription-associated skew S T was hypothesized to superimpose to the replication-associated skew S R [18][19][20]. Further evidence of this link between S and dMRT=dx was obtained when averaging, for the different cell

Author Summary
DNA replication in human cells requires the parallel progression along the genome of thousands of replication machineries. Comprehensive knowledge of genetic inheritance at different development stages relies on elucidating the mechanisms that regulate the location and progression of these machineries throughout the duration of the DNA synthetic phase of the cell cycle. Here, we determine in multiple human cell types the existence of a new type of megabase-sized replication domains across which the average orientation of the replication machinery changes in a linear manner. These domains are revealed in 7 somatic cell types by a U-shaped pattern in the replication timing profiles as well as by N-shaped patterns in the DNA compositional asymmetry profile reflecting the existence of a replication-associated mutational asymmetry in the germline. These domains therefore correspond to a robust mode of replication across cell types and during evolution. Using genome-wide data on the frequency of interaction of distant chromatin segments in two cell lines, we find that these U/N-replication domains remarkably correspond to self-interacting folding units of the chromatin fiber.
lines, the dMRT=dx profiles inside the 663 skew N-domains after rescaling their length to unity (Fig. 1E). These mean profiles are shaped as a N, suggesting that some properties of the germline replication program associated with the pattern of replication fork polarity are shared by somatic cells.
Replication timing U-domains are robustly observed in human cell lines According to Equations (1) and (2), the integration of the skew S is expected to generate a profile rather similar to the replication timing profile. In segments of linearly changing skew, the integrated S function is thus expected to show a parabolic profile. The integrated S function when estimated by the cumulative skew S (Fig. 1B) along N-domains of a 11.4 Mb long fragment of human chromosome 10, indeed displays a U-shaped (parabolic) profile likely corresponding the replication timing profile in the germline. Remarkably, the 6 N-domains effectively correspond to successive genome regions where the MRT in the BG02 embryonic stem cells is U-shaped (Fig. 1C). The 7 putative initiation zones (O 1 to O 7 ) corresponding to upward S-jumps (Fig. 1A), co-locate (up to the *100 kb resolution) with MRT local extrema which supports that they are highly active in BG02. These initiation zones can present cell specificity as exemplified by the putative replication origin O 5 which is inactive (or late) in both the K562 erythroid and GM06990 lymphoblastoid cell lines (Fig. 1C) resulting in domain ''consolidation'' [40]. Two neighboring U-domains (½O 4 ,O 5 and ½O 5 ,O 6 ) in BG02 merged into a larger U-domain in the K562 and GM06990 cell lines. Note that the other 3 N-domains (½O 1 ,O 2 ,½O 2 ,O 3 , and ½O 6 ,O 7 ) are replication timing U-domains common to BG02, K562 and GM06990. To detect U-domains in replication timing profiles at genome scale, we developed a wavelet-based method (Material and Methods, and Supplementary Text S1) which allowed us to identify in the 7 human cell lines from 664 (TL010) up to 1534 (BG02) U-domains of mean size ranging from 0.966 Mb (HeLa R2) up to 1.62 Mb (TL010) and covering from 39.6% (TL010) to 61.9% (BG02) of the genome ( Table 2). For each cell line, the average MRT profile of U-domains has an expected parabolic shape ( Fig. 2A) representative of individual U-domains ( Fig. 2C and Supplementary Figs. S1A-S9A). Inside the U-domains, the derivative dMRT=dx is N-shaped ( Fig. 2D and Supplementary Figs. S1B-S9B) like the skew profile inside N-domains (Supplementary Figs. S1F-S9F). When rescaling the size of each Udomains to unity for a given cell line, these profiles superimpose onto a common N-shaped curve well approximated by the average dMRT=dx profile (Fig. 2B).
To determine the amounts of U-domains conserved in different cell types, we computed for each cell type pair the mutual covering of the corresponding sets of U-domains (two U-domains are shared by two different cell lines if each domain covers more than 80% of the other domain (Table 3)). Taking as reference the matching obtained for the two BJ (68.6% and 74.3%) and HeLa (51.8% and 54.6%) cell replicates, the matchings between the other cell lines were statistically significant and comparable (from 40% to 65% for the mutual covering of lymphoblastoid cell lines). The number of U-domain shared by cell type pairs were all significantly larger than the number expected by chance (Pv10 {3 , Supplementary  Table S3). This corresponds to a significant proportion (*20%) of the U-domains of the individual cell lines (Table 3), as compared to the matchings ( * ; 5%) expected by chance (Supplementary Table S4). A significant percentage of N-domains correspond to U-domains (e.g. from 12.5% in BJ R1 up to 23.7% in BG02). This explains that when representing the MRT profile of BG02 instead of the skew S, along the set of N-domains ordered according to their size, we can recognize the edges of many N-domains (Supplementary Figs. S1D-S9D). The same observation can be made when comparing the dMRT=dx profiles (Supplementary Figs. S1E-S9E) to the corresponding skew profiles ( Supplementary Fig. S1F). Note that the N-domains match only 7{14% of the U-domains of various cell lines due to the very stringent N-domain selection criteria [19,20] that yielded only 663 N-domains (29.2% of the genome) as compared to much larger U-domain numbers (*50% of the genome; Table 2). Replication timing U-domains are robustly  ). This is also true for the skew N-domains (50.2%) that likely correspond to replication timing U-domains in the germline. However about half of the genome that is covered by U-domains corresponds to regions of high replication timing plasticity where replication domains may (i) reorganize according to the so-called ''consolidation'' scenario (merging of two U-domains into a larger one) ( Fig. 1C), (ii) experience some boundary shift and (iii) emerge in a late replicating region as previously observed in the mouse genome during differentiation [40].

Replication timing U-domains borders are enriched in open chromatin markers
Genome-wide investigation of chromatin architecture has revealed that, at large scales (from 100 kb to 1 Mb), regions enriched in open chromatin fibers correlate with regions of high gene density [41]. Moreover there is a growing body of evidence that transcription factors are regulators of origin activation (reviewed in Kohzaki and Murakami 2005). We ask whether the remarkable genome organization observed around N-domain borders [19] is maintained around replication timing U-domain borders and to what extent it is mediated by a particular chromatin structure favorable to early replication origin specification [22].
When mapping DNase I sensitivity data (Material and Methods) [42] on the U-domains, we observed that the mean coverage is maximal at U-domain extremities and decreases significantly from the extremities to the center that is rather insensitive to DNase I cleavage ( Fig. 3A and Supplementary Fig. S10). This decrease, from values significantly higher than the genome-wide average value, extends over *150 kb, whatever the size of the replication timing U-domain ( Supplementary Fig. S11A-C) suggesting that, for all examined cell lines, early replicating U-domains borders are at the center of *300 kb wide open chromatin regions. We observed a significant anti-correlation between DNase I cleavage sensitivity data and replication timing data in BG02 (DNase H1-hESC: R~{0:55, Pv10 {16 ), K562 (R~{0:63, Pv10 {16 ) and GM06990 (R~{0:57, Pv10 {16 ) cell lines as well as in the other four cell lines (data not shown; note that this was still observed when controlling for the GC content). This is further supported by open over input chromatin ratio data obtained from human lymphoblastoid cells [41]. We observed that the regions presenting an open/input ratio w1:5 also decreased significantly (3-fold) from U-domain borders to centers (Fig. 3B).
Cytosine DNA methylation is a mediator of gene silencing in repressed heterochromatic regions, while in potentially active open chromatin regions, DNA is essentially unmethylated [43].
DNA methylation is continuously distributed over mammalian chromosomes with the notable exception of CpG islands (CGIs) and in turn of certain CpG rich promoters and transcription start sites (TSSs). Along the observation that the hypomethylation level of CGIs extends to about 1 kb in flanking regions, we used 1 kbenlarged CGI coverage as an hypomethylation marker (Material and Methods) [22]. When averaging over the U-domains detected in BG02, we robustly observed a maximum of CGI coverage at U-domain borders as the signature of hypomethylation and a decrease over a characteristic distance of *150 kb (Fig. 3C), similar to what we found for DNase I sensitivity coverage (Fig. 3A). This contrasts with the GC-content profile that strongly depends on the U-domain size and decreases very slowly toward the U-domain center without exhibiting any characteristic scale ( Supplementary Fig. S11D-F). These observations are consistent with the hypothesis that early replication origins at U-domain borders are associated with CGIs that are possibly protected from methylation by colocalization with replication origins [44].
Open chromatin markers have been associated with genes. For example 16% of all DNase I hypersensitive sites (HS) are in the first exon or at the TSS of a gene and 42% are found inside a gene [45]. Also, more than 90% of broadly expressed housekeeping genes have a CpG-rich promoter [46]. Remarkably, the mean profiles of Pol II binding Chip-Seq tag density (Material and Methods) along U-domains detected in BG02, K562 and GM06990 cell lines strongly decay over *150 kb away from Udomain borders (Fig. 3D). This indicates that, whatever the cell line, the open chromatin regions around replication U-domains are prone to transcription whereas U-domain central regions appear, on average, transcriptionally silent. Importantly, we have reproduced the analyses of open chromatin markers near U-domain borders that do not match with a N-domain border (at 100 kb resolution) and confirmed that the results reported in Fig. 3 apply to the initiation zones at Udomains borders of every cell line ( Supplementary Fig. S12).

Replication timing U-domains are insulated compartments of genome-wide chromatin interactions (Hi-C)
It is widely recognized that the 3D chromatin tertiary structure provides some understanding to the experimental observation of the so-called replicon and replication foci [2,47]. In particular, replicon size, which is dictated by the spacing between active origins, correlates with the length of chromatin loops [8,47,48]. The chromosome conformation capture technique [38] has provided access to long-range chromatin interactions as a footprint of the different levels of chromatin folding in relation with gene activity and the functional state of the cell. From a comparative analysis of   . We found that these four U-domains remarkably correspond to four matrix squareblocks of enriched interactions (Fig. 4A). Hence, we recover that early replicating zones that border a U-domain (e.g. O 4 and O 6 separated by 3.9 Mb), have a high contact probability as the signature of 3D spatial proximity. However, we also observe a high contact probability of the two early replicating borders with the late replicating U-domain center and interactions appear sparse for loci in separate U-domains (e.g. O 1 and O 3 separated by 3.6 Mb). Further examination of the average behavior of intrachromosomal contact probability as a function of genomic distance for the complete genome corroborates these observations. We found that the mean number of interactions between two 100 kb loci of the same U-domain decays when increasing their distance as observed genome-wide (Fig. 4B). Importantly, the mean number of pairwise interactions is significantly higher inside the U-domains than genome-wide and this seems to depend on the U-domain length. In particular, we found that the smaller the domain, the higher the mean number of interactions which is probably a signature of a more open chromatin structure. When comparing the contact probability between two loci inside a U-domain or lying in neighboring U-domains (Fig. 4C), we observed that the latter is higher than the former for distances smaller than the characteristic size (*300 kb) of the open chromatin structure at U-domain borders (Fig. 3). Above this characteristic distance, the tendency is reversed and the ratio increases up to 2 for distances *1:8 Mb (Fig. 4C). These data suggest that the segmentation of the genome into replication timing U-domains corresponds to some spatial compartmentalization into self-interacting structural chromatin units insulated by two boundaries of open, accessible, actively transcribed chromatin. This conclusion is strengthened by the observation that U-domain borders are significantly enriched in the insulator binding protein CTCF (Fig. 5), that is known to be involved in chromatin loop formation conditioning communication between transcriptional regulatory elements [36,37,49,50]. Quantitatively similar results were obtained for the lymphoblastoid GM06990 cell line for which both replication timing and Hi-C data were available (Supplementary Fig. S13).

Perspectives
The mapping of open chromatin marks along U-domains revealed that they are bordered by early replication initiation zones likely specified by a *300 kb wide region of accessible, open chromatin permissive to transcription. Such a strong gradient of open chromatin environment was not observed around a large fraction of the 283 replication origins identified in ENCODE regions [12]; only 29% overlap a DNase I hypersensitivity site and half of them do not present open chromatin marks and are not associated with active transcription [22]. Furthermore, the typical inter-origin distance in human cells is 50-100 kb [12,48], a much smaller value than the mean U-domain size (1-1.5 Mb). These data can be reconciled in a model [51,52] where replication origins fire independently and their properties (intrinsic firing time probability, efficiency) are specified by the chromatin state: efficient early replicating origins in euchromatic regions (U-domains borders) and late replicating or less efficient origins in heterochromatic regions (U-domains centers). A more dynamical model can also be proposed in which replication first initiates at Udomain borders followed by a chromatin gradient-mediated succession of secondary origin activations. These origins may be remotely activated by the approach of a center-oriented fork that may stimulate initiation due to changes in DNA supercoiling in front of the fork or to association of chromatin remodelers or origin triggering factors with replication fork proteins [35]. This ''domino'' model could explain why replication progresses from U-domain borders much faster (3-5 times) than the known speed of single fork [8,35,48]. Indeed the U-shape of the replication timing profile indicates that the replication wave accelerates (effective velocity equals the inverse of the replication timing derivative, Equation (2)) as the signature of an increasing origin firing frequency during the S-phase [53]. It will be essential to determine to what extent the chromatin state influences fork progression and origins activations and whether outside of U-domains, the genome replicates according to a similar or completely different scenario.

Linking nucleotide compositional skew to replication fork polarity
We use the formalism of Markov processes to prove that replicationassociated asymmetries between the substitution rates of the two DNA strands induce, in the limit of small asymmetries, a nucleotide compositional skew proportional to the replication fork polarity (the average direction of a locus' replication). Models of DNA composition evolution are usually written in the form of an autonomous and homogeneous system of first-order differential equations [54]: where X (t)~t(T(t),A(t),G(t),C(t)) is the vector which represents the state of the system, i.e. for i[fT,A,G,Cg, X i (t) is the frequency of i at time t, and for i,j[fT,A,G,Cg, M ij is the substitution rate of j?i. A general and well-known property of a Markov process like Equation (3) is that X (t) tends exponentially towards the equilibrium value According to our previous studies of the skew S in mammalian genomes [16][17][18][19][20][29][30][31], we can reasonably suppose that replication and transcription are the main mechanisms responsible for where M 0 is a substitution rate matrix satisfying the no-strand bias conditions (M 0~ M M 0 ), M R is the substitution rate matrix associated with replication and p (z) (resp. p ({) ) the proportion of forks replicating the region of interest in the 5'?3' (resp. 3'?5') direction. M can be easily decomposed into a symmetric part: and an antisymmetric part: which turns out to be proportional to the fork polarity: Under the assumption that M a R is significantly smaller than M 0 zM s R , namely we can use perturbation theory to solve Equation (3) and to show that if the compositional skews: are initially null (S TA (t~0)~S GC (t~0)~0), then the total skew will be proportional to the fork polarity p at all times t up to terms of order e 2 (Equation (8)): where s(t) is a function that depends only on M 0 and M R . Using the mean nucleotide substitution rate matrix M R computed in the intergenic regions on each side (300 kb windows) of the S-upward jumps [27], the coefficients of M a R were found to be much smaller than those of M 0 zM s R with e*10 {1 {10 {2 (Supplementary Text S1). Thus, according to Equation (10), the observed linear decrease of the skew S in N-domains from positive (5' end) to negative (3' end) values likely reflects the progressive linear decrease of the replication fork polarity with a change of sign in the middle of the skew N-domains. These results provide strong support to the interpretation of skew N-domains (Fig. 1A) as independent replication units in germline cells.
Determining the replication fork polarity from replication timing data As previously pointed out in [52], the derivative of the replication timing profile does not provide a direct estimator of the replication fork velocity as it also depends on the fork polarity. Here, we demonstrate that the replication fork polarity can be directly K562 replication U-domains only, for four U-domain size categories: Lv0.8 Mb, 0.8 MbvLv1.2 Mb, 1.2 MbvLv1.8 Mb and 1.8 MbvLv3 Mb (from light to dark red). (C) Ratio of the number of interactions between two 100 kb loci inside the same U-domain at equal distance from its center and the number of interactions between loci on opposite sides and equal distance from a U-domain border, versus the distance between them; colors as in (B). doi:10.1371/journal.pcbi.1002443.g004 deduced from replication timing data under the central hypothesis that the replication fork speed v is constant and that replication is bidirectional from each origin. For a given cell cycle, let n be the number of activated origins, x 1 v Á Á Á vx n their positions along the genome and t 1 , Á Á Á ,t n their initiation times. Then the configuration C~O 1 Á Á Á O n~( x 1 ,t 1 ) Á Á Á (x n ,t n ) (where and when the origins of replication fire during the S-phase) completely specifies the spatiotemporal replication program (Fig. 6) [51,52]. If we denote O i \O iz1 the event ''the fork coming form O i meets the fork coming from O iz1 '' whose space-time coordinates are: then the replication timing and fork orientation (p C (x)~+1) at spatial position x[½x i{1\i ,x i\iz1 are given by (Fig. 6): We clearly see that since d(Dx{x i D)=dx~sign(x{x i ) then the fork orientation is equal to v times the derivative of the replication timing: Under the hypothesis of constant fork velocity v, this relationship holds in whole generality in each cell cycle and at every locus x without any specific asumption on the distribution of initiation events. By definition, the replication fork polarity is the population average over cell cycles of the fork orientation: p(x)~Sp C (x)T Cells . Hence, when averaging over cell cycles, Equation (13) yields: where we have used the fact that the spatial derivative commutes with the population average and that by definition MRT(x)S t C (x)T Cells . The replication fork polarity therefore provides a direct link between the skew S and the derivative of the MRT (Equations (10) and (14)) in germline cells.

Sequence and annotation data
Sequence and annotation data were retrieved from the Genome Browsers of the University of California Santa Cruz (UCSC) [60].  (11). The replication timing and fork orientation at the spatial position x[½x i{1\i ,x i\iz1 are given by Equation (12) from which we deduce the relationship p C (x)~vdt C (x)=dx and in turn Equation (14) for the replication fork polarity and the derivative of the MRT. In this picture of the spatio-temporal replication program, the replication fork velocity v is assumed to be constant and replication is bidirectional from each origin. doi:10.1371/journal.pcbi.1002443.g006 Analyses were performed using the human genome assembly of March 2006 (NCBI36 or hg18). As human gene coordinates, we used the UCSC Known Genes table. When several genes presenting the same orientation overlapped, they were merged into one gene whose coordinates corresponded to the union of all the overlapping gene coordinates, resulting in 23818 distinct genes. We used CpG islands (CGIs) annotation provided in UCSC table ''cpgIslandExt''.

Replication N-domains
The coordinates of the 678 human replication N-domains for assembly NCBI35/hg17 were obtained from the authors [19] and mapped using LiftOver to hg18 coordinates; we kept only the 663 N-domains that had the same size after conversion.

Determining mean replication timing profiles
We determined the mean replication timing profiles along the complete human genome using Repli-Seq data [23,26] (Supplementary Text S1, and Supplementary Fig. S14). For embryonic stem cell line (BG02), three lymphoblastoid cell lines (GM06990, H0287, TL010), a fibroblast cell line (BJ, replicates R1 and R2), and erythroid K562 cell line, Repli-Seq tags for 6 FACS fractions were downloaded from the NCBI SRA website (Studies accession: SPR0013933) [26]. For the HeLa cell line we computed the mean replication timing (MRT) instead of computing the S50 (median replication timing) as in [23].

Detection of U-domains along mean replication timing profiles
We developed a segmentation method of the MRT profile into Udomains based on the continuous wavelet transform. This method amounts to perform objective (U-) pattern recognition in 1D signals where the U-motif is picked out from the background signal variations (Supplementary Text S1, and Supplementary Fig. S15).

Correlation analysis
For the analysis of correlations, we reported the Pearson's product moment correlation coefficient R and the associated Pvalue for no association (R~0). All statistical computations were performed using the R software (http://www.r-project.org/).
We plotted the coverage by DNase Hypersentive Sites (DHSs) identified as signal peaks at a false discovery rate threshold of 0.5% within hypersensitive zones delineated using the HotSpot algorithm (''wgEncodeUwDnaseSeqPeaks'' tables). When several replicates were available, data were merged.

Genome-wide maps of Pol II and CTCF binding
We used ChIP-seq data using antibody for Pol II and CTCF from Release 3 (Mar 2010) of the ENCODE Open Chromatin track [11,61]. Data were downloaded from the UCSC FTP site: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/ wgEncodeChromatinMap.
We plotted coverage by regions of enriched signal in ChIP experiments, called based on signals created using F-Seq [62] (''wgEncodeUtaChIPseqPeaks'' tables). Significant regions were determined at an approximately 95% sensitivity level. We always used the most recent version of data.

Whole genome chromatin conformation data
We used the spatial proximity maps of the human genome generated using Hi-C method [38]. We downloaded 100 kb resolution maps for GM06990 and K562 cell lines from the GEO web site (GSE18199_binned_heatmaps): http://www.ncbi.nlm. nih.gov/geo/query/acc.cgi?acc = GSE18199.

Chromatin fiber density data
Open over input chromatin ratio data from human lymphobastoid cells were obtained from the authors [41].

Data availability
Coordinates of N-domains and U-domains in the investigated 7 cell lines can be downloaded from: http://perso.enslyon.fr/benjamin.audit/ReplicationDomainsPLoSComputBiol 2012/. (Equation (S7)) values are color coded using green (resp. orange) shades for negative (resp. positive) curvature (note that MRT axis is going downwards). Horizontal dashed line marks scale 300 kb used to detect regions of preferential replication initiation (vertical lines). Pairs of horizontal bars delineate the scale range where strong negative curvature is expected for parabolic U-shaped MRT profile. Regions delineated by two successive regions of preferential replication initiation are kept as U-domain if T MRT g (2) ƒ{0:04 at their midpoint for some scale value in this range. (PDF)

Table S2
Number of matchings between replication timing Udomains in different pairs of cell lines including skew N-domains in the germline. A U-domain in a given cell line (column) was considered as matching a U-domain in another cell line (row) if more than 80% nucleotides of each of these U-domains were common to the two domains. (PDF)

Table S3
Number of matchings between randomly re-positioned replication timing U-domains in different pairs of cell lines including skew N-domains in the germline (1000 simulations were used to obtain the mean values). A U-domain in a given cell line (column) was considered as matching a U-domain in another cell line (row) if more than 80% nucleotides of each of these Udomains were common to the two domains. (PDF)

Table S4
Percentage of matchings between randomly repositioned replication timing U-domains in different pairs of cell lines including skew N-domains in the germline (1000 simulations were used to obtain the mean values). A U-domain in a given cell line (column) was considered as matching a U-domain in another cell line (row) if more than 80% nucleotides of each of these Udomains were common to the two domains. (PDF) Text S1 Supplementary methods: (i) Substitution rate matrix associated to replication (ii) Determination of mean replication timing profiles from experimental data and (iii) Detection of Udomains along mean replication timing profiles. (PDF)