The authors have declared that no competing interests exist.
Conceived and designed the experiments: DJG GM YG JKP. Performed the experiments: AAP YNF-M KM NL. Analyzed the data: GM DJG. Wrote the paper: GM DJG JKP YG. Developed the paired-end MNase-seq protocol: JW YNF-M. Provided valuable advice at the onset of the project: JW. Supervised the project: YG JKP.
Nucleosomes are important for gene regulation because their arrangement on the genome can control which proteins bind to DNA. Currently, few human nucleosomes are thought to be consistently positioned across cells; however, this has been difficult to assess due to the limited resolution of existing data. We performed paired-end sequencing of micrococcal nuclease-digested chromatin (MNase–seq) from seven lymphoblastoid cell lines and mapped over 3.6 billion MNase–seq fragments to the human genome to create the highest-resolution map of nucleosome occupancy to date in a human cell type. In contrast to previous results, we find that most nucleosomes have more consistent positioning than expected by chance and a substantial fraction (8.7%) of nucleosomes have moderate to strong positioning. In aggregate, nucleosome sequences have 10 bp periodic patterns in dinucleotide frequency and DNase I sensitivity; and, across cells, nucleosomes frequently have translational offsets that are multiples of 10 bp. We estimate that almost half of the genome contains regularly spaced arrays of nucleosomes, which are enriched in active chromatin domains. Single nucleotide polymorphisms that reduce DNase I sensitivity can disrupt the phasing of nucleosome arrays, which indicates that they often result from positioning against a barrier formed by other proteins. However, nucleosome arrays can also be created by DNA sequence alone. The most striking example is an array of over 400 nucleosomes on chromosome 12 that is created by tandem repetition of sequences with strong positioning properties. In summary, a large fraction of nucleosomes are consistently positioned—in some regions because they adopt favored sequence positions, and in other regions because they are forced into specific arrangements by chromatin remodeling or DNA binding proteins.
Within the nucleus of the cell, the genome of eukaryotic organisms is tightly packaged into chromatin. Chromatin is composed of a repeating series of bead-like nucleosomes, each of which is encircled 1.7 times by a string of DNA. The organization of nucleosomes on the genome is fundamentally important because they can prevent other proteins from accessing the DNA. Previous studies of human nucleosomes concluded that most nucleosomes have fuzzy positioning and tend to occupy different locations in different cells. This interpretation, however, may be a consequence of the low resolution of existing data. Here we revisit the question of nucleosome positioning by generating the most precise map of nucleosome positions that has ever been created for a human cell line. We find that 8.7% of nucleosomes have very consistent positioning, and most nucleosomes are more consistently positioned than expected by chance. Additionally, we estimate that almost half of the genome contains regularly spaced arrays of nucleosomes. Much of this positioning is due to the intrinsic preference of nucleosomes for some DNA sequences over others; but in some regions of the genome, the sequence preferences of nucleosomes are overridden by proteins that out-compete them for binding or displace them using energy from ATP.
In eukaryotes, the genome is organized into a compact protein-DNA complex known as chromatin which, at the most fundamental level, consists of a repeating series of nucleosome core “beads” separated by linker DNA “strings”
Nucleosome organization is described by the translational and rotational positions of nucleosomes
The positioning of nucleosomes is at least partly encoded by the genome, because some DNA sequences energetically favor nucleosome formation more than others
Nucleosome organization is also influenced by barriers that exclude nucleosomes from stretches of sequence
The importance of sequence preferences for nucleosome positioning is controversial, and some authors have argued that the nucleosomes with the strongest positioning are usually directed by cellular machinery such as RNA polymerase, chromatin remodelers, or transcription factors
To overcome these limitations and assess the strength of nucleosome positioning in the human genome, we performed paired-end MNase-seq on seven human lymphoblastoid cell lines (LCLs) derived from HapMap individuals. LCLs are an ideal model system for this problem because they have been extensively characterized by ENCODE
To determine the genomic positions of nucleosomes in seven human lymphoblastoid cell lines (LCLs), we combined micrococcal nuclease (MNase) digestion of chromatin with high-throughput paired-end and single-end DNA sequencing (MNase-seq). MNase preferentially cuts within linker DNA
To study the rotational and fine-scale translational positioning of human nucleosome sequences, we restricted ourselves to 130 million fragments of length 147 bp. Fragments that are substantially longer or shorter than this may result from over- or under-digestion of nucleosomal DNA and provide less precise estimates of individual nucleosome positions. A major advantage of using paired-end sequencing is that large and small fragments can be filtered and the location of nucleosome dyads can be determined much more precisely.
We examined the nucleotide composition of the 147 bp fragments, and found clear 10 base pair periodicities in all 16 dinucleotides (
A. Frequencies of AA/AT/TA/TT and CC/CG/GC/GG dinucleotides across nucleosome sequences normalized by expected dinucleotide frequencies (log2 ratio). Expected frequencies were taken from a set of simulated fragments, which were sampled such that they had the same MNase cutting bias as the observed fragments. B. DNase I cut rates across nucleosome sequences normalized by the expected cut rates (log2 ratio). Expected DNase I cut frequencies were estimated from the composition of all observed DNase I cut sites in the human genome. C. MNase-seq fragment midpoints from 3 cell lines. Expected midpoint frequencies were estimated from the same simulated fragments used in A.
We investigated the sensitivity of nucleosome sequences to nicking by the DNase I enzyme using 3.0 billion experimentally identified DNase I nick sites
Unexpectedly, the 10 bp periodicity in DNase I sensitivity extends beyond the putative nucleosome core region into the adjacent linker sequences. This suggests that nucleosomes in different cells often have translational offsets that are multiples of 10 bp. This would maintain their rotational positioning and result in a longer periodic pattern of DNase I sensitivity in aggregate. To look for further evidence of 10 bp offsets in translational positioning, we examined how other 147 bp MNase fragment midpoints are distributed around observed fragment midpoints. To avoid artifacts introduced by the MNase-seq protocol (such as duplicate sequences introduced by amplification), we ascertained midpoints using four cell lines, and examined the distribution of midpoints in the surrounding region using three other cell lines. This procedure reveals a striking periodic pattern, in which nucleosomes are much more likely to be positioned at “in-phase” distances that are multiples of 10 bp from the ascertained dyad (
To quantify the translational positioning of nucleosomes we calculated positioning scores for one million randomly sampled 200 bp regions. We define the positioning score for a particular site as the fraction of nearby midpoints (within 100 bp) that are within 15 bp of the site. The score for a given region is then the maximum score across all sites in the region. For the same regions we also calculated positioning scores using simulated midpoints (
A. Distribution of nucleosome positioning scores from a random sample of one million 200 bp regions (smoothed using a Gaussian kernel with bandwidth 0.01). Scores were also calculated in the same regions using midpoints from non-duplicate read pairs and from simulated read pairs. B. Distribution of nucleosome array log likelihood ratios (LLRs) for 23,763 randomly sampled 1 kb regions (smoothed using a Gaussian kernel with bandwidth 1.0). LLRs were also calculated using midpoints from simulated reads and using permuted versions of the same regions. C. Heatmap of MNase midpoints in the randomly sampled regions from B, prior to their alignment. D. Heatmap of MNase midpoints from panel D, after their alignment. Regions were aligned according to the most likely position of the central nucleosome. E. Heatmap of aligned MNase midpoints for permuted regions. Heatmaps in C, D, and E are ordered by the LLR of the observed midpoints.
While most nucleosomes have non-random positioning, the majority of translational positioning is weak. Using arbitrary score thresholds, we estimate that 81% of human nucleosomes have weak positioning (score between 0.3 and 0.5), 8.4% have moderately strong positioning (score between 0.5 and 0.7), and 0.3% have very strong positioning (score>0.7). This is a large enrichment over simulated midpoints where only 27%, 0.06% and 0.03% of regions meet these criteria (
We next sought to establish how much of the genome contains consistently positioned arrays of nucleosomes. We randomly sampled 100,000 genomic segments each 1 kb long and removed those where fewer than 80% of the sites were uniquely mappable. We split the remaining 47,528 regions into training and test data sets and estimated parameters for a probabilistic nucleosome array model from the training data. The array model specifies the probability of observing an MNase fragment midpoint at each position of an 879 bp “template” spanning 5 nucleosomes (
As our LLR statistic compares a template model to a uniform model, some fraction of the significant regions may have non-uniform patterns that do not closely resemble nucleosome arrays. It is visually clear, however, that many of the high scoring regions contain evenly spaced arrays of nucleosomes, even those with an LLR close to the 1% FDR threshold (
To better understand what types of genomic regions contain consistently positioned nucleosome arrays we estimated the overlap of the significant arrays (LLR>27.8; FDR<1%) with chromatin states identified from histone marks in human LCLs
To investigate the contribution of DNA sequence to nucleosome organization within each of these chromatin states, we compared observed nucleosome occupancies to those predicted by an
We also performed a scan for regions containing extremely strongly positioned arrays by sliding the nucleosome array template across the genome in 5 bp steps. Windows with low numbers of mapped fragments, or low mappability were removed and overlapping windows with likelihood ratios greater than 50 were merged. This genome-wide scan revealed many striking examples of regularly spaced, consistently positioned nucleosomes (
A. MNase midpoint density (smoothed using a 30 bp sliding window) across a 76 kb region near the chromosome 12 centromere. This region contains an array of ∼400 nucleosomes with regular, consistent positioning. B. A small 10 kb subsection of the larger nucleosome array. Predicted nucleosome occupancy from the
We observed a particularly extreme example of sequence-directed nucleosome positioning near the centromere of chromosome 12. This region spans ∼76 kb and contains over 400 consistently positioned nucleosomes in a single array (
We examined the positioning of nucleosomes around transcription factor binding sites, using publicly available chromatin immunoprecipitation followed by sequencing (ChIP-seq) data for 35 different transcription factors in LCLs
ChIP-seq peaks with strongly positioned flanking arrays are more sensitive to DNase I digestion and have far more pronounced DNase I footprints (
A. Heatmaps of MNase midpoints (columns 1–2) and DNase I cuts (column 3) surrounding 1000 randomly sampled ChIP-seq peaks for CTCF, NF-kB, Irf4, GABP and C-fos. Heatmap rows are ordered from top to bottom by the nucleosome array log likelihood ratio (LLR). Columns 2 and 3 are aligned according to the most likely location of the upstream and downstream arrays of positioned nucleosomes. B. Aggregate MNase midpoint and DNase I cutsite depths across all regions and for the subset of regions with LLR>500.
A. Mean MNase midpoint depth around ChIP-seq peak summits, aggregated across 5 transcription factors (CTCF, NF-kB, Irf4, C-fos and GABP). Regions are aligned such that the estimated locations of the +1 nucleosome, the −1 nucleosome and the midpoint between the nucleosomes are at the same position. Segments that have data from less than 50% of the ChIP-seq peaks (because of the variable spacing between nucleosomes) are omitted. Regions are stratified into ChIP-seq read depth quintiles, (higher quintiles indicate higher transcription factor occupancy). B. Predicted nucleosome occupancy from an
In the regions surrounding transcription factor binding sites, predicted occupancy is correlated with observed nucleosome occupancy (
The above results argue that nucleosome positions are guided by sequence preferences, which are frequently overridden by barriers that are sensitive to DNase I digestion. As a direct test of this hypothesis, we asked whether DNA sequence differences that affect DNase I sensitivity also affect nucleosome positions, using a set of 7088 DNase I sensitivity quantitative trait loci (dsQTLs)
For each dsQTL we classified each cell line as homozygous sensitive, heterozygous or homozygous insensitive, using the genotype of the associated SNP. We then examined the nucleosome organization of each genotype class by aggregating MNase-seq midpoints across dsQTLs. Regions that are homozygous for the sensitive genotype are flanked by arrays of positioned nucleosomes, consistent with those around ChIP-seq peaks (
Data are aggregated across dsQTLs and are scaled by the total number of sequenced reads. The DNase-seq data are from 70 individuals and the MNase-seq data are from 7 individuals. This plot was created using a subset of dsQTLs (n = 1101) that have a narrow region of DNase I sensitivity (below the median) and a large difference in sensitivity between genotypes (above the median). The complete set of filtered dsQTLs shows the same trend (
Previous studies have found little evidence for consistent positioning of human nucleosomes
At a fine scale, nucleosomes are often found at alternate “minor” translational positions that are multiples of 10 bp away from their most frequent “major” position. These alternate positions preserve the rotational positioning of the nucleosome on the DNA and are likely to be energetically favored because they retain phase with the periodic nucleosome sequence preferences. Similar offsets in nucleosome positions have been observed in 5S rDNA
At a broad scale, nucleosomes are often found in consistently positioned, regularly spaced arrays, which are enriched in insulators, promoters, and enhancers. These arrays frequently flank transcription factor binding sites and are strongest, in aggregate, when DNase I sensitivity and transcription factor occupancy are highest. Additionally, the consistent phasing of nucleosomes flanking DNase I sensitive sites is disrupted by single nucleotide polymorphisms that reduce DNase I sensitivity. This is strong evidence that the positioning of nucleosomes in these regions depends upon a barrier that is created by the binding of non-histone proteins to the DNA. A single nucleotide difference is unlikely to substantially change the affinity of a single nucleosome for a sequence, let alone shift the positions of multiple nucleosomes across a region spanning several thousand bases.
An interesting question is whether a barrier alone is sufficient to create arrays of regularly spaced nucleosomes. Recent results in yeast suggest that this may not be the case. While a minimal
While the sequence preferences of nucleosomes are often overridden by other factors in functional regions of the genome, the abundance of nucleosomes with consistent rotational and translational positioning suggest that sequence preferences may play an important role in gene regulation. In particular, sequence-directed organization of nucleosomes may determine whether pioneer transcription factors that recruit chromatin remodelers can bind in the first place.
We studied
A 1 ml aliquot of nuclei was digested with 7 µl of 50 U/µl MNase at 37°C for 12 minutes. The reaction was stopped by addition of EDTA, SDS and NaCl to an end concentration of 0.01 M, 2% and 0.2 M respectively. Reactions were digested with RNaseA (0.1 mg) for 1 hr at 42°C and further treated with ProteinaseK at 37°C for one hour. DNA was extracted using phenol-chloroform extraction and concentrated by ethanol precipitation. DNA was then run on a 3.3% Nusieve agarose gel at 75 V for 5 hours. 147 bp fragments representing the mononucleosomes were excised from the gel and DNA was extracted from the gel by crushing the gel and soaking in soak buffer (300 mM Sodium Acetate, 1 mM EDTA, 0.1% SDS). The resulting DNA fragments in solution were then purified using a Qiagen PCR purification kit.
Our seven libraries of nucleosome fragments were prepared for paired-end sequencing using the standard Illumina protocol. Libraries were sequenced for a total of 36 or 50 cycles (18 bp or 25 bp for each end of the fragment) on either an Illumina Genome Analyzer II or Illumina HiSeq machine. For one flow cell only 25 bp single-end reads were generated due to a problem with the adaptor sequences. We retained the single-end reads for analysis, but also re-sequenced the two affected libraries to obtain a complete complement of paired-end data (
We mapped reads to the hg18 assembly of the human genome using BWA (with default arguments)
To estimate the number of mapped MNase-seq fragments per nucleosome, we assumed a genome size of 3 billion bases with one nucleosome every 200 bp.
A complete summary of the sequenced libraries is provided in
To examine the fine scale properties of nucleosomes, we restricted our analysis to the 134 million mapped MNase-seq fragments of size 147 bp. To generate
MNase has a strong sequence specificity that could bias the positions of nucleosomes inferred using MNase-seq
To correct for MNase cutting bias in
To correct for bias in DNase I nicking, we counted occurrences of 6-mers at observed DNase I nick sites. The 6-mers were extracted from positions −3 to +3 around each nick, which have the strongest compositional bias (
To quantify the consistency of nucleosome positioning, we calculated positioning scores for a sample of one million 200 bp genomic regions. We only sampled from regions where at least 80% of the bases are uniquely mappable (defined by the wgEncodeDukeUniqueness24 bp track from the UCSC genome browser) and excluded regions that overlapped segments with excessive 1000 genomes read depth
We defined the positioning score,
As a control, we calculated positioning scores for midpoints from the simulated set of 142–152 bp fragments. To avoid amplification artifacts we also computed scores after conservatively removing duplicate read pairs from each MNase-seq library.
To calculate a FDR for positioning scores, we assigned each sampled region an empirical
We searched for well-ordered nucleosome arrays using a sliding window approach. In each window, we performed a likelihood ratio test that compared a model of a uniform distribution of MNase midpoints in the window (expected under no positioning) to an array model, where MNase midpoints are highly ordered into successive peaks and troughs. We modeled the spatial distribution of midpoints in the window as multinomial distribution, such that the likelihood of the midpoints in a window of
To estimate the fraction of the genome that contained nucleosome arrays, we selected 1 kb windows at random from the genome. For each window we slid a symmetric template over central 200 bp of the window in 1 bp increments computing a LLR for each successive step. Our procedure started with the template midpoint at a position 100 bp upstream of the window midpoint and ended with the template midpoint 200 bp downstream of the window midpoint. For each template we used all data in the 1 kb region to compute the LLR.
To align MNase data flanking ChIP-seq peaks we extracted MNase data from 2 kb windows flanking each ChIP-seq peak summit. To identify arrays of nucleosomes upstream and downstream of the peak summit we used two 1 kb nucleosome array templates separated by a nucleosome free region with a size of up to 200 bp. The probability of observing nucleosome midpoints in the nucleosome free region was assumed to be uniform with a rate equal to the mean number of midpoints per site in the region. For each window we estimated the size of the central nucleosome free region to be the one that gave the maximum likelihood. We then aligned all of the regions on the edges of their nucleosome free regions (
We investigated the performance of our array searching method in two ways. First, to assess the effect of mappability and MNase digestion biases we searched for nucleosome arrays in the simulated read data set described above. We extracted simulated reads for each of the 23,763 random 1 kb regions and performed the same array search procedure as for the real data. The distribution of LLRs from simulated read data is shown in
To examine aggregate MNase midpoints and DNase I sensitivity in the chromosome 12 array region, we first identified locations of nucleosomes using the following procedure. We identified contiguous regions with nucleosome positioning scores>0.4 as ‘peak regions’ and labeled the position with the maximum score in each region as the peak. We discarded peaks where the score at the peak was less than 0.5, and when multiple positions tied we chose the one closest to the midpoint of the region. Using this method we identified 403 putative nucleosomes within the nucleosome array region (chr12:34,376,000–34,452,000), and used these to construct the aggregate plot shown in
We downloaded publicly available ChIP-seq data for 40 transcription factors that were generated for the ENCODE consortium by the Bernstein, Myers and Snyder groups
We obtained a list of 7088 DNase I sensitivity quantitative trait loci (dsQTLs) from
To more precisely identify the DNase I sensitive region within each dsQTL, we combined DNase I nick counts from homozygous sensitive and heterozygous cell lines and smoothed them with a 101 bp sliding window. We used the smoothed values to define a “peak” and “sensitive region”. We defined the “peak” as the site with the maximum value within 200 bp of the dsQTL's midpoint and defined the “sensitive region” as the block of contiguous sites around each peak where values exceeded 1/2 the peak value.
We then filtered the dsQTLs as follows. First, we discarded dsQTLs where the edge of the sensitive region was more than 100 bp from the dsQTL's midpoint (n = 396). Second, we filtered dsQTLs where the sensitive region overlapped one from another nearby dsQTL (n = 1463). Finally we discarded dsQTLs where the DNase I values in the peak region were inconsistent with the original, broader dsQTL region. We considered the DNase I values to be inconsistent with the dsQTL if the mean value for the heterozygote was greater than that of the homozygous sensitive genotype, or if the mean value for the homozygous insensitive genotype was greater than either that of the heterozygous or homozygous sensitive genotype. In total 911 dsQTLs were discarded by this latter criterion leaving a total of 4318 for analysis.
To examine the nucleosome organization in the remaining dsQTLs, each region was centered on the midpoint of the identified sensitive region and MNase midpoints were aggregated across regions separately for each of the three possible genotype classes.
All MNase-seq data are deposited in GEO under accession number GSE36979 and are available at
Distribution of MNase-seq fragment sizes. Fragment sizes for each paired-end sequencing library were inferred from the separation of read pairs, which were mapped to the human genome using BWA.
(TIF)
Dinucleotide frequencies for all 16 dinucleotides computed using 147 bp fragments from paired-end MNase-seq. Unlike
(TIF)
Dinucleotide frequencies for all 16 dinucleotides computed using 147 bp fragments from a single MNase-seq library for cell line GM19193. This cell line was under-digested compared to the other cell lines (
(TIF)
Dinucleotide frequencies computed from single-end MNase-seq reads. While 10 bp periodic patterns are still visible in the dinucleotide composition, they are greatly attenuated compared to those obtained from paired-end reads as shown in Figures S2 and S3.
(TIF)
Scatterplot showing the relationship between positioning scores and the stringency metric of Valouev
(TIF)
Distribution of stringency values for 200 bp windows. Stringency values were calculated for 805,477 randomly sampled regions using the method described by Valouev
(TIF)
Template models for nucleosome arrays. The master template was initially derived from from CTCF binding sites and then re-trained on a set of random sequence regions. The re-trained template was used to derive the 879 bp template for genome-wide searching, and the templates used to discover nucleosome arrays flanking ChIP-seq binding sites.
(TIF)
Examples of nucleosome arrays for different log-likelihood ratio (LLR) ranges. LLRs were calculated for a set of 23,763 “test” regions, as described in the main text. Each row of panels shows four 2 kb regions that were randomly sampled from test regions, within a specified LLR range (e.g., the regions in the top row have LLRs between 0 and 25). The LLRs were computed using the central 1000 bp of each region. For each region the observed MNase-seq fragment midpoints (smoothed with a 50 bp sliding window) and the predicted nucleosome occupancy from the
(TIF)
Nucleosome array LLR distributions after removing the most highly-scoring nucleosome. A. The distribution of LLRs calculated for the set of 23,763 “test” regions and in permuted data. In each region we removed the most strongly positioned nucleosome prior to LLR computation in both the real and permuted data. B. Correlation between the full array LLRs and the LLRs computed after dropping the highest scoring nucleosome.
(TIF)
Distributions of nucleosome array LLRs after permuting midpoints from all but two of the positioned nucleosomes. For each of the 23,763 “test” regions we computed a LLR for the real data and for two permuted versions of the same region. For the “random” permutation, midpoint counts were randomly shuffled over the entire region. For the “2 nucleosomes” permutation, two nucleosomes were randomly selected and excluded from the permutation, while midpoint counts were randomly permuted between the remaining sites in the region.
(TIF)
Nucleosome array enrichment by chromatin state. Chromatin states for lymphoblastoid cell lines were obtained from Ernst
(TIF)
Correlation between observed and predicted nucleosome occupancy by chromatin state. Predicted nucleosome occupancy was obtained from Kaplan
(TIF)
Aggregate MNase and DNase I as a function of distance from estimated nucleosome dyad positions in the chromosome 12 array region. Data are aggregated over 403 putative nucleosome dyads in the region spanning chr12:34,376,000–34,452,000 and smoothed with a 10 bp sliding window. Both MNase and DNase I rates were normalized by expected rates estimated from simulated midpoints and nucleotide composition.
(TIF)
Estimates of nucleosome repeat lengths. We estimated nucleosome repeat lengths using distances between MNase midpoint peaks in aggregate plots for the chromosome 12 array region (
(TIF)
Distribution of nucleosome array log likelihood ratios (LLRs) around ChIP-seq peaks. LLRs around ChIP-seq peaks were calculated using two 1000 bp regions, flanking a central region of variable width, as described in
(TIF)
Nucleosome organization in regions with an association between DNase I sensitivity and genotype (dsQTLs). This figure is as described in
(TIF)
MNase digestion bias. This figure shows the frequency of nucleotides around the ends of MNase-seq fragments that mapped to chromosome 1. K-mers from the first and last four bases of each fragment (+1 to +4) were used to correct for MNase digestion bias, as described in
(TIF)
DNaseI cutting bias. This figure shows the frequency of nucleotides around the genomic locations of 3.0 billion DNase I nick sites. K-mers spanning positions −3 to +3 were used to correct for bias in DNase I nicking, as described in
(TIF)
Distributions of nucleosome positioning scores calculated from 150 bp regions. Scores were computed from the same sample of one million regions used in
(TIF)
Summary of MNase-seq data for this study. MNase-seq reads for 7 Yoruba lymphoblastoid cell lines were generated using either an Illumina Genome Analyzer IIx or Illumina HiSeq 2000 sequencer. Read lengths were either 18 bp or 25 bp, and both single-end and paired-end reads were generated. This table summarizes the number of MNase-seq fragments that were retained following mapping and filtering for each sequencing library. The raw sequencing reads have been deposited in GEO under accession number GSE36979.
(XLS)
Percentage of nucleosomes with weak, moderate, or strong translational positioning. Positioning scores were calculated for 805,477 randomly sampled regions of 200 bp that had at least 50 midpoints. The percentages indicate the number of regions that meet the specified scoring criteria for observed midpoints, observed midpoints from non-duplicate read pairs, and simulated midpoints.
(XLS)
We would like to thank Arend Sidow, Anton Valouev, and an anonymous reviewer for their helpful comments and the members of the Pritchard, Przeworski, Stephens, and Gilad laboratories for valuable discussions.