Oligonucleotide Sequence Motifs as Nucleosome Positioning Signals

To gain a better understanding of the sequence patterns that characterize positioned nucleosomes, we first performed an analysis of the periodicities of the 256 tetranucleotides in a yeast genome-wide library of nucleosomal DNA sequences that was prepared by in vitro reconstitution. The approach entailed the identification and analysis of 24 unique tetranucleotides that were defined by 8 consensus sequences. These consensus sequences were shown to be responsible for most if not all of the tetranucleotide and dinucleotide periodicities displayed by the entire library, demonstrating that the periodicities of dinucleotides that characterize the yeast genome are, in actuality, due primarily to the 8 consensus sequences. A novel combination of experimental and bioinformatic approaches was then used to show that these tetranucleotides are important for preferred formation of nucleosomes at specific sites along DNA in vitro. These results were then compared to tetranucleotide patterns in genome-wide in vivo libraries from yeast and C. elegans in order to assess the contributions of DNA sequence in the control of nucleosome residency in the cell. These comparisons revealed striking similarities in the tetranucleotide occurrence profiles that are likely to be involved in nucleosome positioning in both in vitro and in vivo libraries, suggesting that DNA sequence is an important factor in the control of nucleosome placement in vivo. However, the strengths of the tetranucleotide periodicities were 3–4 fold higher in the in vitro as compared to the in vivo libraries, which implies that DNA sequence plays less of a role in dictating nucleosome positions in vivo. The results of this study have important implications for models of sequence-dependent positioning since they suggest that a defined subset of tetranucleotides is involved in preferred nucleosome occupancy and that these tetranucleotides are the major source of the dinucleotide periodicities that are characteristic of positioned nucleosomes.


Introduction
The fundamental building block of the eukaryotic chromosome is the nucleosome, which consists of 147 bp of DNA, wrapped 1.65 times around an octamer of core histone proteins [reviewed in [1][2][3]. The histone octamer has been highly conserved throughout evolution and is composed of two copies of each histone (H2A, H2B, H3, and H4). Arginine and lysine residues on the surface of the octamer interact strongly with the negatively charged phosphate backbone of DNA ensuring that essentially any DNA sequence can be packaged into a nucleosome. However, during the past five years, large scale sequencing approaches and microarray hybridization technology have permitted the localization of the majority of nucleosomes in the genomes of yeast, worms, flies and humans, and these genome-wide studies have revealed that a surprisingly large fraction of nucleosomes are wellordered with respect to their positions along the chromosomes [3][4][5][6][7][8][9][10][11][12][13][14]. These results are in agreement with a large body of earlier work, which has shown that nucleosomes are distributed in a nonrandom fashion along the eukaryotic chromosome [1]. The nonrandom positioning of nucleosomes along DNA in chromatin is thought to control access to regulatory proteins and is thus considered to be of fundamental importance in the regulation of the eukaryotic genome. [for reviews, see 1,15]. Consequently, elucidation of the factors that govern nucleosome positioning is required for a better understanding of genome regulation.
The positioning of nucleosomes depends on two fundamental factors. First is the DNA sequence preference, but it is uncertain as to what fraction of nucleosomes is positioned by sequence alone in vivo [12,13,16,17]. Second are epigenetic factors including ATPdependent chromatin remodeling factors, DNA methylation, posttranslational modification of histones and DNA bound regulatory proteins [18][19][20][21]. It is likely that DNA sequence dictates the ground state for the ordering of nucleosome positioning, and that epigenetic factors are superimposed over this state for determination of the final architecture and function of chromatin in the cell [14,22]. The observation that the chromatin structures of most, but not all, promoters in yeast are maintained throughout the cell cycle seems to point to the importance of the both the primary DNA sequence and epigenetic factors in the control of gene regulation [23].
DNA sequence is thought to direct the positioning of nucleosomes by two distinct mechanisms: the inhibition of nucleosome formation and the preferential assembly of the core particle. Homopolymeric stretches of dA: dT that are .10-20 bp in length represent an important genomic feature that serves to inhibit nucleosome formation. Early studies demonstrated that these tracts are stiff and resistant to bending forces, and this property has long been associated with their ability to inhibit nucleosome formation in vitro and in vivo [24,25]. These sequences produce gaps between nucleosomes, and the nucleosome-free regions (NFRs) are hypersensitive to nuclease probes such as DNase 1. These gaps are frequently found in constitutively active promoters where they have been viewed as entry sites for the transcriptional machinery. Chromatin gaps that arise from these sequence elements have also been implicated in the control of replication, initiation, and transcription termination. In fact, the long homopolymeric dA:dT tracts have been considered to represent major determinants of nucleosome organization in all eukaryotes [reviewed in 26].
The second mechanism by which DNA directs the organization of nucleosomes involves sequences that facilitate nucleosome formation and stability and promote positioning of the histone octamer at single genomic sites. These positioning sequences were originally obtained from a wide range of eukaryotes and their viruses, and it is now suspected that they are more widespread in the eukaryotic genome than was originally anticipated [1,3]. These positioning elements frequently occur in the vicinity of promoters and enhancers, and a variety of direct functional studies provide strong emerging evidence that they directly regulate transcriptional initiation and other genomic functions as well. Recent studies have suggested that these sequences are preferentially associated with variable promoters rather than constitutively active ones, and it has been suggested that they render these promoters susceptible to epigenetic regulation [27]. However, the precise mechanism by which these elements facilitate nucleosome positioning and stability is not known.
During nucleosome formation, a relatively stiff DNA molecule is tightly wound around the histone octamer resulting in a DNA conformation that is highly strained. According to one widely accepted view, certain dinucleotide sequence patterns along the length of nucleosomal DNA can best relieve this strain by enhancing anisotropic flexibility, and these sequences should therefore be preferentially packaged into nucleosomes. This concept was originally advanced by Trifonov and Sussman [28] and has been incorporated into a large number of models for predicting nucleosome positioning from AA/TT/TA sequence periodicities [7,16,29,30]. However, to our knowledge, there is no direct experimental evidence that dinucleotide periodicities per se are involved in dictating preferred nucleosome occupancy, and it is becoming increasingly apparent that sequence-dependent structures of DNA in solution and in the nucleosome cannot be adequately described at the level of the dinucleotide. For example, curved DNA that arises from oligonucleotide length A-tracts arranged in a ,10 bp periodicity is preferentially packaged into nucleosomes [31][32][33][34]. It was also pointed out long ago that dinucleotide analysis represents an oversimplification of the problem since the AA/TT repeat pattern in 177 chicken erythrocyte nucleosome fragments is largely due to AAA/TTT [35]. Recent studies with synthetic DNA fragments have also shown that certain oligonucleotide sequences containing TA steps function in conferring high nucleosome affinity and positioning activity in vitro and the two major experimentally identified core elements in this group are the tetranucleotides CTAG and the related sequence TTAA [36,37]. These sequences have been identified at the same locations in a few natural nucleosomepositioning sequences [36][37][38], but there have been no systematic computational studies aimed at describing these sequences in genomic nucleosomal DNA. Taken together, these results seem most consistent with an oligonucleotide model for nucleosome positioning.
Proteins that bind in the DNA major groove most often utilize a direct readout strategy for the recognition of nucleotide sequences that involves hydrogen bonding between DNA bases and amino acid residues. In contrast, proteins like histones that interact in the minor groove often utilize indirect modes of recognition, which are dependent on intrinsic shapes, and mechanical properties of the DNA [39,40]. The informational content of DNA that is used for assessment of indirect readout mechanisms depends on the sequence length. The relative frequencies of A-T vs. G-C bp provides information on DNA stability which has been used for characterization of DNA in solution and in the nucleosome [41]. However, dinucleotides of the same composition can display markedly different characteristics as exemplified by the divergent properties of AA, AT and TA [42][43][44]. The smallest units of DNA that contain sequence information are the 16 dinucleotides and characterizations of DNA structure at this level have yielded important insights into the factors which affect the conformational properties of DNA and its packaging into nucleosomes. However, dinucleotide steps are sensitive to their immediate sequence context as suggested by the observations that the conformational properties of the dinucleotide YR in the tetranucleotide XYRZ is dependent on the identity of X and Z [45,46]. Several other sequences that have well-defined conformational properties at the dinucleotide level are so strongly affected by their neighbors that they behave completely different at the tetranucleotide level [47]. For this reason, the 256 tetranucleotides have most recently been the subject of investigation, and the characterization of these units in terms of flexibility, stability and minor groove widths have now been reported [47,48].
In this study, we characterized tetranucleotides in positioned nucleosomes in genome wide libraries from yeast and C. elegans. The overall aims were to identify tetranucleotide periodicities that are likely to be associated with the sequence-dependent positioning of nucleosomes and to compare the importance of these tetranucleotides to dinucleotides periodicities in the control of nucleosome placement. The results suggested that a defined subset of tetranucleotides is involved in preferred nucleosome occupancy and that these tetranucleotides are the major source of the dinucleotide periodicities that are characteristic of positioned nucleosomes.

Analysis of Tetranucleotides in the Yeast In Vitro Nucleosome Library
The yeast genome-wide library of nucleosome sequences described by Kaplan et al. [12] was used to provide a description of the tetranucleotide sequence patterns in in vitro positioned nucleosomes. The library was prepared by high-salt reconstitution methods using purified histones from chicken erythrocytes and naked high molecular weight yeast DNA using a DNA: histone mass ratio of 2.5:1. Reconstituted chromatin was then digested with micrococcal nuclease (MNase) and the nucleosome core particle DNAs were sequenced by utilization of the Illumina Solexa technology. The DNA excess should select for preferred histone octamer binding sequences in the absence of epigenetic factors and minimize the possibility that a nucleosome positioned by sequence does not serve to position adjacent nucleosomes by a sequence-independent boundary effect.
To our knowledge, there have been no systematic characterizations of the DNA sequence periodicities in positioned nucleosomes at the tetranucleotide level. As a first step aimed to address this problem, the frequency profiles of the 256 tetranucleotides were examined in the yeast library. Fouriertransform analysis of this frequency data was used to construct Table S1, which gives the periodicities, fractional variations of occurrence (FVOs), and phase angles for each tetranucleotide. The FVO represents the strength of the periodic oscillations in a frequency profile relative to the frequency average. The phase angle is used to indicate whether the minor groove of a tetranucleotide faces inward or away from the histone surface. In order to compare the characteristics of tetranucleotides with dinucleotides, Table S2 was constructed which gives the corresponding dinucleotide parameters.
A four-step procedure was performed in order to simplify the presentation of the tetranucleotide data in Table S1. First, approximately 30% of the tetranucleotides were omitted because they displayed weak ,10 bp periodicities and consequently low FVOs ( Figure S1). Second, only those tetranucleotides that had minor grooves facing the histone octamer (phase angles ,2135u or .+135u) or away from the octamer (phase angles .245u or ,+45u) were considered for further study. These angles were chosen because we assume that sequence-dependent bending, bendability, and kinking would most likely depend on sequences with these rotational orientations. Third, only unique tetranucleotides were considered for the analysis. This simplification was justified because unique tetranucleotides shared identical FVOs and opposite phase angles with their reverse complements (Table  S1). A total of 63 tetranucleotides satisfied these three criteria and are displayed in Figure 1. A final distinction was made according to the relative frequency along the nucleosomal DNA. The tetranucleotides were classified as peripherally located, centrally located or uniformly distributed along the sequence. File S1 shows the frequency profiles of the tetranucleotides, providing examples of these distributions.
Characteristics of the 63 tetranucleotides that satisfy the above criteria are presented in Figure 1. Tetranucleotides are grouped according to preferential distribution along nucleosome DNA (peripheral, central, or uniform) and rotational orientation of the DNA minor groove (In vs. Out). The numbers beside the tetranucleotides are FVOs. In each column, the tetranucleotides are grouped according to similarities in sequence. Consensus sequences derived from the 8 groups are presented in bold face type below the tetranucleotides. Each tetranucleotide in a group shared a common dinucleotide and at least one common 59 or 39 flanking base. If the common dinucleotide was in the center of the tetranucleotide, both flanking bases were required to match the consensus sequence. The permissible 1bp staggers in the alignment procedure roughly correspond to the assigned phase angle ranges of +/245 degrees. The map at the top of the table depicts the general positions of the consensus sequences and their rotational orientation. Seven of the 63 tetranucleotides could not be described by a consensus sequence. Figure 2 (Left panels) shows the occurrences of the consensus tetranucleotide sequences derived from the studies in Figure 1. For comparison, the profiles of the corresponding central dinucleotide sequences of the tetranucleotides are shown in the right panels to illustrate the importance of the central flanking bases. The numbers adjacent to the sequence designation in parentheses are the FVOs. The results with the tetranucleotides are in complete agreement with those in Figure 1 in terms of peripheral inward localization of AnTm, the central inward localization of YTAR and YCAG/CTGR, the uniform inward localization of WTAW, the uniform outward localization of RACY/RGTY, and the central outward positioning of RCAY/RTGY, RCGY and RGCY. The strengths of 10.2 bp periodicities, as measured by the FVO values, are, on average, 1.8-fold higher for the tetranucleotide consensus sequences than the corresponding dinucleotide sequences.
There are a total of 136 unique tetranucleotides, which include the 24 tetranucleotides that make up the 8 consensus sequences. The remaining 112 unique tetranucleotides were not analyzed in Figure 2 because they had low FVOs, intermediate phase angles and/or because they did not align to the consensus sequences. The analysis in Figure 3A-C was carried out to ascertain the relative contribution of the 24 consensus tetranucleotides to the strength of the ,10 bp tetranucleotide periodicities in the entire library. In this analysis, we compared the average strength of the periodicities of all tetranucleotides in the in vitro library to the strength of the periodicities of tetranucleotides in a modified library that lacked the tetranucleotide consensus sequences and to another library that contained only isolated and overlapping consensus tetranucleotides. The original spacing of the tetranucleotides in the two modified libraries was retained by using the procedure described in the Materials and Methods Section. In panel A, the strengths of the periodicities in the three sequence sets were examined as a function of increasing sequence reads since nucleosome sequences that are characterized by a higher number of reads presumably correspond to DNA sequences with higher affinity for the histone octamer and/or higher positioning activity. Panels B and C illustrate the nature of the periodicities in the three libraries for all reads and for sequences with greater than 6 reads, respectively. The strength of the 10.2 bp periodic signal increased with increasing numbers of reads with the unmodified library. This increase was substantially greater with the library consisting of only consensus sequence tetranucleotides, which is expected of sequence elements that are responsible for nucleosome positioning. In contrast, the periodic signal was greatly reduced (.80%) with the library containing only the non-consensus sequence tetranucleotides, which shows that the consensus sequence tetranucleotides are the major contributors to the periodic pattern of the entire library.
A similar approach was used to assess the contribution of the 24 consensus tetranucleotides to the average of the dinucleotide periodicities in the in vitro library ( Figure 3 D-F). The results show that there is a near complete loss of dinucleotide periodicity in the absence of the tetranucleotides that comprise the consensus sequences. In addition, there was essentially a complete loss of the periodic patterns displayed by each of the ten unique dinucleotides in the library that lacked the 24 consensus sequence tetranucleotides as shown in Figure S2. These observations demonstrate that the periodicity of dinucleotides in the in vitro library are, in actuality, due primarily or exclusively to the tetranucleotides (or longer oligonucleotides) that comprise the 8 consensus sequences.

Relationships between Tetranucleotides and Nucleosome Stability and Positioning
Synthetic DNA fragments which display high affinity for the histone octamer in vitro were used to study the relationships between the consensus tetranucleotide sequences in Figures 1 and  2 and nucleosome stability and positioning activity. The first sequence set was prepared by a SELEX approach starting with a large pool of chemically synthetic random DNA molecules [49,50]. These fragments display the highest reported affinities for the histone octamer, and this characteristic likely arises from multiple sequence determinants. The 73 bp central regions of these sequences, which contain all information needed for high nucleosome affinity, were used to derive the conserved sequence that is shown in Figure 4 [50]. Indicated along this sequence are tetranucleotide sequences that are represented by four of the tetranucleotide consensus sequences. These 13 tetranucleotides occupy over 60% of the length of the sequence and display the same rotational orientation as the corresponding tetranucleotide consensus sequences in the yeast genome-wide library. The positions of the 13 tetranucleotide sequences along this sequence also closely coincide with regions of high frequencies in the occurrence profiles shown in Figure 2. For example, the high occurrence peaks of YTAR at positions at +/215 and +/225 in the database sequences correspond to CTAG and TTAA at +/215 and +/225 in the synthetic sequence. Similar positional correspondence is seen with GTGC and GCAC (RCAY+RTGY), AGCT (RGCY), and GCGC (RCGY). The high frequency of these tetranucleotides in the synthetic fragment, as well as the similarities in rotational and translational positions within the nucleosome, argue for a functional role of these sequences in the control of nucleosome occupancy in vitro.
A more detailed functional analysis of the single consensus sequence YTAR is given in Figure 5. A synthetic 223 bp DNA fragment known as 67 displays a high affinity for the histone octamer and positions a nucleosome at a single translational frame [36,37]. The nucleosome that assembles onto Fragment 67 also contains a single site that is hypersensitive to KMnO 4 . The hyperreactive T residue is contained within a TA step, which is located 15 bp upstream from the dyad at a site that is highly distorted in the nucleosome. Mutational analysis revealed that both the TA step and its flanking bases are required for high affinity octamer binding and translational positioning. Figure 5 (top panel) shows an analysis of the occurrences of the nucleosome sequences in the yeast database that contains the 8 bp sequence CTCTAGAG that surrounds the hyperreactive T residue in Fragment 67. Also shown in this panel are occurrence profiles when the central 6 bp and 4 bp of this sequence were used in the analysis. The results revealed characteristic patterns of occurrences that are consistent with experimental data in that the frequencies of each sequence are highest within the central turns of the nucleosome, with the most prominent peaks at positions +/215 and +/225 from the dyad. The FVO value of CTCTAGAG was also .2 SD above the mean of all octamers in the yeast database (see legend) and its enrichment at the +/215 bp region relative to the frequency average was .4 SD above the mean (data not shown). There was a reduction in the strength of this pattern when the central TA step was changed to TG/CA, and a near loss of the periodicity when the TA flanking bases were exchanged from C and G to G and C (middle and bottom panels). Corresponding reductions in nucleosome stability and positioning activity were seen when these mutations were made in Fragment 67 [37].
The analysis in Figure 5 was carried out for 16 mutations that were made in Fragment 67. These fragments were tested previously for stabilities (DG) and positioning activities relative to wild-type 67 (% 67) in in vitro nucleosome reconstitution assays [37]. The results of these published experiments are given in Table 1. The FVOs of the 67 derivative sequences were determined from the frequency profiles like those in Figure 5 (File S3). Only the octamer FVOs are shown in Table 1 while the FVOs of the hexamers and tetramers are provided in File S3. Changes in the TA step in fragment 67 resulted in reductions in nucleosome stability and positioning activity and these changes were accompanied by corresponding reductions in FVOs (37, Table 1A). The order for obtaining stable nucleosomes, positioning activity and FVO values was TA.TG.TT$TC<GG< GA<AT. Likewise, there were reductions in nucleosome stabilities and positioning activities with corresponding decreases in FVOs when TA flaking bases were altered (Table 1B). KMnO 4 hypersensitive TAs were also observed in other sequences that position nucleosomes at single translational sites including the synthetic fragment 601 and the 5S rDNA sequence from sea urchin [36,37]. The 10 bp sequences surrounding these TA step conferred high stability and positioning activity when they replaced the 210 to 220 bp region in Fragment 67 [36,37]. The major core sequences in this set were TTAA and CTAG with the consensus YTAR. Most of these sequences were located at approximately +/25, +/215 and +/225 from the dyad in their native fragments with their minor grooves facing inward toward the histone surface. Table 1C shows that the octamer sequences centered within these 10 bp insertions generally have high FVOs and confer high stability and positioning activity when tested experimentally in fragment 67. High frequency peaks in the occurrence profiles of the database sequences were also found at +/215, +/225 and +/235 in most of these sequences (File S3) in agreement with their locations in the native fragments. Table 1D shows that the FVO values derived from the frequency profiles of the core tetramers, hexamers, and octamers within the sequences listed in Table 1 A-C are highly correlated with nucleosome stabilities and positioning activities. In contrast, the FVOs of central dinucleotides were not significantly correlated with these parameters (data not shown). These correlations indicate that sequence motifs known to be important for nucleosome positioning in vitro exhibit strong periodicities in genomic nucleosome libraries and that the intranucleosomal patterns of occurrence in these libraries is related to their activities in in vitro reconstitutions.

Analysis of Tetranucleotides in In Vivo Libraries
A central question is whether the results obtained with nucleosomes reconstituted in vitro reflect the nucleosome sequence patterns found in cells. To address this question, tetranucleotide sequence profiles in 4 genome-wide in vivo libraries were characterized (File S2) and compared to those in the library generated by in vitro assembly. Three of the libraries were from yeast and one from C. elegans [8,11,12,14] ( Table 2). The in vivo libraries differ from the in vitro library in three aspects. First, there was no selection for high affinity octamer binding sequences, as there was for the in vitro sequences. Second, the in vivo nucleosomes positions are subject to nucleosome boundary effects where a nucleosome positioned by sequence can phase an adjacent nucleosome in a sequence-independent manner [8,51]. Third, transcriptional-dependent processes have been shown to alter the positions of nucleosomes in yeast in vivo relative to the in vitro preferred positions, which are dictated by DNA sequence alone [13,14,22]. Table 2 displays global properties of all tetranucleotides sequences in the libraries while Figure 6 and Table S3 give characteristics of the 8 consensus sequences. The sequence features that reflect the patterns of occurrence and rotational orientations within nucleosomal DNA in the in vitro library are conserved in all in vivo libraries. The average periodicities of tetranucleotide sequences varied little among the libraries with an overall mean and median periodicity of 10.25 and 10.11 bp for those tetranucleotides that displayed significant FVOs. These periodicities are similar to those reported from the analysis of dinucleotides in chicken erythrocyte nucleosome DNA (10.15-10.26 bp) [35], and hydroxyl radical footprinting studies (10.18 bp) [52]. There was also a high correspondence between the phase angles in the in vitro and in vivo libraries as evidenced by the high correlation coefficients in Table 2 and in the correlation plots between the in vitro yeast data and the in vivo data from yeast and C. elegans ( Figure S3). This correspondence can also be seen with all consensus tetranucleotides by the coincidence of peak frequency positions in the 8 consensus profiles in the in vitro and in vivo libraries (Table S3 and Figure S4).
The similarities between the in vitro and in vivo libraries extend beyond features that relate to rotational orientation of nucleosome DNA. The relative strengths of the tetranucleotides periodicities are also similar in the in vitro and in vivo libraries as revealed by the similarities in relative FVOs as seen in the Figure 6. These results suggest that the utilization of the consensus sequence tetranucleotides is similar in vitro and in vivo in both yeast and C. elegans. This is also seen from the qualitative similarities in occurrence profiles of tetranucleotides in the in vitro and in vivo libraries (Figures 2 and  S4). These profiles also illustrate a consistent difference between the in vitro and all vivo libraries. The relative peak heights of all consensus sequences in the nucleosome periphery tended to be greater in the in vivo libraries as compared to the in vitro library. This difference was least pronounced with AnTm and most pronounced with YTAR.
The major difference between the in vitro and in vivo libraries was the strengths of the tetranucleotide periodicities, as quantified by FVOs. The FVOs of each tetranucleotide consensus sequence in each in vivo library, as well as the average tetranucleotide FVOs, are 3-4 fold lower than those displayed by the in vitro nucleosome sequences (Figures 2 & S3 and Table S4). It was also noted in previous studies that the strength of the AA/TT periodicity in in vitro libraries was greater than in in vivo libraries [12,13]. Perhaps the most straightforward explanation for these results is that a smaller fraction of nucleosomes are positioned by DNA sequence in vivo [12,13].

Oligo A/T Tracts and Nucleosome Positioning and Stability
Models attempting to explain nucleosome occupancy from nucleotide sequence are frequently based on 10 bp periodicities of dinucleotides, in particular AA/TT/TA. These studies most often quantify frequencies of AA and TT steps rather than individual di, tri, tetra, and penta-A and T-containing nucleotide motifs [7,16,29,30]. A limitation to this approach is the uncertainty of the source of the signal since, for example, a single A 4 tetranucleotide is counted as three AA dinucleotides. The importance of AA/TT dinucleotides in the in vitro yeast library was revaluated in Figure 7 by separating the signal qualities derived from isolated AA/TT dinucleotides and isolated A/T tracts of varying lengths. The sequence elements were isolated by G and C (S) rather than by T and A in order to eliminate tetranucleotides such as AATT, TTAA, and ATTA, which exhibit strong 10.2 bp periodicities (Table S1). The normalized occurrences are given in the top panel of the figure, and the corresponding FVOs as a function of period are graphed for these motifs in the bottom panel. Strong 10.2 bp periodicities are exhibited by the isolated oligonucleotide tracts, following the order A 5 /T 5 .A 4 /T 4 .A 3 /T 3 while no significant patterns were detected with the isolated AA/TT dinucleotides.
In order to provide additional evidence for the importance of oligo A/T tracts, the occurrences of isolated AA/TT, non-isolated Figure 3. Contribution of the consensus tetranucleotides to the average tetranucleotide and dinucleotide periodicities. In order to evaluate the significance of the consensus tetranucleotides, the Kaplan et al. 2009 in vitro replicate 1 library [12] (Unmodified Library) was modified as described in the Methods Section to yield the Consensus Only and No Consensus libraries. The average FVO of the tetranucleotides for a 10.2 bp periodicity in the three libraries as a function of reads is given in Panel A. Panels B and C present graphs of the average tetranucleotide FVO versus period for all reads and for sequences with greater than 6 reads, respectively, for each of the three libraries. An analogous study was performed on the dinucleotides, which is represented by Panels D-F. doi:10.1371/journal.pone.0010933.g003 AA/TT steps and oligo A 3 -A 5 /T 3 -T 5 , tracts were computed in the in vitro yeast library, the three in vivo yeast libraries and the library from C. elegans (Figure 8). The results revealed strong patterns for the oligo A 3 -A 5 /T 3 -T 5 , tracts, weaker patterns for non-isolated AA/TT steps and no significant patterns for the isolated AA/TT dinucleotides in each library. These occurrence profiles are reflected quantitatively in Table S4, which shows that the FVOs for oligo A 3 -A 5 /T 3 -T 5 , tracts were consistently ,2fold greater than those for the non-isolated AA/TT steps while the patterns for isolated AA/TTs are weak and not significant in all libraries. These results show that inclusion of the dinucleotides data in the total AA/TT step analysis detracts from the strength of the ,10 bp relationship.   Table 1 that are non-palindromic. The average FVO 10.2 value for all 65,536 octamer sequences was 0.112 with a SD of 0.059; therefore, the FVO of CTCTAGAG is more than 2 SD above the mean ( Table 1). The same analysis was carried out for 67-m1 and 67-m9 where the central TA step was changed to TG/CA (middle panel) and flanking bases were exchanged from C and G to G and C (bottom panel Several studies have suggested that oligo A/T tracts in a 10 bp period can facilitate nucleosome formation but the sequence features that are responsible for this effect have not been clearly defined [31][32][33][34]. Since the studies in Figures 7 and 8 show that isolated AA/TT dinucleotides are not periodic in genomic nucleosome sequences, it was of interest to examine the effects of A-tract length on promoting nucleosome assembly. The synthetic nucleosome positioning sequence Fragment 67 was used to address this question (Figure 9). The fragment was modeled after natural nucleosome positioning sequences and contains two regions of curvature that reside on opposite sides of the dyad [53,54]. As noted above, the fragment also contains a single KMnO4 hypersensitive site that is located at a TA step at 215 bp from the dyad, which is required for high nucleosomes stability and unique positioning [36,37]. The four A tracts depicted in the figure alternate with GC rich segments, and their minor grooves face the histone surface. The region containing these tracts is responsible for establishing the rotational orientation of the entire fragment [54]. In order to characterize the effects of A-tract length on nucleosome properties, the four A5 (AAAAA) tracts in Fragment 67 were replaced by A3 (AGAAA), A2 (AGAAG) and A1 (AGAGA) sequences. Electrophoretic analysis of the four 223 bp fragments on the PA bending gel in Figure 9B revealed that the electrophoretic anomaly displayed by the wild type A5 (67) fragment was reduced by approximately 50% and 80% upon conversion to A3 and A2, respectively. The electrophoretic mobility of the A2 fragment is essentially the same as fragment A1, illustrating the importance of at least 3 continuous As in generating electrophoretic anomaly, in agreement with previous reports [55].
The four fragments were reconstituted into nucleosomes at 25uC and 37uC using the histone exchange method, and the nucleosomes were analyzed in order to determine if they could promote high nucleosome stability and translational positioning activity ( Figure 9C). Translational positioning activity was first analyzed by native PAGE analysis where the slow migrating nucleosomes are located on the positioning sequence in the center of the fragment. The nonpositioned nucleosomes assemble at multiple sites along the sequence, and consequently, most migrate faster than the centrally positioned nucleosomes. Representative samples of these native gels are shown in the figure. Positioning activity was also monitored by restriction endonuclease accessibilities using Hae III and Msp I as detailed previously [36,37]. The data are summarized below the gel as the means (+/2S.E.M.) from at least 4 independent experiments. The results demonstrated that there was a modest decline in stability and positioning activity in the A3 construct relative to the wild type A5 (67) sequence but a dramatic decrease at both temperatures in the A2 and A1 fragments. The high translational positioning activity associated with the A5 and A3 fragments was also evident from exonuclease III digestion patterns in Figure 9D. Digests of nucleosomes reconstituted onto Fragments A5 (67) and A3 revealed major   persistent pauses that mapped to the borders of the positioned nucleosome as reported previously for Fragment 67 [36,37]. In contrast, the A2 and A1 nucleosomes displayed a ladder pattern, indicative of multiple nucleosome positions on these fragments. The ladder pattern is essentially identical to that seen with the negative controls, Fragments AT and 61. Fragment AT has an AT step in place of the TA step at the 215 site of Fragment 67 while Fragment 61 is a 6 bp deletion of Fragment 67. The rotational orientation of the TA step at the 215 site is important for the high nucleosome positioning activity and stability of Fragment 67 since reductions in these functions were seen when the TA step was translocated by as little as 1 bp in either direction [37]. This observation, and the observation that these 4 A5 tracts shown in the figure likely dictate the rotational orientation of the entire fragment [54], provide a plausible mechanism by which the A5 tracts control positioning since these tracts should dictate the rotational orientation of the TA step. To investigate this possibility further, reconstituted nucleosomes were subjected to hydroxyl radical cleavage analysis ( Figure 9E), and the relative strengths of the hydroxyl radical cutting sites are indicated by the vertical lines in Figure 9a. In Fragments A5 (67) and A3 , the minor grooves of the downstream A tracts face the histone surface, as does the minor groove of the upstream TA step   [14], and E) the Valouev et al. 2008 [11] (C. elegans) library were computed. The blue lines show the occurrences of oligo A 3 -A 5 /T 3 -T 5 tracts, the red lines show the occurrences of all AA and TT dinucleotides, and the green lines show the occurrences of AA and TT dinucleotides isolated by C or G. All profiles displayed were normalized by their average frequencies and the FVOs, frequencies and periods are given in Table S4. doi:10.1371/journal.pone.0010933.g008 at 215bp. However, the rotational orientation displayed by Fragments A2 and A1 was shifted by 2-3 bp, producing an altered rotational orientation of the TA step at 215 from an inward to a more outward facing position in relation to the histone octamer (Fig 9a). This altered rotational orientation likely plays a role in the near loss of the KMnO4 hypersensitivity of the reactive T at the TA step in fragments A2 and A1 ( Figure 8E), as well as the low positioning activities of these fragments ( Figure 9C,D). These results point to the importance of oligo A tracts in the establishment of rotational orientation and illustrate how rotational orientation can be directly linked to translational positioning. The phased A-tracts may also contribute to the stability of Fragment 67 by an effect independent of the action on translational positioning since curved sequences without translational positioning signals are preferentially packaged into nucleosomes [32,33].

Tetranucleotides vs. Dinucleotides
Perhaps the most distinguishing sequence characteristic of positioned nucleosomes is the periodic occurrences of certain dinucleotides, and this feature forms the basis of many models that have been used for predicting nucleosome occupancy from nucleotide sequence [7,16,28,29,35]. Some of the more recent models also incorporate non-periodic and position-independent sequence characteristics including oligonucleotides, G+C content, and long A tracts as nucleosome exclusion elements [11,12,29,[56][57][58][59][60]. In this report, we have taken an approach for describing positioned nucleosomes, which exploits sequence information derived from the periodic occurrences of the 256 tetranucleotides in nucleosome DNA. The approach entailed the identification and analysis of 24 unique tetranucleotides that were defined by 8 consensus sequences (Figures 1 and 2). The periodicities of these 24 tetranucleotides are responsible for most of the strength of the tetranucleotide periodicity displayed by the entire in vitro library, and consequently the 8 consensus sequences are the major source of the periodic signals in positioned nucleosomes. In addition, the signal strength displayed by the consensus tetranucleotides increased dramatically with increasing sequence reads, which is expected of sequence elements that are responsible for nucleosome positioning and/or histone binding affinities. The consensus tetranucleotides are also responsible for essentially all of dinucleotide periodicities displayed by the library as seen in Figures 3 and S2, which points to the fundamental importance of these tetranucleotides, in contrast to dinucleotides, as distinguishing features of positioned nucleosomes.
The analysis of A-tract length on nucleosome positioning also provided strong evidence that oligonucleotide-length sequences rather than dinucleotides give a more accurate and complete description of sequence features that are involved in nucleosome positioning. The studies in Figure 9 demonstrated that an A 3 -containing DNA fragment arranged in a 10 bp period displayed approximately half the nucleosome positioning activity and electrophoretic anomaly when compared to an A 5 fragment, while A 2 and A 1 -containing fragments displayed near normal gel mobility, low affinity for the histone octamer and failed to position nucleosomes at single translational sites in vitro. These results are in total agreement with the studies in Figures 7, 8 and Table S4, which show that the source of the periodic signal seen in the analysis of all AA/TT steps is due to oligo A3-A5/T3-T5 tracts, and that isolated AA/TT dinucleotides are not periodic in genomic nucleosome sequences from yeast and C. elegans. In fact, the present results clearly show that inclusion of isolated AA/TT dinucleotides detracts from the strength of periodicities when all AA/TT steps are computed in both in vitro and in vivo libraries, which raises the question as to whether isolated AA/TT dinucleotides should even be included in predictive models for nucleosome occupancy.
The nucleotides that flank central dinucleotides in a tetranucleotide can have profound influence on the properties of the tetramer [42][43][44][45][46][47]. For example, each AA dinucleotide embedded in an oligo A-tract of 3 bp or longer has a highly unusual structure that confers to the tract enhanced stiffness and resistance to bending forces. In contrast, the structure and properties of AA dinucleotides flanked by G or C are characteristic of normal B-DNA [26,48,55,61,62]. Similarly, the ability of the TA step to facilitate nucleosome assembly and positioning in regions of high curvature demand in the nucleosome is highly dependent on the nature of the TA flaking bases [36,37, Figure 5 and Table 1]. Flanking bases of central dinucleotides also often play a significant role in dictating the phase angle of a tetranucleotide and consequently whether the minor groove of the tetranucleotide faces toward or away from the histone surface. This effect was seen with each of the 10 unique central dinucleotides as detailed in File S2. For example, the dinucleotide CA/TG displays a weak 10.2 bp base periodicity; however, the tetranucleotide analysis in Figure 2 resolved two distinct patterns for CA/TG (RCAY+RTGY and YCAG+CTGR), which display opposite rotational orientations causing a cancellation of signal strength in the dinucleotide profile. TA and TG/CA steps are the most intrinsically variable and hence most flexible of the ten unique dinucleotides in terms of roll, twist and slide. The TG/CA step is also the most variable in terms of bending into the minor and major grooves as revealed by analysis of crystal structures of oligonucleotides and in vitro studies with positioned nucleosomes [42][43][44]49,63]. The results presented in this study suggest that this distinction is related to the TG/CA flanking bases in the nucleosome.
Studies with natural and synthetic nucleosome positioning sequences have suggested that the major determinants for translational positioning are located in the central regions of nucleosomal DNA [1]. The center turns of the nucleosomal DNA at positions 0 to +/230 are tightly associated with the H3/H4 tracts, which corresponded to A 3 (AGAAA), A 2 (AGAAG), and A 1 (AGAGA). Hydroxyl radical cleavage efficiency is indicated at each base by vertical lines. B. Fragments were separated on a 9% native PA-gel at 5uC in order to study DNA bending. M is a marker. C. Fragments were reconstituted into nucleosomes at 25uC using the histone exchange procedure and energies of reconstitutions were determined as described previously using chicken DNA as competitor. Translational positioning was determined on native PA gels and a sample gel is shown in the figure. Over 95% each fragment was assembled into nucleosomes and only the nucleosome region of the gel is shown in the figure. The % of radioactivity in the top-positioning band is given below the figure as are the results of restriction nuclease accessibility measurements for positioning activity (M, Msp 1; H, Hae III). D. Fragments were end labeled either on the top or bottom strands and assembled into nucleosomes. Reconstituted fragments were then digested with Exo III for 5 minutes. The arrows indicate the nucleosome boundary of Fragments A 5 (67)  tetramer, and the sharpest bends in the nucleosome occur in this region at +/215 bp from the dyad [64][65][66][67]. Previous experimental studies with synthetic fragments and the 5S rDNA sequence from sea urchin have shown that the nucleosome positioning sequences TTAA and CTAG with the consensus YTAR are located at sites of maximal curvature in the nucleosome at positions +/25, +/215 +/225 and +/235 bp from the dyad [36][37][38]. There is a preference for TA containing motifs over TG motifs in these central turns as seen in nucleosome occupancy profiles, which is consistent with the observation that nucleosome positioning activity followed the order CTAG.CTGG.CNNG when these sequences were placed at the 215 bp region in a synthetic nucleosome positioning sequence [37]. Richmond and Davey [68] demonstrated that DNA kinking occurred at TG steps at positions 2+35 2/+45 and +/255. The inward facing CTGR+YCAG that overlap with these regions is consistent with this view. The sequences RCAY+RTGY, RACY+RGTY, RCGY and RGCY that are centered 5 bp away from the YTAR and YCAG+CTGR elements could play a role in facilitating the DNA bending into the major groove in the central turns of nucleosomal DNA (Figures 2, S4). All of these outward-facing sequences have relatively wide minor grooves (6.8-7.7 A), which favor the deflection of the helical axis toward the histone surface. This arrangement was also observed in the central region of the nucleosome positioning sequences analyzed in Figure 4, which contain an unusually high density of both inward and outward facing consensus tetranucleotodes. These results seem consistent with the mini-kink model for DNA bending in the nucleosome, where DNA sharply bends into the minor and major groove at 5 bp intervals by a mechanism that involves lateral slide displacements [69].

In Vitro vs. In Vivo Libraries
Yeast genome -wide studies have established that most nucleosomes are positioned at the same chromosome location in the majority of the cells in the population [4,5,6,8,[12][13][14]. However, the fraction of nucleosomes that are positioned by DNA sequence in the cell remains an open question. While it is clear that long dA:dT tracts in NFR regions are important factors in promoting nucleosome exclusion both in vitro and in vivo [26], it remains uncertain as to the prevalence of nucleosome favoring sequences in the genome. One common approach used to address this question has entailed the comparison of in vivo and in vitro nucleosome occupancy maps [12][13][14]. Although these studies have often led to controversial results and interpretations, most of the recent analyses have suggested that the fraction of nucleosomes positioned by sequence in vivo is small, and that epigenetic factors play more influential roles in nucleosome organization. Recent observations have also shown that depletion of a chromatin remodeling factor [22] and RNA polymerase [14] resulted in nucleosome repositioning to a state that is more similar to the positions dictated by DNA sequence as detected by in vitro reconstitution. These results are consistent with the emerging view that the ground state of nucleosome organization is dictated by DNA sequence and that epigenetic factors are superimposed on this state for the final organization of nucleosomes in the cell.
These considerations raise questions concerning the results of the studies described in this report. The results of this study revealed that the nucleotide sequence patterns of nucleosome positioned in vivo are strikingly similar to those assembled in vitro from purified components. These similarities include rotational orientations and relative FVOs of tetranucleotides, tetranucleotide periodicities, and the profiles of occurrence of the consensus tetranucleotides, which are likely to play important roles in nucleosome positioning (Figures 2, 6, S2, S4). In addition, an analysis of the yeast in vivo library by the procedures described in Figure 3 revealed that the periodicity of dinucleotides was due primarily to the tetranucleotides that comprise the 8 consensus sequences as was seen with the in vitro library (data not shown).
The major difference between the in vitro and in vivo libraries was the strengths of the tetranucleotide periodicities, as quantified by FVOs. The FVOs of the tetranucleotide consensuses sequences in each in vivo library, as well as the average tetranucleotide FVOs, are 3-4 fold lower than those displayed by the in vitro nucleosome sequences ( Table 2 and Table S3). These results imply that the frequency of positioning determinants on a sequence basis is less in the in vivo datasets or, more likely, that a relatively large fraction of the sequences in the in vivo libraries lack sequence information for DNA directed nucleosome positioning. It follows that the small subset of sequences in the in vivo libraries that contain positioning signals may be derived from those nucleosomes that have not been subjected to repositioning by transcription, epigenetic mechanisms or chromatin boundary effects. These residual nucleosomes might be expected to represent a relatively small fraction of the yeast genomic sequences since at least half of the yeast genome is transcribed at least once during the cell cycle. These results should not be taken to imply that DNA sequence-directed nucleosome positioning is not of biological relevance since nucleosomes positioned by DNA sequence may be important for the initial repositioning processes. For example, nucleosomes positioned by DNA sequence can control the initial direction of translocation, translocation distance, as well as the new positions adopted by nucleosomes in response to chromatin remodeling machines [70].
Trifonov and Sussman [28] identified 10 bp sinusoidal patterns of AA/TT dinucleotide sequence preference along eukaryotic DNA nearly 30 years ago and suggested that these patterns facilitate the packaging of DNA into the nucleosome. This interpretation has been used extensively as evidence for the relevance of periodic sequence patterns in genome-wide nucleosome libraries. A more direct way for assessing biological significance of these patterns is based on the results in Figures 5,  6 and Table 1, which show that sequence features known to be important for nucleosome positioning in vitro were enriched in the genome-wide libraries, and that their rotational orientations and distributions along nucleosome DNA were correlated with their activities in in vitro reconstitutions reactions. Likewise, the effects of A-tract length on nucleosome stability and positioning activity as revealed by in in vitro assembly was highly correlated with the strength of the periodic patterns of these A sites in the in vitro and in vivo sequence libraries (Figures 7, 8, 9 and Table S4). The high correspondence between the strengths of the patterns in genomewide libraries and positioning properties in vitro points to a basic strategy that could be used for development of novel predictive models for identifying nucleosome positions from nucleotide sequence and for evaluating sequence heterogeneity in nucleosome libraries in a meaningful fashion. Such a strategy could be used for fractionating nucleosome libraries into sequence subsets with different positioning determinants, and for assessing the number, arrangement, and linkage of positioning motifs in specific subsets of nucleosomal DNA sequences.

Materials and Methods
The yeast nucleosomal DNA sequence libraries analyzed in this study were derived from nucleosome occupancy experiments performed by Mavrich  The in vitro replicate 1 library (GSM351491) and the YPEtOH non-crosslinked in vivo library (GSM351494) from the Kaplan et al. 2009 data were used in this study. The information from these sources provided the yeast genome coordinates as well as the number of reads for each procured sequence. The Kaplan and Weiner data provided the 59 ends of the reads with directionality while the Mavrich data provided the nucleosome midpoints. These coordinates were used to extract nucleosomal DNA sequences from the May 2006 build of the Saccharomyces Genome Database. All sequences were made to be 147 bp in length, and the reverse compliments of these sequences were also analyzed. When computing the frequency profiles of a given motif, each nucleosomal DNA sequence was weighted by its corresponding number of reads ( Figure S5). So, if a certain sequence in one of these libraries had four reads, a given motif at any given position along the nucleosomal DNA would be counted four times instead of just once.
Five of the 13 in vivo libraries prepared by Kaplan displayed similar tetranucleotide profiles to the YPEtOH non-crosslinked library (YPEtOH crosslinked replicates 1 and 2, YPGal crosslinked replicate 1, and YPGal non-crosslinked replicates 1 and 2). The YPEtOH non-crosslinked replicate 1 library was chosen at random from this group of six for presentation in this report. The remaining seven in vivo libraries were out of phase by 5 bp as judged by the phase angles of AAAA and by a variety of other criteria. This may have resulted from slight over trimming or under trimming by MNase, but other explanations cannot be excluded. These libraries were rendered in phase by adjusting the phase angles of AAAA to +/2180 degrees through shifting the sequences 5 bp. When these adjusted libraries and the six unadjusted libraries were analyzed as a group of 13, the tetranucleotides profiles were nearly indistinguishable from the YPEtOH non-crosslinked library (data not shown).
The C. elegans nucleosome data were derived from studies conducted by Valouev et al. 2008 [11] and acquired from the Short Read Archive at NCBI under accession number SRA001023 (SRX000425). These short reads were mapped using the Bowtie software and pre-built indexes for the most recent assembly of the C. elegans genome [71]. The default two-mismatch threshold was applied along with ''-m 1'' reporting mode to ensure that only unique, confidently mapped reads were utilized. The color space option was used for these SOLiD reads. To report the 59 end coordinates of the reverse reads instead of the 39 ends, the following post-processing code was added to the command line: With these inputs, 33% of ,110 million C. elegans reads were reported. All nucleosomal DNA sequences from C. elegans were analyzed with their reverse complements and were made to be 147 bp in length.

Generation of the Modified In Vitro Libraries
The ''consensus only'' and ''non-consensus'' libraries were derived from the Kaplan et al. 2009 in vitro replicate 1 library. Nucleotides within the sequences of the ''consensus only'' library were replaced with an ''X'' unless they occurred within a consensus tetranucleotide. The consensus tetranucleotides were allowed to overlap each other. For example, no part of the sequence, ACGTGT, would be converted to an ''X'' because it is an overlap of the consensus tetranucleotides, RCGY and RCAY+RTGY. Because the consensus tetranucleotides were allowed to overlap, non-consensus tetranucleotides could also be counted in this library (such as CGTG, which is contained within the example sequence ACGTGT). However, the occurrences of non-consensus tetranucleotides in this library were considerably less than the occurrences of consensus tetranucleotides as expected (data not shown). Replacing nucleotides with ''X's'' in the ''consensus only'' library allowed one to count only tetranucleotides and dinucleotides that occurred within isolated or overlapping consensus tetranucleotides. ,51% of the nucleotides in the ''consensus only'' library were replaced with an ''X.'' The ''no consensus'' library was the exact opposite of the ''consensus only'' library as all nucleotides within isolated and overlapping consensus tetranucleotides were replaced with an ''X.'' In the ''no consensus'' library, the 40 (or 24 unique) consensus tetranucleotides had zero occurrences. ,49% of the nucleotides in the ''no consensus'' library were replaced with an ''X.''

Counting Occurrences of Motifs with Different Lengths
In order to analyze the periodicities of motifs of various lengths, a standard method for counting their occurrences was developed. The following formula was used to determine what position a particular motif would be counted: Therefore, the dinucleotide centers of even-length motifs are counted at the same position. For example, if a TA step was counted at 215 from the dyad and its 59 and 39 flanking bases were C and G, respectively, the corresponding tetranucleotide CTAG would also be counted at 215. Additionally, by this method, the centers of all odd-length motifs are counted at the same position. All frequency profiles displayed were generated after being subjected to a three-bond averaging procedure. To calculate the normalized occurrences for a frequency profile, the number of occurrences at each position along the nucleosomal DNA was divided by the average number of occurrences from all positions along the nucleosomal DNA.

Fourier-Transform Analysis
To measure the periodicities of the motif occurrences, Fouriertransform analysis was carried out on the raw frequency data as implemented previously [35,72]. The frequency-domain complex coefficients (C h = A h +iB h ) are given by where f f is the average frequency, f x is the frequency at position x, and c/h is the period. To generate the Fourier-transform spectra, the amplitude |C h | was calculated over a range of periods by incrementing h from 250 to 150, corresponding to periods of 8 to ,13.5 bp, respectively, as c = 2000. The values of c and h were selected in order to control the bp intervals in the Fouriertransform spectra.

Evaluation of Motif Periodicities
The fractional variation of occurrence (FVO) was used to determine and compare the strengths of the periodicities among the various motifs [35]. The FVOs were calculated from Fouriertransform (FT) spectra at either the maximal amplitude (FVO MAX ), or a 10.2 bp period (FVO 10.2 ). A period of 10.2 bp was selected since the majority of tetranucleotides' maximum amplitude periodicities were near 10.2 bp ( Figure S6). The equation used to determine the FVO for a 10.2 bp periodicity is given below where |C h = 196 | represents the amplitude in the FT spectrum at period of 10.2 bp and 142/10.2 represents the number of periods between positions x = 3 and 144.
The periodicities were also assessed by a technique that involved calculating areas under the Fourier-transform spectra. The value, %FTS 10.2 , was developed and is characterized by the percent area under the Fourier-transform spectra (%FTS) from 9.8 to 10.6 bp over the area under the entire FT spectra (8 to ,13.5 bp). These %FTS 10.2 values for many motifs were compared to their corresponding FT spectra, and this relationship was used to establish the significance of a given motif's 10.2 bp periodicity.

Computation of Phase Angles
The phase angles were based on 10.2 bp periodicities and were calculated using a reference point at position 2 (272 from the dyad) from using the equation

Table 2 Calculations
For determining the average and median tetranucleotide periods, tetranucleotide periods in the individual libraries were excluded if their corresponding FVO MAX values were less than one-half SD below the mean FVO MAX value of all tetranucleotides. For determining the correlation coefficients of the phase angles between the in vitro and in vivo libraries, tetranucleotide phase angles in the individual libraries were excluded if their corresponding FVO 10.2 values were less than one-half SD below the mean FVO 10.2 value of all tetranucleotides. Figure 8 All procedures used in these studies have been described previously [36,37,54]. The three derivative fragments were produced by insertion of synthetic oligonucleotide duplexes into the Psha I and Hind III sites of fragment 67. The calculated R L (apparent length/real length) of the fragments on the PA gel shown in Figure 8B were 1.8, 1.4, 1.2, and 1.2 for fragments A 5 (67), A 3 , A 2 and A 1 , respectively. The residual electrophoretic retardation seen in Fragments A 2 and A 1 is due to the four dyad upstream tracts, which are common to all fragments. Reconstitution procedures were also carried at 37uC, and the energies of reconstitutions were 242+/240, 202+/2202, 879+/2111, and 721+/275 cal/mol for A 5 A 3 A 2 and A 1 fragments, respectively. The corresponding percentages of radioactivities in the positioning bands on native gels were 97, 80, 28 and 38. Note that the 28 site in A and E has the same core sequence (CTAG) that is found at 215. The T at 28bp on the bottom strand in Fragments A 2 and A 1 becomes slightly more sensitive to permanganate because of the altered rotational orientation in these fragments. Figure S1 Elimination of tetranucleotides with weak periodicities. Only tetranucleotides that displayed significant ,10 bp periodicities in the Kaplan et al. 2009 in vitro library were used for the development of the tetranucleotide consensus sequences. This was determined from the %FTS 10.2 values, which represent the percent area under the Fourier-transform spectra (%FTS) from 9.8 to 10.6 bp over the area under the entire FT spectra of 8 to ,13.5 bp. Tetranucleotides were included if they had %FTS 10 Figure 1 because its frequency profile was periodic in the central region (File S1). Found at: doi:10.1371/journal.pone.0010933.s001 (0.27 MB TIF) Figure S2 Contribution of the consensus tetranucleotides to the periodicities of the 10 unique dinucleotides. The Kaplan et al. 2009 in vitro replicate 1 library [12] was modified as described in the Methods Section in two different ways in order to evaluate the significance of the consensus tetranucleotides. Panels A, B, and C present graphs of FVO versus period for the 10 unique tetranucleotides for the Unmodified, Consensus Only, and No Consensus libraries, respectively, for sequences with greater than six reads. In Vitro R1). The frequency profiles of tetranucleotides from all the sequences in the in vitro replicate 1, the EtOH non-crosslinked replicate 1, and the C. elegans nucleosomal DNA sequence libraries were examined in order to calculate the phase angle for each tetranucleotide. The phase angles of the two in vivo libraries, EtOH non-crosslinked replicate 1 and C. elegans, were plotted against the in vitro replicate 1 library, yielding Pearson correlation coefficients of 0.99 and 0.98, respectively. For this phase angle correlation, approximately 30% of the phase angles from each of these three libraries were omitted because their corresponding FVO 10.2 values were less than one-half standard deviation below the mean. determined for nucleotide sequence motifs that ranged in length from 1-6 nucleotides in order to study the relationship between sequence length and enrichment of 10.2 bp periodic sequences in sub-libraries with increasing numbers of reads. The mean FVO 10.2 values for each sequence length are plotted against the number of reads in the sub-libraries. The SD for each point ranged from +/240-60% of the means. The results show that the mean FVO 10.2 for each sequence length increased as function of the number of reads, and that the longer sequences increased to a greater extent than the shorter ones. Due to this observation, all nucleosomal DNA sequences were weighted by the number of reads in this study. Randomized subsets of the total library did not increase the FVO 10 10.2 values, and the phase angles are displayed below for the tetranucleotide consensus sequences in the four in vivo libraries as well as the in vitro library listed in Table 2. It is important to note that due to the fact that the reverse complement pairs of dinucleotides and tetranucleotides possess opposite phase angles, the sum of the frequency profiles of reverse complement pairs will always possess phase angles of 0 or +/2180 degrees. If dinucleotides or tetranucleotides within a reverse complement pair are far from 0 or +/2180 degrees, the corresponding FVO of the reverse complement pair will decrease relative to the FVOs of the single components. On the other hand, if dinucleotides or tetranucleotides within a reverse complement pair are close to 0 or +/2180 degrees, the corresponding FVO of the reverse complement pair will reflect the FVOs of the single components. If a perfect reference point had been utilized in calculating the phase angles, the table would display 0's and +/2180's instead of 2177.5's and 2.5's.  Table S1 and Table S2 were generated for each library listed in Table 2 and are located in File S2. In these tables, the 256 tetranucleotides are arranged in the table according to the 10 unique dinucleotide steps positioned in the center of the tetranucleotides. Reverse complements are also paired. For all dinucleotides and tetranucleotides, the maximum amplitude periodicities, the FVO MAX and FVO 10 Table 1, which are 8 bp in length, graphs of the frequency profiles from the in vitro library are provided. Additionally, within each of these graphs, the frequency profiles of the hexamers, tetramers, and dimers centered within each octamer sequence are given. All of the profiles were normalized by their average frequencies. The occurrences of the reverse complements of all of the sequences that are nonpalindromic in Table 1 were added to the frequency profiles.