A comparison of nucleosome organization in Drosophila cell lines

Changes in the distribution of nucleosomes along the genome influence chromatin structure and impact gene expression by modulating the accessibility of DNA to transcriptional machinery. However, the role of genome-wide nucleosome positioning in gene expression and in maintaining differentiated cell states remains poorly understood. Drosophila melanogaster cell lines represent distinct tissue types and exhibit cell-type specific gene expression profiles. They thus could provide a useful tool for investigating cell-type specific nucleosome organization of an organism’s genome. To evaluate this possibility, we compared genome-wide nucleosome positioning and occupancy in five different Drosophila tissue-specific cell lines, and in reconstituted chromatin, and then tested for correlations between nucleosome positioning, transcription factor binding motifs, and gene expression. Nucleosomes in all cell lines were positioned in accordance with previously known DNA-nucleosome interactions, with helically repeating A/T di-nucleotide pairs arranged within nucleosomal DNAs and AT-rich pentamers generally excluded from nucleosomal DNA. Nucleosome organization in all cell lines differed markedly from in vitro reconstituted chromatin, with highly expressed genes showing strong nucleosome organization around transcriptional start sites. Importantly, comparative analysis identified genomic regions that exhibited cell line-specific nucleosome enrichment or depletion. Further analysis of these regions identified 91 out of 16,384 possible heptamer sequences that showed differential nucleosomal occupation between cell lines, and 49 of the heptamers matched one or more known transcription factor binding sites. These results demonstrate that there is differential nucleosome positioning between these Drosophila cell lines and therefore identify a system that could be used to investigate the functional significance of differential nucleosomal positioning in cell type specification.

Introduction Over 75% of eukaryotic DNA within a nucleus is compacted into chromatin fibers that contain long repeating arrays of nucleosomes. In each nucleosome unit, a segment of DNA is wrapped around a histone protein core [1]. An essential role of chromatin is to compact the large amount of genomic DNA into the confines of the eukaryotic nucleus, but nucleosomes also physically occlude DNA from interactions with other DNA binding proteins [2][3][4]. Thus, the nucleosome structure is considered to be repressive to gene expression [5,6]. Indeed, depleting nucleosomes in yeast activates previously repressed genes even in the absence of activating transcription factors [7]. Controlled changes in nucleosome placement along the DNA are predicted to have regulatory roles in gene transcription [8][9][10]. Furthermore, the competition between nucleosomes and transcription factors for binding to the DNA strand can be considered an additional layer of epigenetic regulation of gene expression [11][12][13][14]. Because transcription factor concentration and access to genetic information changes with growth, cell differentiation and in response to environmental stimuli, the chromatin organization and nucleosome positioning must also change rapidly and precisely.
Positioning of nucleosomes is directed by two major factors: intrinsic DNA-histone interactions, and positioning of nucleosomes by remodeling complexes [15][16][17][18][19][20][21][22]. For most nucleosomes, each nucleosome is a discrete unit consisting of 147 base pairs (bp) of DNA wrapped around a histone octamer; 2 pairs of histones H2A H2B, and 2 pairs of H3 and H4 [23]. Previous work demonstrated that DNA sequences wrapped around a nucleosome exhibit predictable patterns that influence nucleosome occupancy [24][25][26][27]. In particular, the histone octamer prefers placement along DNAs containing 10 base pair repeats of AA/AT/TT dinucleotides out of phase with CG dinucleotide repeats [28][29][30]. The phased helical repeats of A/T dinucleotides every 10 base pairs allow for flexion of nucleosomal DNA around the histone octamer. Furthermore, poly-A kmers are generally excluded from nucleosomal DNA. Acting on top of the biochemical interactions that drive nucleosome positioning, the positions of nucleosomes can be altered by chromatin remodeling complexes [31,32]. These factors should therefore direct the landscape of nucleosome occupancy that characterizes a specific cell state following differentiation.
Previously, cell differentiation was considered to be driven solely by controlled expression of transcription factors (TFs) [33][34][35][36][37][38]. However, it is now recognized that cell fate depends not only on the expression of TFs, but also on the accessibility of target sites within the genome [4,11,39,40]. During differentiation, access to promoters of genes involved in cell-type specific transcription requires rearrangement of nucleosomes over and around particular transcription factor binding sites (TFBS) [13]. Recent studies have described physical changes to chromatin, including epigenetic changes, in specific loci that mark cell fate [11,41,42]. However, to fully understand the role of nucleosome positioning in cell-type determination, it is essential to conduct genome-wide analyses of nucleosome occupancy in different cell types. Genome-wide studies have been performed, but predominately in whole multicellular, multistage organisms [43]. Because the concentration of histones, and therefore the number, positioning and occupancy of nucleosomes, differs between different cell types, and during developmental stages, use of whole organisms may obscure underlying patterns of organization. We therefore, decided to examine nucleosome positioning and occupancy in different tissue lineages represented by the standard Drosophila S2 cell line and four distinct Drosophila L3 imaginal disc cell lines: leg, eye, antennal and haltere [44].
Drosophila melanogaster is an attractive model to use because the relatively small genome of the organism allows for reasonable coverage of mapped reads during parallel sequencing. The various cultured Drosophila cell lines are extensively characterized and therefore provide a powerful model for understanding cell-type specification. Several studies have characterized the unique differential expression profiles for many of the available Drosophila cell lines [33,45]. While each cell line necessarily possesses the same genome, each line maintains a distinct transcriptional profile that represents its tissue source and the concentration of factors driving expression [33,34].
In this work, we compare nucleosome positioning over key genomic regions and DNA sequences in distinct Drosophila cell lines. We report in vivo nucleosome positioning maps for the standard S2 cell line that is of embryonic hemocyte origin, and for the antennal, eye, haltere and leg cell lines that are derived from imaginal discs. We characterized patterns of differential coverage by examining nucleosome occupancy and positioning throughout the genome. By comparing nucleosome maps to each other and to the map from in vitro reconstituted Drosophila chromatin, we uncovered differences from intrinsic nucleosome organization that correlate with possible binding sites for in vivo factors that may direct cell type specification.

Drosophila cell culture
The following D. melanogaster cell lines were obtained from the Drosophila Genomics Resource Center-DGRC (https://dgrc.cgb.indiana.edu/). The five cell lines used in this study were: S2 (late embryonic cell line); Cme-L1 (leg disc imaginal cell line); ML-DmD11 (eyeantennal disc cell line); ML-DmD20 (antennal disc cell line); and the ML-DmD17 (haltere disc cell line). Cells were cultured in the DGRC recommended culture media at 24˚C. S2 cells were cultured in Schneider's Drosophila medium (Invitrogen) supplemented with 10% FCS (Hyclone); Cme-L1 cells were maintained in M3 (Sigma-Aldrich), supplemented with 2% FCS, 5 μg/ml insulin (Sigma), and 2.5% fly extract, while ML-DmD11, ML-DmD20 and ML-DM17 cells were maintained in M3+BPYE supplemented with 10% FCS and 10 μg/ml insulin. For the experiments, cells were replated on 60 mm plastic dishes at a density of 0.5-1×10 6 cells/ml, and allowed to proliferate for 3-4 days until they became they reached~85% confluency. Cell harvest required only gentle agitation to dislodge the semi-adherent cells, which were pelleted by centrifugation at 1200 x g then washed three times with PBS.
In vivo mononucleosome purificatioñ 100 million cells were collected from healthy cell cultures, pelleted and washed with ice-cold PBS. The cell lines were cultured in parallel in identical conditions and digested samples were combined after bar coding, but before sequencing. Cells were resuspended in NP-40 lysis buffer (10 mM Tris-Cl, pH 7.4; 10 mM NaCl; 3 mM MgCl 2 ; 0.5% NP-40; 0.15 mM spermine; 0.5 mM spermidine). PMSF and BZA (Sigma) were added to final concentrations of 1 mM and 0.4 mM respectively. Cells were lysed by a 5-minute incubation on ice, the nuclei were pelleted and then washed once with PBS. After gentle resuspension in MNase digestion buffer (10 mM Tris-Cl, pH 7.4; 15 mM NaCl; 60 mM KCl; 0.15 mM spermine; 0.5 mM spermidine; 1 mM CaCl 2 ), chromatin was digested with Micrococcal nuclease (Sigma N3755) for 10 minutes at room temperature. Digestion was stopped with MNase stop solution (0.25 M EDTA, 5% SDS added to a final ratio of 1:10 buffer volume) and 5 M NaCl (added to a ratio of 1:5 buffer volume). MNase-digested DNA was isolated from histones and other DNA binding proteins by phenol/chloroform extraction and ethanol precipitation. 10 mg/mL RNAse was added and the purified DNA was incubated for 30 minutes at 37˚C to remove any residual RNA.
Digested DNA was sized by running on a 3% agarose gel (NuSieve Lonza). Nucleosomal DNA bands were visualized by UV illumination and mononucleosomal DNA (mnDNA) corresponding in size to 150 bp was excised from the gel. mnDNA was recovered by a mild "crush and soak" protocol [17]. Briefly, excised gel slices were covered in crush and soak buffer (300 mM NaOAc and 1mM EDTA, pH 8.0), and crushed with a microtube pestle inside the centrifuge tube. The gel and buffer slurry was then incubated at room temperature for 48 hours on a bench rocker to allow DNA to passively diffuse into the buffer. Solubilized DNA was separated from the agarose using spin-filters (Amicon Ultrafree-CL filter), centrifuged at 5000 g for 3 minutes and purified (QIAquick PCR purification kit, Qiagen 28104). The DNA was then prepared for ABI SOLiD sequencing following the standard ABI protocols [43].

Genomic DNA purification and in vitro reconstitution of chromatin
To obtain histone octamers for in vitro reconstitutions, chicken erythrocytes were prepared as described previously [25]. Briefly, histone octamer and purified genomic DNA from S2 cells were mixed at a 0.8:1 molar ratio in reconstitution buffer (2 M NaCl; 5mM Tris; 1mM benzolamide; 0.5 mM PMSF; 0.5 mM EDTA) and loaded into a 12-14 kDa, 10 mm diameter dialysis tubing, which was then placed into a larger 6-8 kDA 100 mm dialysis bag filled with 100mL of reconstitution buffer. This assembly was then dialyzed against 4 liters of low salt dialysis buffer (5mM Tris; 1mM benzolamide; 0.5mM PMSF; 0.5mM EDTA) at 4˚C for a minimum of 24 hours. After 24 hours the 4 liters of cold dialysis buffer were replaced and dialyzed for an additional 24 hours, and the process repeated for a total of 5 dialysis incubations. Reconstituted chromatin was then digested with MNase as described and prepared for ABI SOLiD sequencing generating 27,542,643 unique read pairs.

SOLiD sequencing, read mapping and analysis
For sequencing, nucleosomal DNA fragments were gel-extracted, end-repaired (End-it-DNA End-Repair kit; Epicentre) and ligated to adaptors using the recommended ABI SOLiD Fragment Library reagents and protocol (Applied Biosystems PN 4464412). The DNA fragments were amplified by PCR for 10 cycles or less prior to ABI SOLiD sequencing. PCR fragments were purified and loaded on a SoLiD flow cell for cluster generation. Nucleosomal reads were separated into separate library files based on their barcodes, and mapped to the Drosophila dm3 reference genome using the ABI BioScope™ software (Applied Biosystems). SOLiD sequencing generated 4 million to 12 million uniquely mapped reads for each sample. From the aligned reads, only unique, paired DNA fragments sized between 101 and 191 bp were retained for use in the analysis dataset. Nucleosome fragment length was estimated as the distance between paired reads and the midpoint of each mapped fragment was considered the nucleosome midpoint. To generate AA/AT/TA/TT and CC/CG/GC/GG frequency plots, we extracted dinucleotide counts surrounding every nucleosome midpoint. We then computed the frequency of d:A/d:T and d:C/d:G dinucleotides at each distance from the nucleosome midpoint. One sample from each cell line or in vitro chromatin reconstitution was prepared and sequenced. The samples were processed in parallel, and a high degree of similarity in nucleosome occupancy was observed between cell lines (R values > 0.99 for heptamer coverage in each cell line compared to mean combined rate and R value = 0.91 for in vitro compared to mean combined rate as described in results) and observed in nucleosome profiles shown in S2 Fig. The gene sets and annotations used in these analyses were from FlyBase BDGP Release 5. RNA-seq reads from S2 cells were obtained from modENCODE [33]. The number of RNAseq reads that overlapped with annotated exons in each transcript were counted and normalized by transcript length to obtain fragments per kilobase per million mapped reads (FPKM). Analyses used the log 10 FPKM value as the expression measurement.

Canonical nucleosome positioning sequence features are maintained in all cell lines
Since each Drosophila cell line in our study (Table 1) contains the same genomic DNA, we first determined the extent to which the positions of nucleosomes in each cell line are defined by expected nucleosome positioning signals. Previous studies have demonstrated that the positioning of nucleosomes is influenced by the genome sequence [2]. The underlying DNA can influence both the translational position, where the nucleosome 'sits' along a stretch of DNA sequence, as well as the rotational position of the DNA around the histone octamer. In the latter case, repeating AA/TA/TT dinucleotide pairs, positioned every 10 bp, or one helical turn, coupled with an out-of-phase 5 bp GG/GC/CC/CG pattern, present highly favorable locations for nucleosome occupancy [3,29,30,46,47]. It is thought that these DNA sequences have an increased flexibility that allows wrapping around the histone octamer. In contrast, long stretches of adenosine nucleotides, poly-A kmers, resist DNA bending and create unfavorable landscapes for nucleosome positioning, thus influencing the nucleosome translational position [48,49].
To determine if the C/G and A/T nucleosome-positioning signals are present in the cell lines used in this study, we collected nucleosomal fragments from them and sequenced them using ABI's SOLiD paired-end sequencing technique. Deep sequencing produced 4-12 million reads for each cell line (Table 1). We retained only read pairs that mapped uniquely to the Drosophila reference genome, with a separation of between 101 bp and 191 bp. The fragments retained and used for analysis are correspond well to the expected lengths for mononucleosomes with mean and median values close to 147bp (S1 Fig). We used the midpoint between the mapped reads as an estimate of the nucleosome midpoint (i.e. dyad) position. As detailed below, the nucleosome profiles for each cell line correlate well with one another, and with previously published data. In addition, nucleosome plots of arbitrary genomic regions show typical occupancy profiles (S2 Fig We examined the frequency of dinucleotides along the 147 bp surrounding nucleosome midpoints in aggregate, and found that sequenced reads from each cell line exhibit the helically repeating AA/TA/TT pattern (Fig 1A), as has been observed in Drosophila [23,34]. Further, nucleosome disfavoring poly(dA:dT) tracts tend to be excluded from nucleosomal DNA ( Fig  1B). Our data demonstrate that each cell type retains the expected larger organizational nucleosome-positioning signals that influence rotational and translational placement.  For each pentamer the log 2 (P/P nucleosome ) was computed, where P is the frequency of the pentamer in the genome, and P nucleosome is the frequency of the pentamer in nucleosomal DNA. Negative values indicate that a pentamer is more frequent within nucleosomal DNA than expected given the frequency of the pentamer in the genome. Separate distributions of log 2 (P/P nucleosome ) are plotted for the 32 pentamers that contain only A and T (blue); the 32 pentamers that contain only G and C (red); and the complete set of all 1024 pentamers (green). Example pentamer sequences are noted in each plot. In all cell lines, A-and/or T-only pentamers (blue) are excluded from nucleosomal DNA whereas C-and/or G-only pentamers (red) are found preferentially within nucleosomal DNA.

Nucleosome organization surrounding transcription start sites is correlated with levels of gene expression
Nucleosomes have a well-defined configuration in promoter regions, which has been observed in many organisms [18,19,27,46,50,51]. This configuration consists of a nucleosomedepleted region (NDR) upstream of a strongly positioned +1 nucleosome. Establishment of the NDR at the TSS is important for regulation of gene expression [32,[51][52][53]. The +1 nucleosome is followed by an array of nucleosomes downstream, that become less well positioned as distance from the transcription start sites (TSS) increases. Furthermore, analysis of chromatin from several organisms, including Drosophila, reveal that phasing of the nucleosome array downstream of the TSS corresponds with gene expression and that genes with high expression have more regularly spaced nucleosome arrays than low expression genes [19,22,[54][55][56].
During increased transcriptional activity rapid dynamic rearrangement of this pattern occurs [13,57].
We asked to what extent this promoter organization is maintained and reproducible between the cell lines used in this study. We aggregated nucleosome midpoints across all annotated TSSs and found that all cell lines exhibit the expected nucleosome configuration around TSSs (Fig 2A).
We next asked how nucleosome organization correlates with gene expression in these cell lines by partitioning genes into low, medium and high expression groups (bottom 25%, middle 50% and top 25%, respectively). Genes with medium and high expression show a well-positioned nucleosome configuration around the TSS (Fig 2B and 2C). In contrast, genes with low expression do not show a pattern of well-positioned nucleosomes (Fig 2D). These results are consistent with the nucleosome maps previously observed in whole embryos [56] but our results extend these observations to differentiated homogenous cell lines. These results are also consistent with a lack of consistent nucleosome organization in low expression genes in both lower and higher eukaryotes [27,56,58,59].
In yeast, worms, flies and humans the NDR has been observed even in the absence of DNA binding proteins, and therefore could be attributed to the underlying DNA sequence [2,8,9,32,43,56,[59][60][61]. To examine if the NDR is maintained in Drosophila in the absence of binding proteins, we reconstituted chromatin in vitro using purified genomic DNA from Drosophila S2 cells and purified histone octamers from chicken erythrocytes [30]. We generated, sequenced and analyzed in vitro nucleosome maps as previously described, capturing over 25 million unique read pairs [36,44]. Overall, nucleosome positioning around TSSs is much weaker in the in vitro reconstituted chromatin than in the in vivo chromatin (Fig 2A-2D), which suggests that much of the nucleosome organization around promoters requires dynamic regulation by DNA binding proteins. However, the in vitro map does show some positioning of the +1 nucleosomes in highly expressed genes suggesting that the DNA sequence plays a role in positioning this nucleosome (Fig 2B, arrowhead). In addition, the in vitro data show evidence of a positioned nucleosome over the nucleosome-depleted region at the TSS in highly expressed genes (Fig 2B, arrow). This suggests that preferential positioning of a nucleosome in the NDR is overridden in some actively transcribed genes. Higher expression levels strongly correlate with a more defined NDR, stronger positioning of the +1 nucleosome and more uniform nucleosome organization demonstrating that chromatin structure can reflect gene regulation. Taken together, our data indicates that while a large part of the global nucleosome organization in each cell line results from sequence-directed nucleosome positioning preferences, the positioning of nucleosomes near genes is strongly correlated with gene expression.  . (B-D). The fragments per kilobase per million mapped reads (FPKM) of each nucleosome was plotted relative to the TSS in high-expression genes (B, highest 25% of genes), in medium-expression genes (C, central 50%), and low-expression genes (D, lowest 25%). RNA-seq data was obtained from modENCODE [33]. Each plot shows MNase midpoints from fragments in the range of 101-191 bp, smoothed with a 20 bp sliding window. In addition, data for in vitro reconstitution of Drosophila chromatin are shown, which to some extent mimic some of the features of the cell-line nucleosome positioning data.

Nucleosomal occupancy in different functional regions of the genome is similar between all cell lines
We next asked if cell line nucleosome occupancy agrees between different genomic regions that are important in gene regulation. Here we consider intergenic, intronic and exonic genomic regions. Regions were categorized using FlyBase gene annotations, with regions within 500 bp of an annotated transcription start sites (TSS) being defined as promoters. The number of nucleosome midpoints within each region were counted and normalized against the total number of sequenced aligned reads from each experiment to determine nucleosome enrichment in that region (Fig 3). Nucleosome occupancy was much higher in exons than in introns in all cell lines. This agrees with nucleosomal DNA sequence preferences, since exon DNA sequences generally have a higher G+C content than intron DNA sequences and therefore are less likely to contain the nucleosome-disfavoring poly-A kmers [18,20,32,56]. Overall, the relative abundance of nucleosomes in each region agrees with previous studies [62,63] and demonstrates that global nucleosome organization is not markedly different between cell types, and therefore that small-scale changes are likely to be important for cell type specification.

Specific sequence motifs have differential nucleosome occupancy in cell lines and in vitro reconstituted chromatin
Given that the inherent nucleosome organization is broadly similar in each cell line, we hypothesized that changes in chromatin structure associated with cell-type specific expression occur locally, within smaller regulatory regions. To investigate this possibility, we divided the genome into non-overlapping 200 bp regions and compared the nucleosome coverage of each base pair in each cell line to the coverage in the S2 cells. S2 cells are derived from embryonic hemocyte (macrophage-like) cells, and thus provide a comparison for the four imaginal disc cell lines derived from later stage larval epithelial tissue. Although nucleosome occupancy within the different cell lines is generally similar to that of the S2 cell line, a subset of regions are markedly different (Fig 4), with many regions differing between 2 and 10-fold, and some regions differing by as much as 100-fold. These regions differ in that some are enriched for nucleosomes and some are depleted compared to the same region in the S2 cells.
To further resolve small differences between the four tissue-specific cell lines, we examined the nucleosomal occupancy over short kmers for each cell type. We used 7 bp kmers (i.e. heptamers) for analysis, reasoning that some differences between cell lines are likely to be at celltype specific TF binding sites (TFBSs). TFBSs are short degenerate sequences, generally 7-11 bp, that occur throughout the genome [64,65]. The context of any TFBS is important for regulatory function; TFBSs found within sequences that are highly favorable to nucleosome binding may be inaccessible to TFs and therefore may not be active [39,66]. While many TFBSs have been annotated, we wanted to examine all possible 7 bp kmers to undertake an unbiased investigation in to whether specific kmers might correlate with differential nucleosome occupancies in differentiated cell lines. We expected to identify heptamers corresponding to the more than 700 TFBS motifs that have been discovered and annotated in the Drosophila genome [33], but we also hoped to identify previously unannotated sequences that are correlated with differential nucleosome occupancy.
We examined the extent to which nucleosome occupancy differs over all possible 16,384 heptamers between cell lines by dividing the genome into 200 bp regions, and calculating the average nucleosomal read depth in each 200 bp region, surrounding every occurrence of a 7-mer. The genome-wide average rate for each heptamer was calculated and normalized to the total number of nucleosomal reads sequenced in each lineage. We performed pairwise comparisons of the rate for each heptamer across the following cell lines and conditions; the four imaginal disc cell lines (antenna, eye, haltere and eye), the mean of all 4 cell lines, and the in vitro reconstituted chromatin. In total we performed 15 pairwise comparisons, and for each comparison, we considered the 20 kmers with the largest absolute residuals from the regression line to be "outliers". In total, there were 91 unique outlier heptamers that had the greatest differences in at least one pairwise comparison ( Table 2). In general, nucleosome occupancy over heptamers was highly correlated across cell lines as seen in Fig 5A (R values > 0.99). Furthermore, this correlation was maintained even when compared to the in vitro reconstituted chromatin (Fig 5B) (R value = 0.91), demonstrating that genomic sequence plays a key role in global nucleosome positioning, directly through DNAhistone interactions. However, multiple outliers were observed that were either more-or lessoccupied by nucleosomes relative to their coverage in other cell lines (Table 2, Fig 5, outliers annotated with red text). These findings suggest that while nucleosome placement is generally guided by thermodynamics and the underlying DNA sequence, there are differences in nucleosome occupancy for specific kmers between datasets that are likely caused by energetically driven processes.
To determine if any of the differentially occupied heptamer sequences correlated with positioned nucleosomes, we visualized the nucleosome occupancy surrounding specific heptamers by aggregating nucleosome midpoints across occurrences of the heptamer and plotting the mean midpoint density in 400 bp regions centered on the heptamer. Interestingly, nucleosome occupancy surrounding the heptamers varied considerably around different heptamers. For some heptamers, there was a visible reduction or increase in nucleosomal occupancy surrounding the heptamer in all cell lines and in the in vitro chromatin (e.g. Fig 6A and 6B, respectively). For other heptamers, nucleosome occupancy surrounding the heptamer site showed no discernable pattern (e.g Fig 6C). In multiple cases, we observed differential nucleosome coverage between cell lines and the in vitro chromatin, with either the cell line or the in vitro chromatin having greater nucleosome coverage (Fig 6D and 6F arrows indicate   occupancy in cell lines, arrowheads indicate occupancy in vitro chromatin). Notably, in some cases, the region of differential nucleosome occupancy was tightly centered on the heptamer sequences but phasing of nucleosomes extended to broader genomic regions (e.g. Fig 6E, asterisks indicate periodic peaks). We also identified several cases where nucleosome occupancy  around specific heptamers differed in only one of the cell lines (Fig 6F-6H). For example, the heptamer AATAATA has reduced nucleosome occupancy in the leg, antenna and haltere lines (Fig 6G, arrow), but is distinctly more occupied in the eye cell line (Fig 6G yellow line indicated by arrowhead). Conversely, the CAACAGC heptamer is slightly over-occupied in eye, haltere, and antennal cell lines (Fig 6H arrow), but is visibly more occupied in the leg cell line (Fig 6H, (Fig 6I arrow). Together, these results demonstrate that that, over heptamers, nucleosome organization is driven to a large extent by DNA sequence, but there are nonetheless clear differences between in vivo and in vitro nucleosome organization, as well as cell line-specific differences. Some differentially occupied heptamer sequences correspond to annotated transcription factor binding sites We next asked whether any of the heptamers with differential occupancy between the cell lines match known regulatory sequences. Using the Tomtom motif comparison tool of the MEME suite of tools (www.meme-suite.org), we compared the differential heptamers to those in databases of known Drosophila transcription factor binding sites. Of the 91 differentially occupied heptamers identified, 49 matched one or more known Drosophila TFBS consensus sequences (Table 3) The transcription factors for these TFBSs have a wide array of biological functions, but several stand out as being important in cell type specification. The poly (dC:dG) heptamers CGCCGCC and CCCCCCC match the predicted binding sites of Buttonhead (Btd) and Brinker (Brk), two transcription factors involved in imaginal disc antennal and wing morphogenesis respectively. Notably, both of these heptamers are associated with regions of differential nucleosome occupancy (Fig 6D-6F). Conversely, another transcriptional factor involved in imaginal disc development, Rotund (Rn), binds to homopolymeric A/T sequences [67], which also shows differential nucleosome occupancy ( Fig 6A). Bric à brac 1 (Bab1) is a TF that is needed in appendage formation [68]. Interestingly, the three cell lines derived from tissues that normally form appendages (antennal, leg, and haltere) all have open chromatin structure over the AATAATA motif that matches the Bab1 binding sequence, whereas this site shows higher nucleosomal occupancy in the eye cell lineage (Fig 6F). These results suggest that some of the 42 heptamers that do not correspond to known TFBS might in fact interact with binding factors to influence nucleosome positioning and/or gene expression.

Discussion
Previous studies have revealed canonical patterns of nucleosome organization in the genomes of many different organisms [21,22,[54][55][56]69]. However, few studies have examined nucleosome organization in the context of differences between distinct cell lines. Our goal in this study was to provide analysis of nucleosome positioning in five cultured cell lines from a model organism, Drosophila melaogaster, and determine if there is evidence that short sequences can influence nucleosome positioning and occupancy. Such evidence would serve as a basis for future causal investigations into the relationship between nucleosome positioning and cell-type specification, and for possibly identifying factors that bind these short sequences.
The presented results show that while underlying sequence does play a role in nucleosome occupancy, there are notable differences in nucleosome occupancy between the cell lines examined and in vitro reconstituted chromatin. We also identified cell type-specific differences that are distinct from the DNA sequences expected to favor or disfavor nucleosome positioning. These results are in line with other studies showing differential nucleosome occupancy during cell-type regulation [11,41], and suggest that changes in nucleosome positioning could be involved in cell fate specification and maintenance. Importantly, our studies extend previous results by identifying 91 heptamers that show cell type-specific nucleosome occupancy. In some cases, strong nucleosome-positioning patterns extend in excess of 1,000 base pairs into the region surrounding the heptamer. While 49 of these heptamers correspond to binding sites of known transcription factors, 42 heptamers do not correspond to known binding factors. We speculate that these novel heptamers identify binding sites for transcription or chromatin remodeling factors that have important roles in establishing specific cell fate in the studied lines.
The possibility that these 42 heptamers could be functional binding sites for transcription factors or chromatin remodeling factors is supported by our finding that 49 of the differentially occupied heptamers corresponded to binding sites for known transcription factors. A particularly notable example is Bab1 (bric á brac), a transcription factor required for appendage development [68]. The heptamer that matches the Bab1 binding sequence, AATAATA, has an open chromatin motif in the three cell lines derived from tissues that normally form appendages (antennal, leg, and haltere), but this heptamer shows higher nucleosome occupancy in the eye-derived cell line. Further studies will be necessary to determine if any of the 42 novel heptamers in fact bind trans-acting factors, and whether they causally affect nucleosome positioning. However, if binding of factors to these sites does influence nucleosome positioning, as detailed below, we would also expect that the corresponding heptamer could influence gene expression. What is the relationship between nucleosome organization and gene expression in these cell type-specific cell lines? Our results show that, as in embryos and other organisms, highly expressed genes in these cell lines show specific organization of nucleosome with an NDR at the TSS, and phased nucleosomes distal to the TSS [27,55,56,58,59,70] As has been observed in other species, genes with low expression did not have an organized nucleosome pattern [27,56]. This correlation, and work in multiple organisms [43,51,56,58,59], suggests that binding of a transcription factor or chromatin remodeling factors to a heptamer sequence could alter nucleosome positioning, which could alter gene transcription, and thus alter cell fate specification. Alternatively, since the canonical nucleosome occupancy pattern observed in highly expressed genes likely creates a chromatin structure best poised for RNA polymerase or TF binding [49,50,56,71], an open chromatin environment created by upstream signaling events could allow a specific TF to bind and thus contribute to cell fate specification or maintenance. Further work is needed to establish whether binding of factors to heptamers alters nucleosome organization or whether altered nucleosome organization allows access and binding of regulating factors.
In summary, our data demonstrates that a large part of the in vivo global nucleosome organization in each cell line results from nucleosome-positioning preferences, favorable and unfavorable, encoded in the DNA. Genomic encoding of nucleosome preference is an integral component of gene regulation. However, overriding the effect of the underlying sequence is cell-type specific nucleosome organization that is mediated by other factors such as TFs and chromatin remodelers [11,31,41]. Our data contribute useful datasets of genome-wide nucleosome positioning in distinct Drosophila cell lines and identify heptamers that are differentially occupied in different cell lines. While 49 of these heptamers match binding sites of known TFs, 42 have no current match, and thus define possible binding sites for novel cell fate specification factors. Together, these data provide tools for examining the effect of sequence and functional relationships between transcription factor activity, nucleosome location in gene regulation and cell fate specification.