Global Chromatin Domain Organization of the Drosophila Genome

In eukaryotes, neighboring genes can be packaged together in specific chromatin structures that ensure their coordinated expression. Examples of such multi-gene chromatin domains are well-documented, but a global view of the chromatin organization of eukaryotic genomes is lacking. To systematically identify multi-gene chromatin domains, we constructed a compendium of genome-scale binding maps for a broad panel of chromatin-associated proteins in Drosophila melanogaster. Next, we computationally analyzed this compendium for evidence of multi-gene chromatin domains using a novel statistical segmentation algorithm. We find that at least 50% of all fly genes are organized into chromatin domains, which often consist of dozens of genes. The domains are characterized by various known and novel combinations of chromatin proteins. The genes in many of the domains are coregulated during development and tend to have similar biological functions. Furthermore, during evolution fewer chromosomal rearrangements occur inside chromatin domains than outside domains. Our results indicate that a substantial portion of the Drosophila genome is packaged into functionally coherent, multi-gene chromatin domains. This has broad mechanistic implications for gene regulation and genome evolution.


Choosing the bias factor γ
We estimated the false discovery rate (FDR) of BRICK identification by determining the number of windows of size w > 1 identified in a random permutation of the genome. In a randomly permutated genome all domains identified are by definition false. Low values of γ favor parses consisting of w=1 windows. We seek an optimum such that the number of falsely To estimate the number of false positive BRICKs identified we calculated the optimal path through 1,000 randomly permutated genomes. We performed these analyses for various values of γ (in decreasing order: 0.01, 0.005, 0.001, 0.0005 and 0.0001) and scored the number of identified BRICKs, as well as the number of genes located within identified BRICKs. This was done for both the randomized datasets and the compendium of 30 protein binding maps. Figure SM1 shows histograms of the identified BRICK sizes for various values of γ. The average number of BRICKs or probes represents the average number that we pick up per binding profile or per genome permutation. We settled on γ = 1x10 -4 , which yields FDR BRICKs = 2.5% and FDR genes = 1.6%. Both FDR values are well below 5%, representing a stringent cutoff.

Performance of the BRICK identification algorithm on synthetic data
We further tested the reliability of the BRICK identification algorithm on a synthetic dataset that emulates a binding map of a hypothetical protein with a number of pre-defined chromosomal domains of various sizes and with various levels of binding. The datasets that we used for the Q i -transformed real protein binding maps, contain ~8,000 probes. Therefore we began with a set of 8,000 uniformly distributed quantile scores. The top 25% of these quantile scores were devided into 3 categories: the top 1% (strong binding), top 1%-10% (medium binding) and top 10%-25% (weak binding). We then constructed a synthetic chromosome arm of 1,200 genes. On this chromosome arm we placed seven domains consisting of 5-100 neighboring genes that were assigned quantile scores selected from one of these categories, i.e., domains consisted of either "strong", "medium" or "weak" genes. Genes between these domains were randomly assigned a value from the remainder of the quantile scores. Thus, we created a model of a chromosome arm with several somewhat "noisy" domains in an otherwise unstructured "noisy" background.
We then tested whether the domains on the synthetic chromosome could be identified by the BRICKs algorithm. This was repeated in 100 independent simulation runs. Figure S3A and S3B show an example of, respectively, a domanogram and the actual quantile scores in one simulation. Figure S3C

Co-expression analysis
Developmental expression data was taken from ref. [1]. On this array, every exon in the genome is represented by a probe. Probe intensity values were log-transformed. For genes with multiple exon probes a mean intensity per gene was calculated. To avoid biases in correlation, the data for every developmental stage was variance normalized (mean of 0 and unit variance). Using this dataset we calculated the average pairwise Pearson correlation (APC) across all possible pairs of genes in a BRICK. For visualization purposes however, we need to correct the APC, since it decreases with the size of the window. For a window of n genes we scaled the APC using a scaling factor S n . S n was determined as follows. From the total set of genes we select n random genes. We calculate the APC for this subset of genes.
We do this selection 1000 times for a given n. S n is now the standard deviation over 1000 APCs.
Neighboring genes frequently show coregulation [1,2]. Because BRICKs obviously encompass neighboring genes, we want to determine whether, on a genomic scale, BRICKs are enriched for coregulated genes. We do this by comparing the expression patterns of the genes in BRICKs to the expression pattern of all other same-sized windows in the genome.
However, for this analysis it is important to keep in mind that the BRICKs show hierarchical and overlapping organization (outlined in Fig. SM2A). To guarantee that the APCs are independent observations, we include a specific correlation between two genes only once in our analysis. This is achieved in the following manner. In figure  BRICK is transformed to a quantile score (q BRICK ) by dividing by the number of windows. Figure S5 shows cumulative distribution plots of q BRICK for all binding profiles. Under the null hypothesis that chromatin domains are not enriched for co-regulated genes, one would expect the distribution of q BRICK to resemble a uniform distribution between 0 and 1. Deviation from the uniform distribution (gray dashed diagonal in Figure S5), represents enrichment of coregulated genes in the domains of a given protein. We test for this using a Kolmogorov-Smirnov test for deviation from a uniform distribution.