Modeling and Experimental Methods to Probe the Link between Global Transcription and Spatial Organization of Chromosomes

Genomes are spatially assembled into chromosome territories (CT) within the nucleus of living cells. Recent evidences have suggested associations between three-dimensional organization of CTs and the active gene clusters within neighboring CTs. These gene clusters are part of signaling networks sharing similar transcription factor or other downstream transcription machineries. Hence, presence of such gene clusters of active signaling networks in a cell type may regulate the spatial organization of chromosomes in the nucleus. However, given the probabilistic nature of chromosome positions and complex transcription factor networks (TFNs), quantitative methods to establish their correlation is lacking. In this paper, we use chromosome positions and gene expression profiles in interphase fibroblasts and describe methods to capture the correspondence between their spatial position and expression. In addition, numerical simulations designed to incorporate the interacting TFNs, reveal that the chromosome positions are also optimized for the activity of these networks. These methods were validated for specific chromosome pairs mapped in two distinct transcriptional states of T-Cells (naïve and activated). Taken together, our methods highlight the functional coupling between topology of chromosomes and their respective gene expression patterns.


Introduction
The genetic material (chromatin) in eukaryotic cells has a multiscale three dimensional organization within the nucleus [1]. DNA is packaged around histone and non-histone proteins to form the 30 nm chromatin fibre [2]. This 30 nm fibre is further hypothesized to be organized into relatively open euchromatin and condensed heterochromatin structures based on post translational modifications of histone [3]. Imaging methods using whole chromosome probes (FISH) reveal the spatial dimension to genome organization in eukaryotic cells. These methods have suggested that chromatin is organized into well-defined chromosome territories (CT), in a tissue specific non-random manner [4][5][6][7]. These chromosome positions remain largely conserved during the interphase in proliferating cells [8][9][10]. In addition, whole genome chromosome conformation capture assays have shown intermingling of neighbouring CTs [11] as well as a model of the yeast genome organization [12]. Further on a smaller scale, these methods have demonstrated that the genes from neighbouring CTs loop out and are found to co-cluster with transcription machinery to form three dimensional interactions called active transcription hubs [13]. The intermingling of nearby CTs vary in concert with transcription and cellular differentiation [14,15], demonstrating the role of chromosome topology in genome regulation [16]. Individual gene labeling methods suggest that candidate gene clusters are spatially co-localized [17] and are coregulated for their specific transcriptional control [18][19][20][21][22][23][24]. Using 2D matrices of chromosome distances at prometaphase stage, the correspondence between co-regulated genes and chromosome positioning has been observed during differentiation [19]. However, methods to describe the correlations between threedimensional architecture of chromosome positions [25,26] and global gene expression as well as TFNs is largely unexplored.
In this paper, we present a quantitative approach to test the correlation between chromosome organization and transcriptional output of the cell. Inter-chromosome Physical Distance (IPD) matrix computed from chromosome centroids in interphase human male fibroblasts [27] revealed non random chromosome organization. Inter-chromosome Activity Distance matrix, constructed from the microarray data obtained for human fibroblast [28], suggested that chromosomes with similar gene activity were spatially clustered in a tissue specific manner. We formulate an energy optimization function, 'H' to elucidate the correspondence between the annotated TFNs [29] and spatial positioning of chromosomes. Numerical simulations of the H function, that relates the activity of genes of specific networks to their corresponding chromosomal positions, suggest the sensitivity in network topology. The prediction from our numerical methods were experimentally validated by correlating chromosome distances for specific pairs with their respective activity distances in two distinct transcriptional states of murine T-Cells (naïve and activated). Taken together these numerical modeling and experimental methods provide an important platform to probe the functional coupling between spatial organization of chromosomes and their epigenetic states.

Results
Methods to probe the correlation between the organization of chromosomes and their transcriptional activity 3D Chromosome FISH was used to map chromosome positions in two cell phases: interphase and prometaphase [27,30]. Based on these observations we extracted the coordinates of all chromosome centroids in human fibroblasts measured for 54 nuclei, as reported by Bolzer et al. [27], which is the only available full map of all chromosome positions. Inter-chromosome Physical Distance (IPD) matrices were constructed by the mean distances between centroid positions of 22 pairs of autosomes ( Figure 1A) as: where r i = (x i , y i , z i ) and r j = (x j , y j , z j ) are the coordinates of chromosome i and chromosome j, respectively and , . denotes averaging over the 54 nuclei. Figure 1A shows Inter-chromsome physical distance between ith and jth chromosome in the nucleus, which represents the (i,j) th element in the IPD matrix. The IPD matrices were constructed for interphase (IPD fib , Figure 1B), prometaphase (IPD prometaphase , Figure 1C) and randomized nucleus (IPD rand , Figure 1D). The values of diagonal elements of all the matrices, which represent mean distance between homologues, are kept minimal and are not considered for any further correlation analysis. Further, regions of low IPD values and high IPD values are observed in the IPD matrices for interphase and prometaphase ( Figure 1B and 1C), suggesting contribution of chromosome size (in total number of base pairs) which decreases from chromosome 1 to 22 ( Figure S1A). The volume of a given chromosome changes dramatically in interphase [31] due to changes in epigenetic modification and subsequent transcriptional states. Hence IPD is an average of chromosome centroids in all such conditions showing spatial clustering of chromosomes. Such clustering was not observed in the randomized nucleus, where these matrices were generated by randomly swapping the rows and columns of IPD matrix multiple times and hence permuting the identities of the chromosomes in the interphase IPD matrix (Methods). Pearson Correlation Coefficient (PCC) estimation between initial IPD and progressive randomization showed significant decrease in PCC values after 30 such permutations ( Figure S2). Interphase and prometaphase IPDs were found to be positively correlated ( Figure 1E) with PCC of 0.904 ( Figure 1G), whereas IPD fib was uncorrelated with the randomized position matrix (mean PCC of 0.17 with standard deviation of 0.13, computed over 10,000 randomized matrices) ( Figure 1F -a representative scatter plot, 1G and Figure S3), confirming the nonrandom organization of the chromosomes in interphase cell nucleus.
To probe the possible correlation between the chromosome positions and their gene activity (Figure 2A), we generated an Inter-chromosomal Activity Distance (IAD) matrix for fibroblast from the microarray data ( Figure 2B) obtained from Goetze et al [28]. Figure 1A shows the schematic of gene activities being classified to i th and j th chromosomes, which is then further used to compute the (i,j) th value in the IAD matrix. From the microarray data, genes were grouped into their respective chromosomes and the mean logarithmic chromosomal activities (Achr) were obtained (Methods). The density of genes on a chromosome does not correlate with the length of the chromosomes ( Figure S1A). For instance, chromosome 18 is larger than chromosome 19, but the former has smaller number of genes as compared to the latter ( Figure S1B). Considering this, chromosomal activity was computed by normalizing total activity of all the genes by the annotated number of genes and not by the chromosome size. Interchromosome Activity Distance (IAD) was then computed as: Use of logarithmic scale captures expression levels over several orders of magnitude. Lower IAD values, shown by cooler colors in the heat map, represent -pairs with similar transcriptional activity, whereas warmer colors represent higher IAD or dissimilar chromosomal activities, as seen in Figure 2B. The correlations between IAD fib and IPD fib matrices at interphase ( Figure 2C), revealed a positive slope (1.23) and PCC (0.58, Figure 2E) with a small false discovery rate (FDR) 0f 0.11 ( Figure S4 and Table S1). To probe the specificity of this correlation, we used the IPD at prometaphase (IPD prometaphase ) as a negative control. Indeed, we obtained a lower slope (0.44) and PCC (0.27) when IAD fib (for interphase) was correlated with IPD prometaphase (Figure S3 B), with a larger FDR ,0.29 (Table S1 & Figure S4C) suggesting that IPD at interphase is more correlated with IAD at interphase. Correlation with randomized matrix exhibited negative slope (20.37, Figure 2D) for a typical randomized matrix ( Figure 1D) and even smaller average PCC of 0.12 ( Figure 2E & Figure S3D) further indicating the non-randomness in the correlation. To probe the effect of chromosome size on the observed correlation between IPD and IAD, we generated a matrix of chromosome basepair length differences (Interchromosome Basepair Distance, IBD) (Methods and Figure S1C). This matrix showed some degree of similarity to IPD and correlated well with the IPD matrix (PCC ,0.54) (Figure S1D & S1F). But when IBD was correlated with the IAD matrix, a very weak correlation of PCC ,0.15 was observed ( Figure S1E and S1F). This suggested that though chromosome size contribute to the observed pattern in IPD, the correlation between IPD and IBD were not influenced by the chromosome sizes. The Interchromosomal Activity Distance matrix used in the correlations was generated by computing the mean of the genes present in the chromosome. This takes into account all the genes irrespective of their activity level. To probe the correlation due to a small subset of genes on the chromosome, which will correspond to smaller active regions of chromosomes (as the active genes are not uniformly distributed throughout the length of chromosome), we selected genes (,25% of the genome) which are highly expressing in each chromosome by applying a threshold (more than 40% of the mean chromosome activity) in the gene expression. We generated the IAD matrix ( Figure S5A) from these selected genes (IAD select ) and calculated the correlations. The correlation obtained after selection of genes was similar to the correlation when all genes in the chromosome were used ( Figure S5B & S5C), suggesting that the correlation is not due to whole chromosome averaging. These results suggests that the mean distances between chromosomes are more correlated with gene activity distances in fibroblasts at interphase as compared to prometaphase and are uncorrelated with random organization.
Methods to identify cell-type specific gene expression profiles and its correlation to chromosome positions Different cell types in an organism are characterized by their distinct transcriptomes. Correlation of gene expression to chromosome organization implies that cell types will differ in the positions of the chromosomes, such that the spatial organization of a given cell type exhibits larger correlation with its own expression pattern. In the presence of cell type specific correlation between IPD and IAD, the correlation should be smaller when IPD fib is correlated with IAD of other cell types. To further extend our approach to test such cell type specific correspondence, we correlated IPD fib of fibroblast (for interphase) and IADs of fibroblast, lung endothelial cells, oocyte and Human Umbilical Vascular Endothelial Cells-HUVECs. As the IAD fib of fibroblast at interphase correlated the most with the IPD of fibroblast at interphase, IPD of fibroblast with interphase was further used for testing the cell type specific correlation.
From the transcriptome of different cell types, cell type specific genes were selected by excluding similarly expressing genes in pair wise comparison with fibroblasts to generate IADs ( Figure 3A). Two activity matrices were generated for each pair of cell type compared: (a) IAD other-fib -computed from the activity of genes in the other cell type which are differentially expressed in comparison to fibroblasts (Methods) and (b) IAD fib-other -computed from activity of the same genes selected above, in fibroblast. Such activity matrices were computed for each of the three pairs, fibroblast-lung ( Figure 3B), fibroblast-oocyte and fibroblast-HUVEC (Methods , Table S2, and Figure S6A & C). Figure 3C depicts the difference matrix of IAD fib-lung and IAD lung-fib for the differentially expressed genes of fibroblast and lung cells. Figure  S6B & D shows difference matrices for other cell type pairs. The PCCs were higher when IPD fib was correlated with IAD fib-other , whereas the PCCs were comparatively smaller when IPD fib was correlated with IAD other-fib ( Figure 3D, Figure S6, Figure S7, Table S1).
These observations suggest that the association between the chromosome topologies and transcription maps is indeed cell type-  Figure S8).
Numerical simulation to probe the coupling between chromosome positions and transcription factor networks Genome-wide chromatin interaction experiments have suggested preferential association of genes co-regulated by similar transcription factors [32]. Such cis-(same chromosome) and trans-(different chromosome) associations have also been shown for coregulated genes at post transcriptional level at other nuclear bodies [33,34], suggesting spatial association of genomic elements to facilitate function. In order to probe such associations we devised an energy optimization function and numerical simulation technique which linked the chromosome positions (IPD) to the co-regulated TFNs. In particular, we examined to what extent two chromosomes participate in the same transcription network, tend to be close by. For this purpose, we constructed a function H, which measures if nearby chromosomes contain co-regulated genes which belong to a particular functional transcription factor network ( Figure 4A). This approach eliminates the whole chromosome averaging that we performed while computing the correlations between IPD and IAD, and considers the activities of only genes which are regulated by a particular transcription factor. However, the position information of genomic elements currently available are at the resolution of chromosomes, leading to coarse graining of this energy optimization function at similar length scales. H takes into account both spatial arrangement of chromosomes and the activity of the 87 known annotated TFNs [29], and quantifies how well they correspond to each other. The spatial part of H is represented in terms of an adjacency matrix ( Figure 4B), The parameter l, is the distance parameter used to scale the distances to the length scale of chromosomes. The part of H which involves the contribution from transcription factor networks is introduced as a network matrix ( Figure 4C) which is defined as, where I cell if is the integrated microarray intensity of genes present in the i-th chromosome that participate in network ''f'' in cell-type ''cell'' of the four cell types. Similar to definition of the IAD, logarithmic scale captures the different orders of magnitudes of gene expression. The TF networks which form the network matrix vary from very small networks (,10 genes) to large networks (.300 genes, Figure S9A). To characterize the TF networks for variability in their sizes, we computed the occupancy of chromosomes for each TF network ( Figure S9B), which is defined as the fraction of total number chromosomes which have at least one gene from the TF network. Large TFNs have occupancy ,1 suggesting that the target genes of these TFNs are scattered throughout the genome, whereas, smaller TFNs (,50 genes) have occupancy ,0.5, suggesting that their target genes are clustered on a few chromosomes. But this clustering of genes of a TFN are not biased by chromosome size, i.e the genes are present on smaller as well as larger chromosomes ( Figure S9A). Further larger chromosomes and gene rich chromosomes were observed to be associated with a larger number of TFNs ( Figure S10 and Table  S3). The function H, which has contributions from spatial organization of chromosomes and the activity of transcription factor networks, is given by, H is obtained by summing over all networks f for all possible pairs i-j of chromosomes, weighted according to the proximity of the chromosomes provided by the adjacency matrix, A fib ij . The distance parameter l, weights the IPD values, such that smaller IPD values attain larger adjacency and vice versa. Moreover l makes a sharp distinction between nearby and distant chromosomes. For each pair i-j of chromosomes, we examine the similarity in the expression levels of genes that belong to a certain network f, by summing their squared difference W cell if -W cell jf 2 (this ensures the contribution from each pair is positive) and tends to zero for similar activity. The matrices are defined such that when the organization of the chromosomes is correlated with the activity  We used numerical simulations to test the above hypothesis. Before performing the actual simulations, we estimated the optimal value of the distance parameter, l to be ,7% of the nuclear radius, and it provided the largest increase in the value of H ( Figure S11A). We used this distance parameter for all the numerical solutions. To probe the optimality of the value of H, we simulated different configurations of chromosome organization by randomizing the adjacency matrix. Large deviations indicate that the actual configuration is optimal.
Following this procedure, we find that DH fib (DH cell for fibroblast) for fibroblast is 1.97, implying a p-value of 0.05 ( Figure 5B, inset). This indicates a rather small probability of obtaining a superior configuration through random reorganization of chromosomes in fibroblasts. The network matrix of a given cell type represents its characteristic transcriptional program and the resulting transcriptome. One therefore expects that the coupling between the physical organization and transcription networks will be cell type-specific. Since the spatial part of H is taken from fibroblasts, namely the adjacencyA fib ij , it should exhibit better fit to fibroblast network activity,W fib if , than to the activity of the other cell types. In accord, the deviations for the other three cell types were lower than that of fibroblasts, the values being DH lung = 0.92, DH HUVEC = 1.02 and DH oocyte = 0.71. These correspond to pvalues of 0.35, 0.31 and 0.48, respectively, indicating that H values obtained for different cell types are not significantly different from the values obtained for a random configuration of chromosomes. Further, the obtained p-values were independent of the mode of simulation; exclusion of homologues from the simulation and non cumulative randomization resulted in similar p values ( Figure S12 and Methods). These results suggest that IPD fib of fibroblast fits better to its own transcriptomes than to those of other cell types.
To analyze the sensitivity of individual TFNs, H values were computed by randomization of the adjacency matrix for the chosen network. The evolution of the H value for the first 200 iterations was plotted for each network. ,70% of networks showed an increase in the H value ( Figure 5C) whereas remaining 30% showed a small decrease ( Figure 5D). This indicates the differential contribution of the networks towards optimization of the coupling between TFNs and chromosomal organization. Larger increase in H value for 70% of the networks is in accordance with increase in the value of H when all networks are considered. Figure 5E shows a list of networks that exhibited maximal changes in H values, indicating its sensitivity to perturbations in chromosomal positions. To probe the contribution of TF networks towards differential increase in the value of H, we correlated the change in H value, DH/H 0 with the number of genes in the TF network considered for simulation ( Figure S13A). It was observed that DH/H 0 and the number of target genes of a TF network were inversely correlated with decreasing degree of correlation with increase in the distance parameter l ( Figure  S13B). As previously observed, the occupancy of the chromosomes has an exponential dependence on the number of target genes of a TF network. This indicates that a large increase in H results from small TFNs, with target genes clustered over a small number of chromosomes. These results indicate that large TFNs which have genes present on all the chromosomes probably regulate housekeeping genes and hence do not contribute strongly towards cell type specific responses.

Experimental validation
Our numerical approaches suggested that spatial arrangements of chromosomes in a given cell type (human fibroblasts) is optimized to its expression pattern better than it fits to the expression patterns in other cell types. To further validate the results from proposed numerical approaches, we experimentally tested the correlation between chromosome positioning and gene expression in another cell type of a different mammal, murine T cells, in two distinct transcription states of naïve and in vitro activated T cells, where the global mRNA levels increase by ,5 fold [35]. We generated IAD for both the naïve and activated T cells (IAD Naive and IAD Activated ) from genome-wide microarray data (GEO accession number GSE30196) obtained from our experiments ( Figure 6A & B). The microarray was done in duplicates. IPD was estimated for candidate pairs of chromosomes 1-3, 1-4, 1-6, 3-17, 4-17 and 13-17 (which harbor 30% of the differentially expressed genes identified in the microarray) by 3D FISH performed in naïve and activated T cells (IPD Naive and IPD Activated ) ( Figure 6D). The cells used for estimation of IPD were obtained from different batches of cell purification, using similar methods as for IAD estimation. The homogeneity between the cells isolated from two different mice was quantified by comparing differences in number of differentially regulated genes at similar conditions from two biological replicates. Figure S14 shows that the number of differentially regulated genes are more than ten folds higher in between different states of T Cell (e.g. naïve NC1 and activated NC2) isolated in same batch when compared to the number of differentially regulated genes, between biological replicates of same cell type (e.g. naïve NC1 and naïve NC2). IAD matrices computed from the biological replicates also showed very small variations (measured as matrix of standard deviation between IAD values generated from two replicates) indicating that different batches of cells does not significantly alter the IAD matrix ( Fig S15). The IPD which we used in earlier correlations, was generated from the centroid positions of the chromosomes obtained from 3D chromosome FISH (Fig 6D and Fig S16). Inter-centroid distances are biased by the size of the chromosomes. Two large chromosomes will tend to have their centroids farther apart as compared to small chromosomes, even though the distance between the chromosome surfaces may be same. To overcome this drawback, we generated the IPDs for specific pairs of T Cells by measuring the distance between the chromosome surfaces. The IPDs were generated using minimum interface distance among all the four possible interface distances between the pair. ( Figure 6E) The IPD of Naïve and Activated T cells were correlated with their respective IADs and IAD Muscle of murine muscle cells ( Figure 6C). The obtained PCC values using IPD and IAD (Table S4) of both naïve (0.28) and activated states (0.46) were higher in contrast to PCC computed between IAD Muscle and IPD of T cells (0.002-IPD Naïve , 0.27-IPD Activated ). Interestingly, the differences in the correlation coefficients are similar (0.2820.002 = 0.278 and 0.4620.27 = 0.19) for both naïve and activated T cells when compared to muscle cells. These results indicate that correlation between chromosome organization and their transcriptional output is a general phenomenon which can be observed in multiple cell types.

Discussion
Random loop polymer models have been extensively used to understand the internal architecture of chromosomes. These studies suggest a gene expression based looping probability of the chromatin fibre which leads to formation of functional DNA domains and confinement of chromatin to chromosome [36][37][38]. Transcriptional activity based chromosome intermingling has also been used to explain the frequent juxtaposition of certain pairs of chromosomes and the resulting chromosomal translocation [14]. But very few methods are present to quantitatively measure the correlation between the physical proximity of chromosomes and transcriptional activity [16]. In this work we propose methods to probe the correspondence between the chromosome positions and global gene expression program. While chromosomes have been found to be radially distributed from the nuclear centroid according to their gene density [39][40][41], our methods were able to assess a further layer of three dimensional organization, in which the relative chromosome positions correlated with gene expression. Previous studies suggest both random [42,43] and non random [27,44,45] chromosome positions, whereas our results and analysis revealed the non-random organization within the fibroblast nucleus [27]. The IADs computed from the microarray data showed correlation between relative chromosome activities and their respective positions. These correlations support the coclustering of genes for transcriptional control [21] by a fewer number of observed transcription factories [13,15]. The observed correlations may also have contribution from noise due to coarse graining and population averaging of the chromosome positions and their activities. Further, noise in chromosome activity measurements could also be contributed by the matured mRNA which does not exactly represent the stochastic nature of short lived nascent mRNA transcripts produced at the sites of transcription at single cell level [46,47]. The correlations can be improved by extending the methodology described here to build a more detailed IPD for smaller continuous regions of the chromosome and their corresponding IADs. Our evaluation methods in different cell types suggested that arrangement of coclustered [48] genes must be cell type specific as we find a lack of correlation between chromosome positions of one cell type with the gene expression program of another cell type . The cell type specific transcriptional programs are usually turned on by cell type specific TF networks [49], suggesting their involvement in modulating inter-chromosome interactions. Previous simulation and modeling work on the role of transcription factors in organization of genome in E. coli suggested formation of DNA regulatory domains of co-regulated TF target genes [50]. Similarly in yeast, target genes of TFs were shown to be preferentially coclustered on the same chromosome [51]. In this work we have taken this idea further to suggest role of TF networks in elucidating relative chromosome proximities in nucleus. Numerical simulations shown here suggest that the activity of TFNs was correlated with relative positions of chromosomes. An optimality measure H was devised to quantitatively understand the coupling between 3D chromosome positions and TFNs. Our predictions of the correspondence between chromosome positions and global gene expression were experimentally validated in naïve and activated states of mouse T-cells. These results evidenced correlations between the IPD and IAD of T cells whereas smaller correlations were observed between IPD of T cells and IAD of muscle cells.
Taken together, our methodologies were able to quantify correspondence between global gene expression program and three-dimensional architecture of chromosome positions. While co-clustered genes have been shown to be co-regulated [21,52,53], methods proposed here take these findings to the large-scale organization of the nucleus where transcription dependent intermingling of proximal chromosome territories may become feasible. Interestingly, these correlations are found both at the scale of transcriptome and at the scale of separate transcriptional networks [54]. Our findings suggest that the observed correlations between relative chromosome positions and transcriptional output are specific to a given cell type. The measured correlations are at steady state and with time averaged expression profiles, which smears the time resolved correspondence between chromosome positions and transcriptional activity. However, mechanistic insights into the origin of such correlations could be gained if such correlations are observed during the process of differentiation. Such refinements in IPD and IAD at single cell resolutions can in future yield better insight about contribution of transcription in relative chromosome organization. In addition, the chromosome position in the nucleus is a result of integration of many functional and spatial organizational cues like epigenetic modifications [55], transcription machinery density, post transcriptional or replication requirements. The methodology presented here can be easily adapted to further investigate the contribution of these factors by quantifying them at similar resolution as IPD for chromosomes. The general mechanisms of chromosome topology [11] and their functional links will become apparent as one simultaneously probes the temporal evolution of these correlations through the process of cellular differentiation and its maintenance through cell cycle.
Current methods to evaluate chromosome positions and its impact on gene expression have remained empirical. The introduction of 2D matrix for chromosome positions enabled an analysis of transcriptional changes through cellular differentiation [16]. Our methods further establish the coupling between chromosome positions optimized for a given cell type on a quantitative framework. By implementing comparative analysis methods between chromosome position matrix and activity matrix we were able to evaluate the coupling between TFNs and chromosome organization. Our proposal of a phenomenological analytical function (H), allow a systematic numerical simulation of correlations relating the TFNs and chromosome topology. The function H could be modified to include epigenetic modifications or active RNA polymerase interactions to construct an activity distance matrix of these parameters. This method could be adapted to chromosome sub-domains by painting smaller regions of chromosomes or using contact probabilities from chromosome capture assay [11] and correlating them with the corresponding activity distance matrix of these parameters. These matrices may further provide correlations at a finer resolution of gene clusters and their correspondence with transcription. We suggest that our methods describing interfaces of CTs in conjunction with chromosome capture assays may also facilitate identifying cell type specific functional gene clusters. The methods described in this work could also be useful in establishing correlations between three dimensional organizations of chromosome positions with other functional networks like signaling networks and chromatin remodeling networks.

Ethics Statement
All experiments involving animals were performed with the approval of the Institutional Animal Ethics Committee at National Centre for Biological Sciences, Bangalore headed by Prof. Mathew with committee members Professors Upinder Bhalla, Sumantra Chatterjee, MM Panicker and R. Sowdhamini. Approval ID for the project is AS-5/1/2008.

Inter-chromosome distance (IPD)
The Inter-chromosomal Physical Distance (IPD fib ) for fibroblast was obtained from the chromosomal distances as: Where r i and r j are the chromosomal distances from the nuclear centroid (in units of nuclear radius) in case of interphase and from centre of the prometaphase ring in case of the prometaphase chromosomes, obtained from Bolzer et al. [27]. Each element in the matrix is calculated from the mean of four possible distances between two pairs of homologous chromosomes (as the current experiments cannot distinguish between two different homologues of same chromosome), and further averaged over the 54 nuclei. Similar IPD matrices were constructed using the MDS distances (IPD MDS ) and minimum distances between the four possible values between two pairs of homologous chromosomes (IPD Min ).

IPD Randomization procedure
The interchromosomal physical distance, IPD rand for a random configuration of the nucleus was obtained from iterative swapping of the chromosomes in the fibroblast nucleus, by shuffling the rows and columns of the IPD fib matrix for 200 iterations ( Figure S1), which was sufficient for complete randomization, i.e. loss of chromosome position information from the initial configuration. The randomization process was designed to obey the triangle inequality, a basic property of Cartesian metric space, as no new spatial coordinates are created other than the actual r i already present. Rather, the rows and columns of the IPD matrix were interchanged in a cumulative fashion, with each shuffling performed on the previously shuffled matrix.

Inter-chromosome Basepair length Differences (IBD) Matrix
Differences in chromosome basepair length size was represented as Inter-chromosome Basepair length difference (IBD) matrix defined as Where Chrbp(j) and Chrbp(j) are the basepair lengths of chromosome i and j respectively.

Inter-chromosome Activity (IAD) Matrices
The Inter-chromosome Activity Distance (IAD) was created from the microarray data of fibroblast, obtained from Goetze et al. [28] (GEO accession no.-GSM157869). In the microarray there are multiple probes for a single gene. The activity of a gene in arbitrary units was obtained by calculating the mean intensity of the multiple probes for a given gene. The genes were further grouped into individual chromosomes and the activities were integrated over the whole chromosome. The total number of genes in each chromosome was obtained (http://vega.sanger.ac.uk/ Homo_sapiens/index.html, as on Nov 11, 2008) and used to estimate the mean chromosome activity as: Where x ik denotes the activity of k th gene in the microarray for chromosome i and N is the total number of annotated genes in a chromosome with summation done over all genes in the microarray for a particular chromosome. Logarithmic activities were obtained for each chromosome to account for the large dynamic range of the gene expression data. The IAD matrix was generated from the logarithmic chromosome activity as: To generate the IAD matrices for additional cell types the microarray data for lung, oocytes and HUVEC were obtained from Gene Expression Omnibus website (http://www.ncbi.nlm. nih.gov/geo). The accession number for the datasets are GSM101102 (lungs cells) [56], GSM288812 (oocytes) [57] and GSM215557 (HUVECs) [58]. The chosen microarray data was performed on Affymetrix GeneChip and MAS5 algorithm was used to calculate probe intensity, identical to that of fibroblast microarray data. In order to normalize the intensity variation between different cell types the microarray data was normalized to the mean of the probes in the array.
where (x_norm) i is the normalized activity of the probe and x x is the average activity of the probes in the array. The activity of a gene was calculated by taking the mean of the normalized probe intensities. To evaluate cell-type specific activities, cell types were considered as pairs (fibroblast-lung, fibroblast-HUVEC and fibroblast-oocyte). For example, if fibroblast and lung is considered as a pair, the differentially expressed genes between the two cell types were selected based on the expression level differences of corresponding genes in the two cell types. Genes with expression level difference of more than 1FWHM (Full Width at Half Maxima, calculated from the difference histogram) were selected. Hence, the activities of same genes from both cell types were compared. Further, these genes were partitioned into their respective chromosomes and the IAD matrix for each cell type was computed with their respective logarithmic activities (similar to IAD fib ). The two resulting matrices were named as IAD fib-lung , if activity is computed from fibroblast microarray data and IAD lungfib , if the activity is computed from microarray of lung cells. This ensures that the number of genes selected for a given chromosome is same in both the cell types

Adjacency and network matrix
The physical space of chromosomes was represented by the IPD fib matrix. To enhance the sensitivity of the chromosome positions, an adjacency matrix was generated with weights for each inter-chromosome distance as: Where, the l in the exponential is the distance parameter which was varied from 2-80% of typical nuclear radius, to find the optimum l. Upon variation of l, the increase in H-value was computed ( Figure S7) and the l which showed maximum increase was selected as the optimal l for the simulation. This functional form of adjacency matrix with a steep slope is sensitive to change in chromosome positions. The exponential form was used to detect small deviations in chromosome organization from the optimum configuration. The network space was represented by a network matrix W cell if , consisting of 87 annotated transcription factor (TF) networks obtained from Transcription Regulatory Element Database database (TRED), Jiang et al [56]. The genes in the TF network were identified in the microarray data for human fibroblast, lung, oocyte and HUVEC cell types and grouped into chromosomes. Chromosomal activity of genes involved in a particular TF network was obtained by calculating the natural logarithm of the integrated activities of all the identified genes in a chromosome. The chromosomes in which no genes were identified, a value of zero was assigned for that chromosome in the network matrix. A column vector was created for individual networks, where column f represents single TF network. The 87 column vectors were aligned to obtain the network matrix for the TF network, where f is a network and i a chromosome. All abbreviations used are listed in Table S5.

Mouse T-cell Microarray and Analysis
For T-cell experiments, cells were isolated from spleens of C57/ Bl6 mice. All animals were bred and maintained in the NCBS animal house facility. Experiments were performed with the approval of the Institutional Animal Ethics Committee at NCBS, Bangalore, India. CD4 + naïve T-cells were isolated from spleen of 8-10 week old mice using MagCellect isolation kit (R&D Systems, MN, USA) activated for 36 hours using aCD3-aCD28 coated beads (Invitrogen, CA, USA). This method consistently produces 90-98% pure populations of T cells (according to manufacturer's protocol). Microarray experiments on cells, purified in our laboratory at NCBS, Bangalore, were carried out at Genotypic (Bangalore, India). Duplicate experiments were done for both naïve and activated T cells. RNA extraction was done using RNeasy Minikit (Qiagen, UK), concentration and purity were determined using NanodropH ND-1000 spectrophotometer (Na-noDrop Technologies, Wilmington, DE, USA). The integrity of RNA was verified on an Agilent 2100 Bioanalyzer using the RNA 6000 Nano LabChip (Agilent Technologies, CA, USA). Equal amounts of RNA was labeled using Agilent dye Cy3 CTP and hybridized to Mus musculus GeneExpression Array 4X44K (AMADID -014868). The slides were scanned using Agilent Microarray Scanner G2505 version C at 2 mm resolution, and data was extracted using Feature Extraction software v 9.5 of Agilent. Though the microarray was performed on two batches of T cells obtained from two different mice, the gene expression across replicates was very similar as compared to the gene expression between naïve and activated T Cells, and hence it does not introduce significant noise in the analysis.To further minimize the noise in estimation of genome wide expression profile, mean of probe intensity for two duplicates was computed. Mean of intensities of probes for the same gene was calculated to obtain the activity per gene. Further, genes were grouped into chromosomes and IAD matrix was computed as explained earlier.
The microarray data is MIAME compliant and accessible on GEO website (http://www.ncbi.nlm.nih.gov/geo/; accession number GSE30196).Microarray data of murine muscle cells was obtained from Gene Expression Omnibus (GEO accession number GSM247205) and used as a negative control.

Chromosome painting and image analysis
For chromosome painting experiments, cells were stuck on PDL coated slides followed by fixation in 4% PFA for 10 minutes. PFA was neutralized with 0.1M Tris-HCl and then the cells were washed and permeabilized with 0.5% Triton X-100 for 8 minutes. This was followed by incubation in 20% glycerol for 1 hour and then 5 or 6 freeze-thaw cycles in liquid nitrogen. After this, cells were treated with 0.1N HCl for 10 minutes, washed and equilibrated in 50% Formamide/2X SSC overnight at 4uC. Hybridization was set up the following day. Cells were denatured in 70% formamide/2X SSC at 85uC for 2 minutes and then incubated with the fluorescently labeled mouse whole chromosome FISH probes (Cambio, Cambridge, UK) for 2-3 days in a moist chamber at 37uC with shaking. At the end of the incubation period, slides were washed thrice each in 50% Formamide/2X SSC at 45uC and 0.1X SSC at 60uC. Cells were counterstained with Hoechst 33342 (Sigma, USA) and then mounted with Vectashield (Vector Laboratories, CA, USA), sealed with coverslip and imaged on a Zeiss 510-Meta confocal microscope.
Inter-chromosome interface distances were computed using a custom written program in LabVIEW (National Instruments, TX, USA). Confocal Z sections for each chromosome were thresholded using (mean+standard deviation) of the fluorescence intensity of the Z-stack. Edge detection algorithm was used on each thresholded confocal section to obtain coordinates of chromosomal edge at each z plane. The three dimensional distance between the edges of two chromosomes were computed. The interface distance between two chromosomes was estimated by calculating the minimum distance, out of all the distances computed between the edge coordinates of the two chromosomes. Since two pairs of chromosomes are labeled in each nucleus, four interface distances were obtained. Average of the four interface distances for a given pair of chromosomes was then used as IPD mean for the given pair of chromosomes.

Statistical analysis
False discovery rate (FDR) was employed as a statistical measure to test the significance of the correlations obtained (Figures S3 &  S5 and Table S2). The actual pearson correlation (PCC) value in each case was denoted as PCC 0 . To compute the FDR, 10 5 randomized matrices (from either IPD fib or IPD prometaphase ) were generated (each randomized matrix was computed by permuting its rows and columns) and each randomized matrix was correlated with matrix under consideration. A histogram of all the PCC value for the correlation between the matrices was generated and the instances with PCC.PCC 0 . FDR was estimated as the fraction of instances with PCC.PCC 0