Skip to main content
Advertisement
  • Loading metrics

Spatial clustering and common regulatory elements correlate with coordinated gene expression

  • Jingyu Zhang ,

    Contributed equally to this work with: Jingyu Zhang, Hengyu Chen

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft

    Affiliations Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China, Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America

  • Hengyu Chen ,

    Contributed equally to this work with: Jingyu Zhang, Hengyu Chen

    Roles Data curation, Formal analysis, Methodology, Resources, Validation, Writing – review & editing

    Affiliation Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China

  • Ruoyan Li,

    Roles Data curation, Formal analysis, Methodology, Resources, Validation, Writing – review & editing

    Affiliation Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China

  • David A. Taft,

    Roles Writing – review & editing

    Affiliation Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America

  • Guang Yao,

    Roles Funding acquisition, Resources, Writing – review & editing

    Affiliation Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, United States of America

  • Fan Bai ,

    Roles Conceptualization, Funding acquisition, Resources, Supervision, Writing – original draft, Writing – review & editing

    fbai@pku.edu.cn (FB); xing1@pitt.edu (JX)

    Affiliation Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, China

  • Jianhua Xing

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Writing – original draft, Writing – review & editing

    fbai@pku.edu.cn (FB); xing1@pitt.edu (JX)

    Affiliations Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America, UPMC-Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, United States of America

Abstract

Many cellular responses to surrounding cues require temporally concerted transcriptional regulation of multiple genes. In prokaryotic cells, a single-input-module motif with one transcription factor regulating multiple target genes can generate coordinated gene expression. In eukaryotic cells, transcriptional activity of a gene is affected by not only transcription factors but also the epigenetic modifications and three-dimensional chromosome structure of the gene. To examine how local gene environment and transcription factor regulation are coupled, we performed a combined analysis of time-course RNA-seq data of TGF-β treated MCF10A cells and related epigenomic and Hi-C data. Using Dynamic Regulatory Events Miner (DREM), we clustered differentially expressed genes based on gene expression profiles and associated transcription factors. Genes in each class have similar temporal gene expression patterns and share common transcription factors. Next, we defined a set of linear and radial distribution functions, as used in statistical physics, to measure the distributions of genes within a class both spatially and linearly along the genomic sequence. Remarkably, genes within the same class despite sometimes being separated by tens of million bases (Mb) along genomic sequence show a significantly higher tendency to be spatially close despite sometimes being separated by tens of Mb along the genomic sequence than those belonging to different classes do. Analyses extended to the process of mouse nervous system development arrived at similar conclusions. Future studies will be able to test whether this spatial organization of chromosomes contributes to concerted gene expression.

Author summary

Cellular responses to environmental stimulation are often accompanied by changes in gene expression patterns. Genes are linearly arranged along chromosomal DNA, which folds into a three-dimensional structure. The chromosome structure affects gene expression activities and is regulated by multiple events such as histone modifications and DNA binding of transcription factors. A basic question is how these mechanisms work together to regulate gene expression. In this study, we analyzed temporal gene expression patterns in the context of chromosome structure both in a human cell line under TGF-β treatment and during mouse nervous system development. In both cases, we observed that genes regulated by common transcription factors have an enhanced tendency to be spatially close. Our analysis suggests that spatial co-localization of genes may facilitate the concerted gene expression.

Introduction

A cell continuously receives signals from its local environment and accordingly adjusts cellular programs, such as cell proliferation, motility and metabolism [1]. Typically, regulation of a cellular process requires changes in the expression of a group of genes in a temporally coordinated manner [2]. How such coordinated regulation is achieved is a central question that remains poorly addressed.

A mechanism of such regulation is through specific interaction network structures of transcription factors (TFs). TFs bind to certain DNA sites and regulate transcriptional activities of their targeted genes. A TF can regulate multiple target genes to form a so-called single-input-module (SIM, or fan-out) [3]. This SIM network motif appears in a high frequency to coordinate the expression of genes with related functions such as those in bacterial metabolic pathways [4]. Gene regulation in eukaryotic cells is more complex since the three-dimensional structure of DNA has a more profound impact on gene transcription than that in prokaryotic cells. For instance, a nucleosome structure with a high packing level limits gene accessibility [5]. Furthermore, epigenetic modifications can strongly influence gene transcription [6]. It is not fully understood how these different regulation mechanisms collectively control the expression of a group of genes.

To examine how multiple levels of regulation lead to concerted expression of gene groups, we analyzed the temporal gene expression profiles of TGF-β treated human mammary epithelial MCF10A cells in the context of histone modification patterns and chromosome structures derived from Hi-C data. The TGF-β family is crucial for regulating a complex signal transduction network in embryonic and fetal development, and is also involved in multiple physiological and pathological processes such as wound healing and cancer progression [7]. Its signaling event starts from membrane embedded TGF-β receptors, which bind active TGF-β molecules from the extracellular environment [8]. The TGF-β signal is then transmitted into the cell through a signal transduction network and triggers a cascade of cellular responses. The latter is achieved through temporally coordinated expression changes of groups of genes with related functions such as cell proliferation, metabolism, and motility [9]. TGF-β also induces a global reprogramming of cell epigenome [10], which reinforces cellular responses for committed cell phenotype transition. We also analyzed temporal gene expression together with histone modifications and chromosome structures during mouse neural differentiation, another well-defined model for studying cell phenotype transition [11, 12]. Specifically, we analyzed a recently published dataset that combined Hi-C, RNA-seq, and ChIP-seq studies on the differentiation process from mouse embryonic stem cells (ESCs) to neural progenitor cells (NPCs) then to cortical neurons (CNs) [13]. In both the TGF-β response and neural differentiation systems, our analyses reveal that genes co-regulated by a common TF(s) have the tendency to be spatially close, even if they are distant along the linear genome sequence.

Materials and methods

Cell culture

MCF10A cells were purchased from the American Type Culture Collection (ATCC) and were cultured in the DMEM/F12 (1:1) medium (Gibco) with 5% horse serum (Gibco), 100 μg/ml of human epidermal growth factor (PeproTech), 10 mg/ml of insulin (Sigma), 10 mg/ml of hydrocortisone (Sigma), 0.5 mg/ml of cholera toxin (Sigma), and 1x penicillin-streptomycin (Gibco). Cells were cultured at 37°C with 5% CO2 with a medium change every the other day. We induced the cells with 4 ng/ml human recombinant TGF-β1 (Cell signaling).

RNA extraction and library preparation

Total RNA was isolated from the cell pellets with an RNA extraction kit (Qiagen, Cat No. 74104). All RNA extracts were confirmed with high quality (RQN score = 10.0) using the Fragment AnalyzerTM platform (Advanced Analytical Technologies, Inc). Libraries were prepared using the NEBNext Ultra RNA Library Prep Kit for Illumina (NEB, Cat No. E7530L) according to the manufacturer's instructions. Briefly, mRNA was first isolated from total RNA with oligo d(T)25 beads (all volumes were halved except for washing steps, NEB, Cat No. E7490S). Next, purified mRNA was denatured and melted into small fragments, and subjected to random priming and extension for reverse transcription. After that, double-stranded cDNA was end-repaired, dA-tailed, adaptor ligated, and amplified with 12 PCR cycles. Constructed libraries were subjected to purification and quality control; the final quality-ensured libraries were pooled and sequenced on an Illumina HiSeq 4000 instrument for 150 bp paired-end sequencing.

RNA-seq data processing

Paired-end cleaned reads were aligned to the human reference genome hg19 (UCSC) using TopHat (v 2.1.1) with default parameters. The BAM files of mapped reads were used to annotate transcripts and calculate the FPKM values using the Cufflinks, Cuffquant, Cuffnorm suite [14]. Differentially expressed (DE) genes were identified between any two time points with the criteria: fold change >2 or < 0.5 and FDR < 0.05. The FPKM values of genes from the RNA-seq dataset were further cleaned up using custom R scripts. Hierarchical clustering of genes was performed using an R package (pheatmap). Gene expression and TF regulation based Hidden Markov Model (HMM) clustering was performed with the DREM2 software [15]. RNA-seq results of ESC, NPC and neuron cells were downloaded from the GEO database under the accession number GSE96107.

Chromosome structure analyses

Hi-C data were downloaded from the GEO database (MCF10A, GEO:GSE66733; mouse nervous system GEO:GSE96107). Chromosome structures were constructed using an R package (igraph). Clustering of bins was obtained with the fast-greedy algorithm [16]. Physical distances between bins were estimated with a Matlab code provided by Lesne et al. [17]. This code uses a Shrec3D algorithm, which first relates the Hi-C contact frequency between every two genomic sites with a spatial distance, then approximates the actual distance between the two sites by their shortest-path distance on a contact graph. This algorithm alleviates uncertainty of reconstructing the spatial distance between two distal sites only by their own contact frequency.

Distribution function calculation

Linear distribution function.

For a tagged HMM class α gene, we divided the flanking sequences into bins with a size of Δl base pairs, and the i-th pair of bins [(−i−1)Δl,−i Δl] and [i Δl,(i+1)Δl], i = 0, 1, etc. (S1A Fig). In the i-th pair of bins, there are n genes belonging to the same HMM class as the tagged gene. For the 0-th pair of bins the counting of the genes should exclude the tagged gene. The linear correlation was calculated as , where i = 0, 1, 2, etc. Nα was defined as the total number of genes belonging to class α, and the average <•>α was performed over every HMM class α gene as the tagged gene.

As a control, where ni was the total number of genes in the i-th bin and N the total number of human genes, and where niD is the total number of DE genes in the i-th bin, and ND is the total number of DE genes.

Spatial distribution function.

The idea of a radial distribution function from statistical mechanics was implemented (S1B Fig) [18]. Each chromosome was divided into sequence bins with a size of 250 kb. A tagged gene from HMM class α resides in a bin that we referred to as the tagged bin. The spatial distance in the 3D physical space between the tagged bin and another bin containing a specific HMM class β gene was analyzed using the Shrec3D algorithm [17] to convert the contact frequency between two bins from Hi-C data to a spatial distance. The sphere centered at the tagged bin was divided into shells with a thickness Δr. In our analysis, Δr ≈ 60 nm based on the estimated conversion in [17].

Next, an average spatial correlation function between a class-α-gene-containing bin at the origin and class-β-gene-containing bins within the i-th shell was defined as, where nβi is the number of HMM class β genes within a spherical shell (iΔr,(i+1)Δr), and this number excludes the tagged class α gene within the 0-th shell in the case β = α; Nβ is the total number of class β genes and this number excludes the tagged class α gene in the case β = α; V is the volume of the nucleus and the unit was chose so that V = 1, and the average over α is again performed over all genes belonging to class α as the tagged gene.

Similarly, the controls were defined as, where ni is the number of all genes within the i-th shell; N is the total number of human genes; niD is the number of DE genes within the i-th shell; and ND is the total number of DE genes. Again, the tagged gene was excluded when counting n0 and nd0.

Results

Changes in gene expression reflect cell phenotype transition in response to TGF-β

We used MCF10A cells, a non-tumorigenic human mammary epithelial cell line, as a major in vitro model to study in this work. This cell line has been widely used to study the TGF-β induced epithelial-to-mesenchymal transition (EMT) [1, 19] (Fig 1A). Cells were treated with 4 ng/ml TGF-β for 12 hours, 2, 3, 5, 8, 12, and 21 days (Fig 1B). Untreated MCF10A cells showed typical epithelial morphology with tight cell-to-cell adherence. With TGF-β treatment, we observed progressive morphological changes indicating the transformation from epithelial to mesenchymal phenotype. From day 2 to day 5, cells started to show loosened intercellular adherence. After day 5, some cells appeared with expanded cell size and extended long cell axis. With further TGF-β treatment, more cells acquired a spindle-like shape. On day 21, only a small fraction of cells still maintained epithelial morphology and most cells had undergone EMT.

thumbnail
Fig 1. MCF10A cell responses to TGF-β treatment.

(A) Schematic diagram of phenotypic transition from epithelial cells to mesenchymal cells in response to TGF-β treatment. (B) MCF10A cells undergo morphological changes in responses to 4 ng/ml TGF-β. (C) PCA clustering of TGF-β treated MCF10A cells reveals distinct gene expression patterns over time.

https://doi.org/10.1371/journal.pcbi.1006786.g001

Next, we performed RNA-seq studies to uncover changes of gene expression accompanying EMT. At each time point, we harvested cell samples and extracted RNA. The RNA-seq results revealed that about 33% of human genes were differentially expressed upon TGF-β treatment. Principal component analysis (PCA) over these ~ 7000 DE genes showed an expected larger separation between gene expression profiles of samples from different time points than those of replicate samples from the same time point (Fig 1C). The global transcriptome change over time reflected in the PCA space was consistent with the gradual morphological change of cells over time and the previous report that TGF-β-induced EMT proceeded through intermediate states [19].

Gene classes sharing similar expression patterns and upstream regulators exhibit similar functional characteristics

To further examine the temporal patterns and functions of the DE genes, we performed hierarchical clustering (HC) analysis. The analysis divided the DE genes into seven HC classes based on similar expression patterns in each (Fig 2A) [20]. Among the seven HC classes, class I with ~1,700 genes exhibit a monotonically decreasing pattern, and class II of ~2,000 genes exhibit a monotonically increasing pattern. Another two classes III and IV show transient up and transient down dynamics, respectively. The remaining three classes V-VII display wavy dynamic patterns to varying degrees. Gene ontological (GO) analysis (S2 Fig) revealed that genes in each class are typically involved in multiple cellular processes. For example, genes in the decreasing class (class I) are related to RNA polymerase I activity and snoRNA binding. These two functions are related to the RNA metabolic process, including ribosomal RNA production, modification, and binding to regulatory factors. The observation that these genes are down-regulated is consistent with previous reports that under TGF-β treatment cells are under growth arrest until they finish EMT [21].

thumbnail
Fig 2. TGF-β induced gene expression changes show distinct temporal patterns.

(A) Hierarchical clustering of genes based on temporal gene expression patterns only. (B) Violin plots of distributions of indicated histone modification levels sampled through genes belonging to individual hierarchical clusters. Numbers 1–7 on the x-axis follow the order of gene clusters in panel A. The control group ‘A’ was sampled through all genes.

https://doi.org/10.1371/journal.pcbi.1006786.g002

Histone modifications can also affect gene expression [22]. To investigate the relationship between histone modification and gene expression, we integrated genome-wide H3K4me3 and H3K4ac profiles obtained by Messier et al. [23] with our RNA-seq data. Both H3K4me3 and H3K4ac are histone modification marks that are associated with active or poised genes [24]. We used H3K4me3 and H3K4ac profiles of all human genes as a control, and examined the marks in each HC class. The results in Fig 2B show that all HC classes have elevated H3K4me3 and H3K4ac compared to the control, and there is no apparent difference between different classes. Each HC class also has a broad bimodal distribution. That is, genes within an HC class do not share common histone modification patterns. Given that histone modification patterns correlate with local chromosome structures [25], these results suggest that genes from the same HC class have heterogeneous local chromosome environments.

Next, we adopted a different clustering scheme, the Dynamic Regulatory Events Miner (DREM), which clusters genes by combining gene expression time series with additional pre-established transcriptional networks [26]. Fig 3A shows the clustering results analyzed with DREM2 based on a Hidden Markov Model (HMM) [15]. At each conjunction node, genes are assigned to different branches based on their expression trend and upstream regulators (transcription factors on this node). Genes from an upstream branch can become key regulators at subsequent nodes [15, 26]. It reveals a hierarchy of gene regulation during the process of TGF-β-induced phenotype change. With DREM2 the DE genes were clustered prominently into 46 branches with 19 nodes at the conjunction sites and 25 end classes. For clarity, we call the latter HMM classes to distinguish from the HC classes that are based on expression only.

thumbnail
Fig 3. Genes clustered based on both expression patterns and key transcription factors show a correlation between patterns of expression and histone modification.

(A) Dynamic regulatory map obtained through the DREM2 analysis. (B) Distribution of indicated histone modification levels sampled through genes belonging to individual HMM classes. Group ‘A’ represents the control group that includes all genes.

https://doi.org/10.1371/journal.pcbi.1006786.g003

Compared to the HC classes, HMM classes showed finer dynamic patterns and GO enrichment information (S1 Table). For example, genes in the first seven HMM classes all had increased expression, but differed in their detailed temporal profiles. Genes in class C1 increased their expression to high levels already on day 2. Genes related to metalloendopeptidase activity were enriched in this class by over 17 fold with respect to the reference genes. Four of the matrix metalloproteinases (mmps), mmp2/7/11/13, are also in this class. These four MMPs are known to degrade components of extracellular matrix proteins such as gelatin, fibronectin, and laminin, and mediate biological activities including migration, mammary epithelial cell apoptosis, and EMT [27]. Heparin binding genes were another type of highly enriched genes. These genes, such as periostin (postn), fibronectin (fn1), are also known to be related to matrix or cell membrane formation and thus affect cell migration and adhesion [28]. Another class of early activation genes, class C2, was also enriched with genes related to cell matrix and membrane structure. Among them five of the pcdh family members, including pcdh7/a4/b9/b10/b13, are integral membrane proteins that are involved in cell-cell recognition and adhesion [29]. In general, genes within each HMM class had narrower distributions and thus higher similarity of histone modification patterns (Fig 3B) than those of the HC classes do (Fig 2B). Therefore, genes clustered through the DREM2 analysis based on common TFs and similar dynamic profiles tend to have closely related functions.

Genes sharing common regulators have an enhanced tendency to be spatially close

As mentioned above, local chromosomal DNA environment affects gene transcriptional activity. We wondered whether genes sharing similar expression patterns and common regulatory factors, as in an HMM cluster identified by the DREM2 analysis, are also spatially close and share similar local DNA environment. To test this hypothesis, we first examined gene arrangement along the linear genome sequences. We divided the whole human genome into bins with a resolution of 1 Mb, a typical size of a topologically associated domain (TAD). Then we matched all genes to the relevant bins based on their genomic positions. Statistical analysis of all the genes spreading along the chromosomes showed that genes are not evenly distributed along the DNA sequences (Fig 4A). Most bins have less than ten genes, and globally one third of the bins are gene-free. By contrast, ~3% of the bins (a total of less than 100 bins) contained 17% of the overall human genes. This uneven distribution was slightly more profound for the DE genes under TGF-β treatment: DE genes resided in less than half of the bins and 17% of DE genes were enriched in only 2.5% of the bins.

thumbnail
Fig 4. Genes with similar expression patterns and controlled by the same up-stream regulators show an enhanced tendency to co-localize spatially in the 3D chromosome structure.

(A) Heat map shows the numbers of 1-Mb bins containing a given number of genes and TGF-β responding genes. The orange line highlights the bins in which all genes responded to TGF-β treatment. (B) Linear and radial distribution functions of TGF-β-responding genes within two representative HMM classes. We calculated the distribution of genes by sampling three types of gene groups: all available genes (All genes, and ), genes within an indicated HMM group (HMM genes, and ), and genes that showed differential expression during TGF-β treatment (DE genes, and ). For spatial distance, with the shell width Δr approx 60 nm. (C) Pseudo-spatial arrangement of genes belonging to two representative HMM classes, respectively. Each circle indicates a 1-Mb bin. The gray level in a circle scales to the number of genes in the bin that belongs to the indicated HMM class. The two-dimensional spatial arrangement of bins within one chromosome was calculated by a fast-greedy algorithm based on the contact frequencies between each pair of bins from Hi-C data. The line width between two circles is proportional to the contact frequencies between the two corresponding bins. Genes within each red box are within ~ 300 nm in space. (D) Relative gene densities of all HMM classes within the first shell of the radial distributions normalized by the average density of all genes around the targeted genes in the first shell (i.e., and .

https://doi.org/10.1371/journal.pcbi.1006786.g004

To further examine the gene distribution along chromosomes, we defined an averaged linear distribution function σL (see Materials and Methods for details). It measured how the chromosomal density of a group of genes of interest changes with respect to the transcription starting site (TSS) of a given gene. For a given gene x belonging to an HMM class α as a tagged gene, we divided the DNA sequences along both sides flanking the TSS of x into bins with a size of 125 kb (S1A Fig, r = 125 kb), and counted the fraction of HMM class α genes in each bin. We then repeated this process by choosing every gene in the HMM class α as the tagged gene, and calculated the average density of HMM class α genes () in the i-th bin with respect to the tagged gene. For statistical comparison, we also calculated a similar for all human genes and for all DE genes with respect to the tagged HMM class α genes as controls. If there were no class-specific gene clustering along the genomic sequence, one would expect that within statistical errors (see Materials and Methods for explanation of terms). Instead, the σL values of more than half of HMM classes were not significantly higher than those of DE gene and all human gene controls. The upper left panel of Fig 4B shows HMM class C23 as an example. Only five HMM classes showed statistically significant increases of σL values over controls (although the increases are small) within the first pair of bins (≤ 125 kb), indicating relative accumulations of genes from the same HMM class; one of them (HMM class C24) is shown in Fig 4B upper right panel.

Next, we investigated the spatial arrangement of the DE genes using a set of available Hi-C data from MCF10A cells [30]. Following an approach used in statistical mechanics [18], we defined a set of radial distribution functions () that measured the average radial density (rd) of HMM class β genes and residing inside the i-th evenly divided spherically shell relative to a tagged class α gene, and averaged the rd values over all class α genes (S1B Fig). For comparison we also defined two controls ) and ), where the class β genes were replaced by all human genes and all DE genes, respectively. If there were no HMM class-specific gene spatial clustering, one would expect that within statistical errors, . According to this metric, however, genes in the classes C23 and C24, as discussed above, exhibited substantial spatial clustering. Genes in class C24 tended to be spatially close (Fig 4B bottom right), likely due to their arrangement in a linear sequence. Notably, genes in class C23 also showed significantly enhanced spatial co-localization. With respect to a tagged C23 gene, the rd values of C23 genes within the first shell was more than doubled than that of all genes, which means that even some C23 genes that are not close along the linear sequence come close spatially. To visualize such spatial clustering of genes from an HMM class, we generated a two-dimensional plot of 1-Mb bins on chromosome 1 based on bin-bin contact frequencies obtained from the Hi-C data (Fig 4C). The red boxes show spatial aggregation of genes on chromosome 1 that belong to the two classes, respectively. Further analysis revealed significant gene spatial clustering in the first shell for all HMM classes compared to that of the controls (Fig 4D), and showed that spatial clustering mainly takes place within each HMM class (S3A Fig). That is, genes sharing a common upstream regulator have an enhanced tendency to be spatially close.

We also examined how the genes within the first shell of a tagged gene are distributed along the chromosome sequence (S3B Fig). While a large contribution to the average radial gene density () came from genes that were already close along the chromosome sequence, some gene elements as far as ~ 50 Mb apart resided spatially close.

Genes of similar functions tend to cluster spatially during embryonic development

Next, we asked whether the observed spatial clustering of genes with related function is beyond the TGF-β induction of MCF10A cells. To this end, we performed similar DREM2 and linear/spatial gene density analyses on the differentiation of mouse ESCs into NPCs then CNs (Fig 5A), for which both RNA-seq and Hi-C data for the three developmental stages were reported by Bonev et al. [13]. A DREM2/HMM analysis clustered ~ 20,000 mouse genes into seven classes based on both their expression patterns during neuron cell differentiation and TF regulation (Fig 5B). For both ESC and CN cells, radial distributions (Fig 5C and 5D and S4A Fig) show that genes within the same HMM class have a slightly enhanced tendency to cluster spatially in the first shell compared to the control groups. We observed a similar tendency for NPC cells but to a less extent.

thumbnail
Fig 5. Mouse ESC-CN system shows a similar enhanced tendency of physical proximity for co-regulated genes with similar expression patterns.

(A) Schematic diagram of the development from ESC to CN cells. (B) Dynamic regulatory map obtained through the DREM2 analysis. (C) Heat map of intra- (diagonal) and inter- (off-diagonal) HMM class gene densities within the first shell of radial distribution relative to the corresponding densities of all genes (as control) in ESC cells or CN cells (i.e., ). (D) Gene densities of all HMM classes within the first shell of radial distribution relative to the corresponding densities of all genes (i.e., )) in ESC cells or CN cells.

https://doi.org/10.1371/journal.pcbi.1006786.g005

Compared to the MCF10A cell data, the ESC-CN system showed less enhanced spatial clustering within individual HMM classes relative to that of the control. We reasoned that DREM2 clustering was more coarse-grained in the ESC-CN system due to the limited number of time points in the available RNA-seq datasets. Each HMM class is thus likely composed of multiple sub-classes regulated by different TFs. The expected effect of spatial clustering within each sub-class () is then reduced by their spatial relation to other sub-classes (), where μ and ν refer to two sub-classes within one HMM class. Apparently, the ratio reaches an asymptotic value of one if there is only one HMM class. This reduction due to unresolved class mixtures was less severe for MCF10A cells, for which the DREM2 clustering was finer. To support this hypothesis, we reanalyzed the MCF10A RNA-seq data assuming that one can only identify nine HMM classes branched on day 2 (S4B Fig), and eight of them are mixtures of the finer classes obtained from analyzing RNA-seq data at all time points (as shown in Fig 3A). As expected, S4C Fig shows that the extent of spatial clustering of genes within each class is reduced as compared to those shown in Fig 4D.

Discussion

Recent studies on chromosome conformations have revealed the existence of structural units such as promoter-enhancer hubs, topologically associated domains (TADs), and meta-TADs and demonstrated that these structural units play important roles in gene regulation [3134]. Several studies focusing on specific genomic regions have shown correlation between gene expression and local chromosome structures [35, 36]. In this work, we provide a genome-wide perspective on the relationship between chromosome structure and gene regulation by integrating the RNA-seq and Hi-C data. We first only used the expression data and grouped genes that share similar temporal expression patterns and are co-regulated by common TFs together. We found that genes within each group display a significantly enhanced tendency to be clustered spatially in the three-dimensional chromosome structure, regardless whether these genes are close (< 1 Mb) along the genome sequence or separated by as far as tens of Mb. This observation further suggests that the three-dimensional chromosome structure is part of a multi-layer gene regulation program.

Our analysis reveals two related mechanisms that achieve spatial clustering of genes subject to common regulators. Some genes are located close in chromosome sequence and consequently spatially close. By contrast, some genes that are far apart along chromosome sequence can also become adjacent spatially by forming three-dimensional structures. TFs may actively orchestrate such chromosome structure organization [37, 38]. Alternatively, other DNA binding factors such as long non-coding RNAs and transcription initiation complexes can drag associated chromosomal regions together to form enhancer-promoter hub structures. These hub structures may facilitate TF binding and related cooperative regulation such as phase separated molecular assemblies [39].

Functionally, spatial co-localization may contribute to temporally coordinated regulation of a group of genes in eukaryotic cells. This co-localization can be viewed as a further refinement of the SIM network motif first noticed in prokaryotic cells. Spatial co-localization may facilitate simultaneous regulation of local chromosomal environment of these genes, such as DNA methylation and histone modification, and chromosome compaction, all of which affect gene expression activities. Indeed, a recent study on Drosophila embryos shows that a group of genes separated by genomic distance but pulled together by an enhancer element exhibit similar expression fluctuation patterns [40].

Our analysis of the MCF10A data, however, has a number of limitations. While we performed RNA-seq analysis of MCF10A cells at a number of time points during TGF-β treatment, the lack of simultaneous time-course Hi-C and epigenomic data prevented us from analyzing how spatial clustering may change dynamically upon the change of gene expression status. In addition, having the RNA-seq, Hi-C and epigenomic datasets obtained from different labs also raises a concern of potential cell line drifting during culture. It is desirable to have an integrated set of parallel RNA-seq, epigenomic and Hi-C measurements from the same batch of cells, similar to how the ESC differentiation was studied by Bonev et al. [13] but at more time points. Together with the gene regulatory network analysis, such datasets would permit finer clustering and identifying gene groups that each contains multiple spatially clustered, co-regulated and functionally related genes, and examining to what extent these units are either cell type specific or conserved among different cell types.

In summary, based on an integrated analysis of transcriptome, epigenome, and chromosome 3D structural information we propose a mechanism for concerted regulation of gene groups that can be further evaluated with more systematically measured datasets. That is, concerted gene regulation can be achieved through a common trans regulator(s) and the spatial co-localization of target genes. This observation further suggests that genes may be spatially organized into functional units, consistent with the hierarchical patterns and long-range interactions revealed by chromosome structure studies [36, 41]. The relationship between gene expression and chromosome structure can be better understood by grouping genes into finer HMM classes based on their expression patterns and regulatory elements.

Supporting information

S1 Fig.

Schematic illustration of the linear (A) and radial (B) distribution functions.

https://doi.org/10.1371/journal.pcbi.1006786.s001

(TIF)

S2 Fig. Gene ontology (GO) analysis of hierarchical-clustering classes.

https://doi.org/10.1371/journal.pcbi.1006786.s002

(TIF)

S3 Fig. Genes from one HMM class have enhanced tendency to cluster spatially.

(A) Radial distribution matrix of genes belonging to various HMM classes as normalized by the corresponding average density of all genes around the targeted gene in the first shell (i.e., ). (B) Distribution of linear genomic distances between a tagged gene and genes from the same HMM class in the first shell.

https://doi.org/10.1371/journal.pcbi.1006786.s003

(TIF)

S4 Fig. Enhanced tendency of spatial clustering among genes within one HMM class is concealed by unresolved HMM class mixtures.

(A) Gene radial distributions, and for the mouse nervous system development, with the shell width Δr approx 60 nm. (B) Eight of the nine HMM classes of TGF-β treated MCF10A identified on day 2 are mixtures of finer HMM classes shown in Fig 3A. (C) Relative gene densities of the HMM classes shown in panel B within the first shell of radial distribution in the first shell, normalized as fold changes with respect to the corresponding average densities of all genes around the targeted genes (the red dashed line), i.e., and .

https://doi.org/10.1371/journal.pcbi.1006786.s004

(TIF)

S1 Table. Gene ontology of DREM2 classes.

Fig 1 MCF10A cell responses to TGF-β treatment.

(A) Schematic diagram of phenotypic transition from epithelial cells to mesenchymal cells in response to TGF-β treatment. (B) MCF10A cells undergo morphological changes in responses to 4 ng/ml TGF-β. (C) PCA clustering of TGF-β treated MCF10A cells reveals distinct gene expression patterns over time.

Fig 2 TGF-β induced gene expression changes show distinct temporal patterns.

(A) Hierarchical clustering of genes based on temporal gene expression patterns only. (B) Violin plots of distributions of indicated histone modification levels sampled through genes belonging to individual hierarchical clusters. Numbers 1–7 on the x-axis follow the order of gene clusters in panel A. The control group ‘A’ was sampled through all genes.

Fig 3 Genes clustered based on both expression patterns and key transcription factors show a correlation between patterns of expression and histone modification.

(A) Dynamic regulatory map obtained through the DREM2 analysis. (B) Distribution of indicated histone modification levels sampled through genes belonging to individual HMM classes. Group ‘A’ represents the control group that includes all genes.

Fig 4 Genes with similar expression patterns and controlled by the same up-stream regulators show an enhanced tendency to co-localize spatially in the 3D chromosome structure.

(A) Heat map shows the numbers of 1-Mb bins containing a given number of genes and TGF-β responding genes. The orange line highlights the bins in which all genes responded to TGF-β treatment. (B) Linear and radial distribution functions of TGF-β-responding genes within two representative HMM classes. We calculated the distribution of genes by sampling three types of gene groups: all available genes (All genes, and ), genes within an indicated HMM group (HMM genes, and ), and genes that showed differential expression during TGF-β treatment (DE genes, and ). For spatial distance, with the shell width Δr approx 60 nm. (C) Pseudo-spatial arrangement of genes belonging to two representative HMM classes, respectively. Each circle indicates a 1-Mb bin. The gray level in a circle scales to the number of genes in the bin that belongs to the indicated HMM class. The two-dimensional spatial arrangement of bins within one chromosome was calculated by a fast-greedy algorithm based on the contact frequencies between each pair of bins from Hi-C data. The line width between two circles is proportional to the contact frequencies between the two corresponding bins. Genes within each red box are within ~ 300 nm in space. (D) Relative gene densities of all HMM classes within the first shell of the radial distributions normalized by the average density of all genes around the targeted genes in the first shell (i.e., and ).

Fig 5 Mouse ESC-CN system shows a similar enhanced tendency of physical proximity for co-regulated genes with similar expression patterns.

(A) Schematic diagram of the development from ESC to CN cells. (B) Dynamic regulatory map obtained through the DREM2 analysis. (C) Heat map of intra- (diagonal) and inter- (off-diagonal) HMM class gene densities within the first shell of radial distribution relative to the corresponding densities of all genes (as control) in ESC cells or CN cells (i.e., ). (D) Gene densities of all HMM classes within the first shell of radial distribution relative to the corresponding densities of all genes (i.e., ) in ESC cells or CN cells.

https://doi.org/10.1371/journal.pcbi.1006786.s005

(DOCX)

References

  1. 1. Thiery JP, Acloque H, Huang RYJ, Nieto MA. Epithelial-mesenchymal transitions in development and disease. Cell. 2009;139(5):871–90. pmid:19945376
  2. 2. Hanahan D, Weinberg RA. Hallmarks of cancer: The next generation. Cell. 2011;144(5):646–74. pmid:21376230
  3. 3. Alon U. Network motifs: theory and experimental approaches. Nat Rev Genet. 2007;8(6):450–61. pmid:17510665
  4. 4. Zaslaver A, Mayo AE, Rosenberg R, Bashkin P, Sberro H, Tsalyuk M, et al. Just-in-time transcription program in metabolic pathways. Nat Genet. 2004;36(5):486–91. pmid:15107854
  5. 5. Phillips T. Regulation of transcription and gene expression in eukaryotes. Nat Educat. 2008;1(1):199.
  6. 6. Atlasi Y, Stunnenberg HG. The interplay of epigenetic marks during stem cell differentiation and development. Nat Rev Genet. 2017;18(11):643–58. pmid:28804139
  7. 7. Massague J. TGFβ signalling in context. Nat Rev Mol Cell Biol. 2012;13:616–30. pmid:22992590
  8. 8. Ruscetti FW, Akel S, Bartelmez SH. Autocrine transforming growth factor-β regulation of hematopoiesis: many outcomes that depend on the context. Oncogene. 2005;24(37):5751–63. pmid:16123808
  9. 9. Strasen J, Sarma U, Jentsch M, Bohn S, Sheng C, Horbelt D, et al. Cell-specific responses to the cytokine TGFβ are determined by variability in protein levels. Mol Syst Biol. 2018;14(1):e7733. pmid:29371237
  10. 10. Meng X-m, Nikolic-Paterson DJ, Lan HY. TGF-β: the master regulator of fibrosis. Nat Re Nephrol. 2016;12(6):325–38.
  11. 11. Gaspard N, Bouschet T, Hourez R, Dimidschstein J, Naeije G, Van den Ameele J, et al. An intrinsic mechanism of corticogenesis from embryonic stem cells. Nature. 2008;455(7211):351–57. pmid:18716623
  12. 12. Evans MJ, Kaufman MH. Establishment in culture of pluripotential cells from mouse embryos. Nature. 1981;292(5819):154–56. pmid:7242681
  13. 13. Bonev B, Mendelson Cohen N, Szabo Q, Fritsch L, Papadopoulos GL, Lubling Y, et al. Multiscale 3D genome rewiring during mouse neural development. Cell. 2017;171(3):557–72.e24. pmid:29053968
  14. 14. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7(3):562–78. pmid:22383036
  15. 15. Schulz MH, Devanny WE, Gitter A, Zhong S, Ernst J, Bar-Joseph Z. DREM 2.0: Improved reconstruction of dynamic regulatory networks from time-series expression data. BMC Syst Biol. 2012;6(1):104.
  16. 16. Clauset A, Newman ME, Moore C. Finding community structure in very large networks. Phys Rev E. 2004;70(6):066111.
  17. 17. Lesne A, Riposo J, Roger P, Cournac A, Mozziconacci J. 3D genome reconstruction from chromosomal contacts. Nat Methods. 2014;11(11):1141–3. pmid:25240436
  18. 18. Chandler D. Introduction to modern statistical mechanics. Introduction to Modern Statistical Mechanics, Oxford University Press, 1987
  19. 19. Zhang J, Tian X-J, Zhang H, Teng Y, Li R, Bai F, et al. TGF-β–induced epithelial-to-mesenchymal transition proceeds through stepwise activation of multiple feedback loops. Sci Signal. 2014;7(345):ra91. pmid:25270257
  20. 20. Treutlein B, Brownfield DG, Wu AR, Neff NF, Mantalas GL, Espinoza FH, et al. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature. 2014;509(7500):371–5. pmid:24739965
  21. 21. Heldin C-H, Landström M, Moustakas A. Mechanism of TGF-β signaling to growth arrest, apoptosis, and epithelial–mesenchymal transition. Curr Opin Cell Biol. 2009;21(2):166–76. pmid:19237272
  22. 22. Ke X-S, Qu Y, Cheng Y, Li W-C, Rotter V, Øyan AM, et al. Global profiling of histone and DNA methylation reveals epigenetic-based regulation of gene expression during epithelial to mesenchymal transition in prostate cells. BMC genomics. 2010;11(1):669.
  23. 23. Messier TL, Gordon JAR, Boyd JR, Tye CE, Browne G, Stein JL, et al. Histone H3 lysine 4 acetylation and methylation dynamics define breast cancer subtypes. Oncotarget. 2016;7(5):5094–109. pmid:26783963
  24. 24. Kimura H. Histone modifications for human epigenome analysis. J Hum Genet. 2013;58(7):439–45. pmid:23739122
  25. 25. Boland MJ, Nazor KL, Loring JF. Epigenetic regulation of pluripotency and differentiation. Circ Res. 2014;115(2):311–24. pmid:24989490
  26. 26. Bar-Joseph Z, Gitter A, Simon I. Studying and modelling dynamic biological processes using time-series gene expression data. Nat Rev Genet. 2012;13(8):552–64. pmid:22805708
  27. 27. Nagase H, Visse R, Murphy G. Structure and function of matrix metalloproteinases and TIMPs. Cardiovas Res. 2006;69(3):562–73.
  28. 28. Soikkeli J, Podlasz P, Yin M, Nummela P, Jahkola T, Virolainen S, et al. Metastatic outgrowth encompasses COL-I, FN1, and POSTN up-regulation and assembly to fibrillar networks regulating cell adhesion, migration, and growth. Am J Pathol. 2010;177(1):387–403. pmid:20489157
  29. 29. Hulpiau P, Van Roy F. Molecular evolution of the cadherin superfamily. Int J Biochem Cell Biol. 2009;41(2):349–69. pmid:18848899
  30. 30. Barutcu AR, Lajoie BR, McCord RP, Tye CE, Hong D, Messier TL, et al. Chromatin interaction analysis reveals changes in small chromosome and telomere clustering between epithelial and breast cancer cells. Genome Biol. 2015;16:214. pmid:26415882
  31. 31. Phanstiel DH, Van Bortle K, Spacek D, Hess GT, Shamim MS, Machol I, et al. Static and dynamic DNA loops form AP-1-bound activation hubs during macrophage development. Mol Cell. 2017;67(6):1037–48. pmid:28890333
  32. 32. Weintraub AS, Li CH, Zamudio AV, Sigova AA, Hannett NM, Day DS, et al. YY1 Is a structural regulator of enhancer-promoter loops. Cell. 2017;171(7):1573–88. pmid:29224777
  33. 33. Fraser J, Ferrai C, Chiariello AM, Schueler M, Rito T, Laudanno G, et al. Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation. Mol Syst Biol. 2015;11(12):852. pmid:26700852
  34. 34. Lupiáñez DG, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E, et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell. 2015;161(5):1012–25. pmid:25959774
  35. 35. Hnisz D, Abraham BJ, Lee TI, Lau A, Saint-André V, Sigova AA, et al. Super-enhancers in the control of cell identity and disease. Cell. 2013;155(4):934–47. pmid:24119843
  36. 36. Sack LM, Davoli T, Li MZ, Li Y, Xu Q, Naxerova K, et al. Profound tissue specificity in proliferation control underlies cancer drivers and aneuploidy patterns. Cell. 2018;173(2):499–514. pmid:29576454
  37. 37. Ong C-T, Corces VG. Enhancer function: new insights into the regulation of tissue-specific gene expression. Nat Rev Genet. 2011;12(4):283–93. pmid:21358745
  38. 38. Krijger PHL, Di Stefano B, de Wit E, Limone F, Van Oevelen C, De Laat W, et al. Cell-of-origin-specific 3D genome structure acquired during somatic cell reprogramming. Cell Stem Cell. 2016;18(5):597–610. pmid:26971819
  39. 39. Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, et al. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013;153(2):307–19. pmid:23582322
  40. 40. Zoller B, Little SC, Gregor T. Diverse Spatial Expression Patterns Emerge from Unified Kinetics of Transcriptional Bursting. Cell. 2018;175(3):835–47. pmid:30340044
  41. 41. Hnisz D, Schuijers J, Lin CY, Weintraub AS, Abraham BJ, Lee TI, et al. Convergence of developmental and oncogenic signaling pathways at transcriptional super-enhancers. Mol Cell. 2015;58(2):362–70. pmid:25801169