Identification of gene specific cis-regulatory elements during differentiation of mouse embryonic stem cells: An integrative approach using high-throughput datasets
Fig 1
Gene-specific predictive models.
(A) Schematic representation of the methodology involved in developing gene specific predictive models. 1. Integration of DNaseI-seq and H3K27ac to quantify the chromatin activity profile (CAP) in candidate cis regulatory elements (CREs). TF ChIP-seq data is used to generate the transcription factor binding profile (TFBP) to quantify the community effect of candidate CREs mapped to a specific gene. 2. Gene wise expression values are obtained as RPKMs to form gene expression profiles (GEPs). 3. CAPs, TFBPs and GEPs are generated for all the regions and genes in the analysis. 4. CAP and TFBP are integrated in order to generate gene specific CRE networks. A greedy community detection is performed in order to identify the communities of CREs (coCREs) in the networks. A new set of CAPs involving aggregate CAPs of the coCREs along with the individual CAPs for singleton CREs are used to predict the GEP for a specific gene. (B) Histogram showing the distribution of candidate CREs per gene within 100kB of the transcription start site over all genes in the study. (C) The plot shows the change in cross-validated Mean Squared Error (MSE) as a function of increasing λ for a predictive model of Runx1 gene expression. The two vertical dotted lines show the two cut offs λmin and λ1se. The total number of CREs with non-zero coefficients (β) at a given λ is shown above the plot.