Differential contribution to gene expression prediction of histone modifications at enhancers or promoters

doi:10.1371/journal.pcbi.1009368

Fig 1.

Identification of repressed and active functional regions in ESCs.

(A) State definition of the chromatin segmentation model in ESCs. The values represent the probability (from 0 to 1) of finding each histone modification (vertical) in genomic segments of states 1 to 9 (horizontal). The cells of the matrix are colored according to the value of probability they contain inside. Red: states with histone modifications associated to activation (active, 1–4); dark yellow, H3K4me1-only state (Intermediate, 5); grey, states in which H3K27me3 was present (repressed, 6–8); dark grey, poised states, in which H3K27me3 colocalized with H3K4me3 and/or H3K4me1 (states 6 and 7); light grey, H3K27me3-only regions (state 8); and white, unmarked state (9). (B) Example of a genomic region containing two expressed genes (Skap2 and Halr1), which are covered by active states (in red), and a cluster of repressed genes (HoxA), which are covered by repressed states (in grey). Active chromatin segments integrate the signal of H3K27ac, H3K4me3, and H3K4me1 and lack H3K27me3. Repressed chromatin segments integrate the signal of H3K27me3, H3K4me3, and H3K4me1 and lack H3K27ac. Expression of Skap2 and Halr1, and silencing of HoxA genes, were confirmed by the RNA-seq profiles [23]. Y-axis represents normalized count of reads by total reads. The screenshot was taken from the UCSC Genome Browser [62]. (C) Enrichment of state transitions (e.g., number of observed transitions divided by the number of expected transitions by chance) from the segments of one state (vertical) towards the segments of another state (horizontal) in the linear chromatin. The cells of the matrix are colored according to the value of enrichment they contain inside. (D) Expression of genes associated to active promoters (AP; 10,786 genes) or bivalent promoters (BP; 3,459 genes). The dotted line represents 1 FPKM. (E) Top GO biological process (2018 categories) for each list of genes in D.

More »

Expand

Fig 2.

Performance and variable importance of enhancer and promoter Hi-C-top predictive models in ESCs.

Predicted expression of the test subset of genes calculated by the models versus their measured expression by RNA-seq. Model performances are represented by the Pearson’s correlation (r) between predicted and measured expression values. (A) Left, the model trained on the promoter regions associated to at least one enhancer using the top significant interactions of Hi-C (Hi-C–top promoter model). Right, the performance of the same model after randomizing the expression of the training subset of genes. The color bar represents the density of dots. (B) Left, the model trained on the enhancer regions associated to at least one promoter using the top significant interactions of Hi-C (Hi-C–top enhancer model). Right, the performance of the same model after randomizing the expression of the training subset of genes. The color bar represents the density of dots. (C) Importance of each histone modification used to train the Hi-C–top promoter predictive model. Importance is defined as the contribution of each variable in the linear regression predictive model and corresponds to the absolute value of the t-statistic for each model parameter. (D) As for C, but for the Hi-C–top enhancer predictive model. (E) As for B, but the model is trained without H3K27me3 as predictive variable. (F) As for D, but the model is trained without H3K27me3 as predictive variable.

More »

Expand

Fig 3.

RNA-seq and ChIP-seq data before and after LOESS normalization.

(A) Raw and normalized expression of 3,277 housekeeping genes along cardiac and neural differentiation from ESCs. (B) Raw and normalized expression of 3,459 bivalent genes along the same time points as A. (C) Raw and normalized H3K4me3 ChIP-seq signal levels at all 2-Kb bins of the genome. (D) Raw and normalized H3K4me3 ChIP-seq signal levels at 3,344 BPs. (E, F), same as C and D, respectively, but for H3K27me3. CM, cardiomyocytes; CN, cortical neurons; CP, cardio precursors; MES, mesoderm; NPC, neural precursors.

More »

Expand

Fig 4.

PE models trained in differentiation time points.

(A) Performance of each differentiation enhancer model on the rest of the differentiation time points as compared to performance over random models. Performance is represented as Pearson’s correlation (r) between predicted expression and measured expression. Significance was assessed using a paired Student’s t-test between the performance of the models and the performance of the random models paired by the differentiation test set (****p < 0.0001, ***p < 0.001, **p < 0.01, *p < 0.05). CM, cardiomyocytes; CN, cortical neurons; CP, cardio precursors; MES, mesoderm; NPC, neural precursors. (B) Importance of the histone modifications for each differentiation enhancer model. Importance is defined as the contribution of each variable in the linear regression predictive model and corresponds to the absolute value of the t-statistic for each model parameter.

More »

Expand

Table 1.

Performance of each PE differentiation model at every differentiation time point.

More »

Expand

Table 2.

Performance of each BP differentiation model at every differentiation time point.

More »

Expand

Fig 5.

PE models trained using developmental stages.

(A) Performance of each differentiation PE model on the rest of the developmental stages as compared to the performance over the random models. Performance is represented as Pearson’s correlation (r) between predicted expression and measured expression. Significance was assessed using a paired Student’s t-test of the performance of the models or of the random models paired by a differentiation test set (****p < 0.0001, ***p < 0.001, **p < 0.01, *p < 0.05). (B) Importance of histone modifications for each development PE model. Importance is defined as the contribution of each variable in the linear regression predictive model and corresponds to the absolute value of the t-statistics for each model parameter. Heart10.5, heart tissue from 10.5 embryonic day; Kidney14.5, kidney tissue from 14.5 embryonic day; Liver11.5, liver tissue from 11.5 embryonic day; Lung15.5, lung tissue from 15.5 embryonic day; NeuralTube12.5, neural tube tissue from 12.5 embryonic day.

More »

Expand

Table 3.

Comparative analysis of other methodologies to predict gene expression in the Hi-C-top dataset.

More »

Expand

Table 4.

Information on ChIP-seq experiments produced in this study.

More »

Expand