Global Quantitative Modeling of Chromatin Factor Interactions

doi:10.1371/journal.pcbi.1003525

Figure 1.

Schematic overview of chromatin factor interaction maximum entropy model.

Chromatin factor patterns were extracted from ChIP data by binning and thresholding algorithms (Methods). We then learned a maximum entropy model that estimates the distribution of chromatin factor patterns by low-order (pairwise or pairwise & triplet) interactions. The model was then applied to prediction of chromatin factor interactions and performing context-based prediction of chromatin profiles.

More »

Expand

Figure 2.

Maximum entropy model accurately predicts high-order chromatin factor pattern frequencies in the data.

For evaluation we predicted frequencies of chromatin factor combinatorial binding patterns each involving either randomly selected 10 chromatin factors (A) or all 73 chromatin factors (B) from model, and compared against observed frequencies in test set data. Red dots represent estimations from maximum entropy model with up to 3^rd order interactions learned with regularization and fine-tuning; gray dots show estimations from independent Bernoulli model. Independent Bernoulli model assumes independence between occurrences of different chromatin factors. The diagonal line is the identity line.

More »

Expand

Figure 3.

Model provided accurate predictor for experimentally validated chromatin factor interactions.

(A) Heatmap visualization of maximum entropy model pair-wise interaction energy scores (upper right) compared with correlation z-scores (lower left); The heatmap is ordered to position positive interactions close to diagonal so positively interacting factors tend to be adjacent to each other. H3K23ac-, H1-, H3-, H4- represent the depletion of these factors respectively. For comparison with interaction scores, correlations were transformed to z-scores by Fisher transformation and rescaled to make standard deviation equal to the standard deviation of pair-wise interaction energy scores. In figure legend, the corresponding correlation (left) and interaction energy score (right) at each z-score level is shown. The interaction energy score prediction is robust to changing bin size in data processing (Figure S2). (B) Precision-recall curves for predicting known interactions. Precision-recall curve shows the performance of using interaction energy score to classify interaction at all thresholds. Precision-recall curve of L1-regularized pair-wise interaction maximum entropy model interaction energy scores (red, solid) is compared to unregularized pair-wise interaction maximum entropy model interaction energy scores (black, solid), Bayesian network bootstrap score (black, dashed), Pearson correlation coefficients (grey, dashed), Partial correlation (grey, solid), and mutual information (grey, dot). Maximum entropy models, Bayesian network model and mutual information are computed on discretized data, while correlation and partial correlation were computed on continuous data without discretization.

More »

Expand

Table 1.

Comparison of different models on hold-out data.

More »

Expand

Figure 4.

Pair-wise interaction network organization structure of chromatin factors.

Each node represents a chromatin factor and each edge represents a pair-wise interaction. Edge color indicates sign and strength of interaction energy score (red indicates positive interaction while blue indicates negative interaction). Only interactions with interaction energy score are shown.

More »

Expand

Table 2.

Top 20 predicted positive pairwise interactions based on pairwise interaction model with regularization.

More »

Expand

Figure 5.

Context-based intra- and inter-cell type chromatin factor profile predictions achieve high overall performance.

(A) Prediction performances on hold-out chromatin factor profiles based on partial data and chromatin model. Chromatin factor profile predictions are compared with observed chromatin profiles using receiver operating characteristics (ROC) that shows true positive rate (y-axis) and false positive rate (x-axis) at full range of prediction thresholds. The diagonal line (dashed) shows expected performance of random classifier. The histogram shows frequency distribution of area under ROC curves (AUC). (B) Comparison of predicted and observed S2 cell H3K18ac chromatin profile. ChIP profile is visualized as the space-filling Hilbert curve as in [11], therefore adjacent genomic locations are also close to each other in this 2D representation. Predicted profile based on BG3 cell model is colored yellow, with darker color showing higher probability; Observed binarized profile is colored blue; Overlap between predicted and observed profile is therefore green. H3K18ac is an example chromatin factor which cannot be accurately inferred from any other single chromatin profile (the highest correlation coefficient with H3K18ac is 0.37). (C, D) Comparison of inter-cell type versus intra-cell type chromatin profile prediction performances. Performance is measured by AUC. ‘->’ indicates which cell lines the model is trained for and tested on, e.g. S2->BG3 represents predicting BG3 cell data with model trained on S2 data.

More »

Expand

Figure 6.

Extended chromatin model well predicts actively transcribed genomic regions.

Predictions are compared with precision-recall curve (A) and receiver-operating characteristics (ROC) curve (B) at full range of prediction thresholds.

More »

Expand