Genome-wide prediction of topoisomerase IIβ binding by architectural factors and chromatin accessibility

doi:10.1371/journal.pcbi.1007814

Fig 1.

Chromatin landscape of TOP2B.

A. Genome browser view for TOP2B ChIP-seq signal and other relevant chromatin features in mouse liver. B. Average reads enrichment of selected chromatin features within ±4 kb of TOP2B binding center (solid lines) and random regions (dashed lines) in mouse liver (blue) and MEFs (red). Signals were smoothed using nucleR [50]. Left and right sided ordinate axes correspond to mouse liver and MEFs, respectively. C. Overlap frequencies of chromatin features at TOP2B binding sites as compared to random regions in mouse liver and MEFs. D. Top panel shows relative frequency of several dinucleotides within ±500 bp of TOP2B binding center and random regions. Bottom panel shows average of 3D DNA shape features within ±1 kb of TOP2B binding center and random regions. Profiles corresponding to MEFs and liver are colored as in C.

More »

Expand

Fig 2.

Machine learning schema for the prediction of TOP2B binding.

TOP2B binding sites and random regions were first identified. Then, 15 high-throughput sequencing experiments together with DNA sequence and shape features were scored around such regions, which resulted a data matrix with rows representing TOP2B/random sites and columns representing the scored features. Finally, binary classifiers were trained and tested using 5 fold cross-validation and feature selection was applied to identify the most informative features.

More »

Expand

Fig 3.

Chromatin features predict TOP2B binding.

ROC curves and AUC values for Support Vector Machine models trained on the indicated sets of features (for Naive Bayes models, see S4 Fig).

More »

Expand

Table 1.

Performance of SVM and NB classifiers in MEF.

More »

Expand

Table 2.

Performance of SVM and NB classifiers in mouse liver.

More »

Expand

Table 3.

Performance of SVM and NB classifiers using the features selected by FCBF and SS strategies.

More »

Expand

Fig 4.

Three features accurately predict TOP2B.

A. Top features selected by Fast Correlation Based Filter and Scatter Search algorithms. For each histogram and feature, the white bar height indicates the frequency of selection and the black bar height is the Symmetrical Uncertainty (SU) value with respect to the class (TOP2B). Indexes of DNA shape parameters indicate position within the corresponding parameter vector associated to the 300 bp width of modeled TOP2B binding sites (see Materials and methods). B. Summary of the most selected features by both algorithms. Top and bottom aligned dots indicate selection of a given feature by the corresponding selection algorithm in liver and MEFs, respectively. In the middle, the SU of each feature is displayed. Only the top fifteen features according to their SU are shown, which happen to match in both systems. C. ROC curves and AUC values for Naive Bayes models trained on either MEF, liver or activated B cells and applied to the three systems (for Support Vector Machine and Random Forests models, see S4 Fig). Only DNase-seq, RAD21 and CTCF binding data were used for training.

More »

Expand

Table 4.

Performance of cross system predictions using models trained with DNase-seq, RAD21 and CTCF.

Errors are only shown for models trained and tested with the same system due to test data partition.

More »

Expand

Fig 5.

TOP2B predictive and experimental tracks in mouse liver, MEFs and activated B cells (training systems).

From top to bottom, genome browser view of TOP2B ChIP-seq, TOP2B virtual track, CTCF ChIP-seq, RAD21 ChIP-seq and DNase-seq are displayed for each system.

More »

Expand

Fig 6.

Validation of TOP2B model in mouse thymocytes and human MCF7.

A. TOP2B predictive and ChIP-seq tracks of mouse thymocytes in a selected region of the mouse genome. Identified peaks are displayed above the signals. B. ROC curves and AUC values for the prediction of TOP2B peaks detected using Novus and Santa Cruz antibodies. C. Venn diagram showing the overlaps between predicted TOP2B peaks and the two sets of experimental peaks. D. Heatmap representations of TOP2B ChIP-seq reads enrichment within ± 8 kb of several set of peaks: predicted and confirmed by HOMER calls of both antibodies (A1-A2-P), predicted and confirmed only by Novus antibody (A1-P) or Santa Cruz (A2-P), only predicted (P-only), only detected by Novus (A1-only) and only detected by Santa Cruz (A2-only). For illustration purposes, the same number of randomly selected peaks is represented in all the heatmaps. E. TOP2B predictive and ChIP-seq tracks of MCF7 in a selected region of the human genome. Identified peaks are displayed above the signals. F. ROC curves and AUC values for the prediction of TOP2B peaks in two replicates performed in MCF7. G. Venn diagram showing the overlaps between predicted TOP2B peaks and the two sets of experimental peaks. F. Heatmap representations of TOP2B ChIP-seq reads enrichment within ± 8 kb of several set of peaks: predicted and confirmed by HOMER calls of both replicates (R1-R2-P), predicted and confirmed only by the first (R1-P) or the second replicate (R2-P), only predicted (P-only), only detected by the first replicate (R1-only) and only detected by the second replicate (R2-only). As in D, the same number of randomly selected peaks is represented in all the heatmaps.

More »

Expand