Transcription Factor Binding Profiles Reveal Cyclic Expression of Human Protein-coding Genes and Non-coding RNAs

doi:10.1371/journal.pcbi.1003132

Figure 1.

Schematic diagram of our analysis for predicting human cell cycle genes.

The predictive model integrates three types of data from microarray, ChIP-seq experiments and computational TF binding motif analysis.

More »

Expand

Figure 2.

Regulator scores of TFs on genes can discriminate cell cycle (CC) versus non-cell cycle (non-CC) genes.

(A) Distributions of regulatory scores for CMYC and E2F1 are significantly different between CC and non-CC genes (P = 2e-55 and P = 1e-50, respectively). (B) The average signals of CMYC and E2F1 show similar distributions between CC and non-CC genes (P = 0.03 and P = 0.05, respectively) (C) The t-scores for CC versus non-CC genes calculated by comparing regulatory scores and average signals of TFs. SYDH, UTA and HAIB are the Lab IDs of a dataset.

More »

Expand

Figure 3.

Statistical models for predicting cell cycle genes using Random Forest method.

(A) The ROC curves for 3 classification models that use TF-only, motif-only features or a combination of them as predictors. (B) The relative importance (measured as MDG, Mean Decrease in Gini coefficient) of TF features in the combined model (TF+Motif). (C) The relative importance of motif features in the combined model. (D) The change of prediction accuracy (measured as AUC scores) when remove the most important predictor from the full model one by one. Note that cell cycle genes in the training data are from data in Hela cells, and thus we use only TF binding data from the same cell line in our model.

More »

Expand

Figure 4.

Tissue specificity of cell cycle predictive models.

(A) The ROC curves when TF binding data from different cell lines are used as predictors in the combined model. (B) Similar to (A), but results are from TF-only model. (C) The regulatory scores of E2F4 on Hela cell cycle genes in HelaS3 versus K562 cells. Note that a small subset of genes shows strong E2F4 binding only in Hela cells. (D) E2F4 regulates overlapping but different target genes in HelaS3 versus K562. (C) and (D) are based on ENCODE ChIP-seq data.

More »

Expand

Figure 5.

Prediction of phase specific cell cycle genes.

(A) ROC curves of models that classify cell cycle genes at specific phase against non-cell cycle genes. (B) The relative importance of different TF features in the 5 phase specific models.

More »

Expand

Figure 6.

Predicted cell cycle genes are more likely to interact with cell cycle partner in protein-protein interaction network.

(A) the average number partners; (B) the average number of cell cycle partners; (C) the average percentage of cell cycle partners. Note all known cell cycle genes are excluded from the predicted cell cycle gene set. The P-values for difference in numbers of partners or cell cycle partners between two gene classes are calculated by Chi-squared test.

More »

Expand

Figure 7.

Prediction of cell cycle related promoters.

Model is applied to ∼138,000 GENCODE annotated promoters to identify novel cell cycle genes of different types. (A) The number of cell cycle related genes identified the model when different threshold is used. The precision (1-FDR) is shown as the increasing grey line. (B) The percentage of different types of genes that are predicted to be cell cycle related at threshold of 0.7 (Prob>0.7). FDR: false discovery rate.

More »

Expand