Assessing Computational Methods for Transcription Factor Target Gene Identification Based on ChIP-seq Data

doi:10.1371/journal.pcbi.1003342

Figure 1.

Overview of the scoring procedure.

Target gene scoring consists of three steps: (1) peak-to-gene assignment, (2) peak scoring and (3) integration of individual peak scores. Black arrow indicates transcription start site (TSS) of the gene that is to be scored. Grey arrow indicates a TSS of another gene (currently not scored). Red color indicates peaks that are assigned to the evaluated (black) gene; grey peaks are not assigned to this TSS by the given peak-to-gene assignment method. Blue and yellow peaks are peaks of other TFs that might be used to score the functionality of binding sites. See Table 1 for details about the alternative scoring options.

More »

Expand

Table 1.

Characterization of TF-target prediction methods.

More »

Expand

Figure 2.

Evaluation of target scoring methods using genomic expression data.

Overlap of the top 500 targets with the top 500 genes differentially expressed in (A) HemoChIP and (B) ESChIP TF perturbation experiments. Overlap of the top 500 targets with the top 500 genes differentially expressed (C) between erythroid and myeloid cells or (D) between undifferentiated (ES) and differentiated (MEF) cells. Distributions of those normalized values are shown.

More »

Expand

Figure 3.

Functional homogeneity of targets.

Number of significantly enriched GO terms specific for a given cellular system and specific for the opposite cellular system for HemoChIP (A) and ESChIP (B). The specific terms are hematopoiesis or embryonic development related GO terms for HemoChIP and ESChIP, respectively.

More »

Expand

Figure 4.

Consistency of target gene predictions.

Independent ChIP-seq experiments are available for some of the factors measured in the hematopoietic system. Consistency of target predictions is quantified as the overlap between the top 500 target genes. Results are summarized based on intersecting targets from pairs of ChIP-seq experiments measuring the same transcription factor (‘same TF’), using ChIP-seq experiments from different factors, but the same system (hematopoietic cells, ‘HemoChIP’) and using ChIP-seq experiments from a different system (ES cells, ‘ESChIP’). Numbers on the right are rounded p-values, measuring the significance of the difference between the overlaps (t-test). P-values for the comparison ‘same TF’ versus ‘HemoChIP’ are generally less significant than ‘HemoChIP’ versus ‘ESChIP’, because the number of comparisons (observations) is smaller.

More »

Expand

Figure 5.

Gene density in target regions.

Gene dense regions tend to contain more binding events (peaks) than gene sparse regions of the genome. The figure shows gene density (number of genes inside 1 Mb regions around the target gene's TSS) of the regions harboring the top 500 genes across the studies in the (A) HemoChIP and (B) ESChIP datasets.

More »

Expand