Skip to main content
Advertisement

< Back to Article

Ab Initio Prediction of Transcription Factor Targets Using Structural Knowledge

Figure 5

Validation of DNA-Recognition Preferences

(A) The predicted binding site model of human Sp1 protein is compared to its known site (matrix V$SP1_Q6 from TRANSFAC [2], based on 108 aligned binding sites). To prevent bias by known Sp1 sites in our training data, the set of DNA-recognition preferences was estimated from the TRANSFAC data after removing all Sp1 sites.

(B) Scanning the 300-bp-long promoter of human dihydrofolate reductase (DHFR) by the predicted Sp1 binding model. The p-value of each potential binding site is shown (y-axis). Four positions achieved a significant p-value (higher than the horizontal red line), out of which three are known Sp1 binding sites [41] (arrows).

(C) A summary of in silico binding experiments for 21 pairs of Zinc Finger transcription factors and their target promoters. Shown is the tradeoff between false positive rate (x-axis) and true positive rate (y-axis) as the significance threshold for putative binding sites is changed. For every threshold point, our set of recognition preferences (EM) achieves better accuracy than the preferences of Mandel-Gutfreund et al. [5] (M-G) and Benos et al. [15] (SAMIE). Interestingly, when the DNA-recognition preferences were estimated from training data expanded to include TRANSFAC's artificial sequences, inferior results were obtained (dotted red line).

(D) Cumulative distribution of Sp1 scores among the sequences of targets/non-targets of unbiased chromatin immunoprecipitation scans of human Chromosomes 21 and 22 [16]. The predicted Sp1 motif appears in 45% of the experimentally bound sequences but in only 5% of the control sequences.

Figure 5

doi: https://doi.org/10.1371/journal.pcbi.0010001.g005