A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval
Figure 1
Overview of the challenges in DNA motif analysis.
(A) Identifying DNA binding motifs: Applying motif discovery algorithms to a group of related DNA sequences leads to the identification of putative transcription factor DNA binding sites. These algorithms output a set of DNA motifs, which are frequently redundant. To infer the correct transcription regulation map from the discovered motif set, it is crucial to reduce this redundancy and to relate the discovered motifs to known ones. (B) Reducing redundancy by clustering and merging motifs: A redundant set of DNA motifs can be reduced by clustering the motifs into groups of related ones and merging the motifs within each cluster. In this example, a redundant set of 16 DNA motifs (a partial output of several motif search algorithms) is clustered and merged to a final set consisting of three DNA motifs. (C) Relating motifs to known factors: The transcription factors that bind the newly discovered DNA motifs can be revealed based on similarities to previously defined motifs. In this example, comparison of a newly discovered motif to four known motifs reveals high similarity to the Gcn4 binding motif. From this comparison the transcription factor that binds the motif is identified with high probability.