A Discriminative Approach for Unsupervised Clustering of DNA Sequence Motifs

Figure 1

Intra-class alignments cover a higher fraction of motif information than inter-class alignments.

(A) Example alignments illustrate the information coverage (IC) criterion. Depicted are m2match outputs of an intra-class alignment for two TFs of the BHLH class E47 and MyoD (top) and an inter-class alignment for the E47 motif and the PFM of MADS transcription factor RSRF (bottom). (B) Histograms of IC values observed in intra-class and inter-class alignments. Alignments were selected using the Euclidean distance (ED) score and information coverage was calculated using the sqr formula (Material and Methods). In total there were 436080 inter-class and 64420 intra-class alignments. Intra-class alignments showed a tendency for higher IC than inter-class alignments and specifically exhibited a pronounced peak at high IC values which is absent in the inter-class distribution.

