Figure 1.
Illustration of Familial Binding Profile Construction
In this example, the binding motifs for four bZIP–CREB transcription factors are aligned in a multiple-motif alignment. The generalized familial binding profiles correspond to the weighted average of the individual profiles.
Table 1.
The Six Similarity Metrics Used in This Study for PSSM Column Similarity and Motif Alignments
Figure 2.
Distribution of the Observed Scores of Column-to-Column Comparisons for the Five Main Similarity Metrics
Columns are obtained from the TRANSFAC database [17]. The ALLR_LL distribution is identical to ALLR for every point ≥2 (unpublished data). Comparison of the JASPAR motif columns yielded similar results.
Figure 3.
Performance of the Five Main Similarity Metrics in Discriminating between Columns Sampled from Dirichlet Distributions around Information Content I and a Background Distribution
The plot shows the positive predictive rate for an FDR of 1% as a function of the information content.
Table 2.
The Top 15 and Bottom 15 Performing Alignment Strategies
Table 3.
Performance of TF Structural Family Classification Based on DNA-Binding Preferences in the Six Largest Motif Families in TRANSFAC
Figure 4.
Average Homogeneity of Families Represented at Each Tree Node as a Factor of the Growth of the Tree
Six scoring metrics and two different tree-building methods are tested with ungapped Smith–Waterman alignments.
Figure 5.
The Tree Resulting from a UPGMA Tree Construction of Ten JASPAR Families (71 Motifs Total) Using the PCC Scoring Metric and Smith–Waterman (Ungapped) Alignment Method
The red line represents the level at which the CHlog metric estimates the optimal number of data clusters on the tree.
Table 4.
The Four Statistics Tested on Automatic Clustering of the 71 JASPAR Motifs
Figure 6.
The Behaviour of the Calinski and Harabasz–Based Log-Metric (CHlog) for the Tree in Figure 5 as the Number of Clusters (g) Is Varied
The value of g = 17 produces a global minimum in the value of CHlog.
Figure 7.
The Tree Resulting from a UPGMA Tree Construction of 12 JASPAR Families (79 Motifs Total) Using the PCC Scoring Metric and Smith–Waterman (Ungapped) Alignment Method
This tree includes the two zinc-finger families (GATA and DOF).
Figure 8.
Optimal Number of Clusters of the 71 JASPAR Motifs, According to Our Method
PCC with Smith–Waterman ungapped alignment was used as a scoring function. Examples of protein–DNA complexes are provided for comparison.
Figure 9.
Similarity between the HMG and Forkhead Motifs
These families are grouped together on the HMG/Forkhead Group I cluster (Figure 8).