Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Figure 1.

Data set used for validation of domain-level TF-DNA specificities.

The top portion contains gene names, UniPROBE identifiers, and truncated position weight matrices for domain-identical transcription factor pairs (test set). The bottom portion contains completely-identical transcription factor pairs with replicate PBM data (control set). PID is the percent identity between the insert sequences of the transcription factor pairs used in the PBM experiments. Sequence logos were created using WebLogo [21].

More »

Figure 1 Expand

Figure 2.

The number of position weight matrices for select organisms before and after homology mapping.

The number of matrices that are initially associated with each organism is compared to the number following mapping of transcription factors with completely-identical sequences, as well as the increase following identical DNA binding domain-level mapping for the (A) JASPAR, (B) TRANSFAC, and (C) JASPAR & TRANSFAC databases. The JASPAR and TRANSFAC databases initially contained PWMs from 124 different species, compared to 1578 species following domain-level homology mapping. In particular, significantly increased PWM coverage is possible through domain-level mappings for the open-access JASPAR database.

More »

Figure 2 Expand

Figure 3.

The number of unique transcription factors with position weight matrices (PWMs) resulting from domain-level homology mappings that did not previously have any associated PWMs.

The number of unique factors resulting from mapping between completely-identical sequences is compared to the number of factors resulting from identical DNA binding domain-level mapping for the (A) JASPAR, (B) TRANSFAC, and (C) JASPAR & TRANSFAC databases. The number in parenthesis above each bar is the percentage increase above the initial annotated total number of unique transcription factors with PWMs. Significantly increased species-associated transcription factor coverage is enabled by domain-level mappings rather than the typical restriction to complete sequence matches.

More »

Figure 3 Expand

Figure 4.

Spearman correlation coefficients (

) for position weight matrix (PWM) scanning of transcription factor pairs and their accompanying experimental protein binding microarray (PBM) fluorescence intensities. Transcription factor pair groupings, as in Figure 1, were cross scans of completely-identical pairs (CCI), cross scans of domain-identical pairs (CDI), self scans of completely-identical pairs (SCI), and self scans of domain-identical pairs (SDI). Each point represents a PWM:PBM pairing as described in the Methods. The transcription factor Elf3 (UniPROBE identifiers UP00090 and UP00407) was an outlier with the lowest correlation coefficients. The lower correlation coefficients for these identifiers is likely due to the transcription factor Elf3 having two different DNA binding domains.

More »

Figure 4 Expand

Figure 5.

Self and cross Spearman correlation coefficients (

) between position weight matrix-based scores and experimental PBM fluorescence intensities. The blue points are the completely-identical and domain-identical transcription factor pairs of Figure 1. The alignment of blue points along the gray diagonal line demonstrates the comparable performance of PWMs derived from completely-identical and domain-identical transcription factor pairs, whereas the magnitude of is an indication of how well the PWM captures the DNA binding properties of the transcription factor. As a point of comparison, the correlation coefficients for all other pairwise sets of transcription factors were calculated. The green points below the gray diagonal are indicative of PWMs from other transcription factors that failed to capture the DNA binding properties in the PBM data. Green points near the diagonal resulted from other transcription factors within the same domain family (e.g., homeodomain) that have similar PWMs and, therefore, DNA binding properties. UniPROBE identifiers UP00017 and UP00389 were significantly outperformed by other PWMs in the data set (see text for details).

More »

Figure 5 Expand

Figure 6.

The distribution of Spearman correlation coefficients for the domain-identical PWM and all other PWMs from the same homeodomain family for each TF from the test set in Figure 1.

In each case, the correlation coefficient for the domain-identical PWM either clearly outperforms or is in the cluster of top performing PWMs, demonstrating that domain-identical PWMs capture the DNA sequence affinity and specificity of transcription factors better than considering the TF family alone.

More »

Figure 6 Expand

Figure 7.

Average precision curves, calculated as the number of top n position weight matrix-based scores and experimental PBM fluorescence intensities in common.

Precision curves were generate for cross scoring of completely-identical pairs (CCI), cross scoring of domain-identical pairs (CDI), self scoring of completely-identical pairs (SCI), and self scoring of domain-identical pairs (SDI) listed in Figure 1. The average precision is nearly exactly overlaying for CCI and SCI, as well as CDI and SDI, owing to the ability of self and cross PWM scans to equivalently capture the DNA binding properties in the PBM data. As with the Spearman correlation coefficients in Figure 4, the average precision for the domain-identical data set actually outperformed the completely-identical transcription factor pair scoring, reflecting the more challenging nature of the completely-identical data set (see text for details).

More »

Figure 7 Expand