Skip to main content
Advertisement

< Back to Article

Table 1.

Summary of predicted/validated non-globular segments and supporting evidence for the 18 SMART version 6 domains.

More »

Table 1 Expand

Figure 1.

Cumulative plots of SMART version 6 and Pfam release 23 problematic domains.

In SMART version 6, the total number of domains with predicted SP/TM segments peaks at 18, which made up 2.2% of 809 SMART domains (see top). Red triangles mark time points for the years 1998, 2002 and 2009 when the total number of domain models was 86, 600 and 809 respectively. In Pfam, the total number of problematic domains peaks at 1214, which made up 11.8% of 10340 Pfam domains (see bottom). Likewise, red triangles marked the years 1999, 2002 and 2008 with 1465, 3360 and 10340 Pfam entries respectively.

More »

Figure 1 Expand

Figure 2.

Histograms of average log probability per predicted transmembrane helix and per predicted signal peptide in Pfam release 23.

The top part shows the histogram of average log probability per predicted transmembrane helix; the bottom part shows the same per predicted signal peptide. The log probability provided on the x-axis is calculated with equations 5 and 6. At the TMcutoff of ≥−12 (false-positive rate 4.67%) and SPcutoff of ≥−1 (false-positive rate 4.02%), the number of predicted TM helices and signal peptides are 3849 and 164 respectively.

More »

Figure 2 Expand

Figure 3.

Average log probability plot of transmembrane helix and signal peptide predictions per domain.

The top part shows the average log probability per predicted transmembrane helix calculated per domain; the bottom part shows the same per predicted signal peptide. Whereas the y-axis shows the log probability in accordance with equation 6 applied over all predicted segments for a given domain, the x-axis represents their cumulative length. At the TMcutoff of ≥−12 and SPcutoff of ≥−1 (horizontal dashed lines), the number of problematic TM and SP domains are 1079 and 164 respectively. The total number of problematic domains is 1214 (1050 TM, 135 SP and 29 concurrent TM and SP).

More »

Figure 3 Expand

Figure 4.

Examples of domain architectures of false-positive HMM hits caused by TM helices in the fragment-mode search.

We show illustrative examples for six Pfam release 23 models: Herpes_glycop_D (PF01537.9), CDC50 (PF03381.7), Cation_ATPase_N (PF00690.18), GSPII_F (PF00482.11), PAP2 (PF01569.13) and HCV_NS4b (PF01001.11). The black boxes denote the problematic domain annotations in the respective sequences. Additional material such as hmmpfam outputs and alignments are available at the associated BII WWW site for this work. Domain architecture illustrations were created with DOG 1.5 [98].

More »

Figure 4 Expand

Figure 5.

Examples of domain architectures of false-positive HMM hits caused by TM helices/signal peptdes in the global-mode search.

Findings for nine Pfam release 23 models Pig-P (PF08510.4), PAP2(PF 01569.13), EMP24_GP25L (PF01105.15), PTPLA (PF04387.6), Lamp (PF01299.9), MttA_Hcf106 (PF02416.8), HAMP (PF00672.17), Nodulin_late (PF07127.3) and GRP (PF07172.3) are shown. The black boxes denote the problematic domain annotations in the respective sequences. Additional material such as hmmpfam outputs and alignments are available at the associated BII WWW site for this work. Domain architecture illustrations were created with DOG 1.5 [98].

More »

Figure 5 Expand

Table 2.

Unjustified annotation percentage of validated problematic domains in protein information resource (PIR) iproclass v3.74 (Global-mode search).

More »

Table 2 Expand

Figure 6.

Relationship between the gathering score and the corresponding E-value threshold for Pfam domain library release 23.

Whereas the y-axis shows the gathering score threshold (GA) for the global-mode search, x-axis shows the corresponding E-value threshold (in decimal log scale) calculated with the domain-specific extreme-value function with parameters provided in the corresponding HMM file (for an NR database size of 7365651 sequences) for this score. The upper plot represents the distribution for 9126 domains without detected SP/TM region, the middle part shows the same for the 1214 domains with SP/TM problems. Effectively, there is no clear correlation between gathering score and E-value threshold. If E-values close to 0.1 are considered significant, all dots should be close to the “−1” line (horizontal dashed lines) in this graph and, indeed, there is some agglomeration of data points in that area; yet, there are numerous outliers. Note that the E-values are computed using the equationwhere is the database size, and are the extreme value distribution (EVD) parameters of the domain model. The bottom plot depicts the histogram of the 10340 domains in Pfam rel.23. The median of all log E-values that corresponded to the domain-specific GAs is found to be −1.16. This translates to an E-value of 0.07.

More »

Figure 6 Expand

Figure 7.

Histograms of average log probability per predicted transmembrane helix for SCOP v1.75 α-proteins class and membrane protein class.

The top (average log probability per predicted transmembrane helix for SCOP v1.75 α-proteins class) and bottom (average log probability per predicted transmembrane helix for SCOP v1.75 membrane protein class) histograms represent the false-positive and true-positive distributions for TM predictions respectively. The total number of predicted structural and membrane helices is 2293 and 5592 respectively.

More »

Figure 7 Expand

Table 3.

FP and FN rates of TM predictions based on different TM cutoffs.

More »

Table 3 Expand

Figure 8.

Histograms of average log probability per predicted signal peptide for SCOP v1.75 α- and membrane protein class and SMART version 6.

The top (average log probability per predicted signal peptide for SCOP v1.75 α- and membrane protein class) and bottom (average log probability per predicted signal peptide for SMART version) histograms represent the false-positive and true-positive distributions for the SP predictions respectively. The total number of predicted signal peptides for SCOP α- and membrane proteins is 193 and 379 respectively, while the total number for SMART is 45. All except SM00817 Amelin (no available structure) were validated against their respective PDB entries.

More »

Figure 8 Expand

Table 4.

FP and FN rates of SP predictions based on different SP cutoffs.

More »

Table 4 Expand