Figure 1.
The structural hierarchy of WD40.
(A) WD40 domain in 2D scheme. The definitions of WD40 blades and WD40 repeats are different. A WD40 blade, highlighted in blue, is Sa-Sb-Sc-Sd. A WD40 repeat, highlighted in red, is Sd-Sa-Sb-Sc. (B) The classical tertiary structure of a WD40 domain. (C) Topology and structural features of a WD blade. Top, bottom and side surfaces and inner core part are drawn in different colors. The residues and corresponding dashed lines highlighted in red are involved in DHSW tetrad hydrogen bonded network. The residues in blue are involved in β-bulges. Normally, two β-bulges (WDb–a and WDc–d) exist in one WD40 blade.
Figure 2.
The scoring functions, the searching/optimization engines and the evaluation criteria are developed independently. The scoring functions and criteria are used in the later optimization procedures (dashed arrows). The criteria values are optimized based on the results and the performances of the engines (blue solid arrows).
Figure 3.
Secondary structure assignment of WD40 repeats based on the structural features.
The residues in β-bulges and the DHSW tetrad are shown in blue and red colors, respectively. These residues are aligned in a higher priority. The blocks with numbers are assigned to be residues in the β-strand secondary structure.
Figure 4.
Sequence logo of the WD40 repeat in which the heights of letters show the conservations of the residues at the position.
The total height of the letters represents the information entropy of the position. The secondary structure is depicted below. The positions highlighted by red asterisks are potential hotspots positions on the top face involved in the protein-protein interactions. The blue asterisks indicate the relatively conserved positions in the loops are included in the Saa in equation (3). The detailed residue frequencies in the sequence logo are listed in Table S4.
Figure 5.
Curve of R(Nrep), which regulates the repeat number in the generated domain.
Figure 6.
Flowchart of WDSP program.
Figure 7.
Percentage of true positive rate and false positive rate plotted versus the average score of repeats.
TPR-FPR is the difference between the true positives and false positives, which reaches the highest value as the average score of repeats is above 48.
Figure 8.
Accuracy of WD40 repeats detection by PROSITE, Pfam, SMART, UniProt and the jack-knife results of WDSP with the use of the loose and tight criteria.
The red bar represents the loose criterion: only containing Sa, Sb and Sc; while the blue bar represents the tight criterion: including all four strands.
Figure 9.
GOR4, PHD, PROF, SSpro, PSIPRED and WDSP are compared to predict the secondary structures of the 33 WD40s.
The secondary structure assignment by the structural element, DSSP and Stride are used as references.
Figure 10.
Jack-knife results versus the reproduction results in the 33 PDBs.
Table 1.
Evaluation of WDSP in predicting unknown proteins.
Table 2.
Comparison of five methods in detecting WD40 repeats/domains/proteins from Swiss-Prot database with sequence length less than 2000 residues.
Figure 11.
Repeat number distributions of WD40 proteins identified by PROSITE, Pfam, SMART, UniProt and WDSP from 271,654 proteins.
Figure 12.
The comparisons of predictions by WDSP.
(A) Comparison of WDSP with four methods in WD40s detection. (B) Comparison of WDSP and the combination of PROSITE, Pfam, SMART and UniProt in WD40s detection.
Figure 13.
Hydrogen-bonded triad formed by D488-H463-T484 in tau91.
In typical WD40s, F494 is always replaced by W or Y. Such triad is a special structural feature for WD40 protein family.
Figure 14.
Secondary structure prediction and repeat detection for LRRK2 protein by various methods.
Red, blue and yellow bars indicate the predicted α-helix, β-strand and WD40 repeat. For WDSP, each predicted β-strand is annotated with strand IDs. Among the competing repeat detection methods, only SMART gives one positive result.
Figure 15.
WDSP predicts the secondary structure of LRRK2 protein.
The strand IDs are depicted below. Residues in the yellow boxes are detected repeats. The starting and ending positions for each repeat are shown in the left.