Figure 1.
Descriptions of the singlish-date, singlish-party, multi-date, and multi-party classes.
Descriptions for each type of hub are described below. The rows of the table represent the singlish and multi-interface hub proteins. The columns represent the date and party hubs. The intersection of the column and row displays a picture showing examples of the type and number of interfaces involved for that class.
Table 1.
Properties of singlish and multiple-interface yeast protein hubs.
Table 2.
Properties of date and party yeast protein hubs.
Figure 2.
Three-phase method to predict protein-binding proteins, hub proteins, singlish interface/multiple interface (SIH/MIH), and Date/Party hubs.
Phase I predicts if a protein physically binds with other proteins (protein-binding (PB) versus non-protein-binding (NPB)). If a protein is predicted to be a PB protein in Phase I, that protein is further classified in Phase II and Phase III. Phase II uses sequence similarity to determine the potential number of interaction sites for the input sequence and if that protein is likely to be a hub protein. Phase III applies methods for predicting both structural (singlish vs. multiple) and kinetic (date vs. party) classifications of protein hub proteins. All methods for each of the three phases make predictions from sequence alone.
Figure 3.
HybSVM is a two-stage machine learning method. The first step of the algorithm is to convert sequence data into a composition-based data representation (monomer, dimer, trimer, and tetramer). These four new data representations are used as inputs to 7 machine learning algorithms based on the NB(k) and NB k-gram approaches (Stage 1). An eighth method based on PSI-BLAST is applied to the original sequence data. The outputs of each of the eight outputs are converted into a binary vector of length 8. The resulting vector is used as input to a SVM to produce the final output (Stage 2).
Figure 4.
Receiver-operator characteristics (ROC) curve for Datasets 1, 3, and 4.
The curve describes the tradeoff between sensitivity and specificity at different thresholds for various predictors. A simple domain-based method is included as a baseline for comparison. The figure includes ROC curves for protein-binding (PB) versus non-protein-binding (NPB), singlish-interface versus multi-interface hub proteins, and date versus party hub proteins.
Table 3.
Dataset 1 (protein-binding vs. non-protein-binding, i.e. PB vs. NPB) prediction results from classifiers trained using machine learning methods.
Table 4.
Dataset 3 (SIH vs. MIH) prediction results from classifiers trained using machine learning methods.
Table 5.
Details for misclassified proteins in Dataset 3 using HybSVM.
Table 6.
Dataset 4 (Date vs. Party hubs) predictions from classifiers trained using machine learning methods.
Figure 5.
Venn-diagram for Dataset 3 and Dataset 4.
Each of the 272 hub proteins belong to one or more of the following classes: singlish, multi, date, party. Dataset 3 consists of 35 singlish hub proteins and 120 multi hub proteins (Yellow circles). Dataset 4 consists of 91 date hub proteins and 108 party hub proteins (Blue circles). Please see text for more details about the datasets.
Figure 6.
Example of a singlish-interface date and multi-party hub proteins.
Images A and B show the quaternary structure for the singlish-date protein Rab GDP dissociation inhibitor alpha (GDI1, YER136W) binding with two different proteins. Image C shows the quaternary structure for the yeast protein beta 6 subunit of the 20S proteasome (PRE7, YBL041W) binding with multiple proteins at the same time. A: GDI1 (green) binding with GTP-binding protein YPT31/YPT8 (purple). PDB ID of the complex: 3cpj [79], [80]. B: GDI1 (green) binding with GTP-binding proteinYPT1 (yellow). PDB ID of the complex: 1ukv [80], [81]. The protein binds at one location (singlish-interface) with one partner at a time (date). C: PRE7 (green) binds with PUP1 (orange), PUP3 (red), C5 (pink), PRE4 (purple). PDB ID of the complex: 3bdm [80], [82]. The protein binds at multiple locations (multi-interface) with many partners at same time (party).