Classification of Protein Kinases on the Basis of Both Kinase and Non-Kinase Regions

doi:10.1371/journal.pone.0012460

Table 1.

Composition of the validation data set.

More »

Expand

Table 2.

Examples of domain swapping in protein kinases.

More »

Expand

Figure 1.

Sequence alignment of selected protein pairs.

A: Sequences ENSP00000266970 and ENSP00000293215. B: Sequences ENSP00000281821 and ENSP00000350896. Identities are indicated by black background. Pfam domains are indicated by colored boxes: red = catalytic domains, magenta = domains detected in both proteins, blue = domain detected in only one protein. Abbreviations used: Ephrin_lbd = Ephrin receptor ligand binding domain, GCC = GCC2 and GCC3 domain, fn3 = fibronectin type III domain, Pkinase = protein kinase domain, SAM = sterile alpha motif domain (type SAM_1 is detected in ENSP00000350896 and type SAM_2 is detected in ENSP00000281821). Global sequence alignment is obtained using the Needleman-Wunsch algorithm. For each pair of sequences, the different distances are indicated at the bottom of the alignment. Image generated using ESPript software [52].

More »

Expand

Figure 2.

Comparison between the different distances computed between protein sequences of the validation data set.

LMScat: LMS distances between catalytic domains, LMSfull: LMS distances between full-length sequences, IDcat: identity distances between catalytic domains, BLOSUMcat: BLOSUM distances between catalytic domains, BLOSUMfull: BLOSUM distances between full-length sequences. The lower panel reports the Spearman rank correlation coefficients between different distances.

More »

Expand

Figure 3.

Assessment of different distances to detect homogeneous clusters in the validation data set.

A: each distance matrix is used as input to hierarchical clustering; clusters are extracted from the resulting trees and assessed by the biological homogeneity index (BHI); B: evolution of BHI according to the number of clusters. LMScat: LMS distances computed from catalytic domains, LMSfull: LMS distances computed from full-length sequences, BLOSUMcat: Blosum distances computed from catalytic domains, BLOSUMfull: Blosum distances computed from full-length sequences, IDcat: identity distances computed from catalytic domains. Horizontal and vertical lines indicate respectively BHI = 1 and number of clusters equal to 17.

More »

Expand

Figure 4.

AUC distributions obtained on the human kinome.

A: AUC obtained using the iterative procedure starting from BLOSUM full-length distances, B: AUC obtained using the iterative procedure starting from LMS full-length distances. The vertical red line indicates the cut-off for the detection of hybrid kinases.

More »

Expand

Figure 5.

Computation of Local Matching Score (LMS) between two sequences without alignment.

More »

Expand

Figure 6.

Detection of outliers in a pre-classified data set.

1: the distance matrix and initial weights are used to compute AUC values for each sequence using equation 6; 2: sequences weights are updated using equation 7; 3: the procedure is iterated until convergence; 4: the final AUC values are used to compute a histogram; 5: the histogram shape is used to detect outliers.

More »

Expand

Figure 7.

Examples of classification curves.

A: a putatively well classified sequence, B: a putatively misclassified sequence. AUC denotes the area under the curve.

More »

Expand