Fig 1.
Example of MSA for a given reference structure’s pocket.
Table 1.
Data sources and corresponding identifiers for the protein similarity matrices employed in this work.
Table 2.
Structures used as templates for modeling the family sequences.
Table 3.
Structures used as templates for modeling the SFLD superfamily sequences.
Table 4.
Comparison of Mutual Information (MI) values for the clusterings obtained by each technique for the studied protein families.
Table 5.
Data combinations which yielded the best results for the nucleotidyl cyclases in five runs of the GP system.
Fig 2.
Nucleotidyl cyclase division into two clusters by the GP system.
Subfigure (a) shows the active site logo for the adenylate cyclase cluster, while (b) shows that for the guanylate cyclase cluster.
Table 6.
Most important residues for the two nucleotidyl cyclase clusters produced by the GP system.
Fig 3.
DUF849 division into seven clusters produced by manually altering ASMC’s hierarchical clustering in [1].
Subfigures (a) through (g) show the active site logos for clusters G1 through G7, respectively.
Table 7.
Substrate nature in each group.
Fig 4.
DUF849 division into seven clusters by the GP system.
Subfigures (a) through (g) show the active site logos for clusters I through VII, respectively.
Table 8.
Data combinations which yielded the best results for the protein kinases in five runs of the GP system.
Fig 5.
Protein kinase division into two clusters by the GP system.
Subfigure (a) shows the active site logo for the cluster consisting mainly of Ser/Thr kinases, while (b) shows the logo for the cluster of Tyr kinases combined with the EGFR subcluster.
Table 9.
Most important residues for the two protein kinase clusters produced by the GP system.
Table 10.
Data combinations which yielded the best results for the serine proteases in five runs of the GP system.
Table 11.
Distribution of the crotonases among families.
Table 12.
Distribution of families among the twelve crotonase superfamily clusters produced by the GP system.
Table 13.
Distribution of the enolases among subgroups and families.
Table 14.
Distribution of families among the twelve enolase superfamily clusters produced by the GP system.