Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure
Figure 6
Ligand-binding site identification performance by number of chains in structure.
(A) The average area under the precision-recall curve (PR-AUC) for predicting ligand binding residues on each set of structures. (B) The average PR-AUC for ligand binding pocket identification. (C) The average Jaccard coefficient of the overlap of the predicted pockets with bound ligands. Methods based on structure alone have an increasingly difficult time distinguishing among ligand-binding pockets and non-ligand-binding gaps between chains as the number of chains in the protein increases. This trend is clear in each evaluation. Conservation's performance does not exhibit this effect (A). In fact, Conservation outperforms Structure on proteins with five or more chains. The integration of sequence conservation and pocket prediction in ConCavity improves performance in each chain based partition in each evaluation, and ConCavity sees only a modest decrease in performance on proteins with multiple chains. Conservation alone could not be included in (B) and (C), because it does not make pocket predictions. Note that the y-axes in the figures do not all have the same scale. The number of structures per chain group: 1 chain: 143, 2 chains: 112, 3 chains: 18, 4 chains: 35, 5 or more chains: 24.