Fig 1.
Cartoon representation of 3D-domain swapping.
The region which connects the swapped domain with non-swapped domain part of the protein is known as “hinge region” and is marked in dark red color. There is a newly formed interface between non-domain swapped regions of the two monomeric units. The domain swapped interface is present both in monomer and the domain swapped molecule.
Table 1.
Prediction assessment of RF and SVM models on testing dataset.
Fig 2.
Workflow of 3D-domain swap prediction and analysis of aggregation-related sequences from the human genome.
Out of 136 aggregation-related sequences, 99 were predicted as to be involved in domain swapping and its distribution in different Pfam domain families was plotted in the pie chart. All the positively predicted sequences were searched for their structural homologues.
Table 2.
3D-domain swapping prediction results on different genomes.
Fig 3.
Function annotations of domain-swapped predicted sequences of human genome at three different levels.
(A) Different Pfam protein families having maximum number of domain swapped predicted sequences from human genome. (B) Maximum number of protein sequences present in these biological pathways. (C) Distribution of these sequences in different protein families.
Fig 4.
Preferred Gene Ontology (GO) terms in positively predicted sequences from human genome.
Human genome is used as reference point and WEGO plotting tools is used. List of the GO terms as cellular component and biological functions, corresponding to X-axis labels, are provided in S1 File.
Table 3.
Disease association of 3D-domain swapped predicted protein sequences of human genome.
Normalization (Z score) was calculated based on domain swapped entries, in whole human genome (8945/20247).
Table 4.
Distribution of genes in different biological pathways in positively predicted sequences of human proteome.
Normalization (Z score) was calculated based on domain swapped entries in whole human genome (8945/20247).
Fig 5.
Case study on three different proteins of known structures.
The blue circle shows that these are experimentally known hinge regions (shown in red in left) that agree with our predictions. (A) Ribonuclease, Seminal (PDB code: 11BA), (B) Promyelocytic leukemia Zinc finger protein PLZF (PDB code: 1BUO) and (C) SH3 Domain (PDB code: 1AOJ). Complete list is provided in S3 Table.
Fig 6.
Workflow to generate negative dataset from monomeric structures of protein database (PDB).
BRP approach was used to find the sequences form Pfam families which do not have known examples of domain swapping (please see Methods for details). DIAL was used for prediction of domain swapping in the given sequences.
Table 5.
List of the features used for model generation for prediction of domain swapping.
Fig 7.
Overall workflow of the method used in this study and creation of positive and negative datasets.
Feature selection by WEKA, and prediction model creation by Random Forest and Support Vector Machine. Genome-wide association study of sequences predicted to undergo domain swapping in the human genome.