Determining the interaction status and evolutionary fate of duplicated homomeric proteins
Fig 1
The potential evolutionary fates of duplicated homomeric proteins and the analysis pipeline for identifying them.
(A) Duplication of a gene encoding a homomeric protein, and the emergence of the first mutation(s), leads to a statistical mixture of homo- and heteromeric complexes (i). Upon further divergence, three outcomes may arise: two distinct homomeric complexes (ii), a heteromeric complex involving both paralogs (iii), or loss of homomeric interaction in one copy, and gain of new interacting partners in the other paralog (iv). (B) Our analysis aimed to identify these four different evolutionary fates. It comprised three steps: (1) The genomes of E. coli and S. cerevisiae were each scanned to identify all possible paralogous protein pairs. These pairs were classified into three categories with increasing confidence of paralog assignment (note that all categories in our analysis are inclusive, i.e., low-confidence paralogs include the medium-confidence ones, and the medium include the low-confidence pairs). (2) Interactions of these paralogs were identified and classified to homo- and heteromeric ones. Macromolecular complexes were collected from the Protein Data Bank (PDB complexes, inter-subunit interactions were obtained from crystal structure data) and the Complex Portal database (CS and C complexes, inter-subunit interactions were predicted from the PPI data). The S. cerevisiae PPI data were extracted from seven databases, and the E. coli data from eight databases. The raw PPI data were filtered using various criteria to exclude potential false-positives. (3) Finally, based on the identified interactions, the paralogous pairs were assigned to one of the four potential fates (i-iv, panel A) with either a flexible or a stringent criterion.