Fig 1.
The potential evolutionary fates of duplicated homomeric proteins and the analysis pipeline for identifying them.
(A) Duplication of a gene encoding a homomeric protein, and the emergence of the first mutation(s), leads to a statistical mixture of homo- and heteromeric complexes (i). Upon further divergence, three outcomes may arise: two distinct homomeric complexes (ii), a heteromeric complex involving both paralogs (iii), or loss of homomeric interaction in one copy, and gain of new interacting partners in the other paralog (iv). (B) Our analysis aimed to identify these four different evolutionary fates. It comprised three steps: (1) The genomes of E. coli and S. cerevisiae were each scanned to identify all possible paralogous protein pairs. These pairs were classified into three categories with increasing confidence of paralog assignment (note that all categories in our analysis are inclusive, i.e., low-confidence paralogs include the medium-confidence ones, and the medium include the low-confidence pairs). (2) Interactions of these paralogs were identified and classified to homo- and heteromeric ones. Macromolecular complexes were collected from the Protein Data Bank (PDB complexes, inter-subunit interactions were obtained from crystal structure data) and the Complex Portal database (CS and C complexes, inter-subunit interactions were predicted from the PPI data). The S. cerevisiae PPI data were extracted from seven databases, and the E. coli data from eight databases. The raw PPI data were filtered using various criteria to exclude potential false-positives. (3) Finally, based on the identified interactions, the paralogous pairs were assigned to one of the four potential fates (i-iv, panel A) with either a flexible or a stringent criterion.
Fig 2.
The distribution of divergence modes of S. cerevisiae and E. coli paralogous pairs.
The four divergence modes, obligatory-homo, obligatory-hetero, mixed and hetero-others, are described in Fig 1A. (A) The distribution of S. cerevisiae paralogous pairs in PPI data (right panel) and in curated complexes (left panel). Presented are the distributions for different stringencies of analysis, along its 3 steps (Fig 1B). Step-1, paralog assignment, is presented in columns, shaded in green, from low-confidence in pale green to high-confidence paralogs in dark green. Step-2, identifying interactions, also in columns, from white (raw PPI data) to dark grey (filter-3). Step-3, the divergence mode, is presented in rows–the top set of rows represent the flexible criterion (shaded in yellow), and the bottom rows the stringent criterion (dark yellow). The dominant divergence modes, or fates, are highlighted in darker shades of red. (B) The distribution of E. coli paralogous.
Fig 3.
The distribution of complexes comprising homo- and heteromeric paralogs in S. cerevisiae and in E. coli.
This analysis was based on the curated complexes databases. The column annotations and color shades are the same as in Fig 2. (A) The numbers of unique S. cerevisiae complexes comprising paralogs assigned to the different homo/hetero divergence modes. Note that the different confidence levels for paralog assignment (LC, MC, HC) show that same trend as in Fig 2B, curated complex panel. (B) The same for E. coli.
Fig 4.
Different modes of prokaryotic homomer to eukaryotic heteromer transition.
Gene duplication of an ancestral non-ring-like homomer may produce a heteromeric complex that may (i) or may not (ii) retain the ancestral oligomeric order (i.e., the total number of subunits in the complex). After the first gene duplication and the subsequent emergence of a heteromeric interaction, multiple rounds of duplication may follow in which the descendant paralogs retain the heteromeric interaction (iii). For ring-like complexes, multiple rounds of intra-ring gene duplications result in heteromeric rings, while keeping (iv) or changing the ancestral oligomeric order (v). For each mode of transition, an example case is provided.