Questioning the Ubiquity of Neofunctionalization

doi:10.1371/journal.pcbi.1000252

Figure 1.

Duplication of self-interacting proteins.

(A) An interaction between a protein and a self-interacting protein. (B) When the self-interacting protein duplicates, the duplicates interact.

More »

Expand

Figure 2.

He and Zhang [6] illustrate the presence of neofunctionalization through interaction data analysis.

(A) Paralogous proteins 1 and 2 initially share all 3 interacting partners. (B) In the absence of neofunctionalization, the number of interacting partners should remain at 3 as redundant interactions are lost over time. He and Zhang show that the number of interacting partners increases as the age of paralogs increases. (C) The increase in interacting partners is attributed to neofunctionalization (i.e., the de novo gain of interactions).

More »

Expand

Figure 3.

Shortcomings of the yeast two-hybrid assay.

(A) The traditional view of the yeast two-hybrid assay. A bait protein is hybridized with the GAL4 binding domain which binds to the upstream activation sequence for galactose (UAS_G). A prey protein hybridized with the GAL4 activation domain interacts with the bait protein. The complex forms a functional transcriptional activator and the downstream reporter gene is expressed. (B) A more accurate view of yeast two-hybrid assay. The GAL4 binding domain actually binds to UAS_G as a dimer. (C) If the GAL4 binding domain is hybridized to a self-interacting protein, self-interacting protein bait dimerizations would reduce the probability of bait-prey interactions.

More »

Expand

Figure 4.

Neofunctionalization vs. concurrent gene duplication and subfunctionalization.

(A1) Gene duplication. Shown also are two additional proteins elsewhere in the network. (A2) According to He and Zhang (2005), additional interactions gained by paralogous pairs over time are explained by the formation of de novo interactions. (A3) The resulting network. (B1) Gene duplication. (B2) An interacting partner duplicates, including the loss of a redundant interaction. (B3) Another partner duplicates and loses a redundant interaction. (B4) The resulting network is indistinguishable from that postulated for neofunctionalization.

More »

Expand

Figure 5.

A fungal phylogenetic tree showing ancestral species nodes into which Saccharomyces cerevisiae duplicates are grouped (T₀–T₃).

Groupings were generated from gene trees reported in reference [28]. Ancient duplications occurred in ancestral node T₃ and the most recent duplications occurred in T₀.

More »

Expand

Figure 6.

Change in the number of interacting partners (protein connectivity) over time.

Proteins are aligned with the phylogenetic period from Figure 5 in which they were born (see Methods). Red circles identify the connectivity of gene duplicates born at the indicated phylogenetic timepoint: T₁, T₂, and T₃. The red trend line indicates that the connectivities of gene duplicates increase over time. Black triangles identify the same proteins after removing interactions with more recent duplicates. The black trend line indicates that once subsequent duplications are accounted for, the connectivities of paralogous genes remain largely unchanged. This is consistent with the alternate explanation proposed in Figure 4B. (A) The combined interaction datasets [29],[30] used by He and Zhang [6]. (B) Physical interactions from BioGrid [31].

More »

Expand

Table 1.

Network measures, including C, the clustering coefficient of Saccharomyces cerevisiae protein interaction networks.

More »

Expand

Figure 7.

Triangles and connected triples in gene duplication.

(A) The network has T = 1 triangle and Γ = 5 connected triples. (B) Simple duplication adds a duplicate of the progenitor's single triangle to the network. There are γ_p = k_p(k_p−1)/2 = 3 connected triples centered around the progeny, and an additional Σk_g = 5 connected triples centered on the neighbors. (C) If the progenitor is self-interacting, an additional edge between the progenitor and progeny is formed, thus increasing the simple duplication counts by k_p = 3 additional triangles (extruded for clarity) and 2k_p additional connected triples (the progenitor and progeny are both centered on k_p additional connected triples due to the dimerizing interaction).

More »

Expand

Figure 8.

The effect of gene duplication on the clustering coefficient.

Every connected network containing three to nine nodes was enumerated producing 273,191 networks containing 2,445,434 nodes. (A) Changes to the clustering coefficient resulting from simple duplication and homomeric duplication. Each of the 2,445,434 nodes was duplicated twice, once as self-interacting (homomeric) and once as non-self-interacting (simple). Shown is the change in clustering coefficient for each duplication, ordered by magnitude. The enumerated networks serve as possible subnetworks of larger protein interaction networks. The magnitude of the vertical axis is determined by the size of the network, but the shape of the curves around zero remains unchanged. (B) The severe effect subfunctionalization has on the clustering coefficient. The vertical axis represents the portion of the 2,445,434 gene duplications in the enumerated networks which result in a decrease in the clustering coefficient. Probability of Loss is the probability the gene duplicate (progeny) loses each of its interactions due to subfunctionalization. Even without losses suffered due to subfunctionalization, simple duplications reduce the clustering coefficient in over 76% of examined duplications. By contrast, clustering coefficients produced via homomeric duplication are far more likely to increase even in the face of interaction losses caused by subfunctionalization. (C) The effect of subfunctionalization on aggregate ΔC. The change in clustering coefficient aggregated for all 2,445,434 duplications at each loss probability. While aggregate ΔC of simple duplication is below zero for all loss probabilities, homomeric duplications remain above zero until the Probability of Loss≈0.62.

More »

Expand

Table 2.

ΔC as the number of nodes in the enumerated networks increases.

More »

Expand

Figure 9.

The clustering coefficient of networks featuring simple duplication, neofunctionalization, subfunctionalization, and homomeric duplication.

Each plot shows the clustering coefficient for different probabilities of a gene duplicate losing a redundant interaction (i.e., different levels of subfunctionalization). Lines are grouped into pairs by color. A solid line is a model with a specific parameter, and a dashed line of the same color is the model's random equivalent (see Methods). The black line pairs represent simple duplication and subfunctionalization (i.e., no neofunctionalization or homomeric duplication) and are therefore identical in both plots. (A) The Solé et al. model which includes neofunctionalization [7]. (B) Homomeric duplication as found in the Vázquez et al. model [32].

More »

Expand

Figure 10.

Underestimating the interaction conservation rate (equivalently, overestimating the interaction loss rate).

The conservation rate is the number of shared interacting partners divided by the total number of partners. (A) Gene 1 is duplicated to create paralogous pair 1 & 2. The true conservation rate is . (B) A neighbor of the paralogous pair duplicates and loses a redundant interaction. (C) The network as observed. The paralogous conservation rate of 1 & 2 is erroneously underestimated to be . Equivalently, the true loss rate of is overestimated to be .

More »

Expand