^{ 1 }

^{2}

^{*}

The author has declared that no competing interests exist.

Scale-free networks are generically defined by a power-law distribution of node connectivities. Vastly different graph topologies fit this law, ranging from the assortative, with frequent similar-degree node connections, to a modular structure. Using a metric to determine the extent of modularity, we examined the yeast protein network and found it to be significantly self-dissimilar. By orthologous node categorization, we established the evolutionary trend in the network, from an “emerging” assortative network to a present-day modular topology. The evolving topology fits a generic connectivity distribution but with a progressive enrichment in intramodule hubs that avoid each other. Primeval tolerance to random node failure is shown to evolve toward resilience to hub failure, thus removing the fragility often ascribed to scale-free networks. This trend is algorithmically reproduced by adopting a connectivity accretion law that disfavors like-degree connections for large-degree nodes. The selective advantage of this trend relates to the need to prevent a failed hub from inducing failure in an adjacent hub. The molecular basis for the evolutionary trend is likely rooted in the high-entropy penalty entailed in the association of two intramodular hubs.

Scale-free networks have been proposed as universal models to describe diverse complex systems such as the Internet, social interactions, and metabolic and proteomic networks [^{−γ}, where A(n) is the abundance of n-degree nodes and

To determine the graph topology of the yeast protein network [

We found that the present-day network is actually a self-dissimilar graph, most often linking nodes of dissimilar degrees, thus revealing a marked avoidance of intramodular hub connections in accordance with previous observations [

The robustness of the present-day network is found to differ from typical scale-free attributes, since it minimizes its vulnerability to hub failure and not to random node failure [

The metric S(G) (0 ≤ S(G) ≤ 1), for a graph G with scale-free degree distribution is defined by [_{i}, X_{j} are the respective node degrees (connectivities), and s_{max}(G) is the maximum over all s(H)-values, where H is a graph with the same connectivity distribution as G obtained by connectivity rewiring. This distribution-preserving rewiring is constructed following [

For a given scaling degree distribution, the metric is informative of the graph structure, reaching its maximum value (S(G) = 1) in the case where edges are most frequently connecting similar-degree nodes and decreases as the frequency of dissimilar-degree connections increases [

Using this metric, we determined the modularity along the natural evolution of the yeast protein interaction network. Node ancestry classes are defined through orthologous representativity in other genomes informative of the yeast evolution (

The trimming of the present-day network following the schedule imposed by ancestry is based on the assumption that a gene arising at a certain point in evolutionary time in an ancestral organism will be detectable in all species diverging thereafter. The ancestry of a yeast protein is thus defined by the number of orthologous ORFs [

The present-day and ancestral networks all fit the scale-free connectivity scaling (

(A) Scale-free generic law fitting the present-day and ancestral yeast protein interaction network. Abundance of nodes as a function of connectivity (node degree) in log–log scale for the present-day yeast network 00001 (filled diamonds); the fungal ancestral network 00011 (open squares); the eukaryotic ancestral network 00111 (filled circles); the eubacterial ancestor 01111 (open triangles); and the ancient network 11111 (filled squares).

(B) Scale-free metric S(G) (blue line plot) indicating the actual graph modularity of the present-day and ancestral networks. Present-day data was cross-validated with the APID database and filtered through iPfam representativity (

(C) Percentage removal of nodes with each orthologous trimming iteration. Nodes are grouped in present-day connectivity classes. Node removal is indicated for removal of class (00001) (black), (00011) (light blue), (00111) (red), and (01111) (lilac—light purple). The nodes retained after the final iteration amount to 3.5% of the present-day proteome size.

(D) Evolutionary trend toward higher modularity in yeast network (blue line) contrasted with topological evolution of randomly rewired versions of the present-day network (magenta plots). Random rewiring is of two types: degree-preserving and fully random (thick line).

(E) Topological evolution of the yeast network characterized by Newman's modularity parameter Q.

There are 319 nodes with a present-day degree X > 8 incorporated along the evolution of the network that starts at the ancient network (cf. [

We tested the sensitivity of the results to persistent noise in interactomic data (see

The dynamics of node removal associated to the evolutionary regression is indicated in

The trend toward increasing modularity associated with evolutionary change was further validated by disproving the null hypothesis that this trend holds irrespective of network topology. Thus, in several computer experiments (cf. [

An alternative indicator of modularity put forth by Newman [_{1}, of the symmetric matrix _{ij} = 1 if nodes i and j are connected, A_{ij} = 0 otherwise) and m = ½Σ_{j}X_{j} is total number of edges in the network. The dominant moduleM℘ is univocally defined by the characteristic function χ_{M℘}(j) = ½(s_{j}(_{1})+1), where _{1} is the eigenvector of _{1} and s_{j}(_{1}) = 1 if the j-th coordinate of _{1} is positive and = −1 otherwise. In set-theory notation: χ_{M℘}^{−1}({1}) =M℘. This constructive procedure reveals the most densely connected group of nodes with only sparser connections to the rest of the graph and may be further iterated on G\M℘, etc., until a full modular partition of G is achieved. A similar definition of the module is provided in [

A modularity parameter Q is then defined as an indicator of the number of nodes falling within modules minus the expected number for a random rewiring of the network, normalized to the total number of nodes in the network. Thus, Q is given by:
_{n}^{T} is the transposed eigenvector of _{n}, and _{j}(_{1})).

The trend toward increasing modularity associated with evolutionary change in the yeast network evolution is then verified adopting the Q-measure, as shown in

The topological differentiation resulting from connectivity accretion concurrent with progressive incorporation of node classes in the order (11111) → (01111) → (00111) → (00011) → (00001) may be algorithmically reproduced. Thus, the primeval network of ancient nodes–proteins may be abstractly developed, i.e., without reference to concrete molecular features of the node, in a manner entirely consistent with the S(G) behavior shown in

The algorithmic behavior of network evolution is determined by the probability P(X_{n}) = G(n)p(X_{n}) that node n with degree X_{n} would acquire a new connection. The p-factor is associated with the rate of connectivity development, while G penalizes like-degree connections that would increase assortativity. The p-factor relates to a preferential attachment law [

Two accretion laws have been investigated. While heuristic in nature, their accurate reproduction of the evolving network topology makes them worthy of examination:

Both laws have optimized parameters (

Natural evolution (black) of the protein network compared with algorithmic network developments (red, blue, green) starting with the network of ancient proteins (node class (11111)). Three algorithmic network developments were computed, following preferential attachment (blue), and laws of connectivity accretion (I in red, II in green) subject to penalization for connection accretion within node kinships.

To prevent similar-degree node connections, nodes are “tagged for kinship” at every stage of network propagation taking into account the order assigned at that stage. This order is obtained by preserving the order arbitrarily assigned in the primeval network while incorporating new nodes in consecutive order.

To define the accretion rules algorithmically, let n_{1} < n_{2} < … be an ordered set of nodes at a specific time in the network development; G_{n} denote the n-centered subgraph, that is, a subgraph containing node n, all nodes connected to n, and the connecting edges; C(n) = {nodes connected to n}; and {G_{n}} is a minimal covering of G satisfying G = _{n}G_{n}. Then, we may define ξ_{n} = Minimum_{n′∈C(n)} |X_{n} − X_{n′}|. Node n is “tagged for kinship” with probability exp(−ξ_{n}) provided no node n′ ∈ C(n) with n′ < n has been tagged for kinship. A node n tagged for kinship at a particular stage of network development is assigned the kinship penalty factor

In case of close kinship (ξ_{n} = 0), we get G(n) = 0. The creation of an internal connection linking node n with another node already tagged to develop a connection is governed by probability
_{n} = Maximum_{n′∈A(G)} |X_{n} − X_{n′}|, and A(G) = nodes tagged to develop a connection at the particular stage of network development. If node n is tagged to develop a connection, and an internal connection develops, then the new edge connects n to existing node n*, with the latter satisfying: n*∈A(G); L_{n}= |X_{n}−X_{n*}|.

The algorithmic network development that best fits natural evolution (

What sort of selective advantage is associated with evolving toward higher self-dissimilarity or dis-assortativity? We shall show that this trend increases resilience to node failure which is not random, contrary to general assumption [

Soluble proteins with high levels of backbone exposure are prone to aberrant aggregation [

(A) Extent of backbone exposure in yeast proteins from present-day class \(11111) correlates with node connectivity. Backbone exposure is given as percentage of full contour length of the protein (

(B) Distribution probability of connections between yeast proteins in present-day class \(11111) either reported in PDB or natively ordered (

(C) Connections between yeast PDB proteins in present-day class \(11111) with backbone exposure levels Y and Y′. Each connection is represented as a point in the Y-Y′ plane, revealing that backbone exposures are significantly anticorrelated across protein–protein interactions.

This finding prompts us to ask the question: Why would the avoidance of hub–hub connections bring about resilience to hub failure? Since hubs are characterized by their extent of backbone exposure, they are highly reliant on binding partnerships to preserve their structural integrity [

Thus, we showed that, unlike robustness to random failure, present-day resilience to hub failure is a non-emergent evolutionary trend achieved by enhancing the dis-assortativity of the graph under the generic scale-free degree distribution (

The lower level of connectivity among nodes of similar degree in the present-day network [

Induced fit entails a considerable entropic cost associated with the structural adaptation, decreasing the stability of the protein complexes [

To extend the validity of the anticorrelation to the full class \(11111), we also adopted a sequence-based predictor of backbone exposure, taking advantage of a tight correlation [

Connections from the full yeast interactome plotted in the Y-Y′ plane of sequence-based predicted backbone exposures (

Using a metric to quantify the extent of modularity, we examined the evolution of the yeast protein network and found significant topological differences along evolutionary time that reflect a considerable increase in modularity concurrent with evolutionary change. Thus, aided by orthologous node categorization to trace network evolution [

The molecular basis for the evolutionary trend toward higher modularity is rooted in the high-entropy cost of the reciprocal induced fits arising from the association of any two intramodular hubs, an event likely to entail structural adaptation in both proteins. Thus, the avoidance of like-degree of nodes of high connectivity is directly related to the extent of backbone exposure and conformational plasticity of hubs, making it entropically costly for them to adapt to binding partners.

This molecular justification of modularity may be complemented by an evolutionary observation. As shown in [

In an alternative molecular approach [

Lacking expression, localization, and developmental coordinates, the protein interaction network provides an incomplete large-scale description of protein–protein associations. Such a study would likely require integration of the interactome and the transcriptome. Thus, the avoidance of like-degree hub connections shown in this work may often materialize in a lack of spatial or temporal correlation between the nodes, a subject of forthcoming work.

Ancestors of the present-day yeast network were obtained by progressive trimming realized through exclusion of node ancestry classes [

Orthologous classification and grouping of the annotated yeast ORFs (

Backbone exposure for node n, denoted Y_{n}, is given as a percentage of contour length of the protein corresponding to under-protected residues, as defined below. The data were obtained from 488 yeast proteins (out of 6,199) reported in PDB complexes and four natively disordered yeast proteins [

We adopt an established relationship between backbone exposure, η, and a structural parameter, λ_{D}, that can be reliably determined from sequence: the propensity for inherent structural disorder in a region of a protein domain [_{D} (0 ≤ λ_{D} ≤ 1) is assigned to each residue within a sliding window. This value represents the predicted propensity of the residue to be in a disordered region (λ_{D} = 1 indicates full certainty). Only 6% of >1,100 nonhomologous PDB proteins give false positive predictions of disorder [_{D} > 0.35. The correlation implies that the propensity to adopt a natively disordered state becomes pronounced for proteins that, because of their chain composition, cannot fulfill a minimal protection of their backbone hydrogen bonds.

Supplementary results.

(121 KB PDF)

The SwissProt (

open reading frame