Mean Field Analysis of Algorithms for Scale-free Networks in Molecular Biology

The sampling of scale-free networks in Molecular Biology is usually achieved by growing networks from a seed using recursive algorithms with elementary moves which include the addition and deletion of nodes and bonds. These algorithms include the Barabasi-Albert algorithm. Later algorithms, such as the Duplication-Divergence algorithm, the Sol\'e algorithm and the iSite algorithm, were inspired by biological processes underlying the evolution of protein networks, and the networks they produce differ essentially from networks grown by the Barabasi-Albert algorithm. In this paper the mean field analysis of these algorithms is reconsidered, and extended to variant and modified implementations of the algorithms. The degree sequences of scale-free networks decay according to a powerlaw distribution, namely $P(k) \sim k^{-\gamma}$, where $\gamma$ is a scaling exponent. We derive mean field expressions for $\gamma$, and test these by numerical simulations. Generally, good agreement is obtained. We also found that some algorithms do not produce scale-free networks (for example some variant Barabasi-Albert and Sol\'e networks).


Introduction
Many systems in nature and society are described by means of complex networks [9]. Some of these systems include the cell [18], chemical reactions [16], the world wide web [6], social interactions [7], etc. It is generally found that many system, though different in nature, produce networks which are scale-free and exhibit similar properties [2,3].
The main property of scale-free networks is that their degree distribution decays as a power law [2,4] -this shows that there is no characteristic scale for the degrees, which is why the networks are called scale-free. The average degree of a scale-free network offers little insight into the real topology of the network [3] since most nodes have degrees which are far away from the average degree of the network. Nodes of high degree are called hubs and though small in number for realistic networks, they are over-represented compared to the number of hubs in random networks. These hubs play an important role in dynamical processes which occur in scale-free networks.
Scale-free networks also exhibit an unexpected degree of robustness -this is the property that such networks maintain their dynamic properties even when many nodes and bonds fail to transmit signals (suffer high failure rates) [9]. However, these networks remain vulnerable to failure of hub nodes, since these nodes play a significant role in maintaining the network's connectivity.
In this paper the mean field approach to the analysis of algorithms for sampling scale-free networks inspired by processes in molecular biology is presented. In addition, numerical testing and, in some cases, verification, of the mean field approach will be examined. The focus will be on four widely used and discussed algorithms in the literature, nameley, the Barabasi-Albert algorithm [4,5], the Duplication-Divergence algorithm [26,27], the Solé algorithm [24] and the iSite algorithm [14,15].
The Duplication-Divergence, Solé and iSite algorithms are inspired by modelling networks in biological models of protein-protein interaction evolution, and all these algorithms are based in one way or another on two ideas: growth by preferential attachment [11], and growth and changes (mutations) in networks induced by the duplication, deletion or replacement of nodes or bonds (these are elementary moves which mutate the network by adding, deleting or moving some of its bonds or nodes).
Growth by preferential attachment is implemented by adding bonds preferentially to nodes of high degree. This increases the probability that a node will grow to be a hub in the network, and the resulting network has an increased probability that it will contain hubs [4]. The Barabasi-Albert algorithm uses preferential attachment to grow scale-free networks by attaching bonds to nodes with a probability which is proportional to the degrees of nodes [2]. A mean field analysis of the Barabasi-Albert algorithm was done in reference [5].
The Duplication-Divergence algorithm [26,27] generates scale-free networks by implementing elementary moves which mutate and grow the network. These are duplication (the duplication of existing nodes and bonds) and divergence (local changes made to existing bonds and nodes) elementary moves. These moves model processes which are thought to underlie the evolutionary mechanisms by which protein interaction networks evolve [23,26,27]: The duplication of genes is a mechanism which generates genes coding for new proteins during evolution and the divergence step is a model for that Solé networks are rich in bonds.
Finally, the iSite algorithm is presented and examined developing a mean field approach to determine its scaling properties. The algorithm is also modified, and the resulting mean field results are tested numerically.
The paper is completed in the conclusion section, where our main results are briefly considered and reviewed.

Scale-free networks
Scale-free networks of order n are characterised by degree sequences {d k } which follow a power law distribution (where d k is the number of nodes of degree k and 1 n d k is the fraction of nodes of degree k).
If d k is the average degree distribution, then 1 n d k is proportional to the probability P (k) that a node has degree k. In scale-free networks, the probability P (k) decays like a powerlaw with exponent γ: Here, γ is the scale-free network exponent. The constant C o is a normalisation constant given by As n → ∞, it is necessary that γ > 1 for P (k) to be summable (and C o < ∞). In this case C o converges to a constant as n → ∞. Thus, if γ > 1 then the network is said to be integrable with scaling exponent γ (in this event equation (1) is the scaling of the limiting degree distribution with C o > 0 finite and P (k) → 0 as k → ∞). The case that γ = 1 gives rise to a logarithmic correction. Since n k=1 k −1 ∼ log n, this gives the distribution P (k) ∼ 1 log n k −1 for networks of (large) order n. This network is said to be not integrable, but for asymptotic values and fixed values of n the decay of P (k) will appear to be proportional to k −1 .
Since P (k) is the probability that a node in a network has degree k, the average degree sequence { d k n } over randomly generated networks of order n is given approximately by d k ∼ n P (k), for n large. It is not known that the degree sequence is self-averaging (that is, that the degree sequence {d k } has distribution d k ∼ nP (k) as n → ∞ for a single randomly generated scale-free network).
This powerlaw decay of degree sequences shows that nodes of large degree (that is, for large k) are more common in scale-free networks (compared to randomly generated networks, where they are exponentially rare). These nodes of large degree are called hubs. A precise definition of a hub in a network is somewhat arbitrary, but for the purpose of this paper, a "hub" in a network of order n is defined as a node of degree exceeding √ n .
The exponent γ can be estimated from numerical data by computing the average degree sequence { d k } and then plotting log P (k)/ log k against 1/ log k (for networks of order n k). Extrapolating the data to k = ∞ using a linear or a quadratic regression gives the value of γ as the y -intercept of the graph. This method works well if P (k) scales with k as in equation (1). However, strong corrections to the powerlaw behaviour may make the extrapolation difficult or inaccurate.
A second method to estimate γ is to note that if γ > 1, then for a fixed value of α > 0, Experimentation with numerical data shows that by plotting ζ(k) against 1 k log k good results are obtained, and linear or quadratic regressions of ζ(k) against 1 k log k can be used to estimate γ.
If it is assumed that P (k) is well approximated by equation (1) for all k ≥ 1, then the average connectivity of a network of order n with average degree distribution proportional to P (k) = C o n −γ is given by Observe that the asymptotic estimate is very poor if γ ≈ 2, and if n is small. The cases γ = 1 and γ = 2 can also be determined; this gives The coefficient γ−1 γ−2 may be modifed if P (k) is not well approximated by the powerlaw decay for smaller values of k in equation (1). These results, however, do show that the connectivity is a constant independent of n (for large n) if γ > 2.
The expected number of bonds in the network is given by E n = 1 2 n k n . Assuming the powerlaw relation in equation (1), it follows that Of course, if γ < 1, then E n = Θ(n 2 ) and since a complete graph has 1 2 n(n − 1) bonds, this implies that these graphs are dense in the sense that lim inf n→∞ 1 n 2 E n > 0. For all values of γ ≥ 1 the above shows that lim sup n→∞ 1 n 2 E n = 0, and the graphs are sparse. These results are useful in examining numerical data for scale-free networks. For example, γ can be estimated by examining degree sequences averaged over randomly sampled networks (from equation (1)), or alternatively by using equation (4). The connectivity k n approaches a constant if γ > 2 (as in equation (5)) or grows as a powerlaw with n if γ < 2, and with logarithmic corrections if γ = 1 or γ = 2 (as in equation (6)). Alternatively, the average size E n (the number of bonds in a network of order n) can be considered, using the results in equation (7).

Barabasi-Albert networks and the Barabasi-Albert algorithm
The Barabasi-Albert algorithm is a recursive algorithm which grows networks (or clusters of nodes and bonds) from a seed node. This algorithm was introduced in reference [4] and reviewed in 2002 in a seminal paper [2], and its elementary move was inspired by processes underlying the (presumed) evolution of scale-free networks seen in the physical world. The elementary move is a preferential attachment of new nodes (and bonds) to hubs (nodes of high degree) in the network. The algorithm is initiated by a single node, and then new nodes and bonds are recursively attached, with new bonds preferentially attached to existing nodes of large degree.
A Barabasi-Albert network of order N nodes is grown as follows: Barabasi-Albert algorithm: 1. Initiate the network with one node x 0 ; 2. Suppose that the network consists of nodes {x 0 , x 1 , . . . , x n−1 } of degrees {k 0 , k 1 , . . . , k n−1 }; 3. Append a new node x n by executing step (a) or step (b): (a) With probability p: Select x j uniformly and attach x n to it by inserting the bond x j ∼x n ; (b) With default probability 1−p: Attach x n by adding bonds x j ∼x n independently with probability k j j k j ; 4. Repeat step 3 until a network of order N is grown.
Step 3(a) is a random attachment of a node and bond, and step 3(b) attaches a node with bonds preferentially to existing nodes of high degree. The algorithm has a single parameter p. If p = 1 then the algorithm grows acyclic (and connected) networks of order N (these are random trees).
On the other hand, if p = 0, then step 3(b) is executed on each iteration. New bonds are created with probabilities q j = k j j k j for j = 0, 1, . . . , n − 1 when the n-th node is added. This shows that the expected number of bonds added in this step is on average j q j = 1. That is, on average 1 bond is added in each iteration, and the average sum of degrees j k j should be equal to 2n by handshaking after n iterations. This suggests that the algorithm grows a sparse graph with increasing n. However, since bonds are appended preferentially on growing hubs, the largest clusters in the network should be dominated by growing hubs.
For values of p ∈ (0, 1) the algorithm adds either (wih probability p) a single bond randomly, or it adds a collection of bonds (on average one bond) preferentially. This grows simple networks of order N and size N − 1, typically not connected unless acyclic. In figure 1 an example of a Barbasi-Albert network of order 122 with p = 0 is shown (left) and the right is a network of size 380. The appearance of hubs in these networks is clearly seen: In the network on the left there are 5 nodes of degrees exceeding √ 122, the largest of degree 31, and in the network on the right there are 3 hubs of degrees exceeding √ 380, the largest of degree 63.

Modified Barbasi-Albert networks
Barabasi-Albert networks are relatively sparse networks. A modification of the algorithm can be introduced to grow denser networks. For example, one may replace step 3(b) by 3(b). With default probability 1 − p: Attach x n by adding bonds x j ∼x n with probability q j = min{ λ k j +A j k j , 1} (where λ and A are non-negative parameters of the algorithm); Since k j j k j in Barabasi-Albert networks, one may assume that λk j +A ≤ j k j for values of λ and A which are not too large (and so q j ≤ 1). In

Variant Barbasi-Albert networks
A variant Barbasi-Albert algorithm can be introduced by changing step 3(b) in the Barbasi-Albert algorithm to 3(b). With default probability 1 − p: Attach x n by adding bonds x j ∼x n with probability q j = min{ k α j +A j k j , 1}, (where α and A are non-negative parameters of the algorithm); The effect of the parameter α is to increase the probability of adding bonds to the hubs of the network if α > 1, and to decrease this probability if α < 1. In the case that α > 1 networks dominated by a single very large hub are obtained (see figure 3 (right network)), while networks with α < 1 are more sparse and not dominated by a few hubs (see figure 3 (left network)). The left network in figure 3 was grown by putting α = 0.15 and A = 0 and has order 327. None of the nodes in this network has degree which exceeds √ 327, and so none qualify as hubs. A denser network is obtained if α = 1. 15   The network on the left was grown using α = 0.15 and A = 0 to a total to n = 327 nodes. This graph is very sparse, and none of its nodes qualify as hubs. The network on the right was grown to order n = 351 with α = 1.15 and A = 0. This is a dense network with several nodes qualifying as hubs of degrees {22, 24, 26, 42, 43, 116}. The arrangement of nodes and bonds in these networks was created using the prefuse force directed lay-out in Cytoscape 3.4.0 [1].

Mean field theory for Modified Barabasi-Albert networks
Let k j (n) be the degree of node j after n iterations. A mean field calculation of k j (n) is done by assuming that k j (n) is equal to its expected value for each n; that is, k j (n) = k j (n) for each j and n.
The modified Barabasi-Albert algorithm appends bonds to a network of order n as follows: Step 3(a) is executed with probability p, and a bond (and the (n + 1)-th node) is appended with uniform probability on one of the n existing nodes. The probability that node j gets a bond in this way is p n and on average one bond is attached with probability p.
If step 3(b) is done instead, then the expected number of bonds added in the mean field is approximately j λ k j (n)+A j k j (n) = λ + nA j k j (n) . The total number of bonds in the network is by handshaking. Thus, the increment in the number of bonds when the next node is appended is Approximate this by a differential equation This can be solved to obtain where C is a function of (p, λ, A) defined by this expression. Notice that E n grows linearly in n, so that Barabasi-Albert graphs will be necessarily sparse as n → ∞ (and by equation (7) the scaling exponent is γ > 2). With each iteration the mean field value of k j (n) (the degree of the j-th node after n iterations) increments by since 2E n = j k j (n) = 2C n, and since the probabilty of adding a bond to node j is λk j (n)+A j k j (n) . This can again be approximated by a differential equation: Take n → t, a continuous time variable, and let k j (n) → k j (t), the continuous mean field degree of node j. Then The initial condition is to assume that node j is added at time t j . Putting A = 0 and λ = 1 gives C = 1 and the equation (1−p)k j (t) 2t (14) which was also derived in reference [5]. In this event the solution is k j (t) = 1+p 1−p (t/t j ) (1−p)/2 − 2p 1−p (assuming the initial condition k j (t j ) = 1). More generally, equation (13) can be cast in the general form where using again the intial condition k j (t j ) = 1. The mean field degree distribution can be determined from this solution. The probability that node j has degree k j (t) smaller than κ at time t is denoted by this is also the probability P (t j /t) > Q/P +κ 1+Q/P −1/P . If the node t j is chosen uniformly from the n available, then The mean field degree distribution is the derivative of this to κ: For large κ this shows that the modified Barbasi-Albert network is scale-free with exponent Putting A = 0 gives the exponent This is the mean field exponent of a modified Barabasi-Albert network. For small λ < 1 the exponent is large, indicating a network with few nodes (if any) of high degree. For large λ > 1, γ 3 + . This is a lower bound on γ for modified Barabasi-Albert networks. If λ = 1, then the exponent γ is given by In this model one similarly finds that γ ≥ 3, and in fact, if p = 0, then γ = 2 + √ 1 + 2A. The parameter A may be used to tune the exponent γ for any given p.
The connectivity of modified Barabasi-Albert networks is given by (5) gives k n 1 1−P . Inserting the value of P gives the result above as well. In the figure 4 the probability P (k), that the degree of a Barabasi-Albert network is equal to k, is examined by plotting log P (k)/ log(k + 1) against 1/ log(k + 1) where P (k) was estimated for values n ∈ {6250, 12500, 25000, 50000, 100000, 200000} and for p = 0. The curves should intersect the vertical axis at −γ. Least squares fit of the data to quadratic curves gives 6 estimates for γ, which average to γ = 3.026 ± 0.076, very close to the theoretical value γ = 3 from equation (20) (for p = 0 and λ = 1).
Data collected for the same values of n and for p = 0.5 cannot be successfully analysed by regressions with quadratic curves, but cubic curves give the average value γ = 5.161 ± 0.068, which are not equal to but still fairly well approximated by γ = 5 prediced by equation (20) for p = 0.5 and γ = 1.
When p = 0.8 the plots are strongly curved and extrapolation to estimate γ is more difficult. In this case a different approach is needed. Putting α = 1 2 in equation (4) gives so that a plot of ζ(k) = (log P (k)−log P ( 1 2 k))/ log 2 → −γ as k → ∞. That is, plotting ζ(k) against 1 k gives a curve with y -intercept equal to −γ. Better results are obtained when plotting against 1 k log k. In this case a linear extrapolation gives γ = 11.67 ± 0.41 Figure 4: Scaling of Barabasi-Albert networks with p = 0: Data on networks generated by the Barabasi-Albert algorithm with p = 0. In each case 100 networks were grown and the average degree sequence P n (k) computed. The curves above are plots of log P n (k)/ log(k + 1) against 1/ log(k + 1) for n ∈ {6250, 12500, 25000, · · · , 200000}. Least squares fit to the data using a quadratic model gives the y -intercepts which averages to 3.026. This is very close to the value γ = 3 predicted for the scaling exponent in this model by the mean field approach. and a quadratic extrapolation gives γ = 11.6 ± 2.6. These results are close to the mean field prediction γ = 11 for p = 0.8. Incidently, if p = 0.5 then this kind of analysis show that γ = 5.47 ± 0.14 (linear extrapolation) or γ = 4.4 ± 1.0 (quadratic extrapolation), and if p = 0, then the results are γ = 3.088±0.022 (linear extrapolation) and γ = 2.86 ± 0.18 (quadratic extrapolation). If λ = 2 and p = A = 0 then the algorithm grows modified Barabasi-Albert networks with γ = 3 (the mean field estimate given by equation (19)). Estimating γ by plotting ζ(k) against 1 k log k gives the estimate γ = 3.019 ± 0.098 (linear extrapolation) and γ = 2.62 ± 0.33 (quadratic extrapolation).
The connectivity of Modified Barabasi-Albert networks should converge quickly to a constant with increasing n (by equation (5)) since γ > 2. Computing it for Barabasi-Albert networks (with λ = 1 and A = 0) gives k n ≈ 3.16 for p = 0, k n ≈ 2.28 for p = 0.5 and k n ≈ 2.08 for p = 0.8, and for n = 12500. Increasing n does not change these results.

Mean field theory for Variant Barabasi-Albert networks
In this model the increment in the number of bonds when the (n +1)-th node is appended is given by Approximating this with a differential equation gives The right hand side can be approximated as follows: For α > 1 the algorithm should grow dense networks with nodes of high degree. Assuming that k j (n) ≈ k (n) for all shows that If A = p = 0, then the differential equation can be solved directly to obtain E n 2 (α−1)/(2−α) n, provided that α > 1. This shows that E n is linear in n, which may be expected if α is not too much larger than 1. Numerical experimentation shows that E n grows linearly in n for values of α not too much larger than 1. For example, if p = 0.5, A = 1 and α = 1 then 1 n E n → 1.207 . . ., if α = 1.5 then 1 n E n → 1.539 . . ., but if α = 2 then 1 n E n increases slowly with n. Similarly, if p = 0, and A = 1, then, if α = 1, 1 n E n → 1.366 . . ., and if α = 1.5, 1 n E n → 2.399 . . ., but if α = 2 then 1 n E n increases slowly with n and for even larger values of n this growth accelerates.
The recurrence for the degree of the j-th node may be approximated by a differential equation similar to equation (13): Assuming that E n = Dn β , replacing n → t (a continuous time variable), gives the recurrence This can be approximated by the differential equation If α = 1 and β = 1 then the solution of this equation gives the Barabasi-Albert case with γ = 3. Proceed by considering the case A = p = 0 and the initial condition k j (t j ) = 1. Assume that α = 1 + . Then the equation becomes A perturbative approach for small can be done by expanding (k j (t)) = exp( log k j (t)) = 1 + log k j (t) + 1 2 2 log 2 k j (t) + · · ·. Truncating this at O( 2 ) and putting g(t) = log k j (t) gives the differential equation Using the initial condition g(t j ) = log k j (t j ) = 0 the solution of this equation is In the case β > 1 suppose that δ = β − 1 and that δ is small. Then approximate With this approximation the solution for g(t) above can be expanded in and δ to give the first order approximations Proceed by solving the above quadratics for log( t t j ) in terms of g(t). Expand the solution in and δ and keep only the first few terms. In the case that β = 1 this gives Since g(t) = log k j (t), the probability that k j (t) < κ is given by Taking the derivative to κ gives the distribution function in the case that β = 1: These networks are thus not scale-free. For small values of k the log k terms are slowly varying, and the networks will appear to be scale-free with γ = 1 + 2D. However, with increasing k the exponent reduces in value and the connectivity of the network will become dependent on k in the way seen in equation (5) for small values of γ.
Notice that if D = 1 and = 0 (or α = 1), then the above reduces to P (k) ∼ k −3 , as expected for Barabasi-Albert networks.
If β > 1, then a similar approach to the above may be considered. Solving the expression for g(t) above for log( t t j ) and keeping only terms to O( ) and O(δ) gives This shows that This shows that This gives an effective exponent γ k = 1+2D+δ log t j +D(2Dδ − ) log k which decreases in size if 2Dδ − < 0 and increases in size if 2Dδ − > 0. Since δ = β − 1 and = α − 1, and for small α numerical simulations show that β ≈ 1, it is normally the case that 2Dδ − < 0. This means that the networks will first appear scale-free with constant connectivity until k becomes large enough in which case the connectivity will increase with k, as seen above.

Numerical results on Variant Barabasi-Albert networks
In figure 5 data for networks with p = 0 and α = 1.1 and α = 0.5 is shown. Since α = 1.1 is still very close to 1, the results above show that these networks should still appear scale-free, and with connectivity a constant. This is indeed the case. For n = 6250 the In each case 100 networks were grown and the average degree sequence P n (k) computed. The curves above are plots of log P n (k)/ log(k + 1) against 1/ log(k + 1) for n ∈ {6250, 12500, 25000, · · · , 200000}. data gives k n = 3.149, and increasing n to n = 200000 gives k n = 3.176. That is, the connectivity of the networks are insensitive to n over this range. Least squares fits to the curves with quadratic polynomials in order to determinate the value of γ give the average γ = 2.857 ± 0.068. This result is consistent with a constant value of the connectivity of networks of these size ranges. With increasing n, it is expected that γ will decrease in value (that is, the value given here is an effective value), and eventually, the connectivity will start to increase.
Networks generated with p = 0 and α = 0.5 turned out to be sparse with low connectivity. For example, for n = 100000, the connectivity is k n = 1.036 and this decreases even further for n = 200000, where k n = 1.020. Attemps to extract an exponent γ from the data for these networks were not succesful, the regressions did not settle on a value, but are strongly dependent on n. Notice that the mean field analysis above does not apply to networks with α < 1.
Putting α = 2 gives networks with average connectivity which increases with n. For example, if n = 100, then k n = 43, for n = 500, k n = 260 and for n = 1000, k n = 527. On the other hand, if α = 3 2 , then k n = 3.08 if n = 100, k n = 3.27 if n = 500, and k n = 3.31 if n = 1000, and it appears that for small values of n the connectivity does not change quickly with increasing n.

Duplication-Divergence networks
Biological models of protein evolution are usually presented in terms of two processes, namely (1) a duplication event involving a gene sequence in DNA, and (2) a (random) mutation of duplicated genes which then drift from one another in genetic space [8,17,28]. The mutations of duplicated and mutated genes change the proteome and the network of protein interactions: If the protein is self-interacting, then the duplicated proteins interact, and the mutated genes code for proteins with altered interactions (some gained, others weakened or lost) with other proteins.
The Duplication-Divergence algorithm models these processes in order to grow a network, and was used in order to estimate the rates of duplication and mutation in the protein interaction networks [27]. There is a rich and large literature reporting on modeling protein interaction networks using models which include processes of duplication and divergence [13,19,22,25].
Since proteomic networks appear to be scale-free [12,20], it seems likely that duplication and divergence processes should grow scale-free networks and that this should also be seen in computer algorithms which grow networks using duplication and divergence elementary moves. Duplication can be implemented by selecting nodes and duplicating them, and their incident bonds, in a network. Divergence is implemented by altering the bonds incident on particular nodes, namely either by deleting, adding or moving bonds. In the Duplication-Divergence algorithm these moves are implemented by selecting nodes uniformly for duplication to progenitor-progeny pairs, and by deleting bonds incident to either the progenitor node or its progeny. Notice that since nodes of high degree have a larger probability of being adjacent to a node selected for duplication, these nodes have a larger probability of receiving new bonds in the duplication processin this way there are events of preferential attachment in this algorithm [11,23].
The basic elementary move of the Duplication-Divergence algorithm is illustrated in figure 6. The algorithm is implemented as follows. Duplication-Divergence algorithm: 1. Initiate the network with one node x 0 and apply the following steps iteratively; 2. Duplication: Choose a node υ uniformly and duplicate by creating node υ ; 3. For all bonds w ∼υ incident with υ, add the bonds w ∼υ ; 4. With probability p add the bond υ∼υ ; 5. Divergence: delete one bond of the pair { w ∼υ , w ∼υ } incident with υ or with its duplicated node υ with probability q (for each w adjacent to both υ and υ independently); 6. Stop the algorithm when a network of order N is grown.
The algorithm has two parameters (p, q). The parameter p is the probability that the protein corresponding to the progenitor node υ is self-interacting. If it is (with probability p) then the bond υ∼υ is added to the network and it represents the interaction between υ and υ .
The parameter q controls the model of divergence in this algorithm. As υ and υ diverge from one another, one bond in each pair of bonds incident with υ and υ is lost independently, with probability q. The result is that the network mutates as bonds (interactions) are lost (while they are created by the duplication process).
A slightly modified algorithm is found by changing step 5 in the algorithm to find a modification of the Duplication-Divergence algorithm which assumes that one of the duplicated pair mutates, while the other remains stable.

5.
Divergence: Consider all bonds w ∼υ incident with the duplicated node υ and delete these independently with probability q.
The Duplication-Divergence algorithm tends to grow disconnected networks, while the Modifed Duplication-Divergence algorithm is more likely to grow networks with a single component (that it, connected networks).

Mean field theory for Duplication-Divergence networks
Let k j (n) be the degree of node j after n iterations. The algorithm appends nodes by duplicating them (the probability that a node υ is duplicated in a network of order n is 1 n ), adds bonds by inserting a bond between a node and its duplicate with probability p, and remove bonds by selecting one bond between node-duplicate pairs and other nodes independently and deleting it with probability q. Let 2E n = j k(n) be twice the total number of bonds after n iterations. Then, if k j (n) is the degree of node j at time n, and node j is duplicated, the number of bonds in the network E n increases in the mean field by This follows since k j (n) bonds are created in the duplication move in the mean field, and another bond is created between the j-th node and its duplicate with probability p. The number of deleted bonds in the mean field is q k j (n).
Notice that 2E n = j k j (n) = n a n where a n = k j (n) is the average degree. In the mean field approximation one substitutes k j (n) in the recurrence (38) by its network average a n . Then equation (38) can be casted as a recurrance for a n : (n + 1) a n+1 = n a n + 2p + 2(1 − q) a n .
Let n → t, where t is a continuous time variable, and approximate this recurrence by the differential equation The initial condition is a 1 = 1, and this has solution Since E n 1 2 n a n , it follows that Comparison to equation (7) shows that, if q < 1 2 , γ = 1 + 2q.
In this case E n = O(n 2(1−q) ) + O(n) and that while 2(1 − q) > 1, the term O(n) is a strong correction to the growth in E n for even large values of n. In other words, the degree distribution P (k) of the network will be strongly corrected from the powerlaw distribution in equation (1). If q = 1 2 , then by solving equation (40), a t = 1 + 2p log t (so that a 1 = 1). Since E n = 1 2 n a n , this shows that E n = 1 2 n + pn log n, if q = 1 2 .
In this case γ = 2 by equation (7), but notice the subtle domination of the n log n term. In numerical work this will be very hard to see. The case q > 1 2 is considered by noting that a t This shows that γ ≥ 2 by equation (7). Putting the above together gives with a logarithmic correction if q = 1 2 . Comparing the coefficient in equation (7) with equation (45) gives a refined estimate γ = 1+ 2p 1+2p−2q ≥ 2, provided that 2q < 1+2p. For example, if q = 0.75 then p > 0.25. However, numerical work shows this estimate to be too small, and estimating γ in this regime for this model remains an open question. √ 300 and so qualify as hubs. The largest few of these hubs have degrees {43, 45, 47, 47, 50}. The network on the right is similarly a network generated with p = 1 and q = 0.60. It is more extended but has only one node of degree equal to one. Its order is 300, and it has 5 nodes of degrees {18, 18, 19, 20, 23} which qualify as hubs. Networks generated with the Modified Duplication-Divergence algorithm have a similar appearance, with the exception that more nodes of degree 1 are seen. The arrangement of nodes and bonds in these networks was created using the prefuse force directed lay-out in Cytoscape 3.4.0 [1].
The power law decrease in P (k) in equation (1) is only asymptotic for this algorithm; and there should be corrections in particular for q < 1 2 . From the results above the average connectivity can be computed: Since E n = 1 2 n k j (n) , From these results P (k) can be calculated. Since k n n 1 k P (k) dk, it follows that d dn k n = n P (n). Thus, using this approach gives where the case q > 1 2 is unknown since the dependence of the exponent γ on the parameters (p, q) is not known. Notice the change in behaviour at the critical value q = 1 2 ; this was already observed numerically in reference [27]. In each case 100 networks were grown and the average degree sequence P n (k) computed. The curves on the right are plots of log P n (k)/ log(k + 1) against 1/ log(k + 1) for n ∈ {3125, 6250, 12500, · · · , 200000}, while those on the left are plots of (log P (2k) − log P (k))/ log 2 as a function of log(k + 1)/k. The mean field estimate for the exponent γ is marked at −γ = −1.8 on the left hand axis. The strong correction to scaling evident in these curves makes it difficult to extrapolate to the mean field value for γ.
The modified Duplication-Divergence algorithm has the same recurrence (41), and so the values for γ and relations for k n and P (k) remain unchanged for this algorithm. Notice that this implementation preserves the degree of the selected node, and tends to give a duplicated node with lower degree (while the (unmodified) implementation tends to lower the degrees of both the selected and duplicated nodes). As a result, networks generated with the modified algorithm have, on average, more nodes of degree equal to one (and so appear more tree-like).

Numerical results on Duplication-Divergence networks
In figure 7 two networks grown with the Duplication-Divergence algorithm are shown. Both networks were grown with p = 1 and have order 300. The network on the left was grown with divergence parameter q = 0.4, and that on the right, with the higher mutation rate q = 0.6.
In figure 8 data for networks grown with p = 0.75 and q = 0.4 are shown. The curves on the right were obtained by plotting (log P (k))/ log(k +1) averaged over 100 networks of sizes {3125, 6250, 12500, 25000, 50000, 100000, 200000} against 1/ log(k + 1). The mean field value of γ is denoted by the bullet on the left-hand axis. These data show that convergence to this value is very slow -this indicates strong corrections to scaling arising in equation (42). In each case 100 networks were grown and the average degree sequence P n (k) computed. The curves on the right are plots of log P n (k)/ log(k + 1) against 1/ log(k + 1) for n ∈ {3125, 6250, 12500, · · · , 200000}, while those on the left are plots of (log P (2k) − log P (k))/ log 2 as a function of log(k + 1)/k. Each of these curves can be extrapolated by a quadratic least squares fit to obtained estimates of γ. This gives the estimates γ n for n = 3125 × 2 for = 0, 1, 2, . . . , 6. Extrapolating the γ n to n = ∞ by a least squares fit γ n = γ + A/n gives γ ≈ 7.4.
An alternative approach is to estimate γ by plotting ζ(k) = (log P (2k) − log P (k))/ log 2 as a function of log(k + 1)/k (see equation (4) with α = 2). The results are also strongly curved data (left in figure 8), and while the results are not inconsistent with the mean field value γ ≈ 1.9 in this model, however, it seems difficult to extrapolate these curves to a limiting value of γ.
If q = 0.60 > 1 2 then the results in figure 9 are seen. The curves of ζ(k) = (log P (2k)−log P (k))/ log 2 as a function of log(k +1)/k have straightened considerably, and each can be extrapolated by a quadratic least squares to obtain an estimate γ n for each value of n = 3125 × 2 (for = 0, 1, 2, . . . , 6). This gives estimates {9.68, 8.52, 7.99, 7.95, 7.82, 7.58, 7.05} which can be extrapolated by a least squares fit of γ n = γ + A/ log n, giving the estimate γ ≈ 2.87, which is slightly larger than the value predicted by the mean field formula γ = 1 + 2p 1+2p−2q (see the paragraph following equation (46)). This suggests that the approach to limiting behaviour in this model is quite slow, consistent with the remarks after equation (46) in the previous section.
The average connectivity k n is expected to behave according to equation (47). In table 1 k n is listed for p = 0.75 and q = 0.40, q = 0.50 and q = 0.60. If q = 0.4, then equation (47) suggests that k n 8.5 n 0.2 . Computing k n × n −0.2 from the data in table 1 gives {5.18, 5.45, 5.65, 5.91, 5.96, 6.01, 6.12}. Plotting these results against 1/ log n and then linearly exptrapolating as n → ∞ gives 7.98, close to the value of 8.5 predicted in equation (47). Finally, if q = 0.6 then the data appear to approach a constant. Extrapolating these results using the model A + B/ log(n) gives the estimated limiting value 8.72. By equation (5) this indicates that γ = 2.13, a value which is quite close to 2.15, the value predicted by the formula γ = 1 + 2p 1+2p−2q in the paragraph following equation (46).

Solé evolutionary networks
The Solé model [23,24] modifies Duplication-Divergence model by using duplication and network rewiring as the basic elementary moves. As before, the duplication of nodes is an implementation of gene duplication, and the network rewiring is based on the loss and gain of protein interactions in the bulk of the network [6]. Thus, the algorithm grows networks based on a model of gene duplication and the rewiring of protein interactions; both these processes drive the evolution of the interactome. The elementary move of the algorithm is as follows: A node in the network is chosen uniformly and randomly, and duplicated to form a progenitor-progeny pair. The progeny will have the same interactions as the progenitor. This network is updated in the rewiring step which has two parts: Bonds incident with the progeny protein are deleted with probability δ, and new bonds are added in the network between nodes (excluding the progenitor protein) are created with probability α. This implementation differs in two ways from the Duplication-Divergence algorithm. In the Solé model there are no self-interacting nodes, and the formation of new bonds in the rewiring steps only occurs in the Solé model.
The basic iterative step of the Solé algorithm is shown in figure 10 and a Solé evolutionary network of order N nodes is grown as follows: Solé evolutionary algorithm: 1. Initiate the network with one node x 0 and apply the following steps iteratively; 2. Choose a node υ uniformly and duplicate it to a new node υ ; 5. For all nodes u not adjacent to the chosen node υ, create the bond u∼υ with probability α; 6. Stop the algorithm when a network of order N is grown.
The algorithm has two parameters (δ, α). If δ = 0 and α = 1 then the algorithm grows complete simple networks. More generally, if α > 0 then on average roughly αN bonds are added to a network of order N. This shows that the algorithm grows networks of size O(N 2 ) -that is, Solé networks are rich in bonds.

Mean field theory for Solé networks
Let E n be the total number of bonds in a Solé network after n iterations of the algorithm, and let k n be the connectivity of the network (that is, the average degree of nodes) after n iterations (so that 2 E n = n k n ). In the mean field approximation the node in step 2 of the algorithm has degree k n and this number of bonds is added in step 3, while, in a similar way, δ k n bonds are removed in step 4. In step 5 there are n − k n choices in the mean field for the node u not adjacent to υ and each bond u∼υ is added with probability α. This shows that the number of bonds after n + 1 iterations is given by the recurrance relation Since 2 E n = n k n this becomes which is a mean field recurrance relation for E n . Taking n → t, a continuous time variable, and approximating E n by E t , and approximating the finite difference as a derivative, gives the following differential equation for E n : Solving this equation and letting t → n again gives the approximate mean field solution for E n : Equation (52) shows that the number of bonds is proportional to n 2 , so that networks created by this algorithm are dense, except when α = 0. Comparison to equation (7) suggests that γ ≤ 1 in this model. Notice that there is no logarithmic factor in the denominator, and that E n = Θ(n 2 ). This is consistent with a mean field value γ < 1 (and this requires that P n (k) be modified so that it is a normalisable probability distribution). With these results, it is reasonable to expect that, in the mean field, If α = 0 then equation (52) gives E n ∼ n 2−2δ and comparison to equation (7) gives

Numerical results for Solé networks
Similar to Barabasi-Albert and Duplication-Divergence networks, Solé networks can be grown numerically by implementing the algorithm as given above, using sparse matrix routines to efficiently store the adjacency matrix of the network. The larger size of networks makes these more difficult to grow, and our algorithms sampled efficiently to networks of size 51200 bonds. Solé networks are rich in bonds. This is seen, for example, in equation (52), which shows that E n ∝ n 2 if α > 0. In figure 11 two examples of networks generated by the Solé algorithm are shown. If δ < 0.5, then the networks have a dense appearance dominated by a few hubs. If δ > 0.5, then the networks appear more extended, often with no nodes qualifying as hubs under the definition that the degree of a hub in a network of order n is at least √ n . The networks in figure 11 were generated with α = 0.005, and increasing the value of α quickly increases the number of bonds.
The mean field result that γ ≤ 1 has implications for the scaling of Solé networks. In particular, P N (k) in equation (1) is not normalisable for infinite networks if γ ≤ 1 and so is not a valid candidate degree distribution in this model. The degree distribution can be modified to where D(x) is a function of the combined (or scaled) variable x = n −φ k. That is, as n → ∞, k is rescaled by n −φ and k γ P (k) approaches a limiting distribution proportional to D(x). This can be tested numerically by plotting n γ P (k) as a function of x = n −φ k. For the proper choices of γ and φ it is expected that n γ P (k) C o D(x) for a wide range of values of n (that is, the data should approach a limiting curve as n → ∞). The result is shown in figure 12 for (δ = 0.25, α = 0.005) and (δ = 0.75, α = 0.005). These are plots on the same graph for n = 100 × 2 n for n ∈ {6, 7, 8, 9} (other curves at smaller values of N are left away to give a clearer picture).
The data for δ = 0.75 are the cluster of peaks to the left, rescaled by choosing φ = 1 and γ = 1 2 , while the cluster of peaks to the right is for δ = 0.25 with φ = 1 and γ = 2 3 . With increasing n the data appear to approach a single underlying curve if γ = 1 2 in the one instance, and γ = 2 3 in the other instance. Both these values are consistent with the mean field expectation that γ ≤ 1 in this model. Further refinements in this scaling assumption may be necessary, since the curves are still becoming narrower with increasing n. It is not clear that these approach a limiting curve as n → ∞, although the data for δ = 0.75 suggest this to be the case. In these cases the curves are sharply peaked with a mean of about 0.02 if δ = 0.25 and about 0.007 if δ = 0.75.
Since the curve D(x) is sharply peaked at a constant value c o of the rescaled variable x, the connectivity of Solé networks is estimated by treating D(x) as concentrated at c o and then (assuming that φ = 1 and approximating the connectivity) In other words, the connectivity of Solé networks should increase linearly with n φ (and since φ = 1, linearly with n). In table 2 the connectivities of Solé networks for δ = 0.25 and δ = 0.75 (with α = 0.005) are listed. Non-linear least squares fits to the data show that φ = 1.01 when δ = 0.25 and φ = 0.99 when δ = 0.75. That is, these results are consistent with the value φ = 1 seen above. Figure 13: The iSite evolutionary algorithm: The duplication-deletion iterations of the iSite algorithm. A node together with its iSites is duplicated, and some bonds incident with the duplicated iSites are deleted with probability r . New bonds between a self-interacting iSite and its duplicate are inserted with probability p, and iSites are silenced with probability q.

The iSite model of network evolution
Protein interaction networks evolve by mutations in proteins which change the interactions of the proteins in the network. In the Duplication-Divergence algorithm, a mutated protein loses its interactions randomly. This random deletion of interactions is a good first order approximation to the evolution of networks. The iSite model refines this by giving structure to nodes in the network by introducing iSites on nodes as localities of the interaction sites on a protein [14,15]. Subfunctionalization of interaction sites in the iSite model is implemented by silencing iSites, and adding interactions with reduced probability if the iSite is not silenced. The implementation of the iSite algorithm relies in the first place on duplication of nodes, and then subfunctionalization of iSites on the nodes. The subfunctionalization of iSites is implemented by randomly deleting of bonds incident to duplicated iSites, and by the silencing of iSites by turning them off. These processes are models of random mutations which cause the loss of information in the genome (and leave behind noncoding remnants of genes). A process of spontaneously creating new iSites is not in the iSites algorithm, although this is a possible refinement which may be introduced in the algorithm.
The elementary move of the iSite algorithm is illustrated schematically in figure  13. A uniformly chosen node is duplicated into a progenitor-progeny pair (and so also duplicating the iSites of the progenitor onto the progeny). If the duplicated iSite is selfinteracting, then bonds are added between the iSite on the progenitor and the duplicated iSite on the progeny with probability p -this allows for subfunctionalization of the duplicate iSites. Bonds incident with the iSites on the progenitor are duplicated with reduced probability r , and iSites on the progenitor or progeny nodes are silenced with probability q. If an iSite is silenced, then all bonds incident with it are deleted. Notice that subfunctionalization enters in several ways, both in the duplication of self-interacting iSites, in the duplication of bonds, and in the silencing of iSites. iSite evolutionary algorithm: 1. Initiate the network with one node x 0 with I active iSites (each of which is selfinteracting with probability p) and iterate the following steps; 2. Choose a progenitor protein υ uniformly in the network and duplicate it, and its associated iSites A, to a successor protein υ with duplicated iSites A ; (a) A duplicated iSite A ∈ υ is active with probability q if it is duplicated from an active iSite on A ∈ υ, and silenced otherwise; (b) An active duplicated iSite A ∈ υ is self-interacting with probability p if it is duplicated from a self-interacting iSite on A ∈ υ, and not self-interacting otherwise; (c) If a silenced iSite A is duplicated to iSite A , then A is also silenced; 3. Add bonds as follows: (a) If iSite A ∈ υ is self-interacting and A is duplicated to iSite A ∈ υ , then add the bond A∼A if A is not silenced; (b) If A∼B is a bond incident with iSite A on the progenitor υ, and A is duplicated to iSite A on the duplicate υ , then A∼B is duplicated to A ∼B with probability 1 − r provided that A is not silenced; 4. Iterate the algorithm from step (2) and stop the iterations when a network of order N is grown.

Mean field theory for the iSite model
Let nodes in the network correspond to proteins, and let i j (n) be the number of active iSites on node j after n iterations of the algorithm. Denote the degree of node j by k j (n) (that is the total number of bonds with one end-point in node j), and let E n be the number of bonds of the network (this is the size of the network). Then 2E n = j k j (n). The average number of active iSites per node is i (n) = 1 n j i j (n). With each iteration i (n) iSites are created, of which q i (n) are silenced, in the mean field. This gives the following recurrance relation for i (n): (n + 1) i (n + 1) = n i (n) + (1 − q) i (n).
The exact solution of this recurrance is where Γ is the gamma function with the property that Γ(x + 1) = x Γ(x) and Γ(1) = 1. Notice that i (0) = I, where I is the number of iSites on the source node x 0 . For large n the Γ-function and the factorial have well known asymptotics (namely the Stirling approximation [29]), so that This shows that with increasing n the total number of iSites grows proportionally to n 1−q . If q = 0, then this is linear in n since no iSites become silenced, and if q = 1, then the number approaches a constant. The total number of bonds in the network increases after n iterations by the recurrance since there are on average 2 n E n bonds incident to each node, and the probability that each one of them is duplicated is 1 − r , and there are on average i (n) iSites per node, and the probability that each of these is self-interacting is p.
Using the asymptotic solution for i (n) and approximating this recurrence by a differential equation gives This equation can be solved, and using the initial condition E 1 = 0, the result is Thus, the average degree of a node is equal to 2 n E n , so that the connectivity of iSite evolutionary networks is given by in the mean field. This shows that the large n value of k n is dominated by the larger of −q and 1 − 2r . In particular, if r < 1 2 (1 + q), then k n ∼ n 1−2r . By equation (7) one may determine γ for this model: If 2r = 1 + q, then a different solution is obtained, namely This shows that γ = 2 + q in this case as well, but there is also a logarithmic correction to the growth of E(t), and so there is a logarithmic factor in the expression for k n .

Modified iSite evolutionary algorithm
The subfunctionalization of proteins can be refined by introducing in the iSite algorithm the probability of creating new iSites on the progeny node with a probability s. This changes the algorithm as follows.
Modified iSite evolutionary algorithm: Implement the algorithm as above but introduce the parameter s and create new active iSites by replacing step 2 in the iSite evolutionary algorithm by 2. Choose a progenitor node υ uniformly in the network and duplicate it, and its associated iSites A, to a progeny node υ with duplicated iSites A ; (a) A duplicated iSite A ∈ υ is active with probability q if it is duplicated from an active iSite on A ∈ υ, and silenced otherwise; (b) An active duplicated iSite A ∈ υ is self-interacting with probability p if it is duplicated from a self-interacting iSite on A ∈ υ, and not self-interacting otherwise; (c) If a silenced iSite A is duplicated to iSite A , then A is also silenced; (d) With probability s create an active iSite C on the progeny node υ , where C is self-interacting with probability p.
The recurrance for the average number of active iSites per node i (n) (see equation (58)) is modified to in the Modified iSite evolutionary algorithm. The exact solution is obtained by replacing q by q − s in equation (58), and the asymptotic approximation of the solution is given by as seen in equation (59). The total number of bonds in the network, E n , still satisfies equation (60), and so it follows from equations (62), (63) and (64), that for the modified iSite evolutionary algorithm (notice the condition that q < r + s):  This shows that the connectivity of Modified iSite networks is given by The value of the scaling exponent is seen from above to be given by with a correction factor in the expression for k n if 2r = (1 + q − s).

Numerical results for iSite networks
The iSite algorithm was coded and networks were grown to compute averaged statistics. Examples of iSite networks generated by the algorithm are shown in figure 14. The algorithm was then used to sample networks of size up to 200, 000. The connectivity k n of iSite networks for I = 3 iSites per node, and with p = 0.5, q = 0.4 and r = 0.3, is shown in table 3. By equation (5), log k log γ−1 2−γ +(2−γ) log n. Least squares fit to the data in Column 2 gives log γ−1 2−γ ≈ 1.0211, and (2 − γ) = 0.258. Solving for γ gives in the first instance γ = 1.735 and in the second γ = 1.742. Since 2r < 1 + q in this case, the mean field value of γ is γ = 1 + 2r = 1.6, close to these estimated values.
Data for I = 5 and with the same values of (p, q, r ) = (0.5, 0.4, 0.3) are shown in table 3 as well. Changing the value of I (the number of iSites per node) should not change the value of γ, and this appears to be the case here. A least squares fit to the data in Column 3 and determining γ as above gives γ = 1.737 and γ = 0.7498, very close to the values above.
If p = 0.5, q = 0.05 and r = 0.8, then 2r > 1 + q, and in this case γ = 2 + q. If the number of iSites per node is I = 3, then the data in table 3 gives a constant value for k , and for I = 5 a slightly decreasing numerical estimate. The mean field value of γ in these cases is 2.05, and a least squares fit gives γ ≈ 2.009 if I = 3 and γ ≈ 2.022 if I = 5 (where the coefficient of log n in the least squares fit is 2 − γ). These results Figure 15: iSite evolutionary networks with I = 3, p = 0.5, q = 0.4 and r = 0.3: Data on networks generated by the iSite evolutionary algorithm. In each case 500 networks were grown and the average degree sequence P n (k) computed. The curves are plots of log P n (k)/ log(k + 1) against 1/ log(k + 1) for n ∈ {3125, 6250, 12500, · · · , 200000}. As k → ∞, then the curves are expected to pass through −γ on the y -axis, and its mean field value is γ = 1 + 2r = 1.6 -this value is marked on the y -axis.
are consistent with the mean field results obtained above, since it shows that the value of γ is close to 2 + q.

Conclusions
In this paper a number of algorithms used for generating networks in molecular biology were examined. Mean field theory for the algorithms was in some cases reviewed, and in other cases newly presented, and also refined. The algorithms include the Barabasi-Albert [2], Duplication-Divergence [26], Solé [24] and iSite algorithms [14,15], and these were in some cases modified by the introduction of more general elementary moves.
The efficient implementation of these algorithms was also examined, and sparse matrix routines (or, more general, hash-coding; see for example reference [21]) were used to optimize the implementation. This gives computer algorithms which can generate very large networks efficiently, and networks of order 200, 000 nodes were routinely sampled. We also explored even larger networks, up to order 3 million, but did not use those in our data analysis.
The adjacency matrix of a network of size E bonds can be stored (using sparse matrix routines) in an array of size O(E). This means that the implementation of these network growth algorithms has average case space complexity O(E).
Hash coding allows for the efficient implementation of routines which search, insert or delete entries in arrays storing the networks. These routines have average time   table is densely populated). Generally, the time complexity of algorithms should grow as O(E τ ) if networks of size E are grown (where τ is an exponent dependent on the particular algorithm). For example, networks of size E bonds can be generated using O(E) computer memory, and the Duplication-Divergence and iSite algorithms can be implemented with O(n τ ) time complexity to grow networks of order n nodes (and where n ≤ E). An examination of these algorithms (the Duplication-Divergence and iSite algorithms) suggests that an optimal implementation will have τ ≈ 1 (if the size of the hash tables is much larger than n).
The Barabasi-Albert and Solé algorithms (with their modified and variant implementations) should have average time complexity of O(n 2 ) for growing networks of order n nodes. This follows because each iteration of the algorithms has to explore all nodes in the current network for the possible insertion of new bonds.
Data on the time complexity of the algorithms are shown in table 4. The data displayed are the average time T to grow one network of order n. Assuming that T = C 0 n τ and fitting log T to log n, least squares estimates of τ can be obtained. For example, it is expected that τ = 2 for the Barabasi-Albert algorithm, while the estimate obtained in the table is τ ≈ 1.97. This is consistent with the expectation that the time complexity of the algorithm is O(n 2 ) in an optimal implementation. This is similarly seen for the modified and variant implementation of the Barabasi-Albert algorithm, and for the Solé algorithm.
The time complexity of the remaining algorithms is O(n), and this is found consistently, except for the Duplication-Divergence algorithm for q = 1 and q = 0.4 (and also for the modified implementation of this algorithm). In these cases the algorithm samples denser networks (see figure 7) which takes up larger amounts of memory, making the implementation less efficient.
The results in this paper raise some questions about the sampling of scale-free networks by random iterative growth algorithms: Figure 16: Self-averaging of the connectivity of Barabasi-Albert networks: The connectivity of a single network grown with the Barabasi-Albert algorithm with p = 0.6 as a function of the size of the network is given by the noisy red curve as the network is grown to order n = 10000. The blue curve is the average connectivity of Barabasi-Albert networks, plotted as a function of n. Notice that the red data appear to converge, with increasing n to the average, so that the connectivity of a randomly grown Barabasi-Albert network appears to converge to its average.
• In some cases, see for example reference [27], the parameters of the algorithms were set to grow networks with properties similar to that of real protein interaction networks. The values of the parameters are then used to estimate the rate of subfunctionalization (or mutation) in the genome. The results are dependent on the algorithm, and so further refinement of algorithms may be needed before useful estimates can be made.
• The mean field approaches are useful in some models (for example the Barabasi-Albert algorithm, and the iSite algorithm), but are poorer approximations in other models (the variant Barabasi-Albert algorithm, the Duplication-Divergence algorithm and its modification, and the Solé algorithm). Can the mean field approach be improved to give a better approximation to these algorithms?
• Investigation of some numerical properties of the networks (for example the connectivity) suggests that the algorithms may be self-averaging. That is, networks are generated with properties which converge to the statistical averages of these properties over a sample of networks generated by the algorithm. This is, for example, illustrated in figure 16 for the connectivity of Barabasi-Albert networks. As the network is grown, its connectivity appears to approach the average connectivity over a large sample of networks.
• In this paper some algorithms were modified in ways not done before in the literature (this includes the modified Barabasi-Albert, the Duplication-Divergence, the Solé and iSite models). Exploring the properties of these modified algorithms, including their usefulness as models of networks in molecular biology, will be the subject of future investigation.
Lastly, these algorithms grow networks using a probabilistic set of rules to implement an elementary move. Each realised network N n of order n is obtained with some probability p(N n ), so that the function p(N n ) is a probability distribution over networks of order n. Determining p(N n ) for any of the algorithms presented here seems difficult, and general properties of p(N n ) remain unknown (other than averages of network properties over p(N n ) are scale-free if the algorithm grows scale-free networks).