Dense Percolation in Large-Scale Mean-Field Random Networks Is Provably “Explosive”

Recent reports suggest that evolving large-scale networks exhibit “explosive percolation”: a large fraction of nodes suddenly becomes connected when sufficiently many links have formed in a network. This phase transition has been shown to be continuous (second-order) for most random network formation processes, including classical mean-field random networks and their modifications. We study a related yet different phenomenon referred to as dense percolation, which occurs when a network is already connected, but a large group of nodes must be dense enough, i.e., have at least a certain minimum required percentage of possible links, to form a “highly connected” cluster. Such clusters have been considered in various contexts, including the recently introduced network modularity principle in biological networks. We prove that, contrary to the traditionally defined percolation transition, dense percolation transition is discontinuous (first-order) under the classical mean-field network formation process (with no modifications); therefore, there is not only quantitative, but also qualitative difference between regular and dense percolation transitions. Moreover, the size of the largest dense (highly connected) cluster in a mean-field random network is explicitly characterized by rigorously proven tight asymptotic bounds, which turn out to naturally extend the previously derived formula for the size of the largest clique (a cluster with all possible links) in such a network. We also briefly discuss possible implications of the obtained mathematical results on studying first-order phase transitions in real-world linked systems.


Introduction
In recent years, ''explosive percolation'' in large-scale random networks has received substantial attention. This phenomenon essentially means that as new links are randomly added step-bystep to a network with n nodes (assuming a very large n) and the number (or percentage) of links reaches a certain critical threshold (the phase transition point), the order of magnitude of the largest connected component changes abruptly from logarithmic in n to linear in n, which is often referred to as the emergence of a giant connected component.
Achlioptas et al. [1] recently reported an interesting computational study suggesting that although the percolation transition in the classical Erdös-Rényi (mean-field) random graph model (e.g., links being added to the network independently and with the same probability) is continuous, this transition can be made discontinuous (or, ''explosive'') by applying modified network formation rules referred to as Achlioptas processes. Several related studies also addressed explosive percolation in various models [2][3][4][5][6]. However, other studies showed that explosive percolation is still continuous in most settings [7][8][9]. Recently, Riordan and Warnke [10,11] proved that explosive percolation is continuous for all Achlioptas processes. As follows from these recent studies, the current state of knowledge on the classical percolation transition phenomena is that these phase transitions are continuous (secondorder) under most network formation rules, but they can be discontinuous (first-order) if a network formation process is substantially different from the Erdös-Rényi model.
However, if one looks at the percolation transition phenomenon from the perspective of system robustness, percolation by itself does not necessarily ensure that a connected network, which has just undergone a percolation transition, is stable and resilient with respect to natural or man-made (possibly adversarial) impacts, which may disrupt multiple links and violate the overall integrity of the underlying system. In other words, the connectivity of the whole network or a large fraction of nodes (giant connected component) may not be sufficient for maintaining the ''stability'' of the linked system. For instance, intuitively, consider a group of particles linked by bonds: if the relative number of bonds is too small, this connected group of nodes would not form a stable structure (i.e., a crystal), since the destruction of a just a few bonds may disconnect the system. Note that a linear in n number of links (n{1 being the exact minimum) is sufficient to fully connect a system of n nodes. On the contrary, it is much harder to

Dense Clusters in Networks
To facilitate the description of the main results, we first introduce basic definitions and briefly discuss the relationship of dense connected clusters and ''regular'' connected components in networks. A dense connected component (cluster) in a network is defined in terms of its edge density parameter r, which is equal to the ratio of the actual number of links (edges) to the maximum possible number of links within the cluster (in other words, the relative percentage of links within the cluster -see Fig. 1 for illustration).
In the basic graph-theoretic model, the minimum required edge density for a cluster to be dense enough (''stable'') does not depend on the size of the considered cluster and is set to a fixed parameter c[(0,1, which may be chosen depending on application-specific considerations (that is, a cluster is considered dense enough, if r §c). Note that although the minimum required edge density c is fixed, the number of links required to form a dense cluster on q nodes does depend on the size of the cluster q and is equal to c½q(q{1)=2. The concept of dense clusters was also recently addressed in the context of network modularity principle described by Hartwell et al. [12] and further analyzed by Spirin and Mirny [13], who considered ''highly connected subgraphs (clusters)'' (or, ''modules'') defined using the aforementioned edge density parameter, which represented meaningful functional units in molecular networks. The significance of dense clusters in other research areas will be addressed in the next section.
In graph theory, dense (''highly connected'') clusters described above are referred to as c-quasi-cliques [14,15], with the limiting case of c~1 corresponding to the well-known clique structure, where there is a link between every two nodes. Clearly, cliques are very cohesive and resilient to node and link failures; however, large-scale real-world systems rarely form cliques, as it is often unrealistic to connect every two nodes by a link. Therefore, quasicliques provide a reasonable tradeoff between regular connected components, which do not necessarily reflect network modularity and may be ''not robust enough'' (as indicated above), and cliques, which are ''too robust''. The assumption that the required threshold value of c for a cluster with q nodes to form a quasiclique (''stable''/''highly connected'' cluster) is constant and does not depend on q may not always be fully realistic; however, it does make theoretical and practical sense, since if one considers the large-scale (asymptotic) case with q??, the function c(q) should not vanish as q increases, since otherwise this would represent an arbitrarily low edge density of a cluster, which would effectively ''downgrade'' a dense connected cluster to a regular connected component.
In related work, the concept of a k-core (a connected subgraph where each node has a degree of at least k, that is, at least k neighbors) has been analyzed in mean-field random networks [16][17][18], and D'Souza [19] discussed the notion of ''dense k-cores'' as Assuming that the minimum required percentage of links for a group of nodes to form a dense connected (''highly connected'') cluster is c = 70% (this value is chosen arbitrarily for illustrative purposes only), the largest-size dense cluster is 4, and all other dense clusters have only 2 nodes each. (B) The network from (A) after 10 more links have formed (for instance, this may be the result of increasing the value of p if one assumes G(n,p) model). Dense clusters with the largest number of nodes are highlighted, with their size still significantly smaller than the size of the whole network. It turns out that if n is very large, further increase of p in G(n,p) will only produce dense clusters scaling as log (n) (no clusters scaling linearly with n) until p reaches the critical point p Ã~c , after which the whole network abruptly becomes a dense cluster. doi:10.1371/journal.pone.0051883.g001 clusters that may possess extra robustness properties in addition to connectivity. However, if k is fixed and n?? there exist k-cores with edge density arbitrarily close to zero. Therefore, a more meaningful type of a ''robust cluster'', which preserves the required edge density in addition to the minimum required degree of each node, is a slightly modified version of a k-core, where k is defined in terms of the percentage of all possible neighbors within a cluster, i.e., k~c(q{1) [20][21][22], that is, the number of neighbors for each node within the cluster depends on the size of this cluster. It is easy to observe that any c(q{1)-core is also a c-quasiclique, but the reverse is not always true. Therefore, c(q{1)-cores represent a slightly more restrictive type of ''highly connected'' clusters than c-quasi-cliques. In addition, c(q{1)-cores possess certain guaranteed robustness properties if cw1=2: as it straightforwardly follows from [23], a c(q{1)-core of size q with cw1=2 has a diameter of at most 2 (that is, any two nodes are connected through at most one intermediary), and it would stay connected after the removal of up to (2c{1)(q{1) links.
Under the aforementioned assumption of the fixed required value of c, we posed a natural question: how does the size of the largest dense connected cluster (c-quasi-clique or c(q{1)-core) grow as more and more links are added to an evolving large-scale network? In particular, can one rigorously prove the emergence of a ''giant dense connected component'' and investigate the order of this phase transition (i.e., whether it is continuous or discontinuous)? It turns out that in the context of the classical G(n,p) (Erdös-Rényi) model, these questions can be answered unambiguously.

First-Order Dense Percolation Transition in Erdö s-Rényi Networks
Formally, an Erdös-Rényi random graph G(n,p) contains n nodes, and each pair of nodes is connected by a link independently with probability p. The process of evolution of such a network can be represented simply by a gradual increase of the parameter p, with larger values of p representing more links in the network.
Although this is a somewhat idealized classical model, the issues of its appropriateness in certain contexts will be addressed in the next section.
One may intuitively assume that the size of the largest dense connected component in G(n,p) would first grow logarithmically (while p is much smaller than c) and then linearly right after some critical point p c vc, and it would continue to grow until it eventually reaches n, similarly to the asymptotic behavior of the giant connected component in an Erdös-Rényi network; however, our rigorous mathematical arguments show that this is not the case. As our results show, for n??, the largest dense connected cluster scales as log (n) as long as pvc (even if p is very close to c), whereas when p reaches c, the whole network abruptly becomes a dense connected cluster. Essentially, for very large values of n, the jump from the logarithmic order of magnitude of the largest dense cluster to the largest dense cluster of size n occurs at one point p Ã~c , with no ''linear growth'' phase in between! Therefore, for any fixed c, there is no emerging giant dense connected component as p gradually increases, until a discontinuous jump at the point p Ã produces a dense connected component representing the entire network. In other words, the size of the largest dense connected component in a large-scale Erdös-Rényi random network exhibits a first-order phase transition; moreover, the existence of this phase transition has been proven by fully rigorous analytical arguments.
The complete detailed proofs are presented in the Materials and Methods section, and here we briefly summarize the obtained formal results. The following proven facts characterize the size of the largest dense connected component (c-quasi-clique) in G(n,p) for sufficiently large n.
If M c n is the size of the maximum c-quasi-clique in G(n,p) for some fixed c[(0,1), then for any pvc it holds almost surely (a.s.) that In this context, a property Q n is said to hold almost surely (a.s.), if with probability 1 there exists n 0 such that Q n holds for all n §n 0 .
Formula (1) is also valid for the size of the largest c(q{1)-core (in the context of the above description) in G(n,p). Moreover, with high probability (w.h.p.) where a property Q n is said to be observed with high probability (w.h.p.) if the probability of Q n being observed converges to 1 as n??.
Expression (1) entails that the size of the largest c-quasi-clique does scale logarithmically with n for any pvc. Note that it provides not only the order of magnitude, but also asymptotically precise upper and lower bounds on the size of the largest c-quasiclique. An interesting observation here is the fact that the upper bound converges to the lower bound as c?1, whereby (1) becomes the classical asymptotic estimate for the maximum clique size in Table 1. Maximum c-quasi-clique sizes in G(n,p) for n~100, and c~0:85,0:90. c~0:85 c~0:90  G(n,p) described by Bollobás and Erdös [24] and Grimmett and McDiarmid [25].
Formulas (2a) and (2b) formally express the existence of the firstorder (discontinuous) jump of the size of the largest dense connected component (M c n ) compared to the size of the whole network (n): if pvc and p approaches c from below (that is, the edge density of the whole network is just below the required edge density c), the size of the largest cluster that does have the required density c is still negligible compared to the size of the whole network. For pwc and approaching c from above, the whole network forms the dense cluster, and as mentioned above, there is no ''gradual'' (continuous) change in terms of the size of the largest dense cluster.
As a final remark, the fact that the location of the dense percolation transition is in the point p Ã~c is not surprising and can be intuitively understood, since one would expect that the size of the largest dense connected cluster in G(n,p) would be equal to n if pwc and less than n if pvc. However, the fact that this phase transition is first-order in the asymptotic case is not intuitively predictable. This fact has been established via advanced analytical arguments that will be presented below.

Discussion
In this section, we briefly discuss potential implications of the obtained results on studying network clusters and phase transitions in related research areas. Dense connected clusters (c-quasicliques, c(q{1)-cores and similar models) have been recently utilized to study large-scale networked systems in a variety of disciplines, including biological networks [12,13,21,[26][27][28][29], social networks [30][31][32][33], telecommunication networks [14,15], and financial networks [34][35][36][37]. On the other hand, the mechanisms of phase transitions in large-scale physical systems are not completely understood, with a number of open questions remaining. Although mean-field random networks have certain limitations in modeling real-world systems, G(n,p)-based network models have been employed in the literature to study various physical processes, including NMR sequential assignment [38], ) in random graphs G(n,p) for c~0:5 (this particular value is chosen simply for illustrative purposes: the obtained results hold for any fixed c[(0,1)) and n~500, 1,000, 5,000, 10,000, 20,000. Due to the fact that finding the maximum quasi-clique in a graph is a computationally challenging NP-hard problem (as opposed to finding the largest ''regular'' connected component, which can be done in polynomial time), the numerical simulations were carried out using GRASP heuristic algorithm [14] and plotting the largest relative size found after multiple runs of the algorithm for each p. The growth increment of p was chosen at Dp~0:001 in the region where p approaches c from below (more details on computational experiments are given in Materials and Methods). (B) Theoretical behavior of the relative size of the maximum c-quasi-clique in G(n,p) as n??, which is simply a step function, as indicated by formulas (2a)-(2b). doi:10.1371/journal.pone.0051883.g002 Potts glass formation [39], and relation of G(n,p) to quantum theory [40].
In general, according to our results, natural systems whose evolution can (to some extent) be described by a mean-field random network model would exhibit a first-order dense percolation transition, if the size n of the system is very large (e.g., a mole of liquid contains n~6:02|10 23 particles, which for most purposes can be considered n??). For instance, if one assumes that the process of bond formation in a large-scale system of particles has a purely probabilistic nature (i.e., indistinguishable particles moving randomly with respect to each other, allowing one to make a simplifying assumption that the formation of a link/ bond between any two particles occurs with the same probability), then a ''highly connected'' cluster spanning the whole system (with the required edge density c, which may be set depending on characteristics of a particular system) would emerge abruptly when the probability of link formation p reaches c. However, it should be noted that many real-life networks are not dense; therefore, examples of dense percolation transition phenomena with c close to 1 cannot be easily identified in nature.
In addition, formula (1) has important implications in computational studies of the maximum c-quasi-clique problem. As it turns out, identifying the largest c-quasi-clique in a given network is an extremely computationally challenging task. The recent work [41] shows that this problem is NP-hard, which means that finding its exact solution for a network of size n would require an exponential in n number of operations. The currently available exact methods allow one to explicitly compute the maximum cquasi-clique size only in small sparse graphs (n,10 2 ) [41]. Therefore, the performance of any inexact (approximate/heuristic) methods cannot be evaluated for large-scale networks. Thus, the theoretical bounds for the maximum c-quasi-clique size in largescale mean-field networks are of particular interest in the context of evaluating the performance of new computational algorithms that may be developed for this problem in future.
Overall, the results of this study provide a starting point for rigorous mathematical justifications for the existence of first-order phase transitions in large-scale linked systems. Despite the limitations of classical mean-field random networks, the process of network evolution under this model may capture certain aspects of the evolution of natural systems undergoing phase transitions. Moreover, due to the fact that dense percolation transition is already discontinuous without any changes to the mean-field model (as opposed to ''regular'' percolation that can be made discontinuous only after substantial modifications to the meanfield model), one can hypothesize that dense percolation may in fact be discontinuous for other network formation rules. Therefore, dense percolation transition can potentially be further analyzed in the context of other random network formation processes reflecting the behavior of physical, biological, and social systems.

Materials and Methods
In this section we present graph-theoretic definitions and rigorous proofs of the aforementioned results. Since the nature of this study is theoretical rather than experimental, the main emphasis is put on mathematical proofs, rather than on data collection description.
Our formal arguments establish the existence of the first-order dense percolation transition in random graphs in the asymptotic (n??) sense, i.e., for very large graphs. Note, however, that the mathematical techniques that support this result cannot be employed to draw conclusions on the situation with smaller networks. In view of that, we complement our theoretical findings by computational studies that illustrate the behavior of maximum c-quasi-cliques in small-to moderately-sized graphs. Note also that the computational cost of conducting such experiments quickly becomes prohibitive as the size of the network increases: this is a direct consequence of the aforementioned fact that the maximum c-quasi-clique problem is NP-hard [41].
Due to the fact that the paper mainly considers the mean-field (Erdös-Rényi) random graph model, the networks utilized in numerical experiments were randomly generated using a straightforward procedure: for each value of p, the link between each pair of nodes was generated randomly and independently with probability p. Note that the case of c~0 would be trivial, as any graph is a 0quasi-clique. A complete graph is a 1-quasi-clique, hence c-quasiclique represents a density-based relaxation of the clique, as compared to degreeand diameter-based clique relaxations such as k-plex and k-club [42][43][44][45][46][47][48][49][50]. A c-quasi-clique with the largest number of vertices is called the maximum c-quasi-clique.

Graph-Theoretic Notations, Definitions, Related Work
In this work, we investigate the asymptotic behavior of c-quasicliques in large-scale random graphs. In particular, we employ the well-known G(n,p) model of random graphs, originated by Erdös [51], which denotes a graph on n vertices, such that an edge between any two vertices exists with a probability p[(0,1, independently from other edges. Since the G(n,p) model yields graph instances with a rather ''uniform'' structure, as opposed to, for instance, power-law graphs, it is often called a uniform random graph model [52].
Random graphs and related structures, such maximum cliques in random graphs, have been studied intensively in last decades [24,53,54]. One of the earliest works on the asymptotic behavior of maximum clique in uniform random graphs is due to [55], who showed that the maximum clique size has a strong peak around 2 ln n= ln (1=p). Grimmett and McDiarmid [25] proved that as n?? the maximum clique size in a uniform random graph G(n,p) is equal to 2 ln n= ln (1=p)zO( ln ln n) with probability one.

Main Theorems and Remarks
First, we prove a generalization of the result of Grimmett and McDiarmid [25] for the case of maximum c-quasi-clique size. Furthermore, we demonstrate that the size of maximum c-quasiclique in G(n,p) undergoes a phase transition when the value of p is varied in the vicinity of the (fixed) value c[(0,1), manifested in a sudden and drastic change of size of the maximum c-quasi-clique in G(n,p) relative to the size of the graph itself. Specifically, in the next subsection we present the proofs of the following two theorems.
Theorem 1 If 0vpvcƒ1, then the size M c n of the maximum cquasi-clique in a uniform random graph G(n,p) satisfies Second, in addition to the computational experiments on relatively large graphs, where large c-quasi-cliques were found using heuristic algorithms (due to NP-hardness of the maximum cquasi-clique problem for any fixed c[(0,1, as stated above), we also conducted experiments on smaller graphs using a linear mixed integer programming formulation for the maximum quasi-clique problem, which allowed us to identify exact maximum quasi-cliques in graphs with up to 100 vertices. Interestingly, it turned out that the obtained asymptotic bounds were rather accurate for the considered small-scale uniform random graph instances. The details of these computational experiments are presented further in this section.
Remark 1 Note that while in the context of this work we are interested in the largest dense connected component, the above definition of c-quasi-clique does not require connectivity. This allows for significant simplifications in the arguments that are presented below; however, the obtained results are still valid for the largest connected c-quasi-clique due to the following observations: 1. It can be easily shown that if pwc, then the whole graph G(n,p) is w.h.p. a c-quasi-clique. Since under the model assumptions c[(0,1 is a parameter that does not depend on the size of the graph n, then if pwc, the whole graph G(n,p) is automatically connected w.h.p., as a direct application of classical results by Erdös and Rényi [53]; therefore, the connectivity requirement for the largest c-quasi-clique is w.h.p. satisfied in this case. 2. If the largest c-quasi-clique does not coincide with the whole graph G(n,p) (this would correspond to the case pvc), asymptotically precise upper and lower bounds (1) that will be proven for the size of the largest c-quasi-clique (in the context of Definition 1) are automatically valid for the largest connected c-quasi-clique. This is due to the simple observation that the size of the largest connected c-quasi-clique does not exceed the size of the largest (not necessarily connected) cquasi-clique, and it is at least as large as the size of the maximum clique.

Remark 2
The phase transition, a phenomenon of a drastic change in some property of a random structure over a small change in the structure's parameters, is well known in the literature. With respect to random graphs, the limiting probability of a graph's property changing from 0 to 1 or vice versa is well known for monotone and first order graph properties [56]. A property Q is monotone increasing (respectively, decreasing) if from A(B (resp., B(A) and A[Q it follows that B[Q. The first order graph properties are ones that can be finitely described in a first order language, i.e., language consisting of variables that represent graph vertices, equality (~) and adjacency (,) relations, Boolean symbols _,^, :, and the universal and existential quantifications V, A. Note that first order properties are not necessarily monotone and vice versa; for instance, the increasing property ''graph is connected'' cannot be expressed in first order language [57]. Then, limiting relations similar to (23) that concern random graphs with first order properties Q are known as zero-one laws [56,57]: where the probability is monotone if Q is monotone.
In this context, it is worth noting that the property that ''graph is a c-quasi-clique'' in neither monotone, nor first order property, hence the phase transition in the relative size of c-quasi-clique in uniform random graphs (23) may not be obtained directly from the general zero-one laws relations. . In this work, we adhere to the G(n,p) definition given above. All the presented results consider p[(0,1) explicitly depending on c rather than on n (that is, for sufficiently large n, the condition p& ln n=n is always true), so the whole graph is with high probability (w.h.p.) connected according to the classical theory [53]. Although the largest connected component already coincides with the whole graph, the largest dense connected component may still be small compared to the size of the whole graph, which turns out to be the case for any pvc.
Remark 4 The obtained asymptotic bounds and first-order phase transition results are also valid for the size of the maximum c(q{1)-core (k-core with k~c(q{1) mentioned above), or, more generally, for the size of the largest subgraph that is required to have a certain minimum degree lw0 as a percentage of its size, in addition to the minimum edge density c. These subgraphs (clusters) are referred to as (l,c)-quasi-cliques [20]. Formally, a subgraph of G~(V ,E) induced by the set of vertices V '5V is a (l,c)-quasi-clique (1 §c §l §0) if and only if the following two conditions hold: where E'~E\(V '|V ') and n be the size of the maximum (l,c)-quasi-clique in random graph G(n,p). Observe that M 1 n ƒM (l,c) n ƒM c n , i.e., the maximum size of (l,c)-quasi-clique is always not greater than the maximum size of c-quasi-clique and bounded by the maximum clique size. It follows from the fact that any clique is a (l,c)-quasiclique, and any (l,c)-quasi-clique is a c-quasi-clique.

Rigorous Proofs of Main Results
Define Obviously, the unconditional probabilities PfI c j~1 g that any subgraph of size k is a c-quasi-clique are equal, whence the expected number of c-quasi-cliques of size k in G(n,p) is given by where Bin(k; n,p) is the c.d.f. of the binomial distribution. As it will be seen, the integer k~k c n such that for large values of n plays a central role in the sequel. The next proposition takes a first step in evaluating k c n . Proposition 1 If pvc, the integer k~k c n that satisfies E½N c k c n ~1 increases with n in such a way that k c n~o (n), n&1. Proof. From expression (8) it is evident that k~k c n cannot be bounded for large values of n, since in that case the right-hand side of (8) would be equal asymptotically to O(n k ). To verify that k c n~o (n), we construct an upper bound on the right hand side of equation (8). Using Stirling's approximation, the binomial coefficient in (8) can be bounded as To bound the summation term in (8), we use Chernoff's bound for the tail of the binomial distribution [58]: where m §np. In our case n~k 2 ,m~qc k 2 r (for simplicity, we use m~c k 2 ), and since pvc, then m §np; thus, the requirement on m is valid. Thus, Combining the upper bounds in (10) and (11), we have that if k~k c n satisfies E½N c k ~1, whereby the following must hold for large enough values of n: Taking logarithm of the right hand side of the above inequality and dividing by k 2 , we obtain where the constant c has the form c~1 {p 1{c 1{c p c c . It is easy to see from the inequality for arithmetic and geometric means that c[(0,1) for 0vpvcv1. Then, if k grows with n such that k ln n~o (1), the above expression becomes negative for sufficiently large n, thereby contradicting the constructed upper bound (12). This implies that k c n~o (n), which proves the proposition. Proposition 2 If pvc, the integer k~k c n that satisfies the equality E½N c k c n ~1, is given by In establishing Proposition 2 we rely on the following result due to [59].
Theorem 3 (McKay [59]). Let p[(0,1) be fixed, and pnƒkƒn for some n §1. Define x~k {pn s , where s~ffi where 0ƒ(n,k,p)ƒ min f ffiffiffiffiffiffiffi ffi p=8 p , x {1 g, and W(x) and w(x) are the cumulative and probability density functions of the standard normal distribution, respectively. Proof of Proposition \reftheorem-1. Using the notations of Theorem 1, let where we note that k&qcnrwpv for large enough n, then the last term in (14) satisfies exp f (n,k,p)=sg~exp fO(n {1 )g~1zO(n {1 ), n??: From the fact that x increases with n (cf. Proposition 1), it follows that where the well-known expansion was used. Invoking Stirling's expansion for C(z), Thus, finally, the tail of the binomial distribution in (8) can be estimated as.
where n~k 2 . Consequently, equation (8) can asymptotically be written as Taking the logarithm of both sides of the last equality, we obtain : To obtain the main term of the asymptotical approximation of the solution of the last equation, let us restate it in the form. : In view of the fact that k~o(n) due to Proposition 1, the above expression can be further rewritten as To determine the order of the term x(n), we restate the last equation as. Next, we demonstrate that the number k c n (given by Equation (13)), which solves the equation E½N c k ~1, with probability 1 represents an upper bound on the size of the maximum c-quasiclique in a uniform random graph when n??. For this, we need the following property of c-quasi-cliques.
Proposition 3 If graph G~(V ,E), where DV D~n, is a c-quasiclique for some fixed c[(0,1, then for any svn there exists a c-quasi-clique of size s in G. Proof. For c~1, this property is trivial. In the case of c[(0,1), it suffices to show that the statement of the proposition holds for s~n{1. Since G~(V ,E) is a c-quasi-clique, then DED §c n 2

:
Assume that there exists a vertex i[V with deg G (i)ƒc(n{1). Let V i~V \i; then the induced subgraph G½V i ~(V i ,E i ) is also a c-quasi-clique, since If there is no such a vertex, i.e., deg G (i)wc(n{1) for all i[V , then let j~arg min i[V deg G (i) be the vertex of G with the smallest degree, deg G (j)~mwc(n{1). As before, denote V j~V \j, and observe that the cardinality of the set of edges E j of the induced subgraph G½V j ~(V j ,E j ) satisfies Proof. First, observe that where the equality is due to Proposition 3. Define a sequence k n~2 ln 1=p Ã ln n, n §1; then, from expression (8) for E½N c n , one obtains by following the steps in Proposition 2 that for sufficiently large values of n PfM c n §k n gƒ n k n (p Ã ) From the definition of k n it follows that the term is bounded for large enough n, whence the sought probability can be subsequently bounded as PfM c n §k n gƒ n C(k n z1) ƒ ne kn k n kn : Again recalling the definition of k n , we note that The next corollary shows that in sufficiently large random graphs G(n,p), the size of the maximum c-clique is almost surely above a certain value of the order of ln n. It uses a well known fact, established by [25], that the size of the maximum clique in a uniform random graph G(n,p) converges almost surely to 2 ln n= ln p {1 . Note that M 1 n represents the size of the maximum clique (c~1) in a uniform random graph G(n,p). Observe also that, according to (13), zO( ln ln n), which corresponds to the well-known expression for the size of the maximum clique in uniform random graphs [24,25]. This allows us to define k 1 n as the limiting value of k c n above. Corollary 1 If 0vpvcƒ1, then the size M c n of the maximum cquasi-clique in a uniform random graph G(n,p) satisfies where k c n is given by (13). Proof. This follows immediately from Proposition 4, and the observation that, for any cv1, the size of the maximum c-clique in G(n,p) always greater than the size of the maximum clique in the same graph, i.e., In such a way, we have established that for any fixed pvc the asymptotic size of the maximum c-quasi-clique is of the order of ln n. Intuitively, when p §c, the entire graph G(n,p) becomes a cquasi-clique, thus the size of the maximum c-quasi-clique has the order of n. Therefore, the natural question arising here is what happens when c is fixed and p approaches c. We show that there is a first-order phase transition in the asymptotic behavior of the order of magnitude of the maximum c-quasi-clique in the point p~c.
Proposition 5 If M c n is the size of the maximum c-quasi-clique in a uniform random graph G(n,p) for some fixed c[(0,1), then with high probability (w.h.p.) Proof. The first limiting case follows from Proposition 4, since we proved that for any fixed cwp with probability 1 To prove the equality in (23b), let X ij be a Bernoulli random variable which is equal to 1 if there exists an edge (i,j) in the uniform random graph G(n,p). Then,

Quasi-Clique Problem
In this subsection we summarize a linear mixed-integer formulation of the maximum c-quasi-clique problem [41] that was used for ''small-scale'' computational experiments as a part of the Computational Experiments section below.
Consider a graph G~(V ,E) with n vertices and an adjacency matrix A (with elements a ij~1 if there is an edge between vertices i and j, and a ij~0 otherwise), and suppose that one selects some subgraph G s of G. In order to verify whether G s is a c-quasi-clique, we use the binary vector of variables x[f0,1g n , where x i~1 if vertex i belongs to G s , and x i~0 otherwise. The subgraph G s is a c-quasi-clique if the cardinality of its set of edges is at least where the last equality is due to x 2 i~x i . The number of edges in the subgraph G s can be calculated as 1 2 x T Ax~X n i,j~1 i=j a ij x j x i : Therefore, the problem of finding the maximum c-quasi-clique in the graph G can be formulated as follows: This is a 0-1 integer programming (IP) problem with a linear objective and a nonconvex quadratic constraint. A linearization of this problem can be performed at the expense of introducing additional variables and constraints.
A mixed-integer linear formulation of (24) with O(n) variables and constraints is given below. Recall that originally we had only one constraint, Next, observe that each of the quadratic equalities above is equivalent to four linear inequalities (a ij {c)x j z(1{x i )n, Therefore, the problem of finding a maximum c-quasi-clique can be represented as the following mixed integer linear programming problem with 2n variables (n binary variables and n continuous variables) and 4nz1 constraints. This formulation will be used for the computational experiments below.

Computational Experiments
In this subsection we consider exact and heuristic numerical computations of the size of the maximum c-quasi-clique in randomly generated graphs. In particular, these computational experiments are aimed at checking the following two aspects: (i) how realistic the asymptotic bounds (21), (22) on the size of the maximum c-quasi-clique M c n and its mean value are for relatively small values of n (n,10 2 ), and (ii) whether the approximate behavior of the relative size of the maximum c-quasi-clique in moderate-size (n,10 4 ) random graphs (as p8c) exhibits the ''step function'' pattern, which was proven for n?? in Theorem 2.
According to Remark 5, in large enough random graphs G(n,p) the average size E½M c n of the maximum c-quasi-clique belongs to the interval provided that pvc. Therefore, it was of interest to check the applicability of the above bounds for relatively small values of n.
To this end, in the first set of computational experiments we generated a number of instances of uniform random graphs G(n,p) with n~100 and p ranging from 0:05 to 0:15; namely, we generated 100 instances of G(100,p) for every p. Then, we employed the MIP formulation (25) to find the maximum c-cliques in the generated graphs for values c~0:9 and c~0:85. Such a choice of parameters is justified by relatively better numerical tractability of the MIP problem (25) for sparse graphs. We used FICO TM Xpress Optimization Suite 7.1 [60] to solve the resulting instances of problem (25). The resulting average values of M c 100 , as well as the minimum and maximum values of M c 100 over 100 instances for each p, are reported in Table 1.