Finding and Testing Network Communities by Lumped Markov Chains

Identifying communities (or clusters), namely groups of nodes with comparatively strong internal connectivity, is a fundamental task for deeply understanding the structure and function of a network. Yet, there is a lack of formal criteria for defining communities and for testing their significance. We propose a sharp definition that is based on a quality threshold. By means of a lumped Markov chain model of a random walker, a quality measure called “persistence probability” is associated to a cluster, which is then defined as an “-community” if such a probability is not smaller than . Consistently, a partition composed of -communities is an “-partition.” These definitions turn out to be very effective for finding and testing communities. If a set of candidate partitions is available, setting the desired -level allows one to immediately select the -partition with the finest decomposition. Simultaneously, the persistence probabilities quantify the quality of each single community. Given its ability in individually assessing each single cluster, this approach can also disclose single well-defined communities even in networks that overall do not possess a definite clusterized structure.


Introduction
Complex networks are currently one of the most extensively studied subjects in the field of applied mathematics. In the last fifteen years, a huge number of theoretical results have been put forward, and almost any field of science and technology has benefit from the application of such results to specific problems [1][2][3][4].
One of the most promising but challenging tasks in network science is community analysis, which is aimed at revealing possible partitions of a network into subsets of nodes (communities, or clusters) with dense intra-but sparse inter-group connections. Finding and analyzing such partitions often provides invaluable help in deeply understanding the structure and function of a network, as widely demonstrated by several case studies in social sciences [5,6], biology [7], ecology [8], economics [9], or information science [10,11], just to name a few.
Despite the abundance of contributions on this subject (see [12] for a thorough survey), the issue of community analysis cannot be considered satisfactorily solved. First of all, finding communities is a computationally hard task, because the ''best'' partition must be sought for in a set whose cardinality grows faster than exponentially with the number of nodes. The exhaustive enumeration of the partitions is thus impossible, and heuristic techniques must be employed. Secondly, and perhaps more important, there is no widespread consensus on formal criteria for defining communities and for testing their significance [12]. When a subnetwork can actually be considered to form a community, namely a group of nodes with comparatively strong internal connectivity? Many contributions, mostly coming from social sciences, computer sciences, and physics, have tried to answer this question in various ways, over the years (e.g., [13][14][15][16]). Probably the most important attempt was put forward by Newman and coworkers [5,17,18], who defined a quality index called modularity which quantifies, for a given partition of the network into candidate communities, to what extent the distribution of the intra-/inter-community edges is anomalous with respect to a suitably defined random network. Since high modularity values are obtained in presence of groups of nodes with comparatively large intra-community edge density, maximizing modularity should put in evidence the ''best'' partition. This method has been proven successful in many circumstances but, on the other hand, it has been widely demonstrated that, due to intrinsic limitations, it does not necessarily always yield a significant partition [12,15,19]. And even when it does, it quantifies the quality of a partition but not of each individual community. For that, many other methods for community analysis have been put forward in the last few years, trying to simultaneously finding a meaningful network partition and assessing its significance (we recall, e.g., [20][21][22]).
This paper introduces a sharp definition of community which is based on a quality threshold. More precisely, once a level 0vav1 is specified, a node cluster is defined to be an a-community if the probability that a random walker, which is currently in one of the cluster's nodes, remains in the cluster in the next step is not smaller than a. Such a probability is obtained from an approximate lumped Markov chain model of the random walker (i.e., a reduced-order Markov chain in which the communities of the original network become nodes) which is easily derived from the original (highorder) Markov chain model. Consistently, a partition composed of a-communities is defined to be an a-partition.
If equipped with an effective method for generating a set of ''good'' candidate partitions, the notions of a-community and apartition provide a framework for simultaneously finding commu-nities and testing their significance. For that, the desired quality level a is first fixed. Then, a family of partitions is derived and each partition is immediately checked to assess whether it is formed by a-communities. This allows one to identify the a-partitions, and to select one of them. Typically, one searches for communities which are at the same time small (to effectively decompose the network) and significant (with much more internal than external connectivity). For that, a guideline is that of selecting, among the available a-partitions, the one with the largest number of communities.
But the notion of a-community can also be useful in a partially different way. It may happen that, for a given quality level a, no apartitions are found. Yet, one or a few a-communities could exist. They correspond to strongly connected groups of nodes, even in a network which, overall, does not possess a definite clusterized structure. Or, finally, one can assess the significance of the results of a single-partition method, such as modularity optimization [5], and obtain an immediate assessment of the a-quality of each single community and, consequently, of the entire partition.
In the paper, we first introduce the lumped Markov chain model of the random walker and define the notions of persistence probability, a-community, and a-partition. Testing the a-quality of a given community or partition turns out to be extremely parsimonious in computational terms. Then we analyze the problem of finding communities in a given network. For that, we propose an effective algorithm for deriving a meaningful set of partitions, among which the ''best'' one will be selected. The algorithm, which applies hierarchical cluster analysis, is again based on the Markov chain model of a random walker and, consequently, it involves a notion of similarity/distance among nodes which is consistent with the quality criterion above introduced. The results of the application of the above approach to both synthetic and real-world networks are discussed. We finally compare this approach, which can be applied to fully general networks (i.e., directed and weighted), with other community analysis methods having a similar philosophy.

Networks, a-Communities, and a-Partitions
Consider a network with nodes N~f1,2, . . . ,Ng and L edges. In the most general case the network is directed and weighted, and we denote by W~½w ij the N|N weight matrix, where w ij §0 is the weight of the edge i?j. The connectivity matrix A~½a ij is the N|N binary matrix where a ij~1 if w ij w0, and a ij~0 otherwise. If the network is actually undirected we have W~W ' and A~A', and if it is unweighted we let W~A (i.e., all weights equal to 1). Since connectedness is typically required for communities ( [12], p. 84), we naturally assume that the network is strongly connected (e.g., [3]), namely there exists an oriented path from any i to any j. If this is not the case, namely the network is disconnected, each strongly connected component must be separately analyzed.
If the network is directed, for each node i we define the (total) degree as k i~k in i zk out i~Pj a ji z P j a ij , whereas k i~P j a jiP j a ij for undirected network. The average degree is given by SkT~P i k i =N. Similarly, for a directed network the in-, out-, and total strength of node i are given by s in i~Pj w ji , s out i~Pj w ij , and s i~s in i zs out i , respectively, and the total network weight by S~P ij w ij . If the network is undirected we have instead s i~s in i~s out i~Pj w ji~P j w ij and S~P ij w ij =2. A N-state Markov chain p tz1~pt P, with p t~p1,t p 2,t . . . p N,t ð Þ , can be associated to the N-node network by row-normalizing the weight matrix W , namely by letting the transition probability from i to j equal to The quantity p ij is the probability that a random walker which is in node i jumps to node j, and p i,t is the probability of being in node i at time t. The transition matrix P~½p ij is a row-stochastic (or Markov) matrix (0ƒp ij ƒ1 for all i,j, and P j p ij~1 for all i). Furthermore, P is irreducible since the network is connected. This implies that the equation p~pP has a unique solution p, which is strictly positive (p i w0 for all i) [23] and corresponds to the stationary Markov chain state probability distribution. For undirected networks one can easily check that p~s 1 s 2 . . . s N ð Þ = (2S), whereas for directed networks a general closed form does not exist and p has to be numerically computed.
We denote by P q a partition of N in q subsets (or subnetworks), namely P q~f C 1 ,C 2 , . . . ,C q g with S c C c~N and C c \C d~ for all c,d. The sub-network C c is called a (candidate) community (or cluster). Defining a partition P q induces a q-state meta-network, where communities become meta-nodes. The rigorous description of the dynamics of the random walker at this scale by a lumped Markov chain, however, is not possible if not in special cases [24] actually, the Markovian property is not even preserved in general. Despite this limitation, a q-state Markov chain can be defined, which correctly describes the random walker at the aggregate level provided the stochastic process is started at the stationary distribution p [25,26]. This lumped Markov chain is defined by the q|q row-stochastic matrix where H (collecting matrix) is a N|q binary matrix coding the partition P q , i.e., its entry h ic is 1 if and only if node i[C c (see the Supporting Information S1 for the derivation of equation (2)). The lumped Markov chain P tz1~Pt U shares the stationary distribution with the original one (suitably collected), namely P~pH satisfies P~PU. On the contrary, starting from an arbitrary p 0 , the lumped Markov chain P tz1~Pt U started at P 0~p0 H provides, in general, only an approximate description of the evolution of p t H. The difference between the real and approximate P t , however, tends exponentially to zero if the two chains are regular [23], since they converge, by definition, to the same stationary state. The ability of the lumped Markov chain to describe the random walk dynamics only at stationarity is not a limitation for our purposes, as it will be demonstrated by the examples of application. Note that the entry u cd of U is the probability that the random walker is at time (tz1) in any of the nodes of community d, provided it is at time t in any of the nodes of community c. We define persistence probability of community c the diagonal term u cc . Large values of u cc are expected for meaningful communities. In fact, the expected escape time from C c is t c~( 1{u cc ) {1 : the walker will spend long time within the same community if the weights of the internal edges are comparatively large with respect to those pointing outside. Given a value 0vav1, C c is defined a-community if u cc §a. Thus a acts as a selection parameter, as sharply qualifies communities with respect to a given quality threshold. Consistently, P q is defined a-partition if it is composed of a-communities, namely u cc §a for all c~1,2, . . . ,q.

Testing communities
Testing the quality of a given partition is the simplest use of persistence probabilities. The partition can be the outcome of a community detection method (e.g., max-modularity) or instead derive from some a priori division (e.g., countries of the same continent in a financial network, or students of the same class in a school). By computing the u cc -s using equation (2), the quality of each community and of the entire partition is readily quantified.
Consider the simple 12-node network of Fig. 1 [27], which is purposely composed of three clusters. Four partitions are considered, corresponding to finer and finer divisions, and the u cc -s are computed for each candidate community. As long as the communities coincide with ''natural'' clusters, or with the union of two of them, all the u cc -s remain rather large. But as soon as a natural community is broken, some very low persistence probabilities are found. If, for example, the quality level a~0:5 is fixed (a value having an important interpretation -see below), only the first and second partition are such that u cc §a for all c (i.e., they are a-partitions). But even if the third and fourth partition fail in meeting such a requirement, yet some of their clusters can individually be classified as a-communities.
From equation (2), one can derive the explicit expression of the persistence probability u cc of cluster C c (see also the Supporting Information S1): Kim et al. [28] note that P i,j[Cc p i p ij is the fraction of time that the random walker spends on the links internal to C c . Thus u cc is the ratio between the latter and the fraction of time spent on the nodes of u cc . In the case of undirected network, recalling that having denoted by W c the total internal weight and by S c the total strength of C c . Thus the persistence probability has, in this case, a straightforward interpretation: it is the fraction of the strength of the nodes of C c that remains within C c . In the even more special case of unweighed networks, this has a strict relationship, in turn, with the notion of ''community in a weak sense'' put forward by Radicchi et al. [14], who defined a community as a set C c of nodes whose edges directed within C c are more than those directed toward the rest of the network. It can easily be verified that this corresponds to u cc w0:5. Therefore persistence probabilities generalize the above notion of ''community in a weak sense'' in a twofold direction: first, they extend it to weighted, directed networks; second, they allow a flexible tuning of the ''strength'' of the communities by fixing the desired minimum acceptable value (not necessarily 0:5) for u cc .
We note that (again by restricting the attention to undirected, unweighed networks), it can easily be checked that is the normalized cut of cluster C c (e.g., [12] p. 92), namely the ratio between the number of edges connecting C c to the rest of the network and the sum of the degrees of the nodes of C c . This observation bridges our dynamical, Markov-chain-based method with traditional graph partitioning techniques. It has already been pointed out that the latter are scarcely suitable for community detection [5,12], because the number of clusters has typically to be provided a priori whereas, in most instances, it is part of the outcome the network analyst is seeking for (see [29] for a relationship between modularity and cut size). Nonetheless, in the next section we shall see how a flexible exploitation of persistence probabilities enables an effective community analysis.

Finding communities
In the previous section, the persistence probabilities were used for testing given partitions and, individually, their communities. Here, instead, we want to analyze how this tool can be exploited for finding communities, namely for deriving partitions composed of meaningful communities.
The starting point is to define the desired level for the quality parameter a. For example, as pointed out above, in the case of undirected, unweighed networks, the constraint u cc wa~0:5 for all c is equivalent to require partitions composed of ''communities in a weak sense'', according to the definition of [14]. But the network analyst can be more or less restrictive, i.e., require a larger (0:5vav1) or smaller (0vav0:5) significance level.
In general, for any given a, a large set of a-partitions exist, i.e., such that u cc §a for all c (e.g., the trivial partition P 1~f Ng, the entire network, is an a-partition for any given a). Typically one searches for small (yet significant) communities, to effectively decompose the network. Thus we can rigorously formulate the problem of community detection as follows: where P denotes the set of all partitions. Notice that the admissible set of problem (5) is not empty for any given a (since P 1~f Ng has u 11~1 ) and that, in general, the optimal solution is not unique (if q~ q q attains the maximum in (5), there can be many P q q which are a-partitions).
Analyzing the theoretical properties of problem (5) is beyond the scope of this work (see [30] for a discussion on the NP-completeness of some related optimization problems). Instead, a heuristic approach for finding a suboptimal solution to (5) can readily be derived, by restricting the optimization to a (much smaller) subset P Ã 5P obtained by whatever ''partitions generator'', namely an algorithm that yields a set of partitions P',P'', . . . which are hopefully ''good'' candidates for community detection. In this way, problem (5) is readily solved by picking up the a-partitions within P Ã , and taking the one(s) with the largest q. We will make reference to this procedure in the remainder of the paper, but we anticipate that, instead of the ''unsupervised'' approach just outlined (with a fixed a priori), we will often prefer a ''supervised'' approach consisting in first generating a bunch of meaningful partitions, then comparatively assessing their quality, and finally selecting the preferred one, thus implicitly fixing the a value a posteriori. We will illustrate this procedure through many examples.
Several methods have been proposed to derive network partitions which are meaningful in the sense of community analysis (see again [12] for a thorough analysis). All of them can be used in our framework: here we adopt a method for deriving partitions which is based on cluster analysis and is consistent with the above introduced random walk modeling.
Cluster analysis can be used to group ''similar nodes'' into candidate communities. This needs defining a meaningful similarity/distance among each pair of nodes. Such a definition is by no means obvious: among the many proposals [12], a few exploit random walks to induce a suitable similarity measure (e.g., [31][32][33][34][35]). We follow this line by proposing an approach in which, however, we do not explicitly perform random walks in a Monte Carlo fashion, but derive analytically the global behavior of a large number M of walkers (a ''fleet'') started from each node i.
Consider a large number M of repetitions of a random walk started from i. For each repetition, the probability that the walker is in j after t steps is ½P t ij . Thus, if M random walks of length T are performed from i, the expected number of visits to j in any time instant in 1ƒtƒT is M P T t~1 ½P t ij . By averaging with respect to M, we propose a (symmetric) similarity s ij defined by Note that this is conceptually equivalent to an explicit random walk approach, but with an arbitrarily large number M of repetitions from each starting node instead of one only. Most notably, the results do not depend on the actual stochastic realization of the random walks. We finally define the distance d ij~dji between nodes (i,j) by complementing the similarity and normalizing the results between 0 and 1: The rationale underlying the definition of s ij and d ij is to assign nodes (i,j) a large similarity if a numerous fleet of random walkers started in i (resp. j) makes a large number of visits to j (resp. i) within a sufficiently small time horizon T. The notion of community induced by this metric, therefore, is that of a subnetwork where a random walker has a large probability of circulating for quite a long time, before eventually leaving to reach another group. This is conceptually consistent with the definition of a-community above introduced. The choice of the time horizon T is potentially critical. Cluster analysis yields a different hierarchical tree (dendrogram) for each time horizon T, whose choice is thus nontrivial. At the two extremes, setting T~1 restricts the pairs of nodes which are candidate to nonzero similarity to neighboring pairs only, whereas larger and larger values of T tend to make any node equally similar to any other. We found that an effective selection of T can be empirically obtained by maximizing the cophenetic correlation coefficient C, which is defined as the linear correlation between the distances d ij and the cophenetic distances c ij [36]. The latter are a product of the hierarchical cluster analysis: for any node pair (i,j), the cophenetic distance c ij is the height of the link joining (directly or indirectly) nodes (i,j) in the dendrogram. The value of C is generally used to assess whether the adopted distance d ij induces an effective clusterization (notice that C qualifies the entire dendrogram, and not a network partition), although limitations have been observed in specific applications [37].
The entire procedure for finding communities is summarized in Fig. 2 with reference to the toy-network of Fig. 1. Starting from the network description, we apply cluster analysis for each T ranging from 1 to some sufficiently large T max (of the order of N), eventually taking the T value that maximizes C. Horizontal topdown cross-sections of the associated dendrogram identify a sequence P 2 ,P 3 , . . . of partitions with increasing number q of candidate communities. For each P q we compute the lumped Markov matrix U according to (2), and plot its diagonal terms in the persistence probabilities' diagram. In the case of Fig. 2, the sudden drop of the least u cc for q larger than 3 reveals that a meaningful community has been broken passing from P 3 to P 4 . If we set, for instance, the quality threshold at a~0:5, then P 2 and P 3 can be qualified as a-partitions, and thus P 3 will be our choice if we seek for the finest partition, consistently with problem (5).

Results
The analysis of four networks is now discussed. We will consider two families of synthetical benchmark networks with built-in cluster structure; a real-world network with a rather strong community structure; and another real-world network with weak clustering but with a few well-defined communities. Other examples are discussed in the Supporting Information S1.

LFR benchmarks
Lancichinetti, Fortunato, and Radicchi (LFR) [38] proposed a family of synthetically generated graphs, designed to serve as benchmarks for testing community detection algorithms. They explicitly took into account two properties found in real networks, namely the heterogeneity in the distributions of node degrees and community sizes. Both of the latter are taken as power laws, with given exponents c and b, respectively. In addition, the network is defined by prescribing the number N of nodes, the average degree SkT, and a mixing parameter m such that each node shares a fraction 1{m of its edges with the other nodes of its own community, and a fraction m with the rest of the network. The benchmark generating method was later extended to oriented and weighted networks [39] (see the Supporting Information S1) -here we consider undirected, unweighed networks with N~1000, SkT~20, c~2. We first let Figure 2. Summary of the procedure for community analysis. From the network description (top panel) and a suitable definition of node distance, a hierarchical tree (dendrogram) is derived by cluster analysis (middle panel). Horizontal top-down cross-sections of the dendrogram identify a sequence P q of partitions with increasing number q of candidate communities. In the persistence probabilities' diagram (bottom panel), the q diagonal terms u cc of the lumped Markov matrix U are plotted for each partition P q (crosses denote the values of the u cc , vertical straight lines are only for visual aid). In this example, the sudden drop of the least u cc for q larger than 3 reveals that a meaningful community has been broken passing from P 3 to P 4 . doi:10.1371/journal.pone.0027028.g002

Finding Communities by Lumped Markov Chains
PLoS ONE | www.plosone.org b~1 and m~0:25. Since the generating algorithm is stochastic, we produce 10 different network instances: the number of built-in communities q Ã turns out to range from 35 to 43, and the size of each community from 10 to 77 nodes.
We now fix our desired quality level, for example a~0:5, and solve problem (5) for each of the 10 networks. For that, we use the above described ''partitions generator'': in Fig. 3 we show, for illustrative purposes, the cophenetic correlation coefficient C as a function of the random walk time horizon T, as obtained analyzing one of the networks. We find a unimodal dependence, as for almost all the network studied. We take therefore T~12 in this case, which attains the maximum C~0:905. The related dendrogram is in the same figure.
The persistence probabilities' diagrams obtained for the 10 networks are shown in Fig. 4. In all instances, the diagrams reveal a sharp discontinuity. For qƒq Ã , all the u cc -s are rather large (larger than 0:72). This indicates that meaningful communities are identified. For qwq Ã , instead, some significant communities are broken, as revealed by a larger and larger number of small u cc -s. In other words, the correct number of built-in communities is systematically revealed, in all instances, by a sudden drop of some of the persistence probabilities. This implies, in turn, that solving problem (5), i.e., taking the largest q such that P q is an a-partition with a~0:5, yields a solution with q~ q q which exactly recovers the number q Ã of communities. Furthermore, such a solution is largely insensitive to the choice of the quality level: for example, any value in a range 0:1vav0:7 would give the same result.
Obviously, the fact that q q~q Ã does not imply that the two partitions are identical. In order to quantify the ability of the method, we compare the built-in partition with that obtained by solving problem (5), in terms of the normalized mutual information I, a reliable and often used measure of partition similarity introduced by [40] to the network research community. The definition of I is reported in the Supporting Information S1: here we only point out that I~1 when the two partitions are identical, whereas I has zero expected value for independent partitions. We obtain an average of I~0:997 over the 10 networks, which favorably compares to the values reported by [38] after extensive tests by using modularity optimization (I&0:975) and Potts model clustering [41] (I&0:925).
In [38] it is shown that the performance of community detection algorithms deteriorates when, ceteris paribus, the scale parameter b of the power-law community size distribution increases (i.e., communities are less differentiated in size) and/or when the mixing parameter m increases (i.e., communities become less isolated each other). To analyze this situation, we generate another set of 10 benchmark networks by increasing b from 1 to 2 and m from 0:25 to 0:6 (the highest m value considered in [38]): the resulting networks turn out to have from 47 to 58 communities, with size ranging from 10 to 61 nodes. Notice that we are generating low-quality clusters, due to the large m: actually, none of them would met the requirement of ''community in a weak sense'' according to [14]. In other words, the cluster structure of the network is extremely weak, and that is obviously the reason of the scarce performance of community detection tools.
All of this is captured by the persistence probabilities' diagrams of Fig. 5. All the candidate partitions are characterized by low-quality clusters (with the u cc -s accumulating in the range 0:25{0:4), which is the obvious result of the low quality of the built-in partitions. In this situation, when analyzing one of the networks, the detection procedure of [14], based on the notion of ''community in a weak sense'', would discard any candidate community; the max-modularity approach would yield a partition as outcome, but with no assessment of the quality of its clusters; and also the unsupervised solution of our problem (5) would lead to poor results: for example, setting a priori the value of a to the ''standard'' value of 0:5 would discard all partitions.
It is exactly in such a difficult context that persistence probabilities can be a precious decision support tool. By looking at the diagrams of Fig. 5, the analyst immediately grasp the weak cluster structure of the network under scrutiny, and can consistently a posteriori fix an a value not unrealistically restrictive. Alternatively, he/she can rely on the observation of a sudden drop in one or more persistence probabilities' as an indication of a (comparatively) good partition. This means selecting q~ q q such that min c u cc has the largest variation from P q q to P q qz1 . If we systematically apply this strategy to the 10 benchmark networks, we obtain an average mutual information between the built-in and the obtained partition of I~0:844, which is intermediate with respect to the values obtained by [38] with modularity optimization (I&0:875) and Potts model clustering (I&0:825). But the added value of our approach is, for the selected partition P q q , the quality measure u cc of each cluster and, consequently, of the entire partition.

Netscience network
The Netscience network is a weighted, undirected, social network describing the collaborations (up to year 2006) among researchers in network science, the weight of the edge connecting two researchers being proportional to the number of papers they have co-authored [18]. Its giant component has N~379 nodes, and it is generally considered an example of a real network with a rather strong community structure. Many methods for network analysis, included community detection algorithms, have been tested and discussed on this example (e.g., [42][43][44]). If we run our partitions generator algorithm, at T~6 we get the dendrogram attaining the largest C: the resulting persistence probabilities' diagram is in Fig. 6. The plot has a less sharp structure than that of the LFR networks of Fig. 4: if we adopt once again the criterion of [14], namely we solve problem (5) in an unsupervised fashion by letting a~0:5, then P 35 is the optimal partition (here we have straightforwardly extended the notion of ''community in a weak sense'' to weighted networks). In a supervised approach, instead, the network analyst will select the proper q as a trade-off between a finer decomposition (large q) and a higher significance of the communities (small q). For example, setting a as large as 0:9 yields P 10 as the optimal partition, i.e., the a-partition with largest q.
It is instructive to compare these results with those obtained, on the same case study, by the graph stability approach proposed by Delvenne et al. [42] (a detailed comparison of the two methods is in the next section). By means of the KVV algorithm [45] (a hierarchical, divisive, non-binary, graph clustering method), they obtain a sequence of six partitions, with q~2,3,5,15,17,21.
Analyzing and comparing the stability curve (i.e., the autocovariance function of a signal emitted by a random walker) of each of them, the authors suggest their partition with q~5 as the more reliable, as it has the largest stability over a longer time span with respect to any other. Incidentally, this is also a supervised approach that leaves the analyst the choice of the preferred solution among a set of alternatives.
In order to test the six partitions of [42], we created their persistence probabilities' diagram and compared it with our results in the diagram of Fig. 7. The partition q~5 of [42] confirms to be definitely more significant than those with finer decomposition (i.e., q~15, 17,21) according to the criterion of minimal u cc too. Actually, our and their P 5 partitions share the same minimal u cc~0 :952, due to a common 22-node community. They are, however, partially different (their normalized mutual information is I~0:886, with about 6% of differently classified node pairs).
The inspection of Fig. 7 also reveals that, for each given q, the partitions obtained with our method are superior than those proposed in [42], provided the criterion put forward in this paper (i.e., minimal u cc ) is adopted. Actually, while the criterion of [42] ranks partitions by ''averaging'' among the communities, our approach is a ''worst-case'' one: by selecting an a-partition one guarantees that the ''worst'' community has a persistence probability not less than a. Finally, note that in the gap from q~6 to 15, where no partition is obtained by the KVV divisive algorithm, our partitions generating algorithm provides a set of finer and finer partitions, whose quality only slowly deteriorates as q increases. The network analyst can fruitfully select in this interval a proper trade-off between fine granularity and significance of the partition.

World trade network
The final example concerns a real-world, directed, weighted network, representing the trade flows among countries. This network, denoted as world trade network (or world trade web), has extensively been studied in recent years (e.g., [46][47][48]). The problem of the existence of communities, namely groups of countries with preferential partnerships, has been addressed too, although results seem to be not definitive [49,50]. This issue is obviously related to the debate about ''globalization versus regionalization'' in the world economy.
We consider the network derived from 2008 data, whose largest connected component has N~181 nodes. It does not seem to display a definite community structure: as a matter of fact, the maximum modularity (estimated as in [51]) is rather small, namely Q~0:296, if compared to other examples where N has the same order of magnitude (e.g., Q~0:831 for the Netscience network). In this situation, we show how our method is able to detect welldefined communities (if any) even in a network which overall does not possess a definite clusterized structure. Consider the persistence probabilities' diagram of Fig. 8. With the exception of the cases q~2 and 3, corresponding to rather trivial partitions, no a-partition exists with a reasonably large (say, a §0:5).  Nonetheless, an a-community which meets a restrictive quality standard is stably detected in a rather wide range of q, as highlighted in the figure. It is a cluster composed of 62 nodes which shows a rather strong internal connectivity (u cc~0 :72). Any other candidate cluster, instead, turns out to have a much smaller u cc value and, therefore, it can hardly be considered to be a significant community. Interestingly, this meaningful cluster includes almost all European countries, plus a number of minor non-European partners.

Comparison with other community detection methods
In the section ''Methods'' we highlighted some important connections between persistence probabilities and other quantities which are standard in graph theory. As we pointed out, for undirected networks u cc reduces to the so-called internal density, namely the ratio between the total internal weight and the total strength of C c . In turn, if the graph is unweighed too, this turns out to be one minus the normalized cut of C c . The definition of ''community in the weak sense'', put forward in [14], can also be reinterpreted in terms of persistence probabilities. No straightforward connections, however, can be deducted for directed networks, where nonetheless the tool of persistence probabilities can be fully applied.
An important relationship between random walks and modularity is put forward by Kim et al. [28] who propose their LinkRank modularity Q lr (that we denote by R for clarity), a variation to the standard modularity aimed at obtaining a better performance on directed graphs. In words, R is the difference between the fraction of time spent walking within communities (R') and the expected value of this fraction on a suitable null model (R''). Both these terms are additive with respect to communities, and it turns out that (with our notation): In the case of undirected networks, simple computations show that R' c~Wc =S and R'' c~S 2 c =(4S 2 ), which implies that the LinkRank modularity reduces to the standard one Q, which indeed can be written as: The comparison between Q' c~Wc =S and the persistence probability u cc~2 W c =S c reveals obvious analogies but also subtle and important differences. The former is the fraction of time spent in community C c : being proportional to the total internal weight W c , it will be smaller for smaller clusters, ceteris paribus, regardless to their cohesiveness. On the contrary, u cc measures the probability of remaining in C c given that the walker is currently there, regardless to the dimension of the cluster, thanks to the normalization by the total cluster strength S c . The result is a superior capability of persistence probabilities is assessing the quality of clusters whatever their size is, a precious feature when analyzing multi-scale networks (i.e., composed of communities of different size scales). This can be demonstrated, for example, by considering an instance of a LFR benchmark network with b~1, m~0:25 (see the section ''Results'' for the value of the other parameters) and analyzing the values of u cc and Q' c (~R' c ) for a set of partitions. The network has 38 communities with size ranging from 11 to 77 nodes. We use our partitions generator to yield a family of P q with qƒ50. For each partition, we compute the set of persistence probabilities u cc and the set of the fractions of time spent in the community Q' c . The results are shown in the first and second panel of Fig. 9: as long as the considered P q does not break any of the built-in clusters, all the u cc -s remain large and concentrated in a rather narrow range, regardless to the cluster size. Then, some of them abruptly decreases as soon as clusters are broken. The Q' c -s, on the contrary, are quite widespread (in a range from 1% to 6{7%) and vary in a rather smooth manner, since a smooth reduction of cluster sizes yields a corresponding smooth reduction of their internal weight W c . It seems therefore that the fraction of time spent walking is not as indicative of the quality of a cluster as the persistence probability. The scenario does not modify if the relative fraction of time Q c~Q ' c {Q'' c (or local modularity) is considered, i.e., if the comparison with the performance of a null model is accounted for, as it appears from the third panel of Fig. 9. The obvious consequence is a very small sensitivity of the modularity Q~P c Q c , at least within this set of partitions. As shown in the bottom panel of Fig. 9, Q has almost the same value in 20 §q §40, which makes questionable the reliability of choosing the max-modularity partition (P 34 , in this case). Further discussion on the use of absolute vs. relative (i.e., compared with a null model) cluster measures is in the Supporting Information S1.
The proposed approach has also important connections with two recently published community analysis methods. Delvenne et al. [42] show that the autocorrelation function of a signal emitted by a random walker, with value c as long as the walker is in a node i[C c , can be expressed in terms of the clustered autocovariance matrix H, and they define the stability of the partition H as r H t~m in s~0,1,...,t trace R s ð Þ. Given a set of candidate partitions, the graph stability function r t~m ax H r H t puts in evidence, for each time instant t, which is the ''optimal'' partition. It is suggested in [42] that the most relevant partitions are those which are optimal over long time windows. It is straightforward to check that our matrix U is related to the step-1 autocovariance R 1 by R 1 zP'P~diag(P)U. The two methods are thus based on the same ground, but our approach has two advantages: first, for each partition H we do not have to compute a long time-dependent sequence such as R 1 ,R 2 , . . . ,R tmax (with t max of the same order as N) of q|q matrices, but the sole matrix U, with an important reduction in the computational burden. Second, the full list of the persistence probabilities u cc allows one to test the quality of each single community, whereas the stability of the clustering r H t averages among all the communities.
Finally, a work with straightforward connections to ours is that of Weinan et al. [52], who suggest that the best q-community partition is that corresponding to the ''best'' (in a suitable technical sense) q-state approximated lumped Markov chain. This boils out Figure 9. A comparison between persistence probabilities and fraction of time spent as indicators of the quality of a community. The test considers a LFR benchmark network with b~1, m~0:25 (see the text for the other parameters). For each candidate partitions P q , the four panels show, from the top to the bottom: the persistence probabilities u cc ; the fractions of time Q' c spent in each community by a random walker; the difference of such fractions with those obtained in a null network model (local modularity Q c ); the modularity Q of the partition. Only the set of persistence probabilities shows a definite structural change in correspondence of the correct number of communities (q~38). doi:10.1371/journal.pone.0027028.g009 to the formulation of a minimization problem, after a metric on the space of stochastic matrices is introduced. A drawback of this approach is however that q must be a priori specified, whereas often identifying the correct number of communities is the main goal of the analysis. For the same reason, it can hardly support the discussion of the significance and convenience of choosing one partition instead of another.

Concluding remarks
In this paper, we have shown that associating a lumped Markov chain to a given network partition (i.e., a set of communities) provides an effective tool for testing the significance of each single community and, consequently, of the entire partition. As a matter of fact, the diagonal terms (called persistence probabilities) of the lumped Markov matrix can be used as quality measures for each individual community. If a threshold level 0vav1 is fixed, a sharp criterion for defining a community as ''meaningful'' is therefore that of requiring that its persistence probability is not less than a.
If an effective method for generating a set of ''good'' partitions is available, the above criterion can be used to rapidly select one of them among those complying with the prescribed a-quality, typically the one with the finest network decomposition (i.e., the largest number of communities). We have used a generator of partitions based on hierarchical cluster analysis, where the node distance is again defined on the basis of a Markov chain random walk model. Overall, the method has fair computational requirements, and can be applied to fully general networks (i.e., directed and weighted). Its effectiveness has been demonstrated on several medium-scale examples (see also the Supporting Information S1 for further case studies).
As already pointed out, the tool of persistence probabilities can be used to assess the quality of partitions, or single clusters, obtained with whatever method (e.g., modularity optimization) or a priori defined (e.g., geographical areas in the world trade network). Along this line, two possible extensions appear to be promising. One one side, several methods have recently been proposed to identify overlapping communities, i.e., clusters with shared nodes [53,54]. In principle, a lumped Markov chain can be associated to a cover as well (i.e., a clusterization with possible overlaps), although this requires a careful treatment of the shared nodes. Another extension concerns time-variant networks, namely networks whose edges (or their weights) vary in time (many examples can be found in social or economic networks). Once a community structure has been identified in a given time instant (i.e., on a ''frozen'' network), one may be interested in tracking the time evolution of the persistence probabilities, to reveal which communities remain significant in time or, on the contrary, which ones have a decaying cohesion [22,55]. These extensions will be the subject of future research.

Supporting Information
Supporting Information S1 Figure S1.1. The persistence probabilities' diagram of the Erdös-Rényi network. Figure S1.2. Zachary's karate club network. Above: The dendrogram obtained with T~2. Below: The persistence probabilities' diagram. Figure  S1.3. The persistence probabilities' diagrams of two LFR directed, weighted benchmark networks. Top: m t~mw~0 :3 (the number of planted communities is 35). Bottom: m t~mw~0 :6 (42 planted communities). See the text for the other parameters. Figure S1.4. LinkRank benchmark network. Figure S1.5. The persistence probabilities' diagram of the LinkRank benchmark network. Figure S1.6. The persistence probabilities' diagram of the neural network. Figure S1.7. Absolute and relative persistence probabilities' diagrams of a LFR benchmark network. The relative persistence probability r cc~ucc {Eu cc compares the absolute one u cc with the persistence probability Eu cc of the same cluster in a null model. (PDF)

Author Contributions
Conceived and designed the experiments: CP. Performed the experiments: CP. Analyzed the data: CP. Contributed reagents/materials/analysis tools: CP. Wrote the paper: CP.