Multi-Objective Community Detection Based on Memetic Algorithm

Community detection has drawn a lot of attention as it can provide invaluable help in understanding the function and visualizing the structure of networks. Since single objective optimization methods have intrinsic drawbacks to identifying multiple significant community structures, some methods formulate the community detection as multi-objective problems and adopt population-based evolutionary algorithms to obtain multiple community structures. Evolutionary algorithms have strong global search ability, but have difficulty in locating local optima efficiently. In this study, in order to identify multiple significant community structures more effectively, a multi-objective memetic algorithm for community detection is proposed by combining multi-objective evolutionary algorithm with a local search procedure. The local search procedure is designed by addressing three issues. Firstly, nondominated solutions generated by evolutionary operations and solutions in dominant population are set as initial individuals for local search procedure. Then, a new direction vector named as pseudonormal vector is proposed to integrate two objective functions together to form a fitness function. Finally, a network specific local search strategy based on label propagation rule is expanded to search the local optimal solutions efficiently. The extensive experiments on both artificial and real-world networks evaluate the proposed method from three aspects. Firstly, experiments on influence of local search procedure demonstrate that the local search procedure can speed up the convergence to better partitions and make the algorithm more stable. Secondly, comparisons with a set of classic community detection methods illustrate the proposed method can find single partitions effectively. Finally, the method is applied to identify hierarchical structures of networks which are beneficial for analyzing networks in multi-resolution levels.


Introduction
Many real-world systems which consist of objects with relationships among them can be efficaciously represented as complex networks [1]. Community structure is one of the most important properties of diverse networks. Generally speaking, a community can be described as a group of nodes that are densely intra-connected, while only sparsely linked with the rest of the network [2]. Community structures in complex networks play important roles in the structurefunction relationship, as they can provide invaluable help in understanding the functions and visualizing the structures of networks [3,4].
Most community detection methods can be roughly classified into heuristic based and optimization based methods. Heuristic based methods derive network partitions by executing some heuristic rules which are usually based on intuitive observations rather than explicitly optimizing global objective functions [5]. Such kind of methods usually lacks of accurate description of the properties of global community structures. Optimization based methods formulate community detection as a combinatorial optimization problem and detect the community structure by optimizing a predefined evaluation criterion which describes a certain property of community, such as modularity [6], normalized cut [7] and the map equation [8], etc. However, single objective optimization methods have two main intrinsic disadvantages, i.e., they may lead to bias on the obtained community partition, and they may not be able to detect multiple potential structures [9]. To overcome above drawbacks, community detection problems have also been formulated as Multi-objective Optimization Problems (MOPs). Multi-objective community detection methods describe multiple structure properties of networks by optimizing multiple conflicting criteria and obtain multiple network partitions which correspond to different tradeoffs among these criteria [10,11].
Traditional community detection methods dealing with single objective function and single community partition can be hardly adapted to multi-objective community detection problems, so Evolutionary Algorithms (EAs) which can handle a population of partitions in a single run have been adopted [9,12,13]. EAs have excellent global search abilities of exploring the entire network partition space and identifying areas with potential high quality partitions. However, they have difficulty in locating the local best partitions around the potential high quality space in a short time. To address such drawbacks, Memetic Algorithms (MAs) which combine EAs with local search procedure have been proposed to deal with single objective community detection problems so far [14][15][16][17]. The integrated local search procedure can search the promising partition space carefully and accelerate the method to find the local optimal partitions. The hybrid search property of MAs should have advantages to deal with multi-objective community detection problems too. Since conventional local search methods only optimize single fitness function and deal with one partition at a time, three problems need to be addressed to adapt them to multi-objective situations, i.e., determining initial partitions, defining appropriate fitness function and designing effective local search strategy.
In this paper, a multi-objective memetic community detection algorithm to identify multiple community structures is presented. The proposed algorithm is termed as MMCD for short. The MMCD adopts a multi-objective immune algorithm as global search mechanism. We mainly focus our effort on developing an effective local search procedure by addressing three problems. Firstly, nondominated solutions generated by evolutionary operations and solutions in dominant population are selected as initial partitions for local search procedure. Then, a new direction vector named as pseudonormal vector is proposed to integrate two objective functions together to form a fitness function. Finally, a network specific local search strategy is expanded to search the local optimal solutions efficiently. To evaluate the effect of local search procedure on MMCD and illustrate the important applications of MMCD, experiments on artificial datasets and real-world networks are carried out from three perspectives, i.e. parameter settings and effects of local search procedure, comparison with a variety of community detection methods, and ability to find multi-resolution structures of networks.
The remainder of this paper is organized as follows. Section 2 presents works related to our work. Section 3 describes the formulation of multi-objective optimization community detection problem. The method is described in section 4. Section 5 presents the experimental results. Finally, Section 6 gives the conclusions.

Related Works
Some multi-objective optimization community detection methods have been proposed based on EAs which are inspired by biological evolution. The Multi-objective Genetic Algorithm for Networks (MOGA-Net) [12] optimizes community score and community fitness simultaneously. It adopts the Nondominated Sorting Genetic Algorithm-II (NSGA-II) [18] as optimization mechanism. The Multi-Objective Community Detection algorithm (MOCD) [13] selects two terms of modularity as objective functions. Since two terms of modularity describe two opposite properties of a community structure and are appropriate for detecting multiple structures, we also adopt them as conflicting objective functions in this paper. However, the optimization algorithm proposed in this paper is rather different from MOCD. MOCD adopts the Pareto Envelope-based Selection Algorithm version 2 (PESA-II) [19] as optimization mechanism and does not integrate a local search procedure, while our method integrates a local search procedure into another competitive multi-objective EA, i.e. Nondominated Neighbor Immune Algorithm (NNIA) [20]. The NNIA is also adopted by Multi-objective Immune algorithm for multi-resolution Community Detection (MICD) [9]. However, MICD adopts two terms of modularity density as objective functions and also does not integrate a local search procedure. What's more, the individual representation scheme and the evolution operators used by MMCD are different from those used by MOCD and MICD.
One of the biggest differences between MMCD and other multi-objective community detection algorithms is that it integrates a local search procedure into the multi-objective EA. Conventional EAs which search the community partition space based on random evolution without much restriction have strong global search ability [9,12,21,22]. However, they have difficulty in locating the local optimal solutions around the promising search space in a short time. On the other hand, local search community detection methods are very good at obtaining the local best partitions [23][24][25], while the salient drawback is that escaping from the local optima to achieve a better solution is not easy for them. Based on advantages and drawbacks mentioned above, it is profitable to combine EAs and local search methods together to formulate the Memetic Algorithms for community detection problems. For example, the Meme-Net community detection algorithm combines GAs with a hill-climbing local search strategy to optimize the modularity density [14]. The community detection method based on Modularity and an Improved Genetic Algorithm (MIGA) takes modularity as objective function and adopts simulated annealing method as local search method [15]. The Memetic algorithm with multi-level Learning strategies (MLCD) also optimizes modularity and proposes multi-level learning methods to accelerate the convergence of genetic algorithm [16]. Above MAs for community detection mainly optimize single objective functions. It is valuable to extend MAs to handle multi-objective community detection problems due to the advantages of multiobjective formulations.
where O = (X 1 , X 2 ,. . ., X N ) is the set of all feasible partitions of a network, and f 1 , f 2 ,. . ., f t denote different objective functions which describe the properties of the partition obtained. To define the set of solutions which are found by the use of Pareto optimality theory [26], the definition of dominance relation is given below. Given any two feasible solutions X 1 and X 2 , solution X 1 is said to dominate solution X 2 , denoted as X 1 1 X 2 , iff If there does not exist any other solution X that dominates X Ã , X Ã is called nondominated (i.e. dominant) or Pareto-optimal solution. Pareto-optimal solutions do not dominate each other, as an improvement in one objective function will result in a degradation of another. The set of Pareto-optimal solutions is called the Pareto-optimal set which is formulated as The functions vector F maps the solution space into the objective function space, and the corresponding image of the Pareto-optimal set in the objective function space is called the Paretooptimal front which is formulated as The goal of the multi-objective community detection methods is to find a Pareto-optimal set or an approximated one. Since communities are usually described from two aspects that they are usually densely intra-connected and sparsely inter-connected, choosing two objective functions to reflect such two properties is reasonable. In this paper, two parts of modularity are selected as two objective functions because they can reflect such two properties to some extent.
Modularity (denoted as Q) proposed by Girvan and Newman [6] can be written as where X is one possible partition of the network, C is a community in partition X, l C is the number of edges in the community C which is defined as l C = (∑ i, j 2 C A ij )/2, k C is the total degree of nodes in the C which is defined as k C = ∑ i 2 C k i , k i is the degree of node i, and m is the total number of edges in the network. The value of Q ranges from -1 to 1 and the larger value corresponds to better network partition. From the Eq (1), modularity can be regarded as a fixed tradeoff between two terms, i.e. P C2X l C m and À To maximize the modularity, both terms should be maximized as possible as they can. Maximizing the first term means as many as possible edges should be included in communities, which will lead to larger communities. While maximizing the second term requires the total degree of nodes in communities should be as small as possible, which will lead to smaller communities. These two complementary terms naturally conflict with each other to some extent and reflect the two aspects of a good partition, i.e., densely intra-connected and sparsely interconnected. Thus we select them as two separate objective functions for our algorithm. The first objective function is The value of this term ranges from 0 to 1. Since the range of the second term of modularity is from -1 to 0, we add a constant 1 to regularize it which will not affect the partition results of the network, i.e., Because such two objective functions have opposite propensity to the size and the number of communities, the maximization of them can find community structures at different resolution levels. The MMCD detects community structures by maximizing such two objective functions which is formulated as where X is a possible partition in network partition space O. Since MMCD integrates a local search procedure into a EA which is proven to solve the above optimization problem successfully, it can solve the above optimization problem by designing an appropriate local search procedure.

Methods
The MMCD optimizes the formulation Eq (4) by both evolutionary global search and local search in each generation. As our main focus is on local search procedure, we adopt modified framework of NNIA [20] as our multi-objective global search mechanism. In fact, our local search procedure can also be integrated into some other multi-objective optimization evolutionary algorithms, such as NSGA-II and PESA-II. Some related terms are stated as follows. Feasible solutions of problems are also called individuals or partitions. In each generation of the algorithm, six populations of individuals evolve in turn. They are dominant population D which is the set of nondominated individuals, active population AP which is the set of individuals selected from dominant population with larger crowding distance, clone population CL which stores clones of active individuals, evolutionary search offspring population EO which contains the result of evolutionary search, local search offspring population LO which contains the result of local search, and combined population B which is responsible for elitism. The main procedure of MMCD is given as Algorithm 1 in Table 1.
Initialization constructs the initial solution population of the algorithm. Nondominated Neighbor-Based Selection and Proportional Cloning are used to keep nondominated individuals more diverse based on crowding distance [18] to prevent individuals from gathering in some local regions and avoid premature convergence to local optimal solutions. Crossover and Mutation are evolutionary global search operations and Local Search Procedure is local search operation on population. Model Selection select partitions from final approximated Pareto-optimal set. The specific criteria to select solutions at each resolution scale should depend on specific applications. As Initialization are the basis of the algorithm, and evolutionary search operations (Crossover and Mutation) and Local Search Procedure are two search operations responsible for population evolution, we will discuss implementation details of them in the followings.

Representation and Initialization
Locus-based adjacency representation [27] and group based representation [28] are two common used representation strategies for community partitions in evolutionary algorithms. In this study, we use the latter one as it is more straightforward and can be conveniently handled in local search procedure. Each individual X p in the population is encoded as . . . ; x n p g, where n is the number of nodes in the network and x i p denotes the community identifier (also named as community label) of node i in partition X p . As at most there are n communities in the network with n nodes, the value of x i p can be chosen from any integer in the range of 1 to n. In the decoding step, the nodes with the same community label are grouped into one community. If there are k different community labels in an individual at the end of algorithm, the community structure corresponding to this individual will automatically have k communities. One prominent property of this algorithm is that the number of community k is unnecessary to set in advance. It is worth noting that there are multiple representations corresponding to one community structure. An illustration of this representation strategy for a toy network is given in Fig 1. The network with 7 nodes is partitioned into two communities. Two possible representations corresponding to the community structure are X 1 = {1,1,1,2,2,2,2} and X 2 = {3,3,3,4,4,4,4}.  3: Nondominated Neighbor-Based Selection: If the size of D g is not larger than SA, let AP g = D g . Otherwise, sort individuals in D g in descending order of crowding-distance, and select the first SA individuals to form AP g . Meanwhile, the original dominant population D g is set as external population to be applied for elitism; 4: Proportional Cloning: Apply proportional cloning to AP g to obtain the clone population CL g ; 5: Crossover and Mutation: Perform crossover and mutation operations on CL g to produce evolutionary search offspring population EO g ; 6: Local Search Procedure: Perform local search operation with parameter MI on external dominant population D g and EO g to obtain two local search offspring populations LO 1 g and LO 2 g , respectively; 7: Combination: Combine LO 1 g and LO 2 g together to form B g+1 . Set g = g+1; 8: Termination: If g < Gmax, return to step 2. Otherwise, go to step 9;  The population initialization procedure is given as Algorithm 2 in Table 2. Initially, each node is put in a different community for all individuals in the initial population. Assume each individual has the same community label assignment (i.e. X p = {1,2,. . ., n},8p = 1,2,. . ., SI). Then a simple heuristic process is employed which is similar to that in [28]. For each individual, randomly select a portion of nodes (e.g., α Á n nodes, α is a parameter and set as α = 0.3 for experiments in this paper) and assign their community labels to all of their neighbors, respectively. This heuristic process enhances the diversity and provides the better quality of the initial solutions, and has a low computational complexity.

Crossover and Mutation
Crossover is one of the evolutionary search operations in NNIA [20]. According to NNIA, the crossover operation O R on the clone population CL can be defined as where crossover(c i , AP) represents crossover operator on clone c i and an active individual randomly selected from population AP. In order to maintain population size, crossover(c i , AP) needs to return one offspring to replace c i . Two-point crossover [13] and uniform crossover [21] are two commonly used crossover operations, but they are not appropriate here because of a property of group based representation strategy. In this representation strategy, the same community label in different individuals may represents different communities, so community labels can't be simply exchanged between different individuals. In this study, we employ oneway crossover operation [28]. The crossover procedure is given as Algorithm 3 in Table 3. Assuming X p and X q are two crossover operation parents. Randomly select one of them as source individual X s and the other one as destination individual X d . Then, randomly choose a node i whose community labels in X s is x i s . Identify the set of nodes with the same community label as x i s in X s and replace the labels of this set of nodes in This procedure is illustrated in Fig 2. Assuming the source individual is X s = {1,1,1,4,5,5,7} and the destination individual is X d = {2,3,4,5,5,5,6}. Node 2 is randomly chosen and its community label in X s is 1. The set of nodes {1,2,3} has the same community label as node 2 in X s . Replacing labels of all nodes in this set with label 1 in X d to produce the offspring. Mutation is the other evolutionary search operation. A neighbor-based mutation operation which reduces the useless search space by taking into account the effective connections among nodes is used [12]. In this operation, the reassigned labels of mutated nodes are restricted to those of their neighbors. For each node i of each individual X p generated by crossover operation, generating a random value r 2 [0, 1]. If r is smaller than the mutation probability p m (p m is set as 0.01 for MMCD in this paper), the community label of this node is replaced with one of its neighbors' label, i.e. x i p x j p ; 9 j 2 fjjA ij ¼ 1g.

Local Search Procedure
Different from memetic algorithms for single objective community detection problems, the cases of multi-objective face some new problems. We summarize these problems into three issues, i.e., determining initial individuals, defining fitness function and designing local search strategy. Initial individuals. In each generation, local search procedure is applied to "good" individuals in two populations to obtain better individuals. One is evolutionary search offspring population EO generated after mutation operation and the other one is external dominant population D used for the purpose of elitism. Since non-dominated individuals are "good" individuals in multi-objective cases, initial individuals for local search procedure are nondominated solutions in population EO and solutions in population D. Table 3. Algorithm 3. One-way crossover procedure.
Input: Two individuals X p and X q .
Output: Offspring of crossover operation X o .
1: Generate a random value r 2 [0, 1]; 2: Randomly choose a node v i ; 3: if r < 0.5 then 4: Identify the set of nodes p i with the same community label as 7: else 8: Identify the set of nodes q i with the same community label as Fitness function. There are two objective functions in MMCD, while local search procedure needs a single fitness function to evaluate the neighbors of an initial individual to decide which neighbor can replace the initial one as a better solution. Dominance relation which is used in multi-objective EAs can be simply used as a fitness evaluation method by which an initial individual can be replaced with one that dominates it or the one that is not dominated by it. As is illustrated in Fig 3(a), the movable area of the former fitness evaluation rule is too small to find satisfied neighbors, while the area is too huge for the latter one which allows to select individuals in dominated regions after several searches. Besides dominance relation, scalarizing function is widely used as the fitness function for local search procedure in MOPs [29,30]. Scalarizing fitness function based on two objective funtions can be expressed as where f 1 (X) = IntraQ, f 2 (X) = InterQ for our method, and ω 1 and ω 2 are nonnegative weights, which satisfy the following constraints Region I-IV are four regions divided with respect to node A. Individuals in region I dominate A and in region I, II,IV are not dominated by A. Region I is too small to search, while individuals in region II and IV may move to Region III which are dominated by A after several generations; (b) When constant weight vector ω = (0.5,0.5) is applied, individual population will suffer from diversity problem after several searches; (c) Assuming weight vector ω = (0.5,0.5) selects X 1 as initial individual according to random weight vector scheme, then the probability to select right side individual is much higher then select left side as there is only one individual on the right side of X 1 ; (d) Pseudoweight vector ω P deviates from normal line vector ω N . Determining the appropriate weights is crucial for effective and efficient local search procedure. As is illustrated in Fig 3(b), if a constant weight vector ω = (ω 1 , ω 2 ) is applied, all initial individuals will perform local search in one fixed direction which will result in serious diversity problem for evolutionary search in the next generation. To maintain population diversity, the weight vector (i.e. local search direction in objective space) should depend on the location of each initial individual. A random weight vector scheme is proposed in [29] which first randomly specifies a weight vector and then select a solution with maximal fitness function value in that direction as initial individual for this weight vector. However, as is illustrated in Fig 3(c), this scheme has a bias selection problem that individuals in a small region may be selected with much higher probability, when the value range distributions of initial individuals in two objective functions vary a lot. What's more, some individuals may be selected more than one time while some others may not be selected any more. Pseudoweight vector [31] is another widely used weight vector for scalarizing fitness function and it is defined as where M is the number of objective functions which is 2 in MMCD algorithm. Pseudoweight vector roughly denotes the priorities of different objective functions at each individual. As is illustrated in Fig 3(d), sometimes pseudoweight vector may obviously deviate from the normal line of nondominated front. Nondominated front here is defined as the images of all nondominated solutions in objective space and normal line is the line perpendicular to the smooth line in nondominated front. In fact, the nondominated front in objective space can be regarded as a generalized isoline, because points in this front are not better or worse than each other. Since the gradient direction of an isoline is its normal line direction, the ideal direction of local search should also be the direction of normal line of the nondominated front.
As it is known to all, normal line is perpendicular to tangent line in two-dimensional space. However, the smooth line in nondominated front is not an inerratic curve and its formula is usually unknown, so the tangent line can't be derived easily. Thus, we propose the pseudonormal vector which approximates the normal line vector for our two objective optimization problem. In fact, tangent line derives from secant line and it is the limiting case of the secant line, so we can apply secant line to approximate tangent line and set pseudonormal line perpendicular to such approximated tangent line. As is illustrated in Fig 4, for an arbitrary individual X which is not one of the boundary individuals in the nondominated front, the slope of tangent line at the location of X is similar to the slope of secant line connecting two individuals X 1 and X 2 that are closest to X on either side. The "closest to X on either side" here means closest to X on either side along either objective function. Actually, two individuals in nondominated front closest to X on either side along one objective function are also closest to X on either side along the other one. To prove this property, we first give another property of nondominated front in two-dimensional objective space.
Property 1 (Reversed order property) In two-dimensional objective space, the ascending order of individuals in nondominated front along one objective function value is the descending order along the other objective function value.
Proof. Individuals in nondominated front do not dominate each other, so one individual greater than another one along one objective function value must be smaller than that one along the other objective function value in nondominated front of two-dimensional objective space. The property is proved by iterating such relation to all pairs of individuals in nondominated front.

Property 2 (Closest relation property)
In two-dimensional objective space, for an arbitrary individual X which is not a boundary individual in nondominated front, two individuals in nondominated front closest to X on either side along one objective function are also closest to X on either side along the other one.
Proof. Assuming X 1 and X 2 are two individuals closest to X along objective function f 1 and the relation of inequality is f 1 (X 1 ) < f 1 (X) < f 1 (X 2 ). According to Property 1, we have f 2 (X 1 ) > f 2 (X) > f 2 (X 2 ). If two other individuals X 3 and X 4 are closest to X along objective function f 2 and the relation of inequality is f 2 (X 3 ) > f 2 (X) > f 2 (X 4 ), then we have the relation of inequality that f 2 (X 1 ) > f 2 (X 3 ) > f 2 (X) > f 2 (X 4 ) > f 2 (X 2 ). However, according to Property 1, it is easy to obtain that f 1 (X 1 ) < f 1 (X 3 ) < f 1 (X) < f 1 (X 4 ) < f 1 (X 2 ) which is contradictory to the assumption that X 1 and X 2 are two individuals closest to X along objective function f 1 . Thus, the property is proved.
The secant line connecting two individuals X 1 and X 2 is Then the pseudonormal vector of X is set perpendicular to such secant and defined as where π is the normalization factor, i.e. π = f 2 (X 1 ) − f 2 (X 2 )+f 1 (X 2 ) − f 1 (X 1 ). For boundary individuals (i.e. X 2 and X 3 in Fig 4) who are in the boundary of nondominated front, only one side of them has closest individuals, so their pseudonormal vectors cannot be calculated by formula Eq (5). In order to maintain diversity and obtain solutions which are optimal on either objective function, the pseudonormal vectors of boundary individuals are set parallel to coordinate axes, i.e. ω PN (X 2 ) = (1,0) and ω PN (X 3 ) = (0,1) in Fig 4. Note that the local search direction specified by the weight vector ω in the objective space is a totally different concept from that in the decision space. The search direction here is used to judge the quality of individuals and guide the local search rather than the exact direction which individuals move along. Local search strategy. By using the pseudonormal vector ω PN , the scalarizing fitness function for local search procedure in MMCD algorithm is By substituting IntraQ and InterQ with formula Eq (2) and Eq (3), respectively, the fitness function can be written as where the meanings of parameters are the same as those in formula Eq (2) and Eq (3) and S(C) is the contribution of community C to fitness function S(X). For a conventional local search procedure, the neighborhood of an initial individual is first defined. Then the fitness values of all neighbors are calculated and the neighbor with maximal fitness value is selected to replace the current individual. However, calculating the fitness value from scratch for all neighbors is one of the most time consuming operations in the algorithm. So inspired by LPAm [24] which local optimizes modularity, we apply an efficient local search strategy here which only calculates the increments of fitness value based on network structure information. Firstly, we define the neighborhood of a network partition (i.e. individual or solution). Intuitively, a neighbor partition should be close to or similar to itself. Some definitions are given as follows.
Definition 1 (Local neighborhood) For a network partition X p , the local neighborhood N i (X p ) respect to a node v i is defined as the set of partitions formed by moving node v i to one of its adjacent communities, i.e., where x AC i p is the community label of community AC i in partition X p , AC i is an adjacent community of v i which is defined as where C j is the community which node v j belongs to and N(v i ) is the set of neighbors of node v i . N(v i ) includes v i itself here, thus community of node itself is also regarded as its adjacent community and it is easy to derive that X p 2 N i (X p ).
Definition 2 (Neighborhood) For a network partition X p , the neighborhood N(X p ) is defined as the union of local neighborhood respect to all nodes, i.e., Next, fitness value increments of partitions in local neighborhood N i (X p ) relative to partition X p are calculated based on the network structure information. The node v i is set as an isolated community by regarding community of itself as its adjacent community. The fitness increment of a local neighbor corresponding to the adjacent community AC i in N i (X p ) relative to X p is calculated as where AC i +v i denotes the community combining AC i with v i and {v i } denotes the isolated community which only contains v i . After calculating all increments corresponding to all adjacent communities AC i of v i , the one with maximal increment is chosen as v i 's new community. If the new community is the same as the original one of v i , it means v i is already in the local best community in partition X p and X p is a local maximal solution in local neighborhood N i (X p ). In order to show how to calculate ΔS i (X p ), the formula Eq (8) is expanded as where l AC i +vi and l AC i are the number of edges in the community AC i +v i and AC i , respectively. l AC i , vi is the number of edges between node v i and the nodes in AC i . k AC i +vi and k AC i are the total degree of nodes in the AC i +v i and AC i respectively and k i is the degree of node v i . As mentioned above, the community labels of node v i and community AC i in X p are x i p and x AC i p , respectively. The formula Eq (9) can be rewritten in the form of adjacency matrix and community labels as where δ is the Kronecker delta function. Finally the new community label x i 0 p of node i should be updated to x AC i p which maximizes the formula (10), i.e.
This fitness function specialized label update rule can be efficiently implemented by an algorithm similar to label propagation algorithm [23]. The local search procedure is illustrated in Algorithm 4 in Table 4. The local optimum around X p in its neighborhood is obtained by getting local optimism in its local neighborhoods respect to all different nodes in a random sequence. The local search procedure will be terminated when a predefined maximum iterations MI is reached or the community label of each node is unchanged.

Analysis of Computational Complexity
In this section, the computational complexity of the proposed algorithm MMCD is analyzed. Given a network with n nodes and m edges and assuming that the maximum size of the dominant population and active population is SD and SA, respectively, the size of clone population is SC, the time complexity of one generation for the algorithm can be calculated as follows.  [24], where MI is the maximum iterations for label propagation. Thus, the total time complexity for local search procedure in one generation is at most O(SC Á m+SC 2 +SC Á MI Á m). Based on the analysis above, according to the operational rules of the symbol O, the overall time complexity of MMCD algorithm with g generations is O(g Á ((SD+SC) Á m+(SD+SC) 2 +SC Á MI Á m)). Maximum iterations MI is usually a very small constant, as 95% nodes or more usually can be partitioned correctly by only 5 iterations in LPA [23]. For large network, m will be much larger than SD+SC, so the time complexity can be further simplified to O(g Á (SD+SC) Á m).

Experimental Results
In this section, we will study the MMCD through experiments on artificial and real-world networks from three aspects. The first experiments will discuss the influence of parameters in MMCD and validate the advantages of local search procedure in MMCD. Then MMCD will be compared with a variety of classic community detection algorithms on artificial and real-world networks to illustrate the performance on finding community structures. Finally, we will apply the MMCD to detect community structures at different resolution levels on artificial and realworld networks. The MMCD algorithm has been written in MATLAB. Unless stated, all the non-deterministic algorithms have been independently run 10 times on each datasets. The experiments are carried out on a 2.80GHz and 3.00G RAM computer running Windows 7.
Evaluating the performance of a community detection algorithm needs to define a criterion to measure the quality of the obtained partition [14]. Modularity (Q) which is used as an objective function for a variety of community detection methods is a natural quality function whose higher value indicates stronger community structure. In fact, maximizing two objective functions of MMCD is also an indirect way to maximize the modularity. On the other hand, if some networks have ground-truth partitions or true partitions, the similarity between true partitions and partitions obtained by algorithms can be calculated to indicate the partition quality. Normalized Mutual Information (NMI) proposed by Danon et al. [32] is such a similarity measure. The value of Normalized Mutual Information NMI(P 1 , P 2 ) of two partitions P 1 and P 2 is between 0 and 1 with higher value indicating more similar to each other. NMI is independent with two objective functions of MMCD because it is based on true partitions of network. In this paper, higher values of both evaluation criteria are regarded as indications of better partitions.
Since multi-objective community detection algorithms can obtain multiple partitions in one run, we use modularity as model selection criterion when it is adopted as quality metric, and use NMI as model selection criterion when NMI is adopted as quality metric for all multi-objective community detection algorithms in this paper.

Influences of Parameters and Local Search Procedure
We will first discuss the influence of parameters of MMCD and validate the advantages of local search procedure in MMCD. In order to analyze methods in details, artificial benchmark network named GN benchmark proposed by Girvan and Newman [1] is adopted. The network consists of 128 nodes which are divided into four communities with 32 nodes each. Edges are randomly placed between nodes independently. Each node has an average degree of 16 and shares a fraction 1 − μ of edges with the nodes of its community, and a fraction μ of edges with the rest of the network. μ is called the mixing parameter. When μ < 0.5, the average number of neighbors of each node inside its community is larger than that of neighbors belonging to the rest communities, in which case the network has strong community structure. With the increase of μ, the community structure will become vaguer and harder to detect by algorithms. 13 different benchmark networks with values of μ ranging from 0.0 to 0.6 with step size 0.05 are generated as GN benchmark datasets in our experiments. Experiments about influences of parameters and local search procedure are carried out on GN artificial datasets.
In MMCD, the parameter Gmax is used to determine the maximum number of generations. The parameter SD indicates the maximum size of dominant population. To simplify the discussion, the maximum size of active population SA and size of clone population SC are set as SA = SD/5 and SC = SD for all experiments in this paper. The parameter MI denotes maximum iterations in local search procedure. Firstly, to test the influence of parameter Gmax on performance of MMCD, the other two parameters are set as SD = 110 and MI = 1. After running the algorithm 10 times with different Gmax, the average results are shown in Fig 5(a). The results at the range of μ 2 [0.45,0.6] are amplified inside the figure. Since the size of benchmark network is not large, the difference among the results is modest. However, it can be still observed that when mixing parameter μ is above 0.4, the results with larger Gmax are generally better than those with smaller Gmax except for a few cases such as Gmax = 8 at μ = 0.45 and at μ = 0.50. Since MMCD is a non-deterministic algorithm, these exceptions may be due to the nondeterminacy of the algorithm. The difference between results with larger Gmax gets smaller. This phenomenon agrees with the general convergence rule that when objective function values are closer to their best values, they will converge more slowly or even stop converging. What's more, we can see that our algorithm converges rather fast. In order to get the balance between performance quality and running time, the Gmax is set to 10 for MMCD on GN benchmark datasets.
Secondly, the influence of parameter SD is studied by letting it change from 30 to 150 with interval 20 and fixing Gmax = 10, MI = 1. The whole average results and the part amplified results at the range of μ 2 [0.5,0.6] are shown in Fig 5(b). It is shown that when μ 0.4, all of them detect the true community structure. With the increase of μ, algorithms with larger SD generally perform slightly better except for a few cases such as results of SD = 50 at μ = 0.5 and of SD = 90 at μ = 0.55. In fact, larger population has bigger probability to include the better The maximum iteration number of local search procedure has been proved to have little influence on performance of MMCD. It is natural to come up with the doubt whether the local search procedure itself has influence on results of MMCD. In order to validate the effect of local search procedure, we develop a multi-objective optimization version of MMCD by removing the local search procedure and name it as MOA. MMCD and MOA with different generation numbers ranging from 1 to 80 with interval 1 are carried out on GN benchmark at μ = 0.2 ten times. The average best results of NMI and Q are displayed in Fig 6(a) and 6(b), respectively. It is obviously shown that MMCD can find the true partition and obtain the largest value of NMI and Q in just two generations. On the other hand, the increases of NMI and Q of MOA are much slower and it can't find the true partition even after 80 generations. What's more, the obvious fluctuation of the results of MOA indicates it is less stable than MMCD. In a word, the convergence results, the convergence speed and the stability of MMCD are all superior to MOA which demonstrates the positive effect of the local search procedure in algorithm.

Comparison with Other Methods
In this section, performance of MMCD on finding single community structure is illustrated by comparing with other related methods. Three single objective optimization community detection methods named as CNM (A1), Louvain (A2), Infomap (A3), a single objective genetic algorithm GA-Net (A4), a single objective memetic algorithm Meme-Net (A5), two multiobjective genetic algorithms, MOGA-Net (A6) and MOCD (A7), and two simplified versions of MMCD named as MOA (A8) and LSA (A9) will be compared with MMCD (A10) on GN benchmark, LFR benchmark and a variety of real-world networks.
CNM [33] is a modularity optimization algorithm for community detection. Louvain [34] also optimize modularity but with a fast multistep greedy technique. Infomap [8] is based on the information theory to minimize a description length of information flow on network. All of three above methods are classic single objective optimization algorithms which are used to be compared with evolution based community detection algorithms.
GA-Net [21] is a representative single objective genetic algorithm which adopts a genetic algorithm to optimize a new objective function called community score. Meme-Net [14] is a single objective memetic algorithm which combines genetic algorithm with hill-climbing local search procedure to optimize the modularity. MOGA-Net [12] is a representative multi-objective evolutionary algorithm for community detection which optimizes two conflicting objectives, community score and community fitness. MOCD [13] is also a multi-objective evolutionary algorithm which optimizes two objectives similar to MMCD, i.e. two terms of modularity. All of four above methods are evolution based community detection algorithms from different perspectives.
As mentioned above, MOA is a simplified version of MMCD by removing the local search procedure from MMCD. It is used to validate the importance of local search procedure. On the other hand, LSA denotes a local search algorithm corresponding to local search part of MMCD. Since LSA can only optimize single objective function, we choose modularity as its objective function. In fact, LSA is the modularity-specialized LPA (LPAm) [24] which is used to test the importance of multi-objective scheme of MMCD.
Firstly, all algorithms are compared on GN benchmark dataset. The experimental parameters of algorithms on GN benchmark dataset are listed in Table 5. Fig 7(a) summarizes the best NMI values averaged over 10 runs for different algorithms on GN benchmark dataset. When the mixing parameter μ is no bigger than 0.05, all algorithms can detect the true community structure. However, with the increase of μ, the performance of GA-Net, MOGA-Net, MOCD and MOA gradually decline. When the μ is no smaller than 0.25 and no bigger than 0.4, only Infomap, Louvain and MMCD can detect the true community structure. As the μ further increases, the network becomes vaguer and all algorithms can't find the true partition. However, from the curves we can see that the MMCD still has the largest NMI values which indicates it is  Since the GN benchmark cannot reflect some important features of real-world networks, another benchmark named as LFR [35] is also adopted to test different algorithms. The LFR benchmark is similar to real-world networks by introducing heterogeneity into degree and community size distributions of a network which are governed by power laws with exponents τ 1 and τ 2 , respectively. Each node shares a fraction 1 − μ of edges with the nodes of its community and a fraction μ of edges with the rest of the network. μ is also called the mixing parameter. Network structure becomes fuzzier as μ's value gets larger. In our experiments, 17 networks with the mixing parameter increasing from 0 to 0.8 with an interval of 0.05 are generated. Each network contains 1000 nodes and the community size ranges from 10 to 50. The averaged degree for each node is 20 and the max node degree is 50. τ 1 = 2 and τ 2 = 1.
The experimental parameters of algorithms for LFR benchmark dataset are listed in Table 5. Fig 7(b) summarizes the best NMI values averaged over 10 runs for different algorithms on LFR benchmark dataset. Infomap can't detect the true community structure at μ = 0, but its NMI value achieves 1 after μ = 0 until μ = 0.55. MMCD and LSA detect the true community structure when the μ is no bigger than 0.2, and their NMI values are quite close to 1 until μ = 0.6. When μ surpasses 0.6, MMCD has the largest NMI value. However, two other multi-objective community detection methods, i.e. MOCD and MOA, have much poorer performance than MMCD when μ is bigger than 0.2. Since the main difference between them and MMCD is that they don't have local search procedure, this illustrates the advantages of local search procedure in MMCD again. Real-world networks may have different properties compared with artificial benchmark datasets, so all algorithms are further compared on a set of real-world networks. The description of each network is as follows.
Zachary's karate club network (N1) [36] describes the friendships between 34 members of a karate club. The network was divided into two groups after a dispute between the administrator and the instructor.
Journal index network (N2) [37] consists of 40 journals as nodes from 4 different fields, i.e. physics, chemistry, biology and ecology. Edges exist between journals if at least one article from one journal cited an article in the other one during 2004.
Dolphin social network (N3) [38] consists of 62 bottlenose dolphins living in Doubtful Sound, Zealand. The ties between dolphin pairs were established by observation of statistically significant frequent association. The network naturally splits into two large groups.
Lesmis network (N4) [39] is a weighted network of coappearances of characters in Victor Hugo's novel "Les Miserables". The network is regarded as unweighted one by set the weight of all edges as 1 in this paper.
Polbooks network (N5) consists 105 nodes representing books about US politics sold by the online book seller Amazon.com. Edges represent frequent co-purchasing of books by the same buyers. Books were divided by Newman [40] into three groups according to their political alignments, i.e. liberal, neutral and conservative.
Word adjacency network (N6) [41] is the network of common words in the novel "David Copperfield" by Charles Dickens. Nodes represent the most commonly occurring adjectives and nouns. Edges connect any pair of words that occur in adjacent position in the text of the book.
American College Football network (N7) [1] represents American football games between Division IA colleges during regular season Fall 2000. Nodes represent teams and edges represent the regular games between two teams. The teams are grouped into 12 conferences.
SFI network (N8) [1] represents collaborations between scientists at the Santa Fe Institute during any part of calendar year 1999 or 2000. Edges connect any pair of scientists if they coauthored one or more articles during the same time period. The biggest component of the SFI graph with 118 nodes is used in the experiment.
Jazz musicians network (N9) [42] includes 198 bands that performed between 1912 and 1940. An edge between two bands is established if they have at least one musician in common. Neural network (N10) [43] represents the neural network of C. Elegans. Metabolic network (N11) [44,45] represents metabolic system of C. Elegans. Email network (N12) [46] represents e-mail interchanges between members of the Univeristy Rovira i Virgili. Netscience network (N13) [41] records coauthorship of scientists working on network theory and experiments. Power network (N14) [43] represents the topology of the Western States Power Grid of the United States.
All the networks above are obtained from Internet [47][48][49]. The basic information of these real-world networks is shown in Table 6. All networks are divided into three groups according to their sizes, i.e., set of networks with nodes number smaller than 150 is group one (G1 = {N1, N2, N3, N4, N5, N6, N7, N8}), set of rest networks with nodes number smaller than 1000 is group two (G2 = {N9, N10, N11}), and the rest is group three (G3 = {N12, N13, N14}). The experimental parameters of algorithms for three network groups are listed in Table 7. Since MMCD converges faster, we adopt much smaller maximum generation number for it.
For each network, we run all algorithms 10 times and record the maximum and average results and standard deviations. The comparison results of modularity are shown in Table 8. As CNM, Louvain and Infomap are deterministic algorithms, their standard deviations are 0 all the time which means they don't have stability problem. Standard deviations of these three algorithms are removed since they don't need to test stability performance. Bold number in each row denotes the best value in corresponding item. It can be seen from the Table 8 that there are 42 items in total for all networks. MMCD acquires the best values in 29 items. Specifically, MMCD has better or equal values in all criteria compared to GA-Net and Meme-Net which are two single objective evolutionary algorithms. Therein, Meme-Net is a single objective memetic algorithm which includes a local search procedure. This illustrates the superiority of multi-objective optimization strategy of MMCD. MMCD also has better or equal values in almost all criteria compared to MOGA-Net except in Q std of N1, and it has better or equal performance on almost all networks compared to MOCD except on N1 and N8. What's more, almost 90 percent of items of MOA are worse than MMCD and no item of MOA is better than Table 7. Parameters of algorithms for real-world datasets.

Algorithm
Pop Gmax p c p m MI   Table 9. Standard deviations of three deterministic algorithms are removed as well. Bold number in each row denotes the best value in corresponding item. From the Table 9 it is obviously shown that the NMI max and NMI avg values of single objective optimization algorithms (i.e. CNM, Louvain, Infomap, GA-Net, Meme-Net and LSA) on most realworld networks are much lower than those of multi-objective optimization methods with only a few exceptions. The reason is that they try to optimize only one objective function (modularity, map equation, modularity density, etc.) and get only single solution. If the objective function can't exactly describe the network properties, the sole solution will not match the true partition. This observation demonstrates the drawbacks of single objective optimization methods mentioned in the previous section. The advantage of local search procedure is further proven by comparing MMCD with MOA. All NMI avg values of MOA except that on Karate network are smaller than those of MMCD and all NMI std values of MOA are larger than those of MMCD which illustrate the wonderful effects of local search procedure on convergence ability and stability of MMCD. Besides, MMCD is comparable with MOGA-Net and MOCD on NMI values.

Hierarchical Community Structures of Network
Not only MMCD can figure out high quality community structures, but also the population nature of its solutions can help find multi-resolution community structures in one run. Firstly, a simple hierarchical extension of the GN benchmark is adopted as hierarchical artificial network with built-in hierarchical community structures [50]. The network consists of 256 nodes which are arranged in 16 communities with 16 nodes each. The 16 communities are further grouped into four supercommunities, each of which has four smaller communities. The structure with 16 communities is regarded as level 1 community structure and the one with four supercommunities is regarded as level 2 community structure. Communities in level 1 and level 2 structures are called level 1 and level 2 communities, respectively. Each node shares average of 10 edges with the nodes in its level 1 community and 5 edges with nodes in three other level 1 communities in its level 2 community. In addition, each node has average of 2 edges with the rest of the network.
The MMCD is carried out on hierarchical GN benchmark once. The nondominated front of final solution population is shown in Fig 8(a). By analyzing the solution population, two community structure levels are both exactly found out. The arrow symbols with numbers of community in Fig 8(a) and 8(b) indicate the solutions corresponding to two community partition levels. Two representative community structures with 16 smaller communities and 4 larger communities found by MMCD are plot in Fig 9. In addition to two representative community structure levels, we further analyze the other solutions obtained by MMCD. The relationships between some objective values and the number of communities are shown in Fig 8(b). NMI − 1 and NMI − 2 denotes NMI values between the partitions found by MMCD and the true level 1 and level 2 community structures, respectively. It is shown that level 2 community structure has the largest modularity (Q) value which indicates that single modularity optimization algorithms may fail to figure out the level 1 community structure. In fact, the modularity of level 1 community structure even is not the second largest one. The modularity values of all partitions with community numbers between 4 and 16 are close to 0.6. These partitions are formed by merging several smaller level 1 communities together, so they are all reasonable partitions besides the level 1 and level 2 community structures. However, for those partitions with low modularity values, some tightly connected level 1 communities are split. They are unreasonable community structures. So modularity can be used to select reasonable partitions from the solution population obtained by MMCD. IntraQ and InterQ are two objective functions of MMCD. From the diagram, we can see that IntraQ decreases monotonically with the number of communities while InterQ increases monotonically. This observation agrees with the property mentioned in the previous section that the maximization of IntraQ and InterQ tend to find communities of opposite sizes. The NMI − 1 value and NIM − 2 value also have an opposite change trends along community numbers within range from 4 to 16 which means higher similarity with level 1 community structure will inevitably leads to lower similarity with level 2 community structure at the range [4,16].
We further illustrate the advantages of MMCD for identifying multi-resolution structures on real-world networks. The MMCD is carried out on each real-world network once. Fig 10(a) shows the nondominated front of MMCD on Karate network. Fig 10(b)-10(d) correspond to three solutions labeled as I-III in nondominated front, respectively. Fig 10(b) divides network into two communities exactly as true partition and its modularity value is 0.3715. The maximum modularity value found by MMCD is 0.4918 whose corresponding community structure is in Fig 10(c). In fact, Fig 10(c) further divides each community in Fig 10(b) into two smaller ones. The partition with the maximum modularity may not correspond to true partition in reality, so algorithms only optimizing modularity may fail to find out the true partition of a network. Fig 10(d) is another structure obtained with three communities whose modularity is 0.4020. Fig 11(a) displays the nondominated front of Journal network. Community structure in Fig  11(b) corresponds to solution I in Fig 11(a) which has the maximal NMI value and maximal modularity value at the same time. Fig 11(c) and 11(d) are two other partitions corresponding to solution II and III, respectively. The upper two communities in Fig 11(b) are mainly about physics and chemistry respectively, and the lower two communities are mainly about biology and ecology respectively. From the perspective of relations between disciplines, the upper two communities have stronger connections and can be merged into a larger community, and it is the same to the lower two communities. Fig 12(a) shows the nondominated front obtained on Dolphins network. Partitions corresponding to maximal NMI value (solution I) and maximal modularity value (solution II) are illustrated in Fig 12(b) and 12(c). Fig 12(c) divides the larger community in Fig 12(b) into four smaller ones. Fig 12(d) is another obtained solution with three communities corresponding to solution III in Fig 12(a) whose modularity value is 0.4932. The structure of Fig 12(d) is just The nondominated front obtained on Football network is shown in Fig 13(a). Groups of nodes gathered together represent true communities of Football network in Fig 13(b)-13(d). MMCD do not obtain the exactly true partition. The maximal NMI value obtained in one run is 0.9304 of solution I and the corresponding partition is shown in Fig 13(b). The number of communities in Fig 13(b) is 12 which is the same as true partition, however it misclassifies several nodes in two small communities in the middle of the network because such two communities are loosely connected inside themselves. Fig 13(c) shows the partition corresponding to solution II with the maximal modularity. Two small communities in the middle of the network disappear and their nodes are included into other communities. Solution III also has relative large modularity value and its community structure is illustrated in Fig 13(d). It has six communities. The main difference between solution III and two previous partitions is that it merges eight communities into four larger ones, each of which roughly consists of two smaller communities. By examining the names of football teams in the network carefully, some interesting phenomena are discovered. It is reasonable to assume that the names of teams reflect the location of them, e.g., Utah in yellow community in Fig 13(c) is in Utah of America. Based on this assumption, it can be observed that teams in each community detected by MMCD are mainly from the states near each other with only a few exceptions. For example, teams named Utah, NevadaLasVegas, NewMexico, ColoradoState, Wyoming, SanDiegoState, NewMexicoState, UtahState and so on in yellow community in Fig 13(c) are mainly from the southwest region of USA according to their names. Teams named WashingtonState, Washington, Oregon, OregonState, UCLA, California, ArizonaState and so on in pink community in Fig 13(c) are mainly from the pacific region of USA. In fact, the pacific region is adjacent to southwest region in America, so it is reasonable that the yellow community and pink community in

Conclusions
In this paper, we formulate community detection as a multi-objective optimization problem which regards the two contradictory parts of modularity function as two objective functions. In order to optimize such two objective functions effectively, a multi-objective memetic algorithm named as MMCD is proposed. MMCD combines multi-objective evolutionary mechanism with a local search procedure which helps algorithm to locate better solutions more effectively. An effective local search procedure is designed by addressing three issues, i.e., determining initial individuals, defining appropriate fitness function and designing effective local search strategy. The extensive experiments on both artificial and real-world networks demonstrate the advantages of MMCD from three aspects. The integrated local search procedure is proven to speed up the convergence to optimal solutions and make the algorithm more stable. What's more, experiments show that MMCD not only can find good community structures effectively, but also can figure out hierarchical structures which are useful to analyze networks in multiresolution levels. Some meaningful extensions can be made to MMCD in the future. There are only two objective functions considered in MMCD. When more objective functions are incorporated, the nondominated front will be much more complicated. How to calculate the normal line direction of nondominated front effectively is still an open question.