A community detection algorithm using network topologies and rule-based hierarchical arc-merging strategies

The authors use four criteria to examine a novel community detection algorithm: (a) effectiveness in terms of producing high values of normalized mutual information (NMI) and modularity, using well-known social networks for testing; (b) examination, meaning the ability to examine mitigating resolution limit problems using NMI values and synthetic networks; (c) correctness, meaning the ability to identify useful community structure results in terms of NMI values and Lancichinetti-Fortunato-Radicchi (LFR) benchmark networks; and (d) scalability, or the ability to produce comparable modularity values with fast execution times when working with large-scale real-world networks. In addition to describing a simple hierarchical arc-merging (HAM) algorithm that uses network topology information, we introduce rule-based arc-merging strategies for identifying community structures. Five well-studied social network datasets and eight sets of LFR benchmark networks were employed to validate the correctness of a ground-truth community, eight large-scale real-world complex networks were used to measure its efficiency, and two synthetic networks were used to determine its susceptibility to two resolution limit problems. Our experimental results indicate that the proposed HAM algorithm exhibited satisfactory performance efficiency, and that HAM-identified and ground-truth communities were comparable in terms of social and LFR benchmark networks, while mitigating resolution limit problems.


Introduction
Many real-world systems can be expressed as networks consisting of nodes connected by edges [1][2][3]. In social networks, nodes represent individuals, and edges are used to mark connections such as friendships and family relations. In scientific collaboration networks, nodes and edges respectively represent scientists and collaborations among scientists for published academic papers. In web graphs, nodes and edges respectively correspond to URLs and hyperlinks. Primary properties exhibited by networks include the small-world effect (indicating a high degree of clustering and low degree of separation) [4], long tails (indicating a power-law degree distribution in which a small number of nodes have stronger connections compared to other network nodes) [5], fractality (indicating combined degree distribution and negative assortativity coefficient slopes) [6][7], and community structure (indicating tight connections between nodes with similar features within groups, and loose connections between nodes across multiple groups) [1][2][8][9][10][11][12][13]. Network community detection, especially community structure, is currently receiving significant attention from researchers ranging from engineers and computer scientists to business and marketing specialists. The primary goal of community detection is identifying densely connected groups of network nodes and/or graph partitions that satisfy specific criteria such as edge connectivity compactness [1,[8][9][10][11]. The community detection problem is a well-studied NP-complete graph partition problem [14][15], and researchers in multiple disciplines have proposed various approaches to approximating community detection problem solutions.
Modularity [1,[10][11]32] (a widely used measure for evaluating community structure quality when a network lacks a ground-truth community) involves evaluations of edge densities within and across communities, with higher modularity values indicating stronger community structures or better network partition quality. Thus, modularity can be used as a fitness function in evolutionary computation approaches, or as an objective optimization function for finding optimum community detection solutions. However, care must be taken to identify and respond to resolution limit problems [41,50] that can arise when a community's small size makes it a likely candidate for absorption by a larger community. Methods that use modularity for fitness or objective functions tend to experience resolution limit problems.
Normalized mutual information (NMI) [51] is a preferred approach for verifying the correctness of algorithm-identified community structures when a network has a ground-truth community partition for calculating similarities between actual and identified partitions. Since the Lancichinetti-Fortunato-Radicchi (LFR) benchmark model [52][53] generates networks with actual partitions, a combined NMI-LFR benchmark network approach can be used to examine an algorithm's identification capabilities. Further, NMI can be used to test whether an algorithm mitigates resolution limit problems according to predefined synthetic networks (e.g., Clique-ring and Clique-pair networks) [41,50]. However, most real-world networks lack partitions, and networks with ground-truth communities are rare.
Theoretical benchmarks and actual network partitions can be used in combination to determine the ability of an algorithm to produce correct community detection results. One algorithm may be better than another for LFR benchmark networks, but not for real networks. In other cases, an algorithm may successfully mitigate resolution limit problems in one kind of synthetic network, but not in another. We believe that community detection algorithms should satisfy four criteria: (a) effectiveness in terms of producing high NMI and modularity values, using well-known social networks for testing; (b) examination, meaning the ability to examine resolution limits using NMI values and synthetic networks; (c) correctness, meaning the ability to identify useful community structure results in terms of NMI values and LFR benchmark networks; and (d) scalability, or the ability to produce comparable modularity values with fast execution times when working with large-scale real-world networks.
In this paper we will apply rule-based strategies to community detection tasks, and offer alternative strategies for identifying network community structures. We will also describe our proposal for a simple hierarchical arc-merging (HAM) algorithm that includes a measure for computing the similarities (weights) of adjacent nodes connected by an edge, and for ranking edges based on these weights. There are at least five advantages to using a rule-based HAM algorithm: strategies are easy to implement because they primarily use if-else code statements; rule structures for tasks such as edge merging are explicit; rule-based programs are efficient because they only need to check rule-defined situations; the HAM algorithms have higher community detection resolution; and they can be extended to node-based methods.
To validate the proposed HAM algorithm according to the four criteria described above, we used five well-studied social network datasets to test community detection (effectiveness criterion), two synthetic networks to examine its susceptibility to resolution limit problems (examination criterion), eight sets of LFR benchmark networks to identify ground-truth community structures (correctness criterion), and eight large-scale real-world complex networks to measure performance (scalability criterion). Our experimental results indicate that the proposed HAM algorithm is capable of producing high NMI and modularity values for identified community structures, and that those structures are similar to ground-truth community structures in social and LFR benchmark networks, thereby reducing the potential for two kinds of resolution limit problems in synthetic networks. At the same time it produces comparable modularity values for identified community structures, and satisfactory performance for large-scale real-world complex networks.

Background
To represent a network, let an undirected and weighted graph G = (V,E,W), where V is the node set, E the edge set, and W the edge weight. |V| denotes the number of nodes, |E| the number of edges, and |W| the sum of all edge weights. Network topology is represented as an adjacency matrix A = {a ij } and a ij 2 R n , where a ij = 1 if an edge e ij exists between nodes i and j, otherwise a ij = 0. w ij = w ji denotes the weight of an edge e ij , where w ij = 1 if nodes i and j in a network are identical and a ij = 1, otherwise w ij = 0.

Similarity measures
To capture network topology information for weighted networks for community detection tasks, similarity measures are generally used to determine edge weights and network characteristics for the purpose of identifying dense structures [54]. The most common approach for determining weight w ij of an edge e ij is to calculate the number of common neighbors-that is, w ij = w ji = S cn (i,j), as in (1). A high weight indicates a high degree of similarity and structural equivalence (i.e., connected nodes sharing large numbers of common neighbors). S cn can be extended to various similarity measures by dividing different denominator forms such as cosine similarity, the Jaccard index, and minimum similarity, respectively defined as S cn ði; jÞ ¼ jGðiÞ \ GðjÞj ð1Þ where Γ(i) is the neighbor set of node i, Γ(j) the neighbor set of node j, |Γ(i)| the neighbor number of node i, |Γ(j)| the neighbor number of node j, and min(x,y) a minimum-value retrieval function.
In complex network research, Girvan-Newman (GN) [2] and Fast-Newman (FN) [32] algorithms were initially applied to common community detection problems using hierarchical clustering and greedy searches. This was followed by several modularity optimization approaches (e.g., CNM [8] and Louvain method [34]) to finding approximate solutions by merging pairs of nodes (or communities) according to the maximum Q of a modularity measure or modularity density [55]. Some researchers then proposed label propagation approaches (e.g., LPA [36], LPAm [37], LPAm+ [38], sub-community integration [39], CenLP [40], LPW [49] and Core-Nodes based LAP [56]) in which node labels are propagated throughout entire networks, with nodes assigned to communities based on the maximum number of neighboring labels, and with community structures identified until a steady level of label propagation is achieved.
Data mining approaches have been adapted to handle non-overlapping (e.g., k-medoids [42]) and overlapping (e.g., fuzzy c-means [43] and rough-fuzzy [44]) community detection problems in which certain nodes belong to multiple communities. Infomap [45], an information theory approach, uses random walk and Huffman coding methods to reveal a network's community structures by minimizing its map equation-that is, its movement entropies between and within modules. Two density-based approaches, DenShrink [47] and ImDS [48], use similarity measures to calculate edge similarities, to extract topology characteristics from a network, and to identify community structures by merging or shrinking pairs of nodes according to degrees of similarity among edges.

Community detection validation
Accurately measuring network partition quality is an important issue in light of the large number of potential partitions. Modularity and NMI measures depend on the presence or absence of a ground-truth community. When none exists, modularity [1,[10][11]32] is often used as a fitness or objective function for evaluating community structure quality. A meaningful network partition contains many intra-community edges, but only a small number of inter-community edges. The term "meaningful" indicates that for an identified community and its randomized version, the number of intra-community connections should exceed the expected value of randomized intra-community connections, with both identified and randomized communities having the same degree sequences or numbers of nodes and edges. Thus, a randomized network is often used as a modularity null model. For any given network with M communities, modularity Q is defined as where ε ii is the fraction of edges connected to endpoints in the same community i, α i the fraction of edges connected to endpoints in community i, l i the number of edges with two endpoints within community i, and d i the summed degree of nodes in community i. A higher modularity value indicates better community structure quality. Unfortunately, resolution limits are a serious problem inherent to modularity [50]. In modularity optimization algorithms, small community size increases the potential of any community being absorbed into a larger community, thereby increasing the potential for overlooking important network substructures. Researchers who use modularity alone to identify communities should therefore consider ways of avoiding resolution limits. For cases where ground-truth communities are present, the NMI [51] and LFR benchmark models [52][53] can be used to measure community structure quality associated with an algorithm-that is, they can be used to calculate levels of similarity between actual A partitions and identified B partitions. Here NMI is defined as where C A is the number of actual communities, C B the number of identified communities, N a confusion matrix, N ij the number of nodes shared in common between communities C A and C B , N i. the sum over row i of matrix N, and N i. the sum over column j of matrix N. The NMI value range is between 0 and 1. If NMI(X,Y) = 1, the two partitions are considered identical, otherwise they are considered independent. A combination of NMI and two kinds of predefined synthetic networks (Clique rings and Clique pairs networks [51]) can be used to determine whether an algorithm suffers from a resolution limit problem. In order to satisfy the four criteria, we developed a four-part process to determine the appropriateness of a community detection algorithm. For the effectiveness criterion, a mix of five social networks, one small-scale LFR benchmark network, and multiple modularity and NMI measures were used to analyze the quality of identified community structures. For the examination criterion, two kinds of synthetic networks and a NMI measure were used to determine the presence of a resolution limit problem. For the correctness criterion, LFR benchmark networks and a NMI measure were used to verify the quality of identified community structures compared to an actual partition. For the scalability criterion, eight large-scale real-world complex networks and a modularity measure were used to analyze community structure quality and performance efficiency (e.g., execution time analysis).

Method
Consisting of an original network phase and a super-node network phase (Fig 1), our proposed HAM algorithm uses network topologies and rule-based arc-merging strategies to identify community structures. In the original network phase, a similarity measure is used to calculate edge weights and to obtain network topology information, after which rule-based strategies are used to identify major communities and to preprocess a super-node network structure. During the super-node network phase, the combination of a proposed modularity optimization equation and rule-based strategies is applied to construct the entire super-node network structure. HAM stops and returns community detection results when network modularity can no longer be improved.

Rule-based arc-merging strategies
The hierarchical arc-merging (edge-or node-merging) mechanism has been widely used for designing community detection algorithms [2,8,32,34,[47][48]. It can be explicitly defined in terms of corresponding rule-based arc-merging strategies-that is, one strategy can be used to identify communities, and another to connect them. We established five arc-merging rules that can be combined to create different strategies. For each edge e ij = (v i ,v j ) 2 E and v i ,v j 2 V, the arc-merging rules are defined as: R1: Create a super-node sn that merges endpoints v i and v j . R2: If endpoint v i is unmerged but endpoint v j is merged with a super-node s j , then merge v i with the super-node sn j (or retain v i as a super-node sn i ).
R3: If endpoint v i is merged with super-node s i but endpoint v j remains unmerged, then merge v j with super-node sn i (or retain v j as a super-node sn j ).
R4: Retain v i and v j as super-nodes sn i and sn j .
R5: Otherwise, do not merge v i and v j .
We used these rules to construct three kinds of strategies: community-creating (T1, which uses R1, R2, R3 and R5), structure maintenance (T2, which uses R2, R3, R4 and R5), and sinkshrinking (T3, which uses R2, R3 and R5). Details regarding the application of rule-based strategies for each HAM phase are presented as S1 File. Original network phase Building on previous community detection studies [34-36, 38, 47], we believe that the characteristics of community structures can be captured by an explicit (deterministic) procedure. Hence, in the original network phase of HAM, we added a procedure consisting of calculating edge weights, classifying edges according to their weights, and merging edge endpoints according to a rule-based strategy for community detection. After calculating the edge weights or similarities of two endpoints and identifying the dense or loose parts of network components, component edges are classified as weighted-edge E W , bridge E B , or sink E S . These three edge classes are defined as: where k i is the degree of node i and k j the degree of node j.
is the weighted-edge node set, V(E B ) the bridge-edge node set, and V(E S ) the sink-edge node set. After classifying edge weights (Fig 2), E W weighted edges are said to have greater similarity and higher summed node degrees, indicating that they are within the denser parts of communities-see, for example, edges (4, 5), (9, 10) and (12,13) in Fig 2. Further, E B bridge edges have higher degrees of either or both endpoints, indicating that they connect different communities-see edges (6,8) and (7,11) in the figure. Otherwise, the edges might be one part of a long bridge consisting of multiple edges. E S sink-edges such as (1,4), (2,4) and (3,4) have only single community connections.
Next, weighted-edge and bridge-edge classes are sorted and arranged in decreasing order according to two indexes: edge weight w ij and the k i + k j summed degree of edge endpoints. An edge with high weight w ij is perceived as having shared endpoints with a large number of neighbors, perhaps serving as the center of a community, group, or clique. However, the size of this community is unknown-it could be large or small. Accordingly, edge sorting involving w ij weights and k i + k j summed degrees in decreasing order represents edge priorities in a small community. High-priority edges in such sequences are considered candidates for community foci.
To give an example, assume that edges e x and e y have identical weights as determined by minimum similarities (w x = 3/min(4,5) = 0.75 and w y = 15/min(20,25) = 0.75), but with different k i + k j values (9 and 45, respectively). Although the two edges have identical proportions of common neighbors, in terms of neighbor endpoint connections edge e y likely captures more community information, and is therefore preferred for merging purposes early in the arcmerging process. Hence, an edge with a higher k i + k j value should be promoted and its priority increased in any sequence that is sorted during the arc-merging process in the original network phase. For the bridge-edge class, any edge with a high k i + k j should be considered an important bridge for connecting two communities, and therefore be given a higher priority during the arc-merging process.
According to this procedure, community structures are constructed from densest-to-loosest according to the order of sorted edges plus three rule-based strategies: 1. For E W edges, the T1 community-creating strategy is used to merge the endpoints of edges into super-nodes for use as seeds (R1), to attract unmerged nodes located close to these seeds (R2 and R3), and to handle all other cases tied to creating edges for constructing network structure (R5). After T1 is completed, a preprocessed high-level network structure consisting of super-nodes is created.
2. For E B edges, the T2 structure maintenance strategy is used to create edges for connecting isolated communities (R4), to attract nearby nodes with one edge endpoint that is already inside a community (R2 and R3), and to handle all other cases (R5). All isolated communities are connected after applying T2.
3. For E S edges, the T3 sink-shrinking strategy is used to handle edges with edge endpoints (either one) of 1 degree (i.e., d i = 1 or d j = 1), and to address all other cases (R5). Although the functionality of T3 is part of strategy T2, we will consider T3 as independent for purposes of describing the HAM rule-based strategy.

Super-node network phase
The procedure for the super-node network phase is similar to that for the original network phase. After edge similarity is measured in terms of the summed weights of all edges between two super nodes, modularity optimization is applied to determine whether any edge endpoint pairs should be merged into a high-level super-node based on a calculation of the ΔQ value of edges contributing to network modularity. The summed weight and ΔQ equations are expressed asŵ where m i is community (or super-node) i,ŵ ij the summed weights of edges between communities m i and m j , ΔQ ij the incremental value of modularity as contributed by edge e ij , Q m ij the partial modularity value after merging communities m i and m j , Q m i the partial modularity value before merging community m i , l ij the number of edges in merged community m ij , l i the number of edges in community m i , d ij the summed degree of nodes in merged community m ij , and d i the summed degree of nodes in community m i . If l ij = l i + l j + |e ij |, then (11) can be simplified as where (according to formulas 11 and 12) network topology information (i.e., l ij , l i , l j , d ij , d i , d j and |e ij |) is used for delta-Q calculations. This information, which is updated during the arcmerging process, can be applied immediately. Weighted network information only uses edge weights w ij in the original network phase and summed weightsŵ ij in the super-node network phase for sorting edges in decreasing order.
After calculating their summed weights and ΔQ values, edges are classified as deltaQ-edge E ΔQ or bypass-edge E P . E ΔQ denotes a set of edges with ΔQ ij values greater than zero-in other words, the merging of two edge endpoints carries the potential to increase the ΔQ of the entire network and improve community structure quality. E P consists of a set of unmerged edges. The two classes are defined as Following edge classification in the super-node network phase, E ΔQ edges have higher weights and ΔQ values, indicating their positions between two dense components and their ability to increase the incremental ΔQ value of the entire network. E P edges are only used to maintain the super-node network structure. Next, E ΔQ -class edges are sorted and arranged in decreasing order according to two indexes: the ΔQ ij value of edges and theŵ ij summed weights of edges. After sorting, the T1 and T2 rule-based strategies are used to create and maintain a high-level super-node network structure. For E ΔQ edges, T1 is used to create super-nodes as seeds for attracting unmerged nodes that are close to the super-node, as well as to handle all other cases. After executing T1, a preprocessed high-level network structure is created. For E B edges, T2 is used to create edges for connecting various communities, to attract nodes that are close to communities, and to handle all other cases. After executing T2, a high-level supernode network is completed.

The proposed algorithm
A HAM flowchart is presented as Fig 3 and details presented as Algorithm 1. For any given network G, a set of neighbors for each node in the original network is created, a similarity measure is used to calculate edge weights, and the original network is appended to the network list. During the original network phase, an empty network H is created as a super-node network for further construction. Next, edges are classified as E W , E B or E S . Three rule-based strategies (Algorithms A1-3 in S1 File) are applied during the original network phase: a strategy for creating communities, a maintenance strategy for connecting communities, and a sink-shrinking strategy for handling the edge endpoints with node degree k i = 1. After applying these strategies, all member-node information for the super-node network is refined. The constructed super-node network is preserved and appended to the network list. As part of the super-node network phase, an empty super-node network is created, ΔQ ij edge values are calculated, and edges are classified as E ΔQ or E P . Two rule-based strategies (Algorithms A4-5 in S1 File) are used to merge super-nodes and to construct a high-level super-node network structure, after which member-node information is refined and used to calculate network modularity values. HAM continues this arc-merging procedure until the ΔQ increment of the entire network is below a threshold, or until there are no more ΔQ edges. Last, community structures are identified. See S2 File for a step-by-step example.

Experimental results
We used two well-studied methods to establish HAM identification accuracy and performance efficiency baselines that fit with the four criteria: the Louvain method, which has a reputation for dealing successfully with a network consisting of 1 billion edges using a PC machine [34], and the Infomap information theory-based method, based on its history of producing optimum NMI results for LFR benchmark networks [45]. We designed experiments to compare HAM with the CNM [8] and Louvain modularity optimization methods, Infomap, and DenShrink (DS, a density-based method) [47]. All four are frequently used in community detection comparison experiments. They were implemented using the C (CNM) or C++ (Louvain, Infomap and DS) programming languages. Our proposed method was created with Python to take advantage of its code readability and package support characteristics, while accepting the disadvantage of slower execution times compared to C and C++. Our HAM python program is downloadable at https://github.com/yuhsiangfu/Hierarchical-Arc-Merging.

Datasets
For the effectiveness criterion we used five well-studied social networks and one small-scale LFR benchmark network with ground-truth communities to verify community detection results in terms of matches between identified and actual communities. For the examination criterion, two synthetic networks were used to identify algorithm-associated resolution limit problems, if any. For the correctness criterion, we used the LFR model [52][53] to generate synthesized networks with different community structure properties, as well as to test the accuracy of algorithm-identified community structures. For the scalability criterion, eight large real-world networks were used to test performance efficiency. The giant connected component (GCC) of the social, synthetic, small-scale benchmark, and large-scale real-world networks used in our experiments is shown in Table 1.
The five well-studied social networks used in this project are also listed in Table 1. The Zachary Karate Club network consists of 34 nodes (club members) and 78 edges (cross-member friendships) [57]. A split occurred due to a disagreement between the club's administrator and instructor; the instructor left, taking one-half of the original members and creating a new club. The Dolphins network consists of 62 bottlenose dolphins living in Doubtful Sound, New Zealand. Based on observations between 1994 and 2001, 159 interactions between dolphin pairs took place, more than would be predicted by chance [58]. This network can be divided into two groups based on the departure of one key individual. The U.S. college football network consists of 115 teams and 613 games played during the 2000 season [2], with nodes representing teams and edges games played between teams. The teams are divided into 12 conferences, and play more games against conference than non-conference opponents. The political book network [59] consists of the purchase histories of customers who bought books on political topics from the Amazon.com website. Nodes indicate books (105) and edges co-purchasing relationships in which users bought more than one book (441). Purchased books were classified as conservative, neutral or liberal. The political blogs network [60] consists of 1,222 blogs about the 2004 American presidential election and 16,714 links among them. The blogs were manually divided into conservative and liberal categories.
The two synthetic networks shown in Table 1 were used to determine the presence of resolution limit problems [51]. The Clique-ring synthetic network consists of a ring of w cliques (with w an even number) connected by a single edge. Each clique is a complete K p graph consisting of p nodes and [p(p − 1)]/2 edges. The Clique-pair synthetic network consists of two K p (part one) and two K q complete graphs (part two), both connected by single edges. Each part one clique is connected to two part two cliques. Clique-ring parameters are p = 5 and r = 30 (150 nodes and 330 edges). Clique-pair parameters are p = 20 and q = 5 (50 nodes and 404 edges).
One assumption of the LFR model is that node degree and community size follow a powerlaw distribution with the following parameters: γ, degree distribution exponent; β, community size distribution exponent; k max and k min , upper and lower node degree boundaries, respectively; z max and z min , community size constraints; mixing parameter u, the proportion of nodes sharing links with the nodes of other communities; and 1 − u, the proportion of nodes sharing links with other nodes in the same community. The LFR parameters used in this study are shown in Table 2 [16,47,53]. The mixing parameter u range was between 0.1 and 0.8 (0.05 increments). The LFR model generated 30 synthesized networks for each u. The small-scale LFR benchmark network described above was generated with 300 nodes and u = 0.05 (Table 1).
The eight large-scale real-world networks [61] can be further divided into the three large (|V| = 1000*10000) and five very large (|V| ! 10000) real network groups shown in Table 1.

Results
Experimental results for the effectiveness and examination criteria are shown in Table 3. The modularity, NMI, and execution time data for each method represent averages for 30 runs.  The first part of the table contains results for five well-studied social networks and one smallscale LFR benchmark network. Modularity and NMI measures were used to verify the correctness of community detection results produced by the various methods. For the ground-truth community NMI results, the HAM algorithm produced the highest NMI values according to the minimum similarity measure among the Karate, Polbooks, Polblogs, and small-scale LFR benchmark networks-that is, the identified community structures were the closest (or identical) to those of the networks' ground-truth communities. Infomap had the highest NMI values for the Dolphins, College Football, and small-scale LFR benchmark networks. For modularity results, when ground-truth communities were removed, the Louvain method had the highest modularity values for the Karate, Football, Polblogs and small-scale LFR benchmark networks. Infomap had the highest modularity values for the Dolphins and Polbooks networks. At best, HAM performance for modularity can only be considered satisfactory. For execution time results, HAM unexpectedly had the fastest performance efficiency for all six small-scale networks, including the Louvain method. Further, we found that different methods produced the highest NMI or modularity values, but whenever a method concurrently produced the highest values for both NMI and modularity (e.g., the Dolphin and small-scale LFR benchmark networks using Infomap), actual community sizes were approximately equal.
Results for two kinds of synthetic networks are presented in the second part of Table 3, with modularity and NMI measures used to determine the presence, if any, of resolution limit problems. HAM and DS had the highest NMI values (equal to 1) for the two networks, indicating that the resolution limit problems had been mitigated to a certain degree, and that community structures were correctly identified. In comparison, the Louvain method produced the highest modularity results for the two synthetic networks, indicating the presence of resolution limit problems. The unstable results produced by Infomap indicate uncertainty regarding their presence.
For the correctness criterion, the LFR model was used to generate 30 benchmark networks for each u (450 networks total). The results shown in Fig 4 represent averages for all 30. The NMI measure was used to verify the correctness of identified community detection results. Regarding HAM similarity settings, we used 50000S and 50000B LFR benchmark networks to determine which similarity measures should be applied in our experiments, and found that all three resulted in similar NMI values, but with markedly different execution times. We observed that the minimum similarity measure resulted in the fastest execution times, that the cosine similarity measure performed as fast as the minimum similarity measure, and that the Jaccard index was the slowest. Based on the NMI and execution time results, we decided to use the cosine similarity measure in our experiments. Similarity comparison data for other LFR benchmark networks are presented in S4 File. NMI and execution time results for LFR benchmark networks are shown in Figs 5 and 6, and detailed NMI and execution time data are presented in S5 and S6 Files. One CNM run required more than one hour for each u value for all 30 networks. Accordingly, generating results for all LFR benchmark network sets would require many hours or days of computing time. Due to a memory allocation error ("std::bad_alloc") during DS execution, CNM and DS results are not shown in Figs 5G, 5H, 6G or 6H. As shown, for each method the overall NMI value decreased when u increased, meaning that the community structures became less distinct as the number of in-between edges increased, making community structures more difficult for algorithms to identify. NMI results from various methods were much closer to each other when u 0.5. NMI values produced by Infomap dropped sharply when u ! 0.6 (e.g., 1000S) or u ! 0.55 (e.g., 1000B), indicating that the network structure information may have been insufficient for random walkers to capture indistinct community structures. Similar decreases have been reported by other researchers [18-19, 28, 39, 53]. In contrast, when network structure information was sufficient (e.g., 5000S/B to 50000S/B), Infomap performed well in cases with u = 0.1*0.8 ranges of distinct/indistinct community structures. In those cases, Infomap NMI results were best for the LFR benchmark networks (6 of 8 sets) (Tables I-M in  S5 File).
Regarding cosine similarity, our NMI results indicate that HAM successfully identified community structures that were close (e.g., u 0.6 in 1000S/B or u 0.5 in 5000S/B) or identical to actual structures (e.g., u 0.7 in 10000S/B to 50000S/B). Compared to those produced by the Louvain method, HAM results were close (e.g., u = 0.6*0.7 for 1000S and 5000B) or better (e.g., u ! 0.1 for 5000S and 10000S/B to 50000S/B). HAM results were significantly better than those produced by CNM and DS in terms of ground-truth community correctness. Combined, the data indicate that HAM produced the second best NMI results for the LFR benchmark networks (6 of 8 sets) (Tables I-M in S5 File).
According to the execution time results shown in Fig 6, the Louvain method had the best performance efficiency among the LFR benchmark networks. Despite being constructed with an interpreted programming language, HAM still outperformed CNM, DS and Infomap. Infomap's performance efficiency results were satisfactory, with best NMI values produced when u 0.6. However, execution times increased sharply when u ! 0.6, meaning that random walkers required more time to find appropriate community structure boundaries. CNM execution time results indicate that the computing time required to identify the shortest paths between all node pairs increased as u increased. The high peaks in the DS execution time results may be due to an excessive number of choices for finding and merging micro-communities when u = 0.5.
Regarding the growth rate of execution time results (i.e., [t − t min ]/t min ), HAM exhibited good stability in performance growth compared to the Louvain method when u = 0.1*0.8. Infomap data indicate rapid growth when u ! 0.6 (Fig 7). According to these findings, the HAM algorithm was not significantly affected by small/large community sizes or distinct/ indistinct community structures. In contrast, the Louvain method and Infomap were affected by increased u values. Although the Louvain method had the best performance efficiency, its execution time growth rate increased quickly, producing execution time results that were close to those produced by the HAM algorithm (e.g., u = 0.8 for the 10000B and 50000B LFR benchmark networks).
Data for the scalability criterion are shown in Table 4. Modularity and execution time results for each method represent averages for 10 runs. Overall, only the Louvain method and HAM could be applied to all of the large-scale real-world networks. Further, the Louvain method produced the best modularity results for large-scale real-world networks such as Email-contacts, Loc-gowalla, Web-google, and Wiki-talk for identifying community structures in the absence of ground-truth communities. According to the cosine similarity measure, HAM produced the best modularity results for the Brightkite, Com-youtube, and Com-amazon networks, and was second best for large-scale real-world networks (S7 File). HAM modularity results were close to or better than those produced by the Louvain method. In terms of execution time, the Louvain method had the best performance efficiency for large-scale realworld networks. In terms of performance efficiency, HAM was second, behind the Louvain method. Its performance was considered satisfactory, despite the drawback that HAM was created with an interpreted language ( Table B in S7 File).
To execute a multi-resolution analysis, we introduced a tunable parameter as suggested by Xiang et al. [68] and Arenas et al. [69]. In the original network phase, a weighted-edge is determined by edge weight w ij > 0. We substituted a weight threshold for the 0 in formula 7-that is, E W = {e ij | w ij > w threshold }. Hence, edges were classified as weighted when their weights exceeded a threshold, otherwise they were classified as bridge-edge or sink-edge.
In addition to using a weight threshold as a tunable parameter [68][69], we introduced several communities and NMI values, and visualized the identified community structures. The weight threshold value was established as w threshold 2 [0,1], in increments of 0.01. Community detection results produced by the HAM algorithm were collected for each threshold value. The results shown in Figs 8 and 9 indicate that different similarity measures affected HAM's community detection capabilities (e.g., different NMI curve trends). For example, the highest NMI value for the Karate network (1.0) involved the minimum similarity measure, indicating that the identified and ground-truth community structures were identical. In contrast, the highest NMI values for the Dolphins network (0.7769) were produced by both the cosine similarity measure and the Jaccard index. The results also indicate that multi-resolution analysis can be used to determine appropriate parameters (e.g., weight thresholds or tunability) for acquiring useable community detection results. Multi-resolution analysis results for small-scale social networks are shown in S8 File.

Summary and discussion
The underlying HAM rationale is based on observations from two kinds of synthetic networks and related studies of resolution limit problems associated with community detection [51]. Since modularity approaches are not capable of identifying communities below certain scales, there is a possibility that such communities are connected by single bridges or small numbers of low-weight edges that can be identified via one or more combinations of a similarity measure and rule-based strategies. Further, a similarity measure can be used to calculate edge weights in order to identify low-weight edges, including those located between communities. The best rule-based strategy for merging nodes into super-nodes or for retaining nodes in their own communities is determined by edge type-weighted, bridge, or sink. We therefore designed a pre-processing original-network phase for partitioning networks into sub-communities based on weighted network structure. Because of the modularity maximization mechanism, these sub-communities are merged.
The rule-based methods described in this paper focus on the use of network structure information to extract important features from community structures. This observation can be used to build corresponding arc-merging strategies. For example, in the original HAM network phase we designed a three-step procedure for partitioning network community structures: edge classification for identifying different edge types, edge sorting for determining the order of applying arc-merging strategies, and arc-merging strategies for merging edge endpoints according to edge type and sorting order. All three steps are based on community structure observations. Other researchers have used similar strategies to determine community structure identification start points [40,47,63].
State-of-the-art methods can be analyzed, simplified and utilized based on their respective advantages to create explicit rule-based strategies for community detection tasks. In cases of overlapping communities [63][64], overlaps can be used to create corresponding rule-based strategies for HAM extensions. The k-cliques of any given network can be merged as supernode seeds during the original network phase, after which edge endpoints are merged into super-nodes. In some instances of overlap, link-pair similarities can be utilized for HAM extensions [65], and merged link-pair endpoints can serve as super-node seeds during initial network phases. Afterwards, single edge endpoints can be merged into super-nodes. Nodes belonging to multiple communities can be handled by a rule-based strategy involving the creation of duplicate nodes in individual communities.
According to our experimental data, there are two possible explanations for the capability of the proposed HAM algorithm to mitigate resolution limit problems: (a) the similarity measure used to calculate edge weights (especially bridge edges), and (b) the strategy of using bridge edges to maintain community structures. For example, in the Clique rings network, an individual clique is connected to two smaller cliques via two individual bridge edges; a single bridge edge also connects the smaller cliques to each other. Based on this example, we think that any two communities (or groups or cliques) connected by bridge edges should be retained, since merging them into new communities might increase overall modularity value.
Our execution time data indicate that the performance efficiency of HAM was satisfactory, raising questions about which modifications could lead to improvement. Possibilities include the addition of a more efficient sorting algorithm (e.g., a distributed sorting algorithm); a similarity measure with lower computation costs such as a minimum similarity denominator (as opposed to the Jaccard index denominator); simplifying defined rule structures in order to reduce computation costs; and using compiled programming languages such as C and C+ + rather than interpreted languages such as Python.

Network
Email -contacts  Brightkite  Com-youtube  Com-amazon   Measure  Q  T-avg  T-std  Q  T-avg  T-std  Q  T-avg  T-std  Q  T-avg  T- Subscript, similarity measure used in HAM.
https://doi.org/10.1371/journal.pone.0187603.t004 Modularity, which is frequently used to evaluate community detection results produced by algorithms, is strongly associated with resolution limit problems. Alternatives include greedy surprise maximization [35], preprocessing [66], edge-reweighting [67][68], multi-resolution [69], and Hamiltonians [70][71], among others. However, each alternative has its own problems, including excessive community splitting. Infomap, a non-modularity method, uses map equations as quality functions for community detection tasks. Our experimental data indicate that Infomap mitigates the resolution limit problem for clique-pair networks but not for clique-ring networks-a unique resolution limit problem.
According to our multi-resolution analysis results (Fig 10), HAM tended to "over-merge" long chains of star-like nodes (communities) connected by bridge edges in Clique line networks [41,50]. We believe this problem is associated with the structure maintenance strategy (Algorithm A2 in S1 File). After handling weighted edges, a long chain consisting of four starlike nodes and bridge edges establishes a connection to the largest community in the Clique line network (i.e., nodes with four bridge edges and multiple sink edges). After applying the structure maintenance strategy, the first bridge edge is merged according to rule R3, since one endpoint is in an identified community. Subsequently, the other three bridge edges merge with the largest R3-based community. The remaining sink edges are handled by applying the sink-shrinking strategy (Algorithm A3 in S1 File). In the end, only one community is identified.
There are at least three ways to address this problem. First, structure maintenance rules (especially R3 and R4) can be modified to retain edges between a community and a star-like node, or between two star-like nodes. Second, each star-like node can be modified so that only single edges are added for purposes of connecting neighbor pairs in a manner that eliminates the need to modify the structure maintenance strategy. A third possibility is to add node-merging strategies to identify star-like and other node types; this idea exceeds the scope of arcmerging strategies addressed in this paper.
According to the study results shown in Table 5, decisions regarding which algorithm to use-Louvain, Infomap, DS or HAM-must be made on a case-by-case basis. If effectiveness involving small-scale social networks is the main concern, the Louvain method is a better choice for modularity optimization, while HAM is a better choice when accuracy or time limitations are priorities. When the primary concern is mitigating resolution limit problems (examination criterion), DS or HAM are the best choices. For the correctness criterion using LFR theoretical networks, the Louvain method may be preferred due to its smaller time requirement, Infomap if accuracy is emphasized, or HAM in scenarios involving combined accuracy and performance efficiency. For the scalability criterion involving large-scale realworld networks, the Louvain method remains the best choice in terms of community structure quality and execution time. However, HAM is a satisfactory alternative in terms of community structure quality or performance efficiency, especially when avoiding potential resolution limit problems is a primary concern.

Conclusion
In this paper we introduced rule-based strategies for community detection tasks, and described a hierarchical arc-merging (HAM) algorithm that uses network topologies and rulebased arc-merging strategies to identify community structures. The HAM architecture consists of similarity measurement and modularity optimization phases, plus rule-based strategies for community detection. We also used four criteria-effectiveness, examination, correctness, and scalability-to determine community detection algorithm appropriateness. Experiments were conducted to examine our proposed HAM algorithm according to these criteria, which we believe all community detection algorithms should satisfy. To test for effectiveness, we used five social networks and one small-scale LFR benchmark network, all with ground-truth communities. Our results indicate that HAM was capable of identifying community structures with satisfactory NMI values, and that the identified communities were similar to ground- truth communities in social and LFR benchmark networks. For the examination criterion, our results (involving two synthetic networks) indicate an absence of HAM-associated resolution limit problems. For the correctness criterion, results from an analysis involving LFR benchmark networks (also with ground-truth communities) with different parameters and community sizes indicate that HAM's NMI values and performance efficiency were as satisfactory as those produced by Infomap. For the scalability criterion, eight large/very large real networks without ground-truth communities were used for separate tests. Results indicate that HAM produced satisfactory modularity values and good performance efficiency, although greater efficiency can likely be achieved if a compiled language is used for implementation.
•, first place or overcame resolution limitation problem.