A memetic algorithm for finding multiple subgraphs that optimally cover an input network

Xiaochen He; Yang Wang; Haifeng Du; Marcus W. Feldman

doi:10.1371/journal.pone.0280506

Abstract

Finding dense subgraphs is a central problem in graph mining, with a variety of real-world application domains including biological analysis, financial market evaluation, and sociological surveys. While a series of studies have been devoted to finding subgraphs with maximum density, the problem of finding multiple subgraphs that best cover an input network has not been systematically explored. The present study discusses a variant of the densest subgraph problem and presents a mathematical model for optimizing the total coverage of an input network by extracting multiple subgraphs. A memetic algorithm that maximizes coverage is proposed and shown to be both effective and efficient. The method is applied to real-world networks. The empirical meaning of the optimal sampling method is discussed.

Citation: He X, Wang Y, Du H, Feldman MW (2023) A memetic algorithm for finding multiple subgraphs that optimally cover an input network. PLoS ONE 18(1): e0280506. https://doi.org/10.1371/journal.pone.0280506

Editor: Loredana Bellantuono, Università degli Studi di Bari Aldo Moro: Universita degli Studi di Bari Aldo Moro, ITALY

Received: February 18, 2022; Accepted: January 2, 2023; Published: January 20, 2023

Copyright: © 2023 He et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data underlying the results presented in the study are available from New Urbanization and Sustainable Development Research Group at Xi'an Jiaotong University. More details please contact to xiaochenhe@xjtu.edu.cn.

Funding: This work was supported by the National Natural Science Foundation of China (grant numbers: 72104194 (XH) and 72004177 (YW), https://www.nsfc.gov.cn/), by the Key Project of the National Social Science Foundation of China (HD) (Grant no. 21AGL028, http://www.nopss.gov.cn/), and by the Fundamental Research Funds for the Central Universities (XH) (grant number: SK2021003, http://skc.xjtu.edu.cn/index.htm). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. There was no additional external funding received for this study.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Over the past several decades there has been substantial interest in studying social networks beyond the traditional social sciences while maintaining a focus on social structures. Specifically, instead of focusing on demographic attributes of a certain population, an increasing number of studies have focused on the structure of relationships that connect individual behaviors with collective dynamics [1]. One focus of the analysis of network structure has concerned cohesive subgraphs [2]. Notable examples of this work are sociometric cliques [3] and variants such as n-cliques, n-clan, k-plex, or k-core [4]. Related work has focused on detecting core/periphery structures [5], rich clubs [6] or communities [7]. Generally, the aim of these studies has been to find one or more subgraphs that maximizes some notion of density.

One popular notion of density that has been widely explored in the literature is the average degree (measured by edge-to-vertex ratio), and the problem of finding a subgraph that maximizes the average degree is called the densest subgraph problem (DSP) [8]. Analysis of the DSP has been applied to DNA analysis [9, 10], financial market evaluation [11], social surveys [12, 13], and theoretical computer science [14, 15]. In the Web domain, Gibson et al. identified the link spams by extracting dense subgraphs in large graphs [16], which is one of the greatest challenges in evaluating search engine rankings [17]. In the social context, DSP was applied to expert team formation [15, 18] as well as party organization [19, 20]. Angel et al. detected real-time stories by searching for dense subgraphs in the entity co-occurrence graph constructed from micro-blogging streams [21]. DSP has been also employed to find teams with higher collaborative compatibility [22].

DSP aims at extracting a single subgraph, but many real-world cases seek a collection of dense subgraphs, such as communities or social stories [23]. There are relatively few studies in this direction, one of which, by Balalau et al., focused on finding a set of m subgraphs that maximizes the total density of each subgraph (denoted as the “multiple-m densest subgraphs problem”, MmDSP) [23]. Variants of this model have been proposed subsequently [24, 25]. These studies have solved the problem of how to extract multiple dense subgraphs, but the process of covering the input network by extracting multiple subgraphs has not been addressed. Maximizing the subgraph density and maximizing the covering have different social meanings. In many real-world cases, the density of subgraphs does not have to be large. For example, a collection of network surveys may not focus on how dense each investigated network is, but on how best to cover the whole population, which is a boundary specification problem [26, 27]. In a network survey, self-report of social relationships is commonly used to collect network data. Specifically, given a list of participants, the data are obtained from answers to single-item questions that ask participants to enumerate individuals to whom they are connected by a direct relationship of a specified kind [1, 28]. The main purpose of such a network survey is to best cover the interactional relationships. Besides network surveys, the covering problem can be also applied to influence maximization [29], network tomography [30], or pinning control [31].

The present study addresses the problem of how to find multiple subgraphs that best cover the input network. We call the problem “multiple-m covering k-subgraphs problem” (MmCkSP), i.e., maximizing the covering of the network edges given m subgraphs of limited size k. Unlike the classic graph partitioning problem and the densest subgraph problem, the present study aims to find how to multiply extract subgraphs that leads to the best coverage of the network relationships. Two illustrations that show the difference between MmCkSP and the densest subgraph problem are in Fig 1. Given the input network in Fig 1A, standard community detection finds the partitions of {1,2,3,4,5} and {6,7,8,9,10}. If we set the number of subgraphs to 3 and the subgraph size to 5, MmDSP may present the best solution as the partitions of {1,2,3,4,5}, {6,7,8,9,10} and {3,6,8,9,10} in order to maximize the density of each extracted subgraph, while MmCkSP may extract the subgraphs {1,2,3,4,5}, {6,7,8,9,10} and {1,3,5,7,8}, which may cover all the network edges even though the subgraph {1,3,5,7,8} is not dense. MmDSP and MmCkSP also extract different subgraphs in Fig 1B. Here the edges a, b in (a) and c in (b) are ignored using MmDSP, but these omitted edges connecting different communities may sometimes have a useful social interpretation. Covering these edges can help provide a better understanding of the input network structure.

Download:

Fig 1. Two illustrations of the difference between maximizing the density and the coverage.

The upper two networks are the input networks. The lower part contains the solution of extracting multiple subgraphs. In the solution part, grey nodes represent the included nodes and white nodes represent the omitted nodes. The dashed lines represent the unobserved ties, while the bold solid lines represent the extracted ties.

https://doi.org/10.1371/journal.pone.0280506.g001

In real world cases, the subgraph size and the number of subgraphs should be constrained because they are always associated with costs. Taking network surveys as an example, a larger nominalist of nodes makes the burden on respondents greater, in which case ties are more likely to be missed because respondents may not be able to recall enough to fully capture the network structure [32]. Here, we formalize MmCkSP as a new optimization problem, which goes beyond the conventional strategy of optimizing network density. An illustration of the optimization is shown in Fig 2. Given an input network consisting of six nodes and nine edges, if we constrain the subgraph size to 4, we can extract subgraphs {1, 2, 3, 4}, {1, 4, 5, 6} and {2, 3, 5, 6} that cover all ties in the entire population. Solution 2 extracts {1, 2, 3, 6} and {3, 4, 5, 6}, which can also include all the edges. Obviously, solution 2 in Fig 2 is more cost-effective than solution 1. Here, we design an algorithm that can find the most cost-effective solution.

Download:

Fig 2. An illustration of the optimization problem of finding multiple subgraphs that best cover the input networks.

The left part is the input network which consists of six nodes and nine edges. The right part contains the solutions by extracting multiple subgraphs. Solution 1 requires three subgraphs and solution 2 requires two subgraphs. In the solution part, grey nodes represent the included nodes and white nodes represent the omitted nodes. Dashed lines represent unobserved ties, while bold solid lines represent the extracted ties.

https://doi.org/10.1371/journal.pone.0280506.g002

The present study is organized as follows: related background, including the densest subgraph problem, and corresponding strategies for the problem with multiple subgraphs, including optimization models, are presented in section 2. In section 3, we propose a memetic algorithm that optimizes the covering problem for each subgraph. Experiments with the proposed algorithm on computer-generated and real-world networks are described in section 4. Section 5 presents the conclusion and discussion.

2. Background

2.1. The densest subgraph problem and the solution approach

The densest subgraph problem (DSP) refers to how to obtain a list of members with the highest density. Given a graph G(V, E), where {v_i} ∈ V denotes the set of nodes and {e_ij} ∈ E denotes the set of relationships, DSP aims to find a subgraph G’(V’, E’) whose average density of G’ computed as is the largest [8]. The optimization of DSP can then be formulated as (1), below. Solution of the DSP has been shown to require polynomial time [8, 33–35].

(1)

The average density of the extracted subgraph in DSP is associated with the subgraph size, and there is a tradeoff between the density and size [36]. From DSP, one may extract smaller subgraphs in sparser networks but extract larger subgraphs in denser networks. However, in real applications there always exists an upper bound for the subgraph, and one may constrain the size of dense subgraphs [36, 37]. If all subgraphs have the same (bounded) size, the problem, which then becomes NP-hard [14, 33], has been investigated under various names including the “k-cluster problem” [38–40], the “k-cardinality subgraph problem” [41], or the “densest k-subgraph problem” (DkSP) [42, 43]. This problem is formulated as (2), below. Some variants of DkSP has been proposed. If the extracted subgraph is required to be connected, the problem is referred as to the densest connected k-subgraph problem (DCkSP) [44]. In weighted networks, finding the subgraph with k nodes that has the highest sum of the weights (edges) is called the “heaviest k-subgraph problem” (HkSP) [45]: (2)

DkSP actually has important interpretations in social science. A social problem related to DkSP is called the “boundary specification problem” (BSP), which aims to find a list of samples that best represents the population [26]. When nodes are excluded from the system, the observed network structure differs from the actual one. Simulations have examined features of missing actors and have shown the detrimental impact of incomplete sampling [27, 28, 46, 47]. The similarity between the sample and the complete network declines as more nodes are excluded, and missing nodes substantially affect measures related to the complete network [28, 46, 47].

Solving DkSP can help to solve BSP, as illustrated in Fig 3. The input network consists of eight nodes, and we set the subgraph size k at 7. If we exclude node 5, then the network has a ring structure, which is quite different from the original structure. If node 6 is omitted by accident or for convenience, the whole network becomes unconnected. This example illustrates how a minor change in network structure can have a dramatic effect on inference about network properties as a whole [48]. Only for special cases can the sampled network have a similar structure to the complete network [49], while the solution that excludes node 8 can be the special case that is also the best solution of DkSP. If we exclude node 8, most of edges can be preserved because of the principle of largest density.

Download:

Fig 3. DkSP in solving the boundary specification problem.

Grey nodes represent the included nodes and white nodes represent the omitted nodes. The dashed lines represent unobserved ties. The left network is the input network, while the three networks on the right represent the solution made up of extracted subgraphs. The upper right network excludes node 5 and becomes a ring; the middle right network excludes node 6 and becomes unconnected; the lower right network excludes node 7, and is the best solution of DkSP.

https://doi.org/10.1371/journal.pone.0280506.g003

To solve DkSP, a number of studies have focused on the use of semidefinite programming; that is, the problem is transformed into a semidefinite programming problem for each node of a branch-and-bound tree [39, 40]. Some semidefinite programming relaxations have been also used to approximate DkSP [50, 51]. Other studies wrote DkSP as a problem of rank-constrained cardinality minimization, and relaxed it by the use of the nuclear norm [52, 53]. Also, a series of heuristic algorithms have been employed in solving the problem. Kincaid proposed a simulated annealing algorithm and a tabu search algorithm to solve the NP-hard DkSP [54]. Macambira employed a tabu search algorithm which was shown to outperform greedy search [55]. A variable neighborhood search heuristic proposed by Brimberg et al. was shown to be effective in solving the DkSP [56].

From a sociological view, given that inappropriate boundary specification can have a detrimental effect on estimating the structure of a real population, a list of sampling methods related to the sampling in network surveys has been also proposed. For example, randomly selecting individuals is a common method of sampling in social science investigations [27, 45, 57]. Top-down sampling (choosing the top nodes ordered by size) has also been widely used and yields estimates of network properties that are highly consistent with those obtained from whole network analysis [58, 59].

2.2. Covering problem with multiple graphs

Finding multiple densest subgraphs has recently been discussed [23–25, 60]. Balalau et al. focused on finding a set of m subgraphs that maximize the total density of each subgraph with the constraint of an upper bound on the pairwise Jaccard coefficient between the sets of nodes of the subgraphs (denoted as “multiple-m densest subgraphs problem”, MmDSP) [23]. Nasir et al. proposed a dynamic variant of this problem, where a collection of m disjoint subgraphs is found in a sliding window [25]. An approach similar to MmDSP was proposed by Galbrun et al. where the objective function takes both the total density and the distance between the subgraphs into account [24]. Dondi et al. addressed the approximability and computational complexity of this problem [60]. An application of MmDSP on dual networks has been also studied [61]. In this paper, we study the multiple-m densest subgraphs problem (MmDSP) proposed by Balalau et al. [23]. MmDSP aims to find a collection of m subgraphs {Gⁱ(Vⁱ,Eⁱ)} for which the sum of the average density of each subgraph is maximized [23]. Optimization of MmDSP can be formulated as problem (3) below, where a is the upper bound on the pairwise Jaccard coefficient.

(3)

MmDSP has focused on improving the density of each subgraph but has ignored the covering of the input network by extracting subgraphs. Although techniques such as the pairwise Jaccard coefficient or the distance between subgraphs have been invoked to avoid too much overlap between the extracted subgraphs [23, 24], the literature still lacks a focus on the network covering problem. The present study aims to find an optimal method for finding multiple subgraphs that best cover the input network, denoted as MmCkSP. There are three key elements associated with the sampling process: the covering of the input network (C), the bound on the subgraph size (k), and the number of subgraphs (m). Given the size of each subgraph |Vⁱ|, practitioners need to assemble the collected ties into a network that can best cover the input network (C). The objective function of MmCkSP is then formulated as (4), below. When the limited number of subgraphs is 1, problem (4) can be transformed to problem (2).

(4)

Here we use the fraction of extracted edges to measure C, i.e., C(E₁,E₂,…,E_m) = cover(…cover(cover(E₁,E₂),E₃),…,E_m)/|E|, where cover(E_i, E_j) = |E_i ∪ E_j| − |E_i ∩ E_j|. An illustration of the measurement of C is shown in Fig 4. Compared with problem (3), C contains the physical significance of a (a parameter for avoiding subgraphs being too similar in problem (3)). Since the objective is to maximize the total covering of the input network, the extracted subgraphs should be different, and thus we do not necessarily employ a in problem (4).

Download:

Fig 4. Computing the objective function.

The left part is the input network which consists of five nodes and seven edges. The right part is a solution containing three subgraphs, where grey nodes represent the included nodes and white nodes represent the omitted nodes. Dashed lines represent unobserved ties, while bold solid lines represent the extracted ties. The sum of the average density of this solution is 8/3, while the covering of the input network is 6/7, because only six edges (excluding edges between nodes 3 and 4) have been extracted.

https://doi.org/10.1371/journal.pone.0280506.g004

The functional relationship between the three elements listed above is non-linear as can be seen from a simulation of the random sampling shown in Fig 5. We find that increasing subgraph size is more helpful in promoting representativeness than increasing the number of subgraphs, because the gradient is greater than .

Download:

Fig 5. Relationship between the three key elements in the sampling process.

X-axis, Y-axis and Z-axis are, respectively, the subgraph size, the number of subgraphs and the covering of the input data. The simulation was performed on an ER random network with 100 nodes and 1,000 edges. The mean value of the results across 10 runs is presented in the figure.

https://doi.org/10.1371/journal.pone.0280506.g005

3. Algorithm

Given a fixed number of subgraphs (m), subgraph size (k) and the entire population (N), the number of possible extracted subgraphs is

Traversing all these solutions cannot be computed polynomial time, and thus MmCkSP constitutes an NP-hard problem. Compared with the NP-hard MmDSP proposed by Balalau et al. [23], MmCkSP is more complicated because of the higher time cost for computing the covering in place of the average density, as well as setting the bound k on subgraph size. In this section, we introduce a memetic algorithm that combines a genetic algorithm and a heuristic local search called the memetic algorithm to find multiple subgraphs that cover the input network (MA-MmCkSP). The memetic operation includes both long-distance and short-distance search and has proved to be effective in solving NP-hard problems [62, 63].

3.1. Framework

The framework of MA-MmCkSP is shown in Algorithm 1. We first input necessary parameters and the adjacency matrix of the input network. An initial population P is generated that consists of a list of solutions (coded as chromosomes), and then the process is repeated until the maximum number of iterations is reached or the coverage of the input network remains unchanged over 50 iterations. At each iteration, tournament selection is used to select a parent population P_parent with the highest representativeness. Next, we perform a genetic operation on P_parent to form an offspring population P_offspring. Then the local-search function is applied to find the local maximum solution for the offspring population. Then an updating function is used to construct a new population P with better solutions. After repeating, we output the fittest solution by decoding.

Algorithm 1. Framework of MA-MmCkSP.

Input: Population size (S_p), Tournament size (S_tour), Mating pool size (S_pool), Crossover probability (P_c), Maximum number of iterations (M_i), Number of nodes (N), Number of subgraphs (m), subgraph size (k), Adjacency matrix of networks (A).
P ← Initialization (S_p, N, m, k);
Repeat
P_parent ← Selection (P, S_tour, S_pool);
P_offspring ← Genetic Operation (P_parent, P_c, P_m, N, m, k);
P_offspring ← Local Search (P_offspring, N, m, k);
P ← Update (P, P_parent, P_offspring);
Until Termination (I_max)
Decode (P)
Output: the best solution of the finding multiple subgraphs and its covering.

3.2. Representation and initialization

Each solution is encoded as a chromosome that consists of m substrings X = [X₁, X₂, …, X_m], where m is the number of subgraphs. Each substring represents the node set in a subgraph and is denoted by a list of genes x ∈ {1, 2, …, n} that specifies which nodes should be included. Fig 6 illustrates the representation for a subgraph of size 5, and the number of subgraphs is set to 4, so the chromosome is formed as five genes with four substrings. If we change the 5^th gene from 5 to 10 in the first substring, the new solution will substitute node 10 for node 5 in the first subgraph.

Download:

Fig 6. Illustration of the representation.

The upper two chains denote two chromosomes consisting of genes. The lower left part is an input network, which consists of ten nodes. The lower right part is two solutions of extracted multiple subgraphs corresponding to the two chromosomes, where grey nodes represent the included nodes and white nodes represent the omitted nodes. The dashed line represents unobserved ties, while the bold solid lines represent the extracted ties.

https://doi.org/10.1371/journal.pone.0280506.g006

For the initialization, we generate a population and randomly select the nodes for each substring in every chromosome.

3.3. Genetic operation

The genetic operation includes both crossover and mutation, which are the primary operations in the genetic algorithm. The algorithm performs the crossover procedure with probability P_c, and executes the mutation procedure with probability P_m = 1−P_c. To some extent crossover represents long-term search, while mutation represents short-term search. Thus appropriate setting of P_m = 1−P_c enables a balance to be found between long-term and short-term search, which helps to increase the efficiency of the genetic algorithm [64, 65].

In the crossover operation, two parental chromosomes are chosen using tournament selection. We first disorganize the order of the substrings for each chromosome to maintain diversity, and then find the genes that differ between the chromosomes in each substring. Given each pair of different genes, we generate a random number γ; if γ< 0.5, the gene remains unchanged; and if γ ≥ 0.5, the corresponding genes are swapped between the two chromosomes. Finally, we add the common genes and form the two offspring chromosomes. The crossover operation is illustrated in Fig 7. After changing the substring disorder, substring 3 in parent 1 and substring 2 in parent 2 are reassigned to the first substring. The genes that differ between parent 1’ and parent 2’ are grey. Since the generated random numbers are 0.3, 0.6, 0.9, and 0.4, respectively, for the first substring, we swap the second and third different genes between the two parental chromosomes because the corresponding γ ≥ 0.5.

Download:

Fig 7. The crossover operation.

Grey elements represent genes that differ between the two parental chromosomes. The substrings of two parent chromosomes are first disorganized, and we check all elements that differ between the two parental chromosomes. If the random number γ < 0.5, the element remains unchanged in the offspring chromosomes, while if γ ≥ 0.5, the corresponding elements are swapped.

https://doi.org/10.1371/journal.pone.0280506.g007

In the mutation operation, we randomly select an element x_i in each substring and then randomly assign a different node number that is also different from other node numbers within the same substring as the element x_i.

3.4. Local search

Local search is effective in reducing inefficient exploration and not only improves the accuracy but also speeds up the convergence [64–66]. Here we employ a hill-climbing technique presented as Algorithm 2. We check each element in a chromosome and replace the original gene with a node number that increases the objective function (the coverage of the input network) on substitution. The chromosome can then reach a local optimum.

Algorithm 2. Local Search in MA-MmCkSP.

Input: The best offspring chromosome (C_offspring), number of nodes (N), number of subgraphs (m) and subgraph size (k);
For i = 1; i ≤ k × m; i++
C_offspringnew = C_offspring
For j = 1; j ≤ N; j++
C_offspringnew(i) = j;
If Obj(C_offspringnew) > Obj(C_offspring)
C_offspring(i) = j;
End If
End For
End For
Output: C_offspring

3.5. Complexity analysis

Given a network with N nodes, number of subgraphs m and the subgraph size k, the time-complexity of MA-MmCkSP is analyzed as follow. At each iteration, we need to execute the crossover operation S_pool/2 times (where S_pool is the size of the mating pool) and the mutation operation S_pool times at most. Since computing the covering costs O(mk), the time-complexity for performing the genetic operation is O(mkS_pool). In the local search procedure, finding the best neighbor for each gene needs O(Nmk), and thus to find the local optimal chromosome will cost O(Nm²k²). Since O(mkS_pool) < O(Nm²k²), the total time complexity of the proposed algorithm is O(Nm²k²).

4. Results

In this section, we show the effectiveness and efficiency of MA-MmCkSP running on a computer-generated random network. We also carry out the procedure on various real-world networks and interpret the optimal method in the social context. The experiments were carried out on a 2.11 GHz CPU with 16.00 GB memory computer, running on Windows 10 using MATLAB to execute the procedure. Table 1 shows the parameters used in the experiments that gave the best performance for the proposed algorithms.

Download:

Table 1. Parameters in the experiments.

https://doi.org/10.1371/journal.pone.0280506.t001

4.1. Results for computer-generated networks

In order to assess the effectiveness of MA-MmCkSP, we compare it with random extraction (RE), the big-degree sampling method where big-degree nodes have a higher probability of being extracted (BD-MmCkSP), the greedy algorithm based on the operation of local search (GR-MmCkSP) and the genetic algorithm without local search (GA-MmCkSP). The five methods were carried out on an ER random network consisting of 100 nodes and 1,000 edges. Fig 8A and 8B present the maximum and mean values of 10-runs, comparing covering of the input network for different settings of subgraph size and number of subgraphs. The figures show that MA-MmCkSP performs the best, GR-MmCkSP performs second best, GA-MmCkSP performs the third, while BD-MmCkSP and RE perform the worst. From Fig 8A we see that the whole network’s ties can be collected using only 10 subgraphs when the subgraph size reaches 37, which is much smaller than the theoretical maximum . We conclude that MA-MmCkSP is effective in solving the optimization problem of sampling in multiple social surveys. The slopes of the curves in Fig 8A are much higher than those in Fig 8B, which suggests that focusing on subgraph size is more important than focusing on numbers of subgraphs.

Download:

Fig 8. Covering the input random network with RE, BD-MmCkSP, GR-MmCkSP, GA-MmCkSP and MA-MmCkSP.

(a) shows the result for different subgraph sizes given the number of subgraphs m = 10; (b) shows the result for different numbers of subgraphs given the subgraph size k = 10. The solid line with stars represents the maximum value of the ten-runs experiment, while the dashed-dotted line with the circles represents the mean value.

https://doi.org/10.1371/journal.pone.0280506.g008

We computed the average density of the extracted solutions corresponding to Fig 8. Fig 9A and 9B present the mean values of the average density in 10-runs for different settings of subgraph size and number of subgraphs. The figures show that the extracted subgraphs derived from MA-MmCkSP are densest, GR-MmCkSP are the second densest, GA- MmCkSP are the third densest, while subgraphs in BD-MmCkSP and RE are sparsest. The results suggest that the optimal extracted subgraphs are more likely to be denser. The proposed algorithm can provide a new alternative for solving the multiple-m densest subgraphs problem (MmDSP). In addition, we find the density of subgraphs increases as the subgraph size increases, but decreases as the number of subgraphs decreases. There is a tradeoff among the subgraph density, subgraph size and the number of subgraphs.

Download:

Fig 9. Average density of the extracted subgraphs for RE, BD-MmCkSP, GR-MmCkSP, GA-MmCkSP and MA-MmCkSP.

(a) shows the result for different subgraph sizes given the number of subgraphs m = 10; (b) shows the result for different numbers of subgraphs given the subgraph size k = 10.

https://doi.org/10.1371/journal.pone.0280506.g009

We also compared the results obtained using MA-MmCkSP with those using GA-MmCkSP on each iteration, and we see that the memetic operation is more efficient. Fig 10 show the results for the two methods with different settings for subgraph size and number. MA-MmCkSP performs much better and converges faster than GA-MmCkSP. The difference is especially apparent in Fig 10D, where MA-MmCkSP is able to reach a covering of 100% at the first iteration, while GA-MmCkSP converges after the 80^th iteration and even then does not reach 100%.

Download:

Fig 10. Covering of the input random network with GA-MmCkSP and MA-MmCkSP for each iteration.

Figures (a)-(d) are, respectively, the results for the settings k = 10 and m = 10, k = 10 and m = 30, k = 30 and m = 10, k = 30 and m = 30. The solid line with the stars represents the maximum value of the ten-runs experiment, while the dashed-dotted line with the circles represents the mean value. Grey and black, respectively, represent the results of GA-MmCkSP and MA-MmCkSP.

https://doi.org/10.1371/journal.pone.0280506.g010

In order to find characteristics of the extracted nodes, we compute the correlation between the number of times each node is selected using MA-MmCkSP and the network centrality, as shown in Fig 11. Comparing Figs 8A and 11A, when the network cannot be completely collected (i.e., the subgraph size is smaller than 37), the probability of a node being selected is highly correlated with its centrality. The correlation dramatically decreases as the boundary size surpasses the critical value. Fig 11B shows a similar result if central nodes are more likely to be included repeatedly. The results suggest that including central nodes is helpful in achieving the network covering.

Download:

Fig 11. Correlation between the number of times each node is selected and the network centrality corresponding to Fig 6.

(a) shows the result for different subgraph sizes given the number of subgraphs m = 10; (b) shows the result for different number of subgraphs given the subgraph size k = 10. The different curves represent the correlations with degree centrality, betweenness centrality and closeness centrality.

https://doi.org/10.1371/journal.pone.0280506.g011

A sensitivity analysis for the proposed algorithm on a network with 1,000 nodes and 10,000 edges is conducted. The results show that MA-MmCkSP still performs the best in maximizing the coverage of networks of larger size as shown in Fig 12. We also test the performance of MA-MmCkSP for networks with different average densities and find that the extracted subgraphs are less covering given the fixed number of subgraphs and the subgraph size when the density of the input network increases as shown in Fig 13. A subgraph of larger size is required if we aim to investigate a denser social network.

Download:

Fig 12. Covering of the 1000-node input random network acquired by RE, BD-MmCkSP, GR-MmCkSP, GA-MmCkSP and MA-MmCkSP for different subgraph sizes given the number of subgraphs m = 10.

https://doi.org/10.1371/journal.pone.0280506.g012

Download:

Fig 13. Covering of the input random 100-node network with different average densities given the number of subgraphs m = 10 and the subgraph size k = 10, 20 and 30 acquired by MA-MmCkSP.

https://doi.org/10.1371/journal.pone.0280506.g013

4.2. Results for real-world networks

In this section, we test RE, GA-MmCkSP and MA-MmCkSP on six real-world networks, namely Zachary’s Karate Club network, Bottlenose Dolphins network, American College Football network and three migrant workers’ networks of ADS, YDSC and WH companies in Shenzhen, China.

Zachary’s Karate Club network consists of 34 karate-club members and 78 social ties observed by Zachary over two years [67]. The Bottlenose Dolphins network was constructed by Lusseau [68], who observed 62 bottlenose dolphins and their 159 connections over seven years. The American College Football network was constructed from the schedule of Division Ⅰ games during the year 2000 football season. The network consists of 115 nodes that represent teams and 616 edges that represent the regular season games between the two teams that they connect [7]. The next three examples are networks of migrant workers in ADS, YDSC, and WH companies investigated by the New Urbanization and Sustainable Development Group of Xi’an Jiaotong University [65]. The three networks were constructed from a single-item question that asked the participant to enumerate individuals with whom they are often in contact at work. ADS network consists of 165 nodes and 1196 edges; YDSC network consists of 70 nodes and 272 edges; WH network consists of 193 nodes and 887 edges. The survey involved both network-level and individual-level investigations.

For each network, the number of subgraphs is m = 5, 10, or 30, and the subgraph size is chosen from k = 0.1N, k = 0.2N, k = 0.3N, k = 0.4N, k = 0.5N, where N is the network size. Table 2 shows the mean and maximum value of representativeness over 10 runs produced by RE, BD-MmCkSP, GR-MmCkSP, GA-MmCkSP and MA-MmCkSP with different values of m and k. We find MA-MmCkSP performs much better than other algorithms. Moreover, subgraph size k plays a much more important role in the multiple extractions: a small increase in k can produce a large improvement in covering. Even random extraction is able to cover all the edges when k reaches 0.4N.

Download:

Table 2. Mean and maximum values of the covering for the real-world networks.

https://doi.org/10.1371/journal.pone.0280506.t002

By decoding the best chromosomes generated by the proposed algorithm, we can extract the specific sampling solution in each subgraph. Fig 14 presents one of the best extraction methods for Zachary’s Karate Club network with k = 0.3N≈10 and m = 5. The present solution is able to collect all the edges, i.e. the covering C = 100%. In Zachary’s Karate Club network, nodes 1, 2, 3, 33, 34 are key individuals who have the highest centrality, and we find that at least two central nodes are needed to include as many edges as possible. However, there is no solution that includes all five central nodes within the same subgraph. This is because a pair of central nodes may be disconnected, while including these nodes may not collect any edges. For example, investigating nodes 1, 3, 34 cannot collect any edges although they have important positions in the network. This suggests that including the central nodes is important, but extracting only the central nodes may not lead to a result that gives the best coverage.

Download:

Fig 14. The sampling solution of Zachary’s Karate Club network with k = 0.3N and m = 5.

The upper part is the topology of Zachary’s Karate Club network. The lower part is the solution of extracting multiple subgraphs. In the solution part, grey nodes represent included nodes and white nodes represent omitted nodes. The dashed lines represent the unobserved ties, while the bold solid lines represent the extracted ties.

https://doi.org/10.1371/journal.pone.0280506.g014

We find that the optimal method is associated with the community structure. Zachary’s Karate Club network is a typical network with characteristic community structure [67]. The network can be naturally divided into two communities where edges are denser within the same community but sparser between the different communities. Fig 15 presents the optimized solution of Zachary’s Karate Club network with k = 0.2N≈7 and m = 5. This solution cannot collect all the edges (C = 0.769) because of the limited subgraph size. Most of the extracted nodes within the same communities are included in the one independent subgraph, which suggests that collecting nodes within the same community in each subgraph is helpful for collecting as many edges as possible; we call this the “community collecting method” (CCM). However, the edges between different communities can be hard to detect using CCM. Therefore, CCM is appropriate where the subgraph size or number are so limited that the optimized solution cannot collect all the edges (in other words, C<1). Another limitation of CCM is that it may not work effectively on networks without community structure (modularity Q<0.3) [7].

Download:

Fig 15. The sampling solution of Zachary’s Karate Club network with k = 0.2N and m = 5.

The club members are divided into two communities. Included nodes are grey, and white nodes represent omitted nodes. The dashed line represents the unobserved ties, while the bold solid line represents the extracted ties.

https://doi.org/10.1371/journal.pone.0280506.g015

In order to test the performance of CCM, we ran MA-MmCkSP on the benchmark networks proposed by Lancichinetti et al. [69]. Each network consists of 128 nodes with the average degree of 16. These nodes are evenly assigned one of the clustering attributes {1, 2, 3, 4}. We introduce a mixing parameter that denotes the fraction of edges for one node linking to other nodes with different clustering attributes. A higher mixing parameter represents a smaller modularity of the input network. We generated nine networks for values of mixing parameter ranging from 0 to 0.5. Fig 16 shows the covering result for different mixing parameters given the number of subgraphs m = 10 and the subgraph size k = 10, 20 and 30. We find that the extracted subgraphs are less covering as the mixing parameter increases. This is because the optimal solution is based on CCM, while CCM performs less efficiently when the feature of community structure of the input network declines.

Download:

Fig 16. Covering of the benchmark networks for different mixing parameters given the number of subgraphs m = 10 and the subgraph size k = 10, 20 and 30 acquired by MA-MmCkSP.

https://doi.org/10.1371/journal.pone.0280506.g016

5. Conclusion and discussion

The present study provides a new perspective on addressing the multiple densest subgraph problem. We advance research on this topic by formulating the problem of covering the input network as an optimization problem and propose a model that maximizes the covering of the observed network by extracting multiple subgraphs. A memetic algorithm combined with a genetic algorithm and local search optimizes the extraction in each independent subgraph.

The proposed algorithm can solve the optimization problem effectively. Compared to adding the number of extractions, increasing subgraph size is more helpful in improving the coverage of the network. Including nodes with higher centrality is necessary, but investigating only those nodes cannot fully reproduce the input network structure because the common edges connected with normal crowds (nodes with lower centrality) can easily be ignored. When subgraph size or numbers are constrained, the community collecting method, which includes nodes within the same community in each subgraph, can be an effective way of enhancing the covering. A suggestion for practitioners is to recognize the potential community structure of research objects before conducting the extractions.

From a sociological review, previous research has highlighted the effectiveness of random sampling [27, 45, 57], but this method is not effective when surveys are conducted repeatedly. This is because random sampling in multiple surveys leads to redundancy, where an edge may be detected many times. The top-down sampling method (choosing the top nodes ordered by size) is also of limited value in repeated surveys, because edges connected by nodes with different rank sizes cannot be collected. Including central nodes helps to enhance the covering, but including only the representative nodes may not lead to a representative result. On the other hand, node size is difficult to estimate precisely in social networks. Before acquiring the whole structure of a network, it is difficult to judge whether an individual is a central or marginal member. An illustration is presented in Fig 14, which shows the difference between different methods in recognizing core nodes. The network in Fig 17 is the ADS migrant workers’ network. By asking “how many friends or acquaintances do you have in Shenzhen (ADS is located in this city)?” in the individual-level questionnaire, we can divide the company members into “big-size” individuals, who have 30 or more friends or acquaintances, and small-size individuals, who do not have as many as 30 friends (see Fig 17A). By applying the core-periphery model [5] to the whole network, we can also find big-size individuals and small-size individuals as shown in Fig 17B. This figure is derived using Eq (5), where α_ij is relationship between nodes i and j, and c_i is one of node i’s attributes (core or periphery), “●” indicates a missing value which treat the off-diagonal regions of α_ij as missing data that helps maximize density in the core and minimize density in the periphery. The inconsistency between (a) and (b) suggests that top-down sampling may choose some fake big-size nodes which undermines the accuracy of network estimation.

(5)

Download:

Fig 17. Identification of central and marginal members of the ADS migrant workers’ network using different methods.

(a) uses the number of friends or acquaintances in the individual-level questionnaire; (b) uses the core-periphery model. The dark nodes are central members, while the white nodes are marginal members.

https://doi.org/10.1371/journal.pone.0280506.g017

In most natural settings, practitioners have no idea as to the real structure of the actual network. In order to collect all the potential edges, practitioners should assume that the actual network is completely connected, so that each pair of nodes is connected. The proposed algorithm can also be applied to the covering for a completely connected network. Despite the merits of these new proposals, there are some limitations to the present study. The algorithm may sometimes be trapped in a local maximum, and we plan to design a more intelligent algorithm in the future. The objective function (coverage of the input network) in this paper is the detected number of edges divided by the total number of edges, while other indices, such as centrality, might also be employed in the optimization model. A meaningful analysis of social networks requires both individual-level and network-level investigations, and thus an index for measuring the covering of multiple subgraphs that considers both individual and relational attributes needs to be designed in the future.

References

1. Marsden PV. Network data and measurement. Annu. Rev. Sociol. 1990; 16(1): 435–463.
- View Article
- Google Scholar
2. Kumar R, Raghavan P, Rajagopalan S, Tomkins A. Trawling the web for emerging cyber-communities. Comput. Netw. 1999; 31(11–16): 1481–1493.
- View Article
- Google Scholar
3. Alba RD. A graph-theoretic definition of a sociometric clique. J. Math. Sociol. 1973; 3: 113–126.
- View Article
- Google Scholar
4. Wasserman S, Faust K. Social Network Analysis: Methods and Applications. Cambridge: Cambridge University Press; 1994.
5. Borgatti SP, Everett MG. Models of core/periphery structures. Soc. Networks 2000; 21(4): 375–395.
- View Article
- Google Scholar
6. Barabási AL, Albert R. Emergence of scaling in random networks. Science 1999; 286(5439): 509–512. pmid:10521342
- View Article
- PubMed/NCBI
- Google Scholar
7. Girvan M, Newman ME. Community structure in social and biological networks. P. Natl. Acad. Sci. USA 2002; 99(12): 7821–7826. pmid:12060727
- View Article
- PubMed/NCBI
- Google Scholar
8. Goldberg AV. Finding a maximum density subgraph. Berkeley, University of California; 1984.
9. Langston MA, Lin L, Peng X, Baldwin NE, Symons CT, Zhang B, et al. A combinatorial approach to the analysis of differential gene expression data. Methods of Microarray Data Analysis, Springer; 2005.
- View Article
- Google Scholar
10. Fratkin E, Naughton BT, Brutlag DL, Batzoglou S. MotifCut: regulatory motifs finding with maximum density subgraphs. Bioinform. 2006; 22(14): 150–157. pmid:16873465
- View Article
- PubMed/NCBI
- Google Scholar
11. Du X, Jin R, Ding L, Lee VE, Thornton JH. Migration motif: a spatial-temporal pattern mining approach for financial markets, P. 15th ACM SIGKDD Int. Conf. Data. Min. Knowl. Disc. 2009; 1135–1144.
- View Article
- Google Scholar
12. Tang L, Liu H. Graph mining applications to social network analysis. Managing and Mining Graph Data, Springer; 2010.
13. Lee VE, Ruan N, Jin R, Aggarwal C. A survey of algorithms for dense subgraph discovery. Managing and Mining Graph Data. Springer; 2010.
14. Feige U, Peleg D, Kortsarz G. The dense k-subgraph problem. Algorithmica 2001; 29(3): 410–421.
- View Article
- Google Scholar
15. Tsourakakis C, Bonchi F, Gionis A, Gullo F, Tsiarli M. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. P. 19th ACM SIGKDD Int. Conf. Knowl. Disc. Data. Min. 2013; 104–112.
- View Article
- Google Scholar
16. Gibson D, Kumar R, Tomkins A. Discovering large dense subgraphs in massive graphs. P. 31st Int. Conf. VLDB 2005; 721–732.
- View Article
- Google Scholar
17. Henzinger MR, Motwani R, Silverstein C. Challenges in web search engines. ACM SIGIR Forum 2002; 36(2): 11–22.
- View Article
- Google Scholar
18. Bonchi F, Gullo F, Kaltenbrunner A, Volkovich Y. Core decomposition of uncertain graphs. P. 20th ACM SIGKDD Int. Conf. Knowl. Disc. Data. Min. 2014; 1316–1325.
- View Article
- Google Scholar
19. Sozio M, Gionis A. The community-search problem and how to plan a successful cocktail party. P. 16th ACM SIGKDD Int. Conf. Knowl. Disc. Data. Min. 2010; 939–948.
- View Article
- Google Scholar
20. Tsourakakis C. The k-clique densest subgraph problem. P. 24th Int. Conf. World Wide Web 2015; 1122–1132.
- View Article
- Google Scholar
21. Angel A, Koudas N, Sarkas N, Srivastava D, Svendsen M, Tirthapura S. Dense subgraph maintenance under streaming edge weight updates for real-time story identification, VLDB J. 2014; 23(2): 175–199.
- View Article
- Google Scholar
22. Gajewar A, Das Sarma A. Multi-skill collaborative teams based on densest subgraph. P. SIAM Int. Conf. Data Min. 2012; 165–176.
- View Article
- Google Scholar
23. Balalau OD, Bonchi F, Chan TH, Gullo F, Sozio M. Finding subgraphs with maximum total density and limited overlap. P. 8th ACM Int. Conf. Web Search Data Min. 2015; 379–388.
- View Article
- Google Scholar
24. Galbrun E, Gionis A, Tatti N. Top-k overlapping densest subgraphs. Data. Min. Knowl. Disc. 2016; 30(5): 1134–1165.
- View Article
- Google Scholar
25. Nasir MAU, Gionis A, Morales GDF, Girdzijauskas S. Fully dynamic algorithm for top-k densest subgraphs. P. ACM Conf. Inform. Knowl. Manage. 2017; 1817–1826.
- View Article
- Google Scholar
26. Laumann EO, Marsden PV, Prensky D. The boundary specification problem in network analysis. Res. Methods Soc. Netw. Anal. 1989; (61).
- View Article
- Google Scholar
27. Borgatti SP, Carley KM, Krackhardt D. On the robustness of centrality measures under conditions of imperfect data. Soc. Networks 2006; 28: 124–136.
- View Article
- Google Scholar
28. Kossinets G. Effects of missing data in social networks. Soc. Networks 2006; 28(3): 247–268.
- View Article
- Google Scholar
29. Kempe D, Kleinberg J, Tardos É. Maximizing the spread of influence through a social network. P. 9th Int. Conf. ACM SIGKDD 2003; 137–146.
- View Article
- Google Scholar
30. Lawrence E, Michailidis G, Nair VN, Xi B. Network tomography: A review and recent developments. Front. Stat. 2006; 345–366.
- View Article
- Google Scholar
31. Cheng Z, Xin Y, Cao J, Yu X, Lu G. Selecting pinning nodes to control complex networked systems. Sci. China Technol. Sci. 2018; 61(10): 1537–1545.
- View Article
- Google Scholar
32. McCarty C, Killworth PD, Rennell J. Impact of methods for reducing respondent burden on personal network structural measures. Soc. Networks 2007; 29(2): 300–315.
- View Article
- Google Scholar
33. Asahiro Y, Hassin R, Iwama K. Complexity of finding dense subgraphs. Discrete Appl. Math. 2002; 121(1–3): 15–26.
- View Article
- Google Scholar
34. Charikar M. Greedy approximation algorithms for finding dense components in a graph. Int. Workshop Approx. Algorithms Comb. Optim. 2000; 84–95.
- View Article
- Google Scholar
35. Kawase Y, Miyauchi A. The densest subgraph problem with a convex/concave size function. Algorithmica 2018; 80(12): 3461–3480.
- View Article
- Google Scholar
36. Wang Z, Chu L, Pei J, Al-Barakati A, Chen E. Tradeoffs between density and size in extracting dense subgraphs: A unified framework. IEEE/ACM Int. Conf. Adv. Soc. Networks Anal. Min. 2016; 41–48.
- View Article
- Google Scholar
37. Dunbar RI. Neocortex size as a constraint on group size in primates. J. Hum. Evol. 1992; 22(6): 469–493.
- View Article
- Google Scholar
38. Corneil DG, Perl Y. Clustering and domination in perfect graphs. Discrete Appl. Math. 1984; 9(1): 27–39.
- View Article
- Google Scholar
39. Malick J, Roupin F. Solving k-cluster problems to optimality with semidefinite programming. Math. Program. 2012; 136(2): 279–300.
- View Article
- Google Scholar
40. Krislock N, Malick J, Roupin F. Computational results of a semidefinite branch-and-bound algorithm for k-cluster. Comput. Oper. Res. 2016; 66: 153–159.
- View Article
- Google Scholar
41. Bruglieri M, Ehrgott M, Hamacher HW, Maffiolia F. An annotated bibliography of combinatorial optimization problems with fixed cardinality constraints. Discrete Appl. Math. 2006; 154(9): 1344–1357.
- View Article
- Google Scholar
42. Dondi R, Hermelin D. Computing the k Densest Subgraphs of a Graph. arXiv preprint arXiv:2002.07695. 2020.
- View Article
- Google Scholar
43. Sotirov R. On solving the densest k-subgraph problem on large graphs. Optim. Method. Softw. 2020; 35(6): 1160–1178.
- View Article
- Google Scholar
44. Chen X, Hu X, Wang C. Finding connected k-subgraphs with high density. Inform. Comput. 2017; 256 160–173.
- View Article
- Google Scholar
45. Letsios M, Balalau OD, Danisch M, Orsini E, Sozio M. Finding heaviest k-subgraphs and events in social media. IEEE 16th ICDMW 2016; 113–120.
- View Article
- Google Scholar
46. Costenbader E, Valente TW. The stability of centrality measures when networks are sampled. Soc. Networks 2003; 25(4): 283–307.
- View Article
- Google Scholar
47. Smith JA, Moody J. Structural effects of network sampling coverage I: Nodes missing at random. Soc. Networks 2013; 35(4): 652–668.
- View Article
- Google Scholar
48. Krebs VE. Mapping networks of terrorist cells. Connect. 2002; 24(3): 43–52.
- View Article
- Google Scholar
49. Stumpf MP, Wiuf C, May RM. Subnets of scale-free networks are not scale-free: sampling properties of networks. P. Natl. Acad. Sci. USA 2005; 102(12): 4221–4224.
- View Article
- Google Scholar
50. Ye Y, Zhang J. Approximation of dense-n/2-subgraph and the complement of min-bisection. J. Global Optim. 2003; 25(1): 55–73.
- View Article
- Google Scholar
51. Rendl F. Semidefinite relaxations for partitioning, assignment and ordering problems. Ann. Oper. Res. 2016; 240(1): 119–140.
- View Article
- Google Scholar
52. Ames BP. Guaranteed recovery of planted cliques and dense subgraphs by convex relaxation. J. Optimiz. Theory App. 2015; 167(2): 653–675.
- View Article
- Google Scholar
53. Li X, Chen Y, Xu J. Convex relaxation methods for community detection. Stat. Sci. 2021; 36(1): 2–15.
- View Article
- Google Scholar
54. Kincaid RK. Good solutions to discrete noxious location problems via metaheuristics. Ann. Oper. Res. 1992; 40(1): 265–281.
- View Article
- Google Scholar
55. Macambira EM. An application of tabu search heuristic for the maximum edge-weighted subgraph problem. Ann. Oper. Res. 2002; 117(1): 175–190.
- View Article
- Google Scholar
56. Brimberg J, Mladenović N, Urošević D, Ngai E. Variable neighborhood search for the heaviest k-subgraph. Comput. Oper. Res. 2009; 36(11): 2885–2891.
- View Article
- Google Scholar
57. Galaskiewicz J. Estimating point centrality using different network sampling techniques. Soc. Networks 1991; 13(4): 347–386.
- View Article
- Google Scholar
58. Alderson AS, Beckfield J, Sprague-Jones J. Intercity relations and globalisation: the evolution of the global urban hierarchy, 1981–2007. Urban Stud. 2010; 47(9): 1899–1923.
- View Article
- Google Scholar
59. Pažitka V, Wójcik D. The network boundary specification problem in the global and world city research: investigation of the reliability of empirical results from sampled networks. J. Geogr. Syst. 2021; 23(1): 97–114.
- View Article
- Google Scholar
60. Dondi R, Hosseinzadeh MM, Mauri G, Zoppis I. Top-k overlapping densest subgraphs: approximation algorithms and computational complexity. J. Comb. Optim. 2021; 41(1): 80–104.
- View Article
- Google Scholar
61. Dondi R, Guzzi PH, Hosseinzadeh MM. Top-k connected overlapping densest subgraphs in dual networks. Int. Conf. Complex Netw. Appl. 2020; 585–596.
- View Article
- Google Scholar
62. Ong YS, Lim MH, Chen X. Memetic Computation-Past, Present & Future Research Frontier. IEEE Comput. Intell. M. 2010; 5(2): 24.
- View Article
- Google Scholar
63. Neri F, Cotta C. Memetic algorithms and memetic computing optimization: A literature review. Swarm. Evol. Comput. 2012; 2: 1–14.
- View Article
- Google Scholar
64. Du H, He X, Wang J, Feldman MW. Reversing structural balance in signed networks. Physica A 2018; 503: 780–792.
- View Article
- Google Scholar
65. He X, Du H, Xu X, Du W. An Energy Function for Computing Structural Balance in Fully Signed Network. IEEE T. Computat. Soc. Syst. 2020; 7(3): 696–708.
- View Article
- Google Scholar
66. Wang S, Gong M, Du H, Ma L, Miao Q, Du W. Optimizing dynamical changes of structural balance in signed network based on memetic algorithm. Soc. Networks 2016; 44: 64–73.
- View Article
- Google Scholar
67. Zachary WW. An information flow model for conflict and fission in small groups. J. Anthropol. Res. 1977; 33(4): 452–473.
- View Article
- Google Scholar
68. Lusseau D. The emergent properties of a dolphin social network. P. R. Soc. Lond. B-Biol. Sci. 2003; 270(2): 186–188. pmid:14667378
- View Article
- PubMed/NCBI
- Google Scholar
69. Lancichinetti A, Fortunato S, Radicchi F. Benchmark graphs for testing community detection algorithms, Phys. Rev. E. 2008; 78 (4): 046110. pmid:18999496
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Marsden PV. Network data and measurement. Annu. Rev. Sociol. 1990; 16(1): 435–463.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Kumar R, Raghavan P, Rajagopalan S, Tomkins A. Trawling the web for emerging cyber-communities. Comput. Netw. 1999; 31(11–16): 1481–1493.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Alba RD. A graph-theoretic definition of a sociometric clique. J. Math. Sociol. 1973; 3: 113–126.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Wasserman S, Faust K. Social Network Analysis: Methods and Applications. Cambridge: Cambridge University Press; 1994.

[ref5] 5. Borgatti SP, Everett MG. Models of core/periphery structures. Soc. Networks 2000; 21(4): 375–395.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref6] 6. Barabási AL, Albert R. Emergence of scaling in random networks. Science 1999; 286(5439): 509–512. pmid:10521342
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref7] 7. Girvan M, Newman ME. Community structure in social and biological networks. P. Natl. Acad. Sci. USA 2002; 99(12): 7821–7826. pmid:12060727
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref8] 8. Goldberg AV. Finding a maximum density subgraph. Berkeley, University of California; 1984.

[ref9] 9. Langston MA, Lin L, Peng X, Baldwin NE, Symons CT, Zhang B, et al. A combinatorial approach to the analysis of differential gene expression data. Methods of Microarray Data Analysis, Springer; 2005.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref10] 10. Fratkin E, Naughton BT, Brutlag DL, Batzoglou S. MotifCut: regulatory motifs finding with maximum density subgraphs. Bioinform. 2006; 22(14): 150–157. pmid:16873465
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref11] 11. Du X, Jin R, Ding L, Lee VE, Thornton JH. Migration motif: a spatial-temporal pattern mining approach for financial markets, P. 15th ACM SIGKDD Int. Conf. Data. Min. Knowl. Disc. 2009; 1135–1144.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref12] 12. Tang L, Liu H. Graph mining applications to social network analysis. Managing and Mining Graph Data, Springer; 2010.

[ref13] 13. Lee VE, Ruan N, Jin R, Aggarwal C. A survey of algorithms for dense subgraph discovery. Managing and Mining Graph Data. Springer; 2010.

[ref14] 14. Feige U, Peleg D, Kortsarz G. The dense k-subgraph problem. Algorithmica 2001; 29(3): 410–421.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref15] 15. Tsourakakis C, Bonchi F, Gionis A, Gullo F, Tsiarli M. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. P. 19th ACM SIGKDD Int. Conf. Knowl. Disc. Data. Min. 2013; 104–112.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref16] 16. Gibson D, Kumar R, Tomkins A. Discovering large dense subgraphs in massive graphs. P. 31st Int. Conf. VLDB 2005; 721–732.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref17] 17. Henzinger MR, Motwani R, Silverstein C. Challenges in web search engines. ACM SIGIR Forum 2002; 36(2): 11–22.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref18] 18. Bonchi F, Gullo F, Kaltenbrunner A, Volkovich Y. Core decomposition of uncertain graphs. P. 20th ACM SIGKDD Int. Conf. Knowl. Disc. Data. Min. 2014; 1316–1325.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref19] 19. Sozio M, Gionis A. The community-search problem and how to plan a successful cocktail party. P. 16th ACM SIGKDD Int. Conf. Knowl. Disc. Data. Min. 2010; 939–948.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref20] 20. Tsourakakis C. The k-clique densest subgraph problem. P. 24th Int. Conf. World Wide Web 2015; 1122–1132.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref21] 21. Angel A, Koudas N, Sarkas N, Srivastava D, Svendsen M, Tirthapura S. Dense subgraph maintenance under streaming edge weight updates for real-time story identification, VLDB J. 2014; 23(2): 175–199.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref22] 22. Gajewar A, Das Sarma A. Multi-skill collaborative teams based on densest subgraph. P. SIAM Int. Conf. Data Min. 2012; 165–176.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref23] 23. Balalau OD, Bonchi F, Chan TH, Gullo F, Sozio M. Finding subgraphs with maximum total density and limited overlap. P. 8th ACM Int. Conf. Web Search Data Min. 2015; 379–388.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref24] 24. Galbrun E, Gionis A, Tatti N. Top-k overlapping densest subgraphs. Data. Min. Knowl. Disc. 2016; 30(5): 1134–1165.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref25] 25. Nasir MAU, Gionis A, Morales GDF, Girdzijauskas S. Fully dynamic algorithm for top-k densest subgraphs. P. ACM Conf. Inform. Knowl. Manage. 2017; 1817–1826.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref26] 26. Laumann EO, Marsden PV, Prensky D. The boundary specification problem in network analysis. Res. Methods Soc. Netw. Anal. 1989; (61).
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref27] 27. Borgatti SP, Carley KM, Krackhardt D. On the robustness of centrality measures under conditions of imperfect data. Soc. Networks 2006; 28: 124–136.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref28] 28. Kossinets G. Effects of missing data in social networks. Soc. Networks 2006; 28(3): 247–268.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref29] 29. Kempe D, Kleinberg J, Tardos É. Maximizing the spread of influence through a social network. P. 9th Int. Conf. ACM SIGKDD 2003; 137–146.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref30] 30. Lawrence E, Michailidis G, Nair VN, Xi B. Network tomography: A review and recent developments. Front. Stat. 2006; 345–366.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref31] 31. Cheng Z, Xin Y, Cao J, Yu X, Lu G. Selecting pinning nodes to control complex networked systems. Sci. China Technol. Sci. 2018; 61(10): 1537–1545.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref32] 32. McCarty C, Killworth PD, Rennell J. Impact of methods for reducing respondent burden on personal network structural measures. Soc. Networks 2007; 29(2): 300–315.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref33] 33. Asahiro Y, Hassin R, Iwama K. Complexity of finding dense subgraphs. Discrete Appl. Math. 2002; 121(1–3): 15–26.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref34] 34. Charikar M. Greedy approximation algorithms for finding dense components in a graph. Int. Workshop Approx. Algorithms Comb. Optim. 2000; 84–95.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref35] 35. Kawase Y, Miyauchi A. The densest subgraph problem with a convex/concave size function. Algorithmica 2018; 80(12): 3461–3480.
View Article
Google Scholar

[99] View Article

[100] Google Scholar

[ref36] 36. Wang Z, Chu L, Pei J, Al-Barakati A, Chen E. Tradeoffs between density and size in extracting dense subgraphs: A unified framework. IEEE/ACM Int. Conf. Adv. Soc. Networks Anal. Min. 2016; 41–48.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref37] 37. Dunbar RI. Neocortex size as a constraint on group size in primates. J. Hum. Evol. 1992; 22(6): 469–493.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

[ref38] 38. Corneil DG, Perl Y. Clustering and domination in perfect graphs. Discrete Appl. Math. 1984; 9(1): 27–39.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref39] 39. Malick J, Roupin F. Solving k-cluster problems to optimality with semidefinite programming. Math. Program. 2012; 136(2): 279–300.
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref40] 40. Krislock N, Malick J, Roupin F. Computational results of a semidefinite branch-and-bound algorithm for k-cluster. Comput. Oper. Res. 2016; 66: 153–159.
View Article
Google Scholar

[114] View Article

[115] Google Scholar

[ref41] 41. Bruglieri M, Ehrgott M, Hamacher HW, Maffiolia F. An annotated bibliography of combinatorial optimization problems with fixed cardinality constraints. Discrete Appl. Math. 2006; 154(9): 1344–1357.
View Article
Google Scholar

[117] View Article

[118] Google Scholar

[ref42] 42. Dondi R, Hermelin D. Computing the k Densest Subgraphs of a Graph. arXiv preprint arXiv:2002.07695. 2020.
View Article
Google Scholar

[120] View Article

[121] Google Scholar

[ref43] 43. Sotirov R. On solving the densest k-subgraph problem on large graphs. Optim. Method. Softw. 2020; 35(6): 1160–1178.
View Article
Google Scholar

[123] View Article

[124] Google Scholar

[ref44] 44. Chen X, Hu X, Wang C. Finding connected k-subgraphs with high density. Inform. Comput. 2017; 256 160–173.
View Article
Google Scholar

[126] View Article

[127] Google Scholar

[ref45] 45. Letsios M, Balalau OD, Danisch M, Orsini E, Sozio M. Finding heaviest k-subgraphs and events in social media. IEEE 16th ICDMW 2016; 113–120.
View Article
Google Scholar

[129] View Article

[130] Google Scholar

[ref46] 46. Costenbader E, Valente TW. The stability of centrality measures when networks are sampled. Soc. Networks 2003; 25(4): 283–307.
View Article
Google Scholar

[132] View Article

[133] Google Scholar

[ref47] 47. Smith JA, Moody J. Structural effects of network sampling coverage I: Nodes missing at random. Soc. Networks 2013; 35(4): 652–668.
View Article
Google Scholar

[135] View Article

[136] Google Scholar

[ref48] 48. Krebs VE. Mapping networks of terrorist cells. Connect. 2002; 24(3): 43–52.
View Article
Google Scholar

[138] View Article

[139] Google Scholar

[ref49] 49. Stumpf MP, Wiuf C, May RM. Subnets of scale-free networks are not scale-free: sampling properties of networks. P. Natl. Acad. Sci. USA 2005; 102(12): 4221–4224.
View Article
Google Scholar

[141] View Article

[142] Google Scholar

[ref50] 50. Ye Y, Zhang J. Approximation of dense-n/2-subgraph and the complement of min-bisection. J. Global Optim. 2003; 25(1): 55–73.
View Article
Google Scholar

[144] View Article

[145] Google Scholar

[ref51] 51. Rendl F. Semidefinite relaxations for partitioning, assignment and ordering problems. Ann. Oper. Res. 2016; 240(1): 119–140.
View Article
Google Scholar

[147] View Article

[148] Google Scholar

[ref52] 52. Ames BP. Guaranteed recovery of planted cliques and dense subgraphs by convex relaxation. J. Optimiz. Theory App. 2015; 167(2): 653–675.
View Article
Google Scholar

[150] View Article

[151] Google Scholar

[ref53] 53. Li X, Chen Y, Xu J. Convex relaxation methods for community detection. Stat. Sci. 2021; 36(1): 2–15.
View Article
Google Scholar

[153] View Article

[154] Google Scholar

[ref54] 54. Kincaid RK. Good solutions to discrete noxious location problems via metaheuristics. Ann. Oper. Res. 1992; 40(1): 265–281.
View Article
Google Scholar

[156] View Article

[157] Google Scholar

[ref55] 55. Macambira EM. An application of tabu search heuristic for the maximum edge-weighted subgraph problem. Ann. Oper. Res. 2002; 117(1): 175–190.
View Article
Google Scholar

[159] View Article

[160] Google Scholar

[ref56] 56. Brimberg J, Mladenović N, Urošević D, Ngai E. Variable neighborhood search for the heaviest k-subgraph. Comput. Oper. Res. 2009; 36(11): 2885–2891.
View Article
Google Scholar

[162] View Article

[163] Google Scholar

[ref57] 57. Galaskiewicz J. Estimating point centrality using different network sampling techniques. Soc. Networks 1991; 13(4): 347–386.
View Article
Google Scholar

[165] View Article

[166] Google Scholar

[ref58] 58. Alderson AS, Beckfield J, Sprague-Jones J. Intercity relations and globalisation: the evolution of the global urban hierarchy, 1981–2007. Urban Stud. 2010; 47(9): 1899–1923.
View Article
Google Scholar

[168] View Article

[169] Google Scholar

[ref59] 59. Pažitka V, Wójcik D. The network boundary specification problem in the global and world city research: investigation of the reliability of empirical results from sampled networks. J. Geogr. Syst. 2021; 23(1): 97–114.
View Article
Google Scholar

[171] View Article

[172] Google Scholar

[ref60] 60. Dondi R, Hosseinzadeh MM, Mauri G, Zoppis I. Top-k overlapping densest subgraphs: approximation algorithms and computational complexity. J. Comb. Optim. 2021; 41(1): 80–104.
View Article
Google Scholar

[174] View Article

[175] Google Scholar

[ref61] 61. Dondi R, Guzzi PH, Hosseinzadeh MM. Top-k connected overlapping densest subgraphs in dual networks. Int. Conf. Complex Netw. Appl. 2020; 585–596.
View Article
Google Scholar

[177] View Article

[178] Google Scholar

[ref62] 62. Ong YS, Lim MH, Chen X. Memetic Computation-Past, Present & Future Research Frontier. IEEE Comput. Intell. M. 2010; 5(2): 24.
View Article
Google Scholar

[180] View Article

[181] Google Scholar

[ref63] 63. Neri F, Cotta C. Memetic algorithms and memetic computing optimization: A literature review. Swarm. Evol. Comput. 2012; 2: 1–14.
View Article
Google Scholar

[183] View Article

[184] Google Scholar

[ref64] 64. Du H, He X, Wang J, Feldman MW. Reversing structural balance in signed networks. Physica A 2018; 503: 780–792.
View Article
Google Scholar

[186] View Article

[187] Google Scholar

[ref65] 65. He X, Du H, Xu X, Du W. An Energy Function for Computing Structural Balance in Fully Signed Network. IEEE T. Computat. Soc. Syst. 2020; 7(3): 696–708.
View Article
Google Scholar

[189] View Article

[190] Google Scholar

[ref66] 66. Wang S, Gong M, Du H, Ma L, Miao Q, Du W. Optimizing dynamical changes of structural balance in signed network based on memetic algorithm. Soc. Networks 2016; 44: 64–73.
View Article
Google Scholar

[192] View Article

[193] Google Scholar

[ref67] 67. Zachary WW. An information flow model for conflict and fission in small groups. J. Anthropol. Res. 1977; 33(4): 452–473.
View Article
Google Scholar

[195] View Article

[196] Google Scholar

[ref68] 68. Lusseau D. The emergent properties of a dolphin social network. P. R. Soc. Lond. B-Biol. Sci. 2003; 270(2): 186–188. pmid:14667378
View Article
PubMed/NCBI
Google Scholar

[198] View Article

[199] PubMed/NCBI

[200] Google Scholar

[ref69] 69. Lancichinetti A, Fortunato S, Radicchi F. Benchmark graphs for testing community detection algorithms, Phys. Rev. E. 2008; 78 (4): 046110. pmid:18999496
View Article
PubMed/NCBI
Google Scholar

[202] View Article

[203] PubMed/NCBI

[204] Google Scholar