Figures
Abstract
In recent years, research on community structure for complex networks has received increasing greater attention, and the overlapping community structure is more closely related to the actual social structure than the non-overlapping community structure, so it is necessary to identify and detect the overlapping communities of social networks. In this paper, we propose an overlapping community optimization method (OSFCM) based on network structure characteristics and fuzzy C-means clustering. We first abstract the feature vector matrix of each node from the network structural properties, and then optimize this matrix by a new objective function gradient optimization method, we generate the preliminary community delineation results with FCM method, and finally calibrate the communities to which the nodes belong. Experimental results show that the algorithm exhibits higher delineation accuracy and better algorithmic performance on seven real network datasets and four synthetic networks.
Citation: Wang A, Li M, Gao X, Li B, Su Z (2025) Overlapping community detection based on bridging structural features and fuzzy C-means. PLoS One 20(8): e0328825. https://doi.org/10.1371/journal.pone.0328825
Editor: Kuldeep Singh, University of Delhi, INDIA
Received: March 14, 2025; Accepted: July 8, 2025; Published: August 26, 2025
Copyright: © 2025 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting information files.
Funding: This work was supported by the Collaborative Research Projects of “Chunhui Program”, Ministry of Education of China. (Grant No. RZ2300003745).
Competing interests: The authors have declared that no competing interests exist.
Introduction
In recent years, with the popularization of the Internet and widespread use of mobile devices, social networks play an increasingly important role in people’s lives, work, and society. Simultaneously, with growing user bases and network coverage, the scale of social networks is increasing, and the structure is becoming more and more complex. In general, social networks are characterized by many complex network features, including but not limited to the small-world phenomenon [1], scale-free [2], and community structure [3]. A community represents a collection of nodes with common characteristics in the network, and community segmentation involves analyzing network structure information to discover potential communities in social networks, in order to reveal the group structure and connection characteristics between nodes in the network, so as to help understand the organization and behavioral patterns of social networks [4]. Based on the unique characteristics of “community” in social networks, community discovery algorithms have been widely used in business, politics, academia, healthcare and other fields [5,6].
Formal conceptual and algorithmic research on community segmentation began with the GN algorithm proposed by Girvan and Newman in 2001 [3], and community segmentation algorithms have gained more and more attention from scholars in the following two decades [7–9]. However, many scholars’ work focuses on the detection of non-overlapping communities where network nodes belong to only a single community, such as the CNM algorithm [10], Louvain’s algorithm [11], spectral clustering optimization algorithms, faction filtering algorithms [12], label propagation algorithms [13], and a series of global or local optimization algorithms, such as local edge clustering optimization algorithms [14]. In fact, for most social networks, nodes in the network have multiple community labels, which may simultaneously belong to different communities, e.g., in an interest network, an individual is able to have multiple interests at the same time; in a collaborative network, a writer can work with writers from multiple different fields. Therefore, network detection of overlapping communities can further reveal the complex inner organizational structure and patterns of the social network.
So far, researchers have proposed a variety of overlapping community detection methods by applying different approaches [15–17], and these methods are classified into four categories: based on Clique Percolation Method, based on label propagation, based on local expansion and optimization, based on link division and based on Machine learning. However, these methods are often plagued by certain limitations, such as most of the label optimization algorithms tend to consider only the local information of the nodes and ignore the global importance of the nodes, or there are problems such as inaccurate accuracy of community division and unsatisfactory experimental results. Therefore, we investigate the idea of fuzzy clustering and proposes a fuzzy C-means clustering overlapping community detection method based on the structural characteristics of the network. We first abstract the structural characteristics of each node in the network, use the structural characteristics to construct the feature vector matrix in fuzzy clustering, and then propose a new objective function gradient optimization method to optimize the feature vector matrix, so that the value of the difference of the eigenvectors of the similar nodes becomes smaller, and the value of the difference of the eigenvectors of the non-similar nodes becomes larger, and then according to the eigenvector matrix and the fuzzy C-means clustering analysis method, we generate the preliminary community delineation results, and finally calibrate the community label to which each node belongs to determine the final community to which each node belongs. The main contributions of this paper are as follows:
- We propose an optimized fuzzy C-means method for overlapping community detection, which solves the problem that nodes can only belong to a single community in the classical community detection algorithm, and adapts to the detection needs of overlapping communities.
- We introduce a novel objective function and gradient optimization algorithm that can effectively reduce the differences between feature vectors of similar nodes while increasing the differences between those of dissimilar nodes, allowing the structural properties of the network to potentially represent community characteristics.
- We propose a new community calibration approach to optimize the community detection results by calibrating the community to which a node belongs with the influence of the node’s neighbors both inside and outside the community on the node.
- We validate the stability and superiority of our algorithm on real network datasets and synthetic networks by simulating multi-experiments, and verify that it has higher accuracy and better performance than other methods in terms of NMI and modularity.
The rest of this paper is organized as follows. The relate work section reviews related work and introduces the current research status of overlapping community detection algorithms. The solution for OSFCM section describes the main ideas and detailed processes of our proposed algorithm. The experimental results and analysis section presents the validation process of the algorithm, including evaluation metrics, test datasets, and comparative experimental results. Finally, the conclusion section provides conclusions and perspectives.
Relate work
This chapter introduces some traditional overlapping community detection methods and their optimization algorithms, and briefly describes the advantages and disadvantages of these algorithms.
Based on Clique Percolation Method (CPM). CPM algorithm [18] is a widely used overlapping community detection method, the core of this algorithm is to decompose all fully-connected subgraphs (k-clique) from the network and perform community diffusion through overlapping nodes between different subgraphs (if any), and then identify the community information in the network based on the associations between these subgraphs. CPMd algorithm [19] is an application of CPM algorithm to directed graphs, in which a new k-clique is defined, but its core is still to find fully connected subgraphs in the network and then perform community diffusion. CBLA algorithm [20] is a CPM-based optimization algorithm, the idea of the algorithm is that after finding all the (k-clique) subgraphs in the network, these subgraphs are transformed into meta-nodes, and subsequently, those nodes that are not delineated are moved, and the attribution of these nodes is determined by analyzing the change in the modularity degree resulting from the movement of the nodes. In general, for the CPM algorithm and its optimization algorithms, the choice of the k-value has a more significant impact on the community delineation results, especially when the k-value is lower in some sparse networks, which tends to create an excessive number of small communities.
Based on label propagation methods. CORPA [21] is the first algorithm to use the LPA idea in overlapping community detection, which makes the nodes retain multiple labels during label propagation, and then determines the division of overlapping communities based on these labels, but the algorithm randomly selects nodes for label propagation, which leads to the problem of low accuracy of community division as well as the unstable division results; WLPA [22] avoids the problem of label oscillation caused by synchronization in the label propagation phase of the algorithm by means of “asynchronous + ascending”, and also takes into account the difference in the influence of different nodes in the network, but only refers to the node degree when considering the influence of the nodes, which is unreasonable and cannot effectively reflect the node’s influence. NALPA [23] assigns four types of capabilities to the nodes in the network, and influences the importance of the nodes in the network as well as constructs a new label propagation mechanism using the four abilities, but its high time complexity makes it difficult to deal with large-scale networks efficiently; FLPA [24] adopts graph compression technology to merge sparse nodes in the network, which reduces the network size, and then optimizes the label propagation process based on node influence and label weight to improve the accuracy of community detection, but it is ineffective in dealing with dense networks because it involves the image compression strategy, which limits the application of the algorithm in specific network. INF-CORPA [25] integrates node “degree” and “k-core” theories, assigns influence to nodes, sets thresholds, and controls label propagation to reduce erroneous labels, but the label thresholds may be set differently in different networks, and the ability of the algorithm to deal with label propagation interference needs to be enhanced for some complex community structure.
Based on local expansion and optimization methods. OSLPM [26] is the first algorithm that can simultaneously handle directed bandwidth weighted networks and be able to identify overlapping communities, which expands the subpopulations by means of a kind of local optimization of fitness function and obtains the community structure from the expansion; Whang et al. [27] identifies the seed set by Graclus centers, and then based on the optimized PageRank strategy personalized expansion to detect overlapping communities in the network; Ding X et al [28] overcame the problem of poor fault tolerance by optimizing node affiliation to discard low-quality seeds and communities, but had limited effect in identifying highly aggregated communities and high computational cost in large-scale networks.
Based on link partition-based methods. For this solution, the network is considered as a kind of link graph that can be processed by a clustering algorithm, by processing the links in the network instead of the nodes, and then community detection is performed based on this link graph [29]. Ahn et al. [14] mapped the initial network structure onto a link graph, and then iteratively merged the similar links, and then determined the optimal partition hierarchy, and finally restored the edge graph as a node-with-node clusters. In summary Link Partition based Method can easily detect overlapping nodes, but its reliability has not been properly proven due to its fuzzy definition of community- based [30].
Based on machine learning methods. Berahmand K et al. [31] proposed a semi-supervised deep attribute clustering framework based on dual autoencoders. By integrating structural and attribute information into a dual-view representation and combining it with pairwise constraint optimization, the framework significantly improves clustering performance in attributed networks. DSSC [32] effectively enhances the accuracy and efficiency of community detection in complex networks through the design of a semi-autoencoder and a pairwise constraint matrix. Zhou et al. [33] proposed CDBNE, an unsupervised attributed network embedding framework for community detection. It integrates graph-attentive autoencoders to model network topology and node attributes, combines encoder/decoder outputs with mesoscopic community priors, and employs self-training clustering to optimize node representations. CPGC [34] improves graph convolution operations and incorporates community-oriented similarity by integrating representation learning with clustering. It effectively utilizes both attribute and structural information to detect overlapping and non-overlapping communities. However, there is still room for improvement in terms of computational efficiency and scalability. GEAM [35] is a graph-enhanced attention model designed for multilayer networks. It effectively integrates cross-layer semantic information and improves community detection accuracy through inter-layer contrastive learning, an adaptive fusion mechanism, and an edge density-driven module.
In addition, there are many other various overlapping community detection methods such as fuzzy-based detection methods [36], agent-based methods [37], and modularity optimization-based methods [38]. These methods implement overlapping community detection from various perspectives of the network.
Solution for OSFCM
Fuzzy cluster analysis refers to solving clustering problems using fuzzy mathematical methods. It quantifies uncertainty in sample-category assignments, captures transitional memberships, and reflects real-world complexity more objectively, thus it has become the mainstream of the research of cluster analysis.
Typical steps of the cluster analysis based on fuzzy relationships: data specifications, construction of fuzzy similarity matrices and fuzzy classification. Among the various algorithms for fuzzy classification, the one with higher real-time requirements is the fuzzy clustering method based on the objective function, which reduces the clustering to a nonlinear planning problem with constraints, and obtains the fuzzy division and clustering of the dataset by optimizing the solution. An optimization scheme for selecting generations to minimize this objective function is the well-known hard C-means(HCM) algorithm. An organizational data analysis technique derived from the optimization of the hard clustering objective function is the fuzzy C-means, which can be transformed into an optimization problem and solved with the help of nonlinear programming theory of classical mathematics.
Among the clustering algorithms based on objective function, fuzzy c-means (FCM, fuzzy c-means), also known as ISODATA algorithm, i.e., iterative self-organizing data analysis technique, is constructed to solve the clustering problem with the help of the objective function method, which uses the mean-squared approximation theory to construct a constrained nonlinear programming function. In the following we discuss the core theory of the algorithm developed in the paper, the process of OSFCM is showed in Fig 1.
Matrix construction for fuzzy classification
Given that the set of nodes of the network is , where each node xi has m characteristic indicators,
, if X is to be classified into c classes, each of its classification results corresponds to a Boolean matrix of order
, where n is the number of nodes, and rij is as follows:
For example, given that denotes the set of classified nodes, if the classification result is
,
, {x4}, then the corresponding classification matrix is:
satisfies the following conditions: (1)
; (2)
, that is, each column has and only one element of 1 and the rest of the elements are 0, ensuring that each object can and must belong to only one of the categories; (3)
, that is, the sum of the elements of each row is greater than 0, ensuring that each category is not empty and that there can be more than one object in a category. Any matrix R satisfying the above-mentioned 3 properties corresponds to a classification.
Given that Mc is the whole matrix R satisfying the above condition, then Mc contains all the possible classification results of X being classified into class c. So we call Mc the classification space of the node set X being classified into class c, which is defined as:
Based on the above-mentioned matrices for classification, we construct fuzzy classification matrices. A fuzzy classification takes the objects xi in the classified X as belonging to a certain class with different degrees of affiliation, so each class is considered as a fuzzy subset on X. Thus each fuzzy classification is a fuzzy matrix of size
, which satisfies the following conditions:
, that is, the classification matrix elements take values between 0 and 1, not 0 or 1, indicating that a node can simultaneously belong to multiple communities to varying degrees;
, for each individual node, the sum of its membership degrees across all communities must equal 1;
, that is, all nodes must exhibit non-zero membership in at least one community, ensuring no community is entirely unpopulated.
Any fuzzy matrix R that meets the above conditions corresponds to a fuzzy categorization of X into c classes.
Denote Mc as the whole of the fuzzy matrix R satisfying the above three properties, which is called the fuzzy classification space where X is classified into c classes, then is defined as:
Construct node feature vector matrix
Given that the feature vector matrix consisting of the characteristic metrics of all the nodes in the network is denoted as . For the fuzzy C-means clustering method, the characteristic indexes of the nodes not only determine how the nodes are finally grouped, but also influence each step of the clustering process. Thus it is especially important to choose appropriate characteristic indexes to construct the feature vector matrix. In OSFCM, we select three widely used feature metrics in network analysis: Degree, Degree Centrality, and Betweenness Centrality. These metrics effectively reflect a node’s position and relative importance within the network structure. Their definitions are as follows:
degree centrality:
betweenness centrality of nodes:
where denotes the number of paths passing through node u connecting s and t, and which are shortest paths, and gst denotes the number of shortest paths connecting s and t.
However, directly using these metrics as clustering inputs may result in communities that overly rely on node importance. For example, all nodes with high centrality may be grouped into the same community, ignoring the potential lack of actual connections between them. Therefore, to improve the rationality of community division, we introduce an optimization objective function based on a node similarity matrix. Specifically, we aim to design an objective loss function L and a node similarity matrix S, to achieve that for similar nodes (denoted by larger S[u,v]), the difference of their eigenvectors should be as small as possible, and for dissimilar nodes (denoted by smaller S[u,v]), the difference
of their eigenvectors should be as large as possible, and the objective loss function L is specifically defined as:
The objective is to minimize L, which ensures that nodes with high similarity remain close in the feature space while weakening the connections between nodes with low similarity. The node similarity matrix is defined as:
Where Nu denotes the set of neighboring nodes of node u and S(u,v) reflects the influence of exerted on u by its neighbors. In the formula, the numerator is incremented by 1 because the intersection of the neighboring node sets of nodes u and v does not include node u itself. In real-world social networks, when nodes u and v share no other common neighbors, the value of is 0. However, node u itself can be considered a common neighbor of the two.
To optimize the target loss function, we use the gradient descent method, the gradient of the loss function L with respect to the feature vector U [u] is defined as:
Based on this, the feature vector update formula is defined as:
Where η denotes the learning rate with a value of 0.001. When or the maximum number of iterations t is reached, it indicates that the update of the feature vector has achieved an acceptable result. By performing gradient optimization on the initial feature vector matrix, the node characteristics are optimized so that these characteristics can, to some extent, reflect community information of the nodes, thereby avoiding the inclusion of unrelated nodes in the same community. The construction process of the feature vector matrix U is given in Algorithm 1.
Algorithm 1: Construct node feature vector matrix.
Input: degree centrality:DC
betweenness centrality of nodes:BC
Output: Node Feature Vector Matrix: U
1:
2: for each node u in V do
3: for do
4: S(u,v) similarity index of node u to
according
to Eq 8
5: end for
6: end for
7: repeat:
8: for do
9: Update U[u] according to Eq 10
10: end for
11: until:
12: For any u,
13: or
14: return U
Fuzzy C-means clustering method
Divide the node set V into c classes, and let the c cluster center vectors form the matrix V.
In order to obtain an optimal fuzzy classification, the following clustering criterion is used to select the best fuzzy classification from the fuzzy classification space Mfc. Find the appropriate fuzzy classification matrix R and cluster center matrix V, so that the following objective function J(R,V) reaches the minimum value.
Where fuzzy parameterq = 2, denotes the distance between the feature vector uk of an object xi and the clustering center vector of class i. In addition, we expect the classes to be tightly clustered and not too spread out, and we expect the objects in each class to have the smallest sum of the squared distances to the corresponding cluster center. Obviously, J(R,V) is related to the setting of these c clustering centers
, and the smaller the distance is, the better the clustering effect is.
Generally speaking, solving the objective function J(R,V) is challenging, and its approximate solution is usually obtained by iterative operations. When is given, we use the fuzzy mean algorithm to perform the iteration, and the process is convergent, the specific steps are as follows:
(1) Select the number of classifications c, , take an initial fuzzy classification matrix
, iterate step by step
. For
, compute the cluster center matrix
, and correct the fuzzy classification matrix
, where:
(2) Compare with R(l + 1). If
, for a given precision
, then
and
are the desired results and the iteration stops, otherwise,
and return to step 1, repeat the process. The solving process is given in Algorithm 2.
Algorithm 2: Fuzzy C-means clustering method.
Input: X, c, q, 𝜖
Output: the fuzzy classification matrix
the cluster center matrix
1:
2: repeat:
3: for i = 1 to c do
4: Update according to Eq 13
5: end for
6: for k=1 to n do
7: for i = 1 to c do
8: Update according to Eq 14
9: end for
10: end for
11: if then
12: return ,
13: until: t = tmax
14: return
The fuzzy classification matrix and the cluster center matrix obtained by applying the above algorithm are locally optimal solutions with respect to the number of classifications, the initial fuzzy classification matrix, and the parameter q.
(3) After obtaining the optimal fuzzy classification matrix and the optimal cluster center matrix that meet the requirements, the classification is carried out according to the following discriminative principles:
- Using the optimal fuzzy classification matrix clustering, given the optimal fuzzy classification matrix is
, for any xk, in the k-th column of
, if
, then the object will be attributed to the i-th class, i.e., the object xk is assigned to the class with the higher membership degree.
- Using the optimal cluster center matrix for clustering: let the obtained best clustering center matrix be
. For any xk, if its corresponding feature vector uk meets
, then the object xk is assigned to the class i.
Since this algorithm requires , the selection of the initial fuzzy classification matrix
must comply with the three conditions of the fuzzy classification matrix, but also has the following restrictions: (a) The initial matrix
cannot be a constant matrix where all elements are the same; (b) The initial matrix
cannot be a matrix with the same value of the elements of a row. For class with only one object in the initial matrix, it should be removed before clustering and reinserted after clustering.
Label verification
After the initial community division by FCM, the node feature vector matrix may not fully capture community structure due to its dependence on node-level metrics. Thus, residual errors persist in the detection results. To further improve the accuracy of community detection, this paper introduces a calibration mechanism based on community attraction force, which leverages the influence of neighboring nodes to refine community labels. Specifically, if multiple neighboring nodes of a given node u belong to a particular community ci and these neighbors exert strong influence on u, then community ci is considered to have a significant “attraction" effect on node u. We define the attraction force between node u and community ci as:
Where F(u, ci) denotes the traction of community ci on node u.
The greater the attraction F(u,ci), the stronger the “pull” exerted by the neighbors in community ci on node u.
To determine whether node u belongs to a candidate community ci, we set an attraction threshold. Let Fmax(u) be the maximum attraction among all candidate communities of node u. We use as the threshold for decision-making. When
, node u is considered to belong simultaneously to community ci.
This strategy uses a unified relative threshold to determine node membership in different communities, avoiding over- or under-assignment in community detection. Although the threshold is heuristically set, it has demonstrated good adaptability and stability across multiple datasets. In future work, we plan to introduce an adaptive attraction criterion to make the calibration process more theoretically grounded and flexible.
In summary, the specific workflow of the OSFCM algorithm is shown in Algorithm 3.
Algorithm 3: OSFCM Algorithm.
Input:
Output: Community set C
1: construct matrix for fuzzy classification Mfc
2:
3:
4: for do
5: Classify node
6: end for
7: repeat:
8: for do
9: Validate node labels according to Eq 15
10: end for
11: until: community set C no longer changes.
12: return C
Complexity analysis
Given an undirected unweighted graph G, where is the number of nodes,
is the total number of edges, c is the number of communities, and t is the number of iterations, the time complexity of each step in the OSFCM algorithm is calculated as follows:
Step 1: Create the initial fuzzy partition matrix and time complexity is ;
Step 2: Construct the similarity adjacency matrix and time complexity is ;
Step 3: Construct the node feature vector matrix and time complexity is ;
Step 4: Apply the fuzzy c-means clustering method and time complexity is ;
Step 5: Perform label verification and time complexity is .
Therefore, we get the time complexity of OSFCM algorithm which is
Experimental results and analysis
Test sets and comparison algorithms
To validate the performance of OSFCM for overlapping community detection, it is validated and analyzed with six other overlapping community detection algorithms (COPRA [21], FLPA [24], CPM [18], LINK [29], AOCD [39], CDMG [40]) on several network datasets. In addition, all algorithms were implemented using python and executed on a processor Intel(R) Core(TM) i5- 13400 CPU@2.5 GHz with 32.00 GB of RAM in Windows 10 environment. In order to eliminate randomness, all experiments were repeated 100 times and averaged under the same conditions.
We selected seven in many real network datasets, including Zachary Karate Club dataset [41], dolphin dataset [42], polbooks dataset, football dataset [3], Jazz dataset [43], Email dataset and PGP dataset [44], these datasets node sizes cover both small-scale and large-scale networks. The number of nodes (Nodes), node relationships (edges) of these network datasets are given in Table 1. Where ‘--’ indicates that there is no real network community division on this dataset.
For synthetic networks, we selected the LFR benchmark [45], which is widely used in the field of community detection with various adjustable parameters, including node count, node degree distribution (average node degree , maximum node degree kmax), and community size (minimum community size Cmin and maximum community size Cmax), power distribution index of node degree and power distribution index of community size(
).
In addition, the ambiguity of community boundaries can be affected in LFR networks by adjusting the mixing parameter μ. A higher μ value increases community ambiguity, making detection more challenging. We conduct experiments on four different sizes of LFR networks (LFR1, LFR2, LFR3, LFR4) to verify the effectiveness of the algorithm in OSFCM. The detailed parameter settings of specific LFR networks are shown in Table 2.
Evaluation metrics
Modularity (Q) is a metric defined from the global perspective of the network to quantify the topological characteristics of communities, which is often used to evaluate the merits of the community segmentation results, and is applicable to non-overlapping communities. For the evaluation of overlapping community detection results, the improved modularity evaluation metric EQ is used, and the value range is generally [0,1], the higher the value of EQ indicates that the network segmentation results are better, with the closer connection within the community and the sparser connection between the communities. Specifically defined as:
Where m denotes the number of edges between all the nodes in the network G, C denotes the number of communities, u and v denote two nodes in the community Ci, Ou denotes the number of communities belonging to the node u (number of groups), A denotes the adjacency information matrix, when there is an edge connecting nodes u and v, otherwise
. du denotes the degree of the node u.
Normalized Mutual Information (NMI) is an evaluation index that can effectively assess the accuracy of the community discovery algorithm by comparing it with the real true value of the social network, which takes the value in the range of [0,1], and the higher the value indicates that the more accurate the community division results are, and when NMI=1, it indicates that the division result is exactly the same as the real true value, which is specifically defined as:
where A denotes the actual community segmentation result, B denotes the experimentally measured segmentation result, CA denotes the number of communities of A, and CB denotes the number of communities of B, N denotes the segmentation information matrix, the rows in the matrix denote the actual communities, and columns denote the detected communities, Nij denotes the number of vertices that appear in the community i and are also contained in the detected community j, Ni. denotes the sum of the i-th row of the segmentation information matrix, and N.j is summed up over the jth column.
Experimental parameter analysis
The fuzzy parameter q controls the degree of fuzziness in the membership assignments during the clustering process: smaller values of q produce crisper clustering results, while larger values allow for greater fuzziness. To evaluate the sensitivity of the proposed method to the fuzzy parameter q, we conducted a series of experiments on datasets such as Karate Club and Football. In these experiments, all other parameters were kept constant, and only the value of q was varied within the range from 1.1 to 3. The experimental results are shown in the Fig 2.
As shown in Fig 2, when the fuzzy parameter q varies within the range of [1.1, 1.5], the performance of the algorithm shows an upward trend across all datasets, without exhibiting relatively stable behavior. In contrast, within the range of [1.9, 2.3], the algorithm demonstrates strong parameter robustness and achieves its best performance on all datasets. However, when q becomes too large, the performance slightly declines. This is because a low q value leads to overly rigid community partitions, while an excessively high q causes the membership degrees of nodes to become too vague, thereby affecting detection accuracy. In summary, the algorithm shows good adaptability in practical applications when q is set around 2.1. In this study, we set q = 2.0.
Analysis of experimental results
Analysis of real-world network detection results.
In order to verify the effectiveness of OSFCM in practical applications, we compare OSFCM with several other comparative algorithms on 8 real network datasets. The experimental results are shown in Table 3, which gives the corresponding EQ values of overlapping community modularity.
As can be seen from Table 3, the detection results of the algorithm proposed in OSFCM are better than FLPA, CORPA, CPM and LINK algorithms on all these datasets. For the karate dataset, although the average modularity value of the CPRPA algorithm is lower than that of OSFCM and the FLPA algorithm, the CORPA algorithm can achieve a value of about 0.382 in individual cases during the actual validation process. However, it is shown based on the actual test that when the modularity of the karate dataset is 0.3715, the community division results obtained from the experiment are consistent with the actual community division results (although there are no overlapping community nodes in the actual division), which indicates that OSFCM does not divide the karate community into overlapping communities but achieves the community detection results consistent with the actual division, which verifies that OSFCM is effective in actual application. In addition, we can see from the table that among all the tested algorithms, FLPA algorithm achieves higher EQ value on PGP network, and the CDMG algorithm performs well on the Dolphin network, but the EQ value of OSFCM is not much different from theirs, and the detection results of OSFCM are much better than FLPA algorithm on other test sets. For the FLPA, CORPA, LINK, and AOCD algorithms, their detection results are inferior to those of the proposed algorithm across all datasets. The results in the table show that OSFCM can obtain a better community structure both on small-scale networks and large-scale networks, which verifies the effectiveness of OSFCM.
Analysis of LFR network detection results.
For the LFR network, the initialization of the LFR network is accompanied by the initialization of the community information contained in the nodes, providing not only the LFR network dataset, but also the real LFR network segmentation results. At this point, the NMI evaluation metrics are used to validate the effectiveness of OSFCM in terms of the accuracy of community segmentation.
The experimental results of OSFCM with other comparative algorithms on the LFR1 artificial network dataset are shown in Fig 3. It shows the variation curve of NMI with μ, where μ increases from 0.1 to 0.5 in increments of 0.05 to ensure the accuracy of the test results.
From Fig 3, it can be seen that FLPA has the highest NMI value when the value of μ is 0.0, but in the interval after that, both OSFCM and FLPA can obtain better community detection results, with the NMI values centered around 0.85, and the results of OSFCM are even better than the other algorithms on the interval [0.3,0.5]. For CORPA algorithm, its NMI value can also be maintained around 0.75 on the whole interval, and better community detection results can also be obtained. The CDMG algorithm can achieve an NMI value of around 0.8 when , but as the μ value increases, the NMI value drops rapidly. However, for CPM algorithm and LINK algorithm, their community detection results are poorer and the interval fluctuates more, and most of the NMI values of these two algorithms are less than 0.5. From the figure, it is intuitively observed that the OSFCM is significantly better than the other algorithms although the gap between the detection results and the FLPA algorithm is not large, which indicates that our algorithm can effectively detect meaningful overlapping community structures.
The experimental results of OSFCM and other comparative algorithms on the LFR2 artificial network dataset are shown in Fig 4. The curves of NMI versus μ are demonstrated, where μ is increased from 0.1 to 0.5 in increments of 0.05 to ensure the accuracy of the test results.
From Fig 4, it can be seen that when the value of μ is in the interval [0,0.25], the NMI values of OSFCM and FLPA are approximate, and both of them are maintained around 0.77, which can obtain better community detection results, but in the interval after that, the detection results of OSFCM are better than those of the FLPA algorithm, and the two algorithms are significantly better than the other algorithms on the whole interval. The CDMG algorithm achieves higher NMI than the FLPA and OSFCM algorithms when μ is small, but its NMI becomes lower than both when . For CORPA algorithm, when the value of μ is less than 0.3, the value of NMI can reach 0.7, but with the increase of u, its detection results also begin to drop significantly, and it is difficult to maintain at about 0.7, while LINK algorithm and AOCD algorithm can reach more than 0.6 only at the beginning, and the value of NMI begins to plummet with the increase of u, resulting in poor detection results. Moreover, the LINK algorithm and CPM algorithm are greatly affected by the value of u, and the NMI value of these two fluctuates greatly with the change of u. Therefore, by comparison and analysis, OSFCM is less affected by the value of μ and can achieve better community detection results, and the algorithm is more applicable, which verifies the validity of the algorithm calculated in OSFCM.
The experimental results of OSFCM and other comparative algorithms on the LFR3 artificial network dataset are shown in Fig 5,demonstrating the curve of NMI with μ, where μ increases from 0.1 to 0.5 in increments of 0.05.
As can be seen from the figure, the FLPA algorithm begins to show larger fluctuations in the value of NMI compared to the previous dataset, and the results of NMI as a whole are poorer compared to OSFCM. On the whole interval, FLPA can obtain detection of little difference with OSFCM only when μ obtains individual values, while in other cases, OSFCM is significantly better than the FLPA algorithm. Similar to previous observations, the CDMG algorithm achieves a higher NMI than the OSFCM algorithm when μ is small, but its NMI becomes significantly lower than that of OSFCM when . As for CPM, CORPA, and LINK algorithms, the detection results of the three algorithms are similar, and all of them achieve an NMI value of about 0.55 initially, while the NMI values of these algorithms decrease with the increase of u. Even the NMI values of CPM and LINK algorithms reach 0.2 when μ equals to 0.5. Obviously, the detection results of these algorithms gradually deteriorate as the size of network increases, and the accuracy of community segmentation also begins to gradually decline. In comparison, the stability of the detection results of OSFCM is better than other algorithms in both large-scale and small-scale networks, which verifies the effectiveness of OSFCM.
The experimental results of OSFCM and other comparative algorithms on the LFR4 artificial network dataset are shown in Fig 6. demonstrating the curve of NMI with u, where μ increases from 0.1 to 0.5 in increments of 0.05.
As can be seen from the figure, compared with the LFR1, LFR2, and LFR3 datasets, on the LFR4 dataset, the community detection results of the FLPA algorithm are not as good as before, while OSFCM can still maintain better community detection results, although the value of the NMI only reaches about 0.7, the results are still a significant advantage for other algorithms. Moreover, the results achieved by OSFCM are relatively stable throughout the test interval, while all other algorithms decrease with the increase of μ value. Even when μ is equal to 0.5, the NMI values of the other four algorithms are less than 0.4, which indicates that most community detection algorithms are difficult to achieve better community delineation results as the community results are more ambiguous and the size of community nodes is larger. Therefore, our algorithm can effectively detect meaningful overlapping community structures even in large-scale, more ambiguous community structures compared to other algorithms, which verifies the effectiveness of OSFCM.
Conclusion
Aiming at the complex and diverse community detection tasks, this paper proposes an overlapping community optimization method (OSFCM) based on network structural characteristics and FCM. Firstly, the structural characteristics of the nodes in the network are abstracted into the characteristic indexes of the nodes and the feature vector matrix is constructed, then, the feature vector matrix is optimized based on the proposed gradient optimization method of the objective function, secondly, the preliminary community delineation results are generated based on the optimized feature vector matrix and fuzzy C-means clustering analysis method, and finally, the community membership of each node is verified. Comparing and analyzing OSFCM with several other overlapping community detection algorithms, simulation results on real and synthetic network datasets show that OSFCM algorithm exhibits good modularity advantage on the given dataset and has higher NMI accuracy in both small and medium-large synthetic network.
In the future, we will continue to explore application scenarios for large-scale community detection algorithms and apply our solution to dynamic or weighted networks to maximize its advantages in solving large-scale complex problems.
Supporting information
S3 File. Experimental division results of the Karate network.
https://doi.org/10.1371/journal.pone.0328825.s003
(CSV)
S4 File. Node feature vector of the Karate network.
https://doi.org/10.1371/journal.pone.0328825.s004
(CSV)
References
- 1. Watts DJ, Strogatz SH. Collective dynamics of “small-world” networks. Nature. 1998;393(6684):440–2. pmid:9623998
- 2. Barabási AL, Albert R. Emergence of scaling in random networks. Science. 1999;286(5439):509–12.
- 3. Girvan M, Newman MEJ. Community structure in social and biological networks. Proc Natl Acad Sci U S A. 2002;99(12):7821–6. pmid:12060727
- 4. Lu M, Zhang Z, Qu Z, Kang Y. Lpanni: Overlapping community detection using label propagation in large-scale complex networks. IEEE Transactions on Knowledge and Data Engineering. 2018;31(9):1736–49.
- 5. Chen X, Li J. Community detection in complex networks using edge-deleting with restrictions. Physica A: Statistical Mechanics and its Applications. 2019;519:181–94.
- 6.
Rezvanian A, Moradabadi B, Ghavipour M, Khomami MMD, Meybodi MR. Learning automata approach for social networks. Springer; 2019.
- 7. Ni Q, Guo J, Wu W, Wang H, Wu J. Continuous influence-based community partition for social networks. IEEE Trans Netw Sci Eng. 2022;9(3):1187–97.
- 8. Ni Q, Guo J, Wu W, Wang H. Influence-based community partition with sandwich method for social networks. IEEE Trans Comput Soc Syst. 2023;10(2):819–30.
- 9. Guo J, Wu W. Influence maximization: seeding based on community structure. ACM Transactions on Knowledge Discovery from Data. 2020;14(6):1–22.
- 10. Clauset A, Newman ME, Moore C. Finding community structure in very large networks. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics. 2004;70(6):066111.
- 11. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. 2008;2008(10):P10008.
- 12. Palla G, Derényi I, Farkas I, Vicsek T. Uncovering the overlapping community structure of complex networks in nature and society. Nature. 2005;435(7043):814–8. pmid:15944704
- 13. Raghavan UN, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2007;76(3 Pt 2):036106. pmid:17930305
- 14. Ahn Y-Y, Bagrow JP, Lehmann S. Link communities reveal multiscale complexity in networks. Nature. 2010;466(7307):761–4. pmid:20562860
- 15. Benati S, Puerto J, Rodríguez-Chía AM, Temprano F. Overlapping communities detection through weighted graph community games. PLoS One. 2023;18(4):e0283857.
- 16. Prokop P, Drazdilova P, Platos J. Overlapping community detection in weighted networks via hierarchical clustering. PLoS One. 2024;19(10):e0312596.
- 17. Ponomarenko A, Pitsoulis L, Shamshetdinov M. Overlapping community detection in networks based on link partitioning and partitioning around medoids. PLoS One. 2021;16(8):e0255717. pmid:34432789
- 18. Derényi I, Palla G, Vicsek T. Clique percolation in random networks. Phys Rev Lett. 2005;94(16):160202. pmid:15904198
- 19. Palla G, Derényi I, Farkas I, Vicsek T. Uncovering the overlapping community structure of complex networks in nature and society. Nature. 2005;435(7043):814–8. pmid:15944704
- 20. Gupta SK, Singh DrDP. CBLA: a clique based louvain algorithm for detecting overlapping community. Procedia Computer Science. 2023;218:2201–9.
- 21. Gregory S. Finding overlapping communities in networks by label propagation. New J Phys. 2010;12(10):103018.
- 22.
Tong C, Niu J, Wen J, Xie Z, Peng F. Weighted label propagation algorithm for overlapping community detection. In: 2015 IEEE International Conference on Communications (ICC). 2015. p. 1238–43.
- 23. Zhang Y, Liu Y, Zhu J, Yang C, Yang W, Zhai S. NALPA: a node ability based label propagation algorithm for community detection. IEEE Access. 2020;8:46642–64.
- 24. Yan R, Yuan W, Su X, Zhang Z. FLPA: a fast label propagation algorithm for detecting overlapping community structure. Expert Systems with Applications. 2023;234:120971.
- 25. Xu H, Ran Y, Xing J, Tao L. An influence-based label propagation algorithm for overlapping community detection. Mathematics. 2023;11(9):2133.
- 26. Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S. Finding statistically significant communities in networks. PLoS One. 2011;6(4):e18961. pmid:21559480
- 27.
Whang JJ, Gleich DF, Dhillon IS. Overlapping community detection using seed set expansion. In: Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2013. p. 2099–108. https://doi.org/10.1145/2505515.2505535
- 28. Ding X, Yang H, Zhang J, Yang J, Xiang X. Ceo: identifying overlapping communities via construction, expansion and optimization. Information Sciences. 2022;596:93–118.
- 29.
Wang C, Tang W, Sun B, Fang J, Wang Y. Review on community detection algorithms in social networks. In: 2015 IEEE International Conference on Progress in Informatics and Computing (PIC). 2015. p. 551–5.
- 30. Xie J, Kelley S, Szymanski BK. Overlapping community detection in networks: the state-of-the-art and comparative study. Acm Comput Surv. 2013;45(4):1–35.
- 31. Berahmand K, Bahadori S, Abadeh MN, Li Y, Xu Y. Sdac-da: Semi-supervised deep attributed clustering using dual autoencoder. IEEE Transactions on Knowledge and Data Engineering. 2024.
- 32. Berahmand K, Li Y, Xu Y. A deep semi-supervised community detection based on point-wise mutual information. IEEE Trans Comput Soc Syst. 2024;11(3):3444–56.
- 33. Zhou X, Su L, Li X, Zhao Z, Li C. Community detection based on unsupervised attributed network embedding. Expert Systems with Applications. 2023;213:118937.
- 34. Zhu X, Li X, Zhang S, Xu Z, Yu L, Wang C. Graph PCA hashing for similarity search. IEEE Trans Multimedia. 2017;19(9):2033–44.
- 35. Sun PG. Weighting links based on edge centrality for community detection. Physica A: Statistical Mechanics and Its Applications. 2014;394:346–57.
- 36. Wang J, Ren J, Li M, Wu F-X. Identification of hierarchical and overlapping functional modules in PPI networks. IEEE Trans Nanobioscience. 2012;11(4):386–93. pmid:22955967
- 37. Zhou X, Zhao X, Liu Y, Sun G. A game theoretic algorithm to detect overlapping community structure in networks. Physics Letters A. 2018;382(13):872–9.
- 38. Jie C, Binbin L, Shu Z, Z H A N G Y. Overlapping community detection model based on a modularity-aware graph autoencoder. Journal of Tsinghua University (Science and Technology). 2024;64(8):1319–29.
- 39. Yuan S, Zeng H, Zuo Z, Wang C. Overlapping community detection on complex networks with Graph Convolutional Networks. Computer Communications. 2023;199:62–71.
- 40. Sismanis K, Potikas P, Souliou D, Pagourtzis A. Overlapping community detection using graph attention networks. Future Generation Computer Systems. 2025;163:107529.
- 41. Zachary WW. An information flow model for conflict and fission in small groups. Journal of Anthropological Research. 1977;33(4):452–73.
- 42. Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM. The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology. 2003;54(4):396–405.
- 43. Gleiser PM, Danon L. Community structure in jazz. Advances in Complex Systems. 2003;6(04):565–73.
- 44. Boguna M, Pastor-Satorras R, Dıaz-Guilera A, Arenas A. Models of social networks based on social distance attachment. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics. 2004;70(5):056122.
- 45. Lancichinetti A, Fortunato S. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E Stat Nonlin Soft Matter Phys. 2009;80(1 Pt 2):016118. pmid:19658785