Figures
Abstract
The essence of the influence maximization (IM) problem is how to identify the set of seed nodes so that the node numbers ultimately affected in the network reach the maximum under a certain spreading model. In the field of influence maximization research, the investigation of seed nodes identifying algorithms is a hot yet challenging work. Although conventional greedy algorithms and heuristic algorithms have high performance, their efficiency remains a challenge when applied to large-scale social networks. In recent years, swarm intelligence-based optimization algorithms have seen increasing application in addressing this problem, with notable improvements in performance. However, the efficiency of these swarm intelligence-based algorithms still needs to be improved in large-scale social networks. Based on this issue, a parallel discrete crow search algorithm (PDCSA) designed for parallel computing is proposed. Based on the evolution characteristics, PDCSA makes full use of the efficiency advantage of parallel computing to improve the time efficiency of solving IM problems.The results of experiments conducted on six datasets show that PDCSA achieves performance comparable to state-of-the-art algorithms, with the added advantages of high efficiency and robustness.
Citation: Han L, Yang K, Ming Y, Tang J (2025) PDCSA: A parallel discrete crow search algorithm for influence maximization in social networks. PLoS One 20(8): e0329350. https://doi.org/10.1371/journal.pone.0329350
Editor: Kuldeep Singh, University of Delhi, INDIA
Received: November 25, 2024; Accepted: July 15, 2025; Published: August 5, 2025
Copyright: © 2025 Han et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: This work was partially supported by the Gansu Sci&Tech Program under Grant No. 22JR11RA134, Gansu Provincial Fund for Distinguished Young Scholars under Grant No. 23JRRA766, National Social Science Fund of China under Grant No. 21BTJ042, Financial Statistics Research Integration Team of Lanzhou University of Finance and Economics under Grant No.XKKYRHTD202304. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
There are many types of network graphs presently, including supply chain networks [1], software-defined networks [2], social network [3], etc. Based on social media, social networks like TikTok and Twitter are ubiquitous, as they enable people to follow what they are interested in, share what they find interesting, and get closer to their friends. These networks have become an indispensable part of real social interaction and significantly influenced people’s lives [4]. In today’s complex information environment, individuals are increasingly inclined to accept information provided by well-known individuals or reputable organizations. This tendency can be attributed to the desire to reduce the cognitive effort and resources required to independently verify the accuracy of information. Similarly, when a new product is launched, a critical challenge arises: how to strategically select an initial group of users within a social network to maximize the product’s word-of-mouth effect. Real-world problems of this nature can be abstracted into the Influence Maximization (IM) problem in social networks, which has a wide range of practical applications in marketing, information dissemination, and social behavior analysis. Since the initial proposal and formalization of the IM problem by Kempe et al. [5], many scholars have studied this problem and put forward many methods to solve it.
In this work, we propose a parallelized discrete crow search algorithm (PDCSA) to solve the problem of seed node set identification in large-scale social networks, which is appropriate for multi-threaded concurrent computing. In summary, the primary contribution of this work follows:
- Based on the network structure, the position vector and memory vector of the crow flock were encoded.
- An innovative search strategy that synergizes local search with global exploration through random walk-based diversification.
- The framework of Parallel Discrete Crow Search Algorithm was developed to enable efficient parallel computation for influence maximization in large-scale social networks.
The remainder of this article is structured as follows. The relevant research is reviewed in Related work section, while Preliminaries section depicts the preliminaries and definitions used in this investigation. The framework of PDCSA is presented in Proposed method section. The grid search strategy utilized to determine the optimal parameter values of the PDCSA for the IM problem, and the performance evaluation of PDCSA on six experimental networks, along with the analysis of the results, is presented in Experiments section. The concluding remarks and future work directions are presented in the last section.
Related work
The IM problem is an NP-Hard problem [6]. Addressing this problem involves tacking two core challenges: the accurate evaluation of node influence and the effective reduction of influence overlap among selected nodes. To solve these problems, researchers have explored diverse approaches and proposed numerous strategies aimed at improving both the effectiveness and efficiency of IM in social networks.
Centrality-based approaches
A common approach to addressing the first challenge involves utilizing various centrality index to quantify the influence of individual nodes, such as Degree Centrality [7], Betweenness Centrality [8], Closeness Centrality, Eigenvector Centrality, etc. Degree Centrality is determined by the number of edges connected to a node’s neighbors. A higher degree centrality indicates that a node has more connections, and thus potentially exerts greater influence within the network. Betweenness Centrality is based on the role a node plays in the shortest paths across the network. Specifically, it quantifies the proportion of all shortest paths between pairs of nodes that through the given node. This measure reflects the node’s influence under the assumption that a higher proportion of such paths indicates a more significant bridging role, and thus greater potential for influence within the network. Many other network centrality characteristics are available for node influence evaluation that are similar to Degree Centrality and Betweenness Centrality.
Heuristic algorithms
While node centrality-based methods are effective in identifying influential nodes, they generally fail to account for the overlapping influence among seed nodes. To address this limitation, scholars have proposed various optimization approaches aimed at mitigating the overlapping impact of influence between seed nodes. In the early stages of research on IM problem, Kempe et al. [5] proposed a greedy mechanism-based algorithm that demonstrated strong performance. This approach iteratively selects nodes that maximize the spread of influence by evaluating all nodes in the network. Obviously, the computational complexity of this method becomes high as network size increases, leading to severe efficiency issues. To address this limitation, CELF [9] and CELF++ [10] algorithms that based on the greedy mechanism are proposed to enhance the original greedy algorithm. Experiments demonstrate that the enhanced algorithms yield comparable performance to the original greedy algorithm, while exhibiting significantly improved time efficiency.
The rapid expansion of real-world networks has imposed higher requirements on the scalability and efficiency of algorithms. In recent years, significant research progress has been made in applying swarm intelligence algorithms to solve the IM problem. Various strategies have been developed to address this challenge using swarm intelligence techniques, such as the Artificial Bee Colony (ABC) [11], Ant Lion Optimizer (ALO) [12], Whale Optimization Algorithm (WOA) [13], Gray Wolf Optimizer (GWO) [14]. A summary of some of the pertinent study findings is given in Table 1. The summary in Table 1 indicate that, despite their effectiveness in identifying influential nodes, swarm intelligence-based approaches often entail high computational complexity when applied to the IM problem under standard propagation models.
Hybrid-based approaches
In recent years, scholars have integrated the fundamental concepts of network characteristics with swarm intelligence algorithms to enhance efficiency and performance. For example, in the DBA algorithm, if the local search is limited to the Clique structure, the convergence and stability of the algorithm will be improved [23]. Gong et al.[24] presented a community-based memetic algorithm to address the IM problem, and the author’s experimental results in a real social network showed that its performance was 12.5%,13.2% and 173.5% higher than the Degree Centrality, PageRank and Random algorithm, respectively. Taherinia et al. [25] introduced the LGFIM algorithm as a solution to the IM problem in large-scale social networks. This approach consists of two stages, the search space is optimized in the first stage using community detection as well, and the second stage uses three heuristic algorithms to adjust the optimal community structure. The increasing scale of real-world networks has driven researchers to prioritize not only the performance but also the computational efficiency of algorithms used to address IM problems.
The aforementioned hybrid algorithms, which combine swarm intelligence optimization algorithms with network centrality, significantly enhance effectiveness compared to single methods. However, in large-scale networks, its time complexity does not decrease but increases.
Deep learning-driven approaches
With the rapid advancement and success of deep learning across multiple domains, an increasing number of researchers have turned their attention to applying these techniques for solving the IM problem. Wang et al. [26] proposed an end-to-end trained dual-coupled graph neural network algorithm called DGN for the selection of seed nodes in IM problem. Chen et al. [27] introduced the ToupleGDD algorithm, which combines three coupled graph neural networks (GNNs) for network embedding to learn the network nodes, thereby identifying the set of the most influential nodes. Kumar et al. [28] utilized graph embedding and GNN to transform the IM problem into a pseudo-regression problem and proposed a method called SGNN algorithm to solve the problem of maximizing influence in large-scale social networks. Based on the adjacency matrix of the network structure and the convolutional neural network, Yu et al. [29] proposed an effective method RCNN to identify the set of key nodes with the strongest propagation in the networks. Ou et al. [30] integrated the multi-layer structure attributes into the RCNN model, resulting in the Multi-Channel RCNN (M-RCNN) model. The experiments showed that compared with RCNN algorithm, M-RCNN achieved an average accuracy improvement of 9.25%. While deep learning-based methods have shown considerable potential in solving the IM problem, they still face certain limitations in comparison with swarm intelligence-based and other conventional optimization methods. For instance, deep learning-based methods require label for training. However, in real-world networks, such label information is often either unavailable or of low quality. Moreover, most of these approaches neglect the overlapping influence among nodes.
Inspired by the operational principles of convolutional neural networks, this paper proposes a parallel computing-based framework for swarm intelligence optimization algorithm to address the IM problem. The approach effectively combines the strengths of swarm intelligence with the efficiency gains of parallel computing, achieving both high performance and improved computational speed.
Preliminaries
Definitions.
Definition 1 (Social Network) Give a graph network , V denotes a set of individuals and E denotes a set of relationships between individuals.
and
are the number of nodes and edges in the graph network, respectively.
Definition 2 (Multithreaded parallel computing) The technique of multithreaded parallel computing is extensively utilized in modern multi-threaded operating systems that run on multi-core processor architectures. It allows for the simultaneous execution of multiple threads on a single machine, where each thread can handle a separate task.
Definition 3 (N-hop neighbor node set) The N-hop neighbor nodes of a node are those nodes that can be reached from the node through the minimum and exact number of N edges. A set of such nodes with these characteristic forms the N-hop neighbor set of the node.
Definition 4 (Seed node set) A collection of nodes in the set that can both act as the source of some information and output some information to its direct neighbors is known as the seed node set. In social networks, it can be expressed as .
Definition 5 (Influence Maximization) For a given seed node set S and a specific diffusion model P, the influence spread P(S) is defined as the expected number of influenced nodes after the completion of the influence propagation process. The goal of influence maximization (IM) is to select a seed set of size K such that
is maximized, i.e.,
.
Definition 6 (Overlapping Influence) The overlapping influence that exists among nodes in the actual network. Let the influence of node be denoted by
, and that of node
by
. Let
represent the combined influence of nodes
and
. In real-world networks, there typically exists
, reflecting the presence of overlapping influence.
Diffusion model
To effectively analyze and solve the IM problem, it is essential to adopt an appropriate diffusion model that simulate the mechanisms of influence spread across the network. The basic influence propagation models commonly used in social network analysis include the Independent Cascade (IC) model, Weighted Cascade (WC) model, Linear Threshold (LT) model, and Susceptible Infection (SI) model. In each of these models, a node can exist in one of two states: activated (influenced) or inactive (uninfluenced). From the existing research, many investigations on the IM problem based on IC model and WC model have been conducted [15–18]. In these two models, influence propagates from the seed nodes along the time series. For example, only the seed nodes are active at time, and these nodes try to activate the inactive state nodes in their one-hop neighborhood with probability p, and P(S) nodes are successfully activated. At time T + 1, the set of nodes in the active state is
, and the inactive state nodes in the one-hop neighborhood of these active state nodes are repeatedly activated in the same way. Until no additional nodes are activated in the network, this activation process keeps going. Let the probability that node v is successfully activated by node u be
, then
can be expressed as:
In the IC model, the activation probability is determined by Eq (1), which is an independent constant. In the WC model, the activation probability
is determined by Eq (2), which is determined by the degree of the node v. In this paper, the IC model is used to simulate influence spread.
Fitness function
In treating the IM problem as an optimization task, the influence spread is often approximated by a simplified fitness function, which serves as the target for optimization. Various optimization strategies are employed to identify the set of seed nodes that maximizes this function. Jiang et al. [17] presented an Expected Diffusion Value (EDV) function that is used to approximate the influence of the seed node set in simulated annealing algorithm. The EDV function approximates the influence of the seed node set based on the one-hop near-neighborhood node set of the seed node set. Gong et al. [18] introduced the Local Influence Estimation (LIE) objective function based on two-hop neighbor nodes of seed nodes, which has a good approximate effect in many optimization algorithms [19,23]. The LIE function is shown as:
where is the sum of node degrees in a node’s one-hop neighborhood; and
is the sum of node degrees in the two-hop neighborhood of the node.
represents the probability that one node will activate its neighboring nodes successfully every time.
is the number of edges between the set of one-hop neighbors and the set of two-hop neighbors of node u. In this work, the LIE function serves as an approximate evaluation of the local influence of a node.
Basic crow search algorithm
The basic Crow Search Algorithm (CSA) is a bionic intelligence optimization algorithm that is derived from the behavior of crows to hide and find food [31]. The basic CSA models the foraging behavior of a group of N crows in a d-dimensional search space. Each crow is represented by a position vector that denotes its current location, and a memory vector that stores its best-found solution over iterations. The position vector of crow i in the t-th generation iteration can be expressed as , where
is the maximum iteration. The memory vector of crow i can be expressed as vector
, which is the best location crow i stored its food. The food hiding location of each crow in the flock throughout the iteration is stored in a memory matrix M.
The optimization mechanism of basic CSA is inspired by the greedy behavior of crows. The algorithm simulates the process by which crows follow each other to exploit food sources more effectively, guiding the search toward optimal solutions. At iteration t, let denote the food source location associated with crow j. Crow i may choose to follow crow j in an attempt to access this potentially food source, thereby updating its own position
based on this observation. In this case, the new location vector
is determined by whether crow j perceives the tracking behavior of crow i. This behavior of tracking versus anti-tracking can be expressed as:
where is a uniformly distributed random number in the range of 0 and 1,
represents the perceived probability of crow j,
denotes the flight length of crow i in the current iteration.
and
are two important parameters that control the search range of the algorithm, where
represents the search length of the crow, and its meaning in the basic CSA algorithm is shown in Fig 1(a).
Proposed method
By discretizing the position and memory vectors of the basic CSA and reconstructing its search mechanism, a Parallel Discrete Crow Search Algorithm (PDCSA) tailored for solving the IM problem is proposed in this paper.
Discretized coding and optimization rules
The implementation of the PDCSA begins with a preprocessing stage in which all nodes in the network are assigned unique identifiers. This is followed by the discrete encoding of the position and memory vectors. Based on this encoding, the algorithm reconstructs its search and update mechanisms to work within the discretized framework.
Network node recoding: Given a network G, let the number of nodes be N, and re-encode each node with a positive integer from 1 to N to ensure the uniqueness of node number in the network. The position vector and memory vector are defined and encoded based on the network node recoding.
Position vector encoding: With K designated as the number of seed nodes, the position and memory vectors each consist of K nodes that represent a current or previously found optimal solution. The position vector of crow i can be expressed as where
represents the node code of network in the PDCSA. The position vector of crows in generation t is expressed as:
where M represents crows in the crow population.
Memory vector encoding: Memory vector is employed to maintain the optimal solution throughout the search process. If t is the current iteration number, the memory vector saves the food hiding position of the crow after the search in the previous t-1 generation, which can be expressed as:
where represents the best location to hide food for crow i in the t-1 generation.
The encoding of these three types of vectors and their relationships are shown in Fig 2(b). As the optimization proceeds, the position and memory vectors of each crow are updated by replacing their constituent nodes to improve solution quality.
Optimization rules reconstruction: Based on the above encoding rules, the discretization optimization process of PDCSA is constructed as:
where the symbol represents a logical intersection operation that aims to determine if two vectors contain identical nodes. If the node
in the memory vector
exists in the position vector
this operation returns 0, otherwise, returns 1. For example, if the memory vector of crow j is
, and the position vector of crow i
, the crossover operation is shown in Fig 2
is a local search mechanism limited by parameters
and
where
represents the probability of randomly selecting the nearest neighbor node of a node in the current position vector.
represents the S-hop range of the nearest neighbor of a node in the current position vector and its meaning is shown in Fig 1(b) The operator
in Eq (7) is used for the replacement operation. If the result value of the corresponding position after the
operation is 1, the node is updated and replaced by the
operation.
Framework of PDCSA
Based on the basic CSA algorithm framework, along with the aforementioned discrete encoding strategy and optimization rules, the PDCSA algorithm is structured into five main steps:
Step 1 Initialization. Set the control parameters: crow population size M, seed node set size K, maximum iterations , local search range s, perception probability AP. Initialize the position and memory vectors of each crow with K nodes.
Step 2 Calculation of the LIE value. Using Eq (3), the algorithm evaluates the LIE value for each node present in the position vector of every crow. This step supports subsequent optimization decisions by identifying nodes with higher local influence potential.
Step 3 Generation of New Position Vectors. The new position vector of Crow i is generated based on either Eq (7) or Eq (8). Subsequently, the LIE values of all nodes in the new vector are evaluated, and the most favorable LIE value is selected to form the updated position vector.
Step 4 Update of the Memory Vector. According to Eq (9), the memory vector of each crow is conditionally updated based on the comparison between the LIE values of the current and previous position vectors. Specifically, if the influence potential of the new position vector exceeds that of the former, the memory vector is replaced with the new one; otherwise, it remains unaltered.
Step 5 Termination Check. The algorithm proceeds to check whether the current iteration count has reached . If not, the optimization loop (Steps3–4) continues. Once the maximum iteration limit is attained, the algorithm identifies the globally best-performing seed node set by selecting the memory vector with the highest LIE value as the final solution.
The flowchart depicting the overall structure and execution steps of the proposed PDCSA algorithm is shown in Fig 3.
The implementation of PDCSA
A detailed description of the PDCSA algorithm for solving IM problems, based on the five steps outlined above, is provided in Algorithm 1. It is worth mentioning that when initializing the position vector of the crows, we select nodes with the highest degrees from the network to form the initial seed node set. This strategy helps accelerate the convergence speed by starting the search from more influential candidate solutions. Algorithm 1 consists of two key functions: LocalSearch(), which updates the position vectors based on Eq (7) to enhance solution accuracy, and RandomExploration(), which employs Eq (8) to diversify the search process and escape local optima.
The PDCSA algorithm is designed with a decentralized structure: each crow operates with its own position and memory vectors, as depicted in Fig 2(b). The optimization proceeds by iterating over M loops, where each loop involves only pairwise interactions between crows. As these operations do not require shared computation or synchronization among crows, the algorithm naturally lends itself to parallel execution.
In this study, OpenMP is used to conduct parallel computing experiments. In the Algorithm 1, the statement #pragma omp parallel for specifies the parallel region and instructs the compiler that the following for loop should be executed in parallel. meaning that the iterations of the loop are distributed across multiple threads for concurrent execution. In the experimental environment with multicore multithreaded processors, when performing parallel computing in C++, only Step 3 and Step 4 of PDCSA framework that require parallel computation need to be marked with the #pragma omp parallel directive at the beginning of their respective code. After that, the compiler will automatically use multi-thread for independent iterative operations. In the PDCSA algorithm framework, after Step 3 and Step 4, use #pragma omp barrier directive to wait for all threads to complete their tasks before proceeding with the selection of the local optimum, thereby completing one iteration.
Algorithm 1. Framework of PDCSA algorithm based on parallel computing
Input: G = (V, E), seed node set size K, maximum number of iterations gmax, Crow population size N
Output: The best seed set Snode
1: Initialize iterator g = 0
2: Define the perception probability AP
3: Position vector X ← Select K*N nodes
4: Memory vector M ← Position vector X
5: WHILE g < gmax {
6: #pragma omp parallel for
7: FOR i = 1:M (all crows of the flock) {
8: A crow j is randomly selected as the tracked object
9: A random number rj is generated
10: if rj ≥ AP
11: Xi ← LocalSearch(V): According to Eq. (7)
12: else
13: Xi ← RandomExploration(V): This corresponds to Eq. (8)
14: endif
15: }
16: #pragma omp barrier
17: Evaluate the new position vector of the crows
18: Update the memory vector of crows according to Eq (9)
19: }
20: Snode ← Select the Max LIE value from the memory vectors
Local and global search mechanisms
Local search strategy.
In the PDCSA algorithm, the execution of a local search is determined by the perception probability AP. Specifically, a local search is conducted when the random probability satisfies
. Since the position vector is composed of K nodes, the particular node xi selected for local search within the position vector is determined through the intersection operation “
” between the memory vector m and the position vector x, as formally defined in Eq (10):
A local search operation is applied exclusively to node xi under the condition that i = 1. The local search procedure is mathematically formulated in Eq (11):
where denotes the set of S-hop neighboring nodes of node
, and the random number
is used to select a node within this neighborhood.
Based on the aforementioned formulation of search strategy, the local search rule is defined as Eq (7). Specifically, the algorithm computes the intersection between the current position vector
of crow I and the memory vector
of the tracked crow j. This intersection operation identifies the different nodes between the two vectors. A decision to perform a local search around node in the position vector is determined by the outcome of the intersection result. A value of 1 at a specific position indicates that the neighborhood of this node requires a local search. Conversely, if the value is 0, the node remains unaltered.
Algorithm 2. Local search strategy based on neighbor node domain
Input: xit, mjt, S
Output: xi_new
1: Vnode← mjt∩ xit
2: For(i = 1:N)do {
3: IF Vnode(i)==1 THEN
4: NBNodeSETs ←CalculateNBNode(Vnode(i),S)
5: LIEValue = LIECalculate(xit,NBNodeSETs)
6: xi_new ← MaxLIENode(LIEValue)
7: END IF
8: }
Given that crows’ positions are represented as set of nodes in the network, their flight length is not Euclidean but rather a shit among connected nodes. Therefore, the local search scope is naturally constrained to the nodes included in the current position vector, guiding the search toward locally optimal solution. Fig 1(b) shows the local search mechanism based on the S-hop neighbor node set. In the S-hop local search mechanism, the value of S is natural number, i.e., . The parameter S determines the depth of the local search within the network. For S=1, the search is confined to immediate neighbors of the node for which the intersection operation result in the position vector is 1, as shown by the dotted circle in Fig 1(b). For S=2, the search includes nodes at two hops domain, indicated by the solid circle. Higher values of S enlarge the search scope to encompass more flight distant neighbors. The S-hop-based local search mechanism is described in Algorithm 2.
In Algorithm 2, Function CalculateNBNode() computes the S-hop neighbor node domain of node Vnode (i) and returns its S-hop neighbor node set, and its pseudo-code is described in Algorithm 3. The function LIECalculate() calculates the LIE value of each node in the position vector replaced by the node of NBNodeSETs set. The function MaxLIENode() selects the node that has the best LIE value from the NBNodeSETs set. The function MaxLIENode() selects the node that has the best LIE value from the NBNodeSETs node-set returned by the function LIECalculate(), replaces the corresponding node in the position vector
, and there are no duplicate nodes in the position vector xi_new after the replacement operation.
Algorithm 3. CalculateNBNod(): Search for S-hop neighbor node-set
Input: S, Vnode(i)
Output: The S-hop area node set: Nodeset
1: Neighbors ← DirectNeibors(Vnode(i))
2: Nodeset ← Neighbors
3: Neighbor ← Ø
4: Step = 1
5: WHILE(Step <S) {
6: FOR node ∈ Neighbors do
7: Nodeset← Nodeset ∪ DirectNeibors(node)
8: Neighbor ← Neighbor ∪ DirectNeibors(node)
9: END FOR
10: Neighbors ← Neighbor
11: Step ← Step +1
12: }
13: Nodeset←RemoveDuplicates(Nodeset)
In Algorithm 3, the first-order neighbors of node are stored into node set Neighbors, and then the node set Neighbors is traversed to find out the neighbor nodes of its S-hop and merged into the node set NodeSet. After the S-hop neighbor nodes of each node are traversed, duplicate nodes in NodeSet are removed to ensure that there are no duplicate S-hop neighbor nodes.
Global exploration strategy
Global exploration is carried out when the randomly generated probability value is less than the perceived probability AP. To ensure the global exploration capability of the algorithm, the DCSA algorithm adopts random numbers based on uniform distribution to realize random walks. Given that the number of network nodes is V, the number of seed nodes is K, and the size of the crow swarm is N. The global search space L is mathematically represented as the collection of all possible subsets of size K selected from a set of candidate nodes. Let
, the solution space is defined as in Eq (12), and its size, denoted by
, is specified in Eq (13).
Given a perception probability , the probability of triggering a global search in any iteration is
. Assuming the search process follows a geometric distribution, the expected number of iterations t can be derived as:
It is evident that the random walk-based global search asymptotically converges to the global optimal solution over the course of the iterative process. The random walk strategy is described in Algorithm 4.
Algorithm 4. Global exploration based on random walk strategy
Input: seed node set size K
Output: Xinew
1: index← 1
2: Xinew←
3: WHILE index <= K DO {
4: Nodetemp← RandomSelect(V, index)
5: IF Nodetemp not in Xinew THEN
6: Xinew ← Xinew ∪ Nodetemp
7: END IF
8: index = index+1
9: }
Computational complexity
The performance of an optimization algorithm in solving IM problems is typically assessed based on two aspects: its computational complexity, indicating runtime efficiency, and its effectiveness, which evaluates how well the algorithm identifies influential seed nodes. In this section, we make a theoretical analysis of the computational complexity of the PDCSA algorithm. Given the number of network nodes be N, the maximum number of iterations , the seed node set size of K and the average degree of the network be
. The computational complexity of the algorithm is as follows without considering the parallelization operation. In the local search of the S-hop, the time complexity of the search nodes’ S-hop neighbor operation is
, the update Xi operation is
. The time complexity of global search operation based on random walk is
, updating memory vectors is
, calculating LIE value is
. Therefore, the time complexity of the above calculation is
. According to the operation rule of the symbol O, the computational complexity of the PDCSA is
.
Based on the above analysis of computational complexity, it can be observed from Table 1 that the computational complexity of the PDCSA algorithm is comparable to, and in some cases slightly better than, that of state-of-the-art swarm intelligence algorithms. A comparison shows that the computational complexity of the PDCSA algorithm is comparable to that of DPSO and ELDPSO, but inferior to that of DBA and IM-SSO algorithms. The aforementioned comparison is derived from theoretical analysis. It is noteworthy that the PDCSA algorithm inherently amenable to parallel implementation on multi-core processors. Consequently, when the speedup from parallelization is considered, its actual efficiency has the potential to surpass that of state-of-the-art swarm intelligence optimization algorithms.
Experiments
In this section, a series of experiments are conducted to evaluate the performance of the PDCSA algorithm. First, we determine the optimal parameter settings for PDCSA across six experimental networks. Next, the LIE values are compared with those of other swarm intelligence-based algorithms. Finally, the performance and the time efficiency are compared with the state-of-the-art algorithms.
The relevant algorithm is implemented in C++, and the experiments are carried out on a PC platform equipped with an Intel(R) Cores (TM) i7-8700 CPU running at 4.6GHz with 32GB of RAM. The Independent Cascade (IC) propagation model is used for influence spread evaluation, and the maximum size of the seed node set is set to 50 in all experiments.
Datasets and baseline algorithms
To evaluate the effectiveness of the proposed PDCSA algorithm, six large-scale real-world networks are selected for experiment. Table 2 summarizes the characteristics of the six experimental networks used in this study. Among them, SynRand is a synthetically generated network with a Gaussian degree distribution, consisting of 14,991 nodes and 56,152 edges. PGP is a social network based on the Pretty Good Privacy (PGP) encryption algorithm, modeling trust relationships among 10,680 individuals who exchange encrypted messages. CondMat is a collaborative network for published articles in the field of physics. Slashdot is a social network constructed from user interactions on the technology news website. All networks are obtained from the Stanford Network Analysis Project (SNAP) repository.
In the evaluation of LIE values, algorithms DPSO [18], DBA [16], AMPDE [36], DPSO_NDC [37] and Clique_DBA [23] are chosen as comparative baselines. A Distinguishing feature of these algorithms is their shared use of the LIE function as the optimization target in the optimization process. The other three most advanced algorithms serve as baseline algorithms to compare the performance of PDCSA, which are CELF [9], Greedy algorithm [5] and CLDE [38]. A comparative experiment is carried out under the propagation probability p = 0.01 and p = 0.05.
- CELF (Cost-Effective Lazy Forward) is an improved greedy algorithm based on strategy lazy forward-selection, which significantly improves efficiency while maintaining high performance.
- DBA (Discrete Bat Algorithm) is a discrete Bat Algorithm based on the network structure, which mainly simulates the foraging behavior of bat swarms to achieve optimization. Experimental results show that the DBA algorithm has advantages in both performance and efficiency in large-scale networks.
- DPSO (Discrete Particle Swarm Optimization) is a swarm intelligent optimization algorithm based on recoding and reconstructing the evolutionary rules of the basic Particle Swarm Optimization algorithm to maximize the influence of social networks.
- The Greedy algorithm is based on the greedy strategy, which is based on the principle of selecting a node with the most significant objective function value from all nodes.
- CLDE (Competitive Learning-driven Differential Evolution) employs a competitive mechanism in which individuals are randomly paired within the population, and each pair competes based on their fitness values to select superior solutions.
- AMPDE (Adaptive Multiple Probabilistic Differential Evolution algorithm) incorporates an adaptive local search mechanism, designed to enhance the search for the optimal solution through the utilization of structural hole nodes and their neighborhoods.
Parameter configuration
The PDCSA algorithm introduces two primary control parameters when addressing the IM problem in social networks: the nearest neighbor domain S and the awareness probability AP. We determine their optimal values through systematic experimentation on a representative network structure, ensuring balanced exploration and exploitation during optimization.
A systematic grid search was performed using four representative real-world networks, ConMat, SynRand, PGP and Email, to identify the most effective parameter configurations for the PDCSA algorithm. First, the four experimental networks exhibit significant differences in scale. Second, from the perspective of network structure, they include both networks with Gaussian degree distribution (e.g., SynRand) and those with power-law degree distribution (e.g., CondMat, PGP). In the experiment, the crow population size N was set to 30, and the number of seed nodes K was fixed at 30. The awareness probability AP was varied incrementally from 0.1 to 0.9, and the nearest neighbor domain S was expanded from 1 to 5 hops. For each combination of parameter settings, 50 independent runs were conducted to ensure statistical reliability. The average influence spread across these 50 runs was then computed and used to generate the three-dimensional bars shown in Figs 4 and 5. Fig 4 shows the statistical diagram of the LIE value when the propagation probability p = 0.01. As shown in the Fig 4, the optimal LIE values cross the four networks are observed in the parameter region around AP = 0.6 and S=3. As shown in Fig 5, depicting the distribution of LIE values at a propagation probability of p = 0.05, the optimal parameter configuration corresponds well with the results observed in Fig 4. This consistency suggests that the optimal parameter setting is relatively insensitive to variations in propagation probability.
In the PDCSA algorithm, the number of iterations is directly related to the algorithm’s convergence. Under the previously determined optimal parameter settings (AP=0.6, S=3, K=30), we conducted an exhaustive search to track the evolution process across the four experimental networks under the propagation probability of p=0.01. These observations are graphically presented in Fig 6. It can be observed that the global optimal solution is reached within 200 iterations. Therefore, in the PDCSA algorithm, the maximum number of generations
is set to 200.
Comparison of LIE
To evaluate the effectiveness of the PDCSA algorithm, comparative experiments based on the LIE values were conducted across six real-world networks. The experiments were performed under varying sizes of the seed node set K, with values set to 5,10,15,20,25,30,40, and 50, under the propagation probability of p = 0.01. As baseline algorithm, four swarm intelligence algorithms, DPSO, DBA, AMPDE, and CLDE, as well as two improved variants, DPSO_NDC and Clique_DBA, were selected for comparison. The algorithms involved in the experiment were successfully run 10 times in each of the six real-world networks, and the average value of LIE in the 10 experiments was evaluated. Fig 7 presents the LIE value evolution curves of the compared algorithms under a propagation probability of p = 0.01.
Fig 7 indicates that, the PDCSA achieves LIE values on par with AMPDE, but superior to DPSO and DBA in large-scale networks, as illustrated in Figs 7(a), 7(e) and Fig 7(f). Furthermore, in small and medium-sized networks, its effectiveness are also outperforming the other swarm intelligence algorithms, as shown in Figs 7(b) and Fig 7(c).
Performance comparison
To evaluate the performance of the proposed PDCSA algorithm, we conducted comparative experiments using the six state-of-the-art algorithms introduced in Section 5.1. The IC model was employed to assess influence diffusion, and Monte Carlo (MC) simulation was used to perform 1000 independent propagation runs on the identified seed node sets. The average influence spread across these simulations was taken as the performance metric, providing a reliable estimate of each algorithm’s effectiveness under the IC model. Under the conditions that the propagation probability p is set to 0.01, and with seed node set sizes ranging from 5 to 50 (i.e., K = 5,10,15,20,25,30,40, 50), the influence spread was evaluated using MC simulation. The resulting average propagation range curves are presented in Fig 8.
Figs 8(a) and 8(d) show that, at p = 0.01, PDCSA achieves influence spread levels nearly on par with the CELF and Greedy algorithms in the large-scale networks CondMat and Slashdot. This indicates that PDCSA can effectively approximate the performance of traditional greedy methods while maintaining the efficiency and scalability benefits of swarm intelligence-based metaheuristics.
Running time comparison
To validate the computational efficiency of the PDCSA algorithm, we conducted runtime comparisons with other leading algorithms under identical experimental settings (K = 30, p = 0.01). The average processing times across all six experiment networks were recorded and visualized in Fig 9.
As shown in Fig 9, the general Greedy and CELF algorithms exhibit the highest computational cost across all six experimental networks. Among the remaining four metaheuristic algorithms, the execution times are relatively comparable; however, the PDCSA algorithm demonstrates the highest time efficiency, which can be attributed to its parallel computing capabilities. One of the most distinctive advantages of the PDCSA algorithm lies in the independence of its iterative evolution stages, which enables efficient parallel execution across multi-core architectures. As demonstrated in the runtime comparisons, PDCSA achieves the same or better solution quality as leading algorithms, but with significantly reduced computational time, making it particularly well-suited for large-scale IM tasks. The larger the network scale, the more nitid the time efficiency.
Conclusions
As social networks grow exponentially in size, traditional algorithms face scalability issues that limit their applicability. One of the most pressing concerns is how to enhance algorithmic efficiency without compromising the quality of results. This challenge motivates the development of novel approaches capable of handling large and complex network structures efficiently. This study introduces a novel metaheuristic approach called Parallel Discrete Crow Search Algorithm (PDCSA), tailored for solving the IM problem in social networks. By integrating parallel computing techniques into its evolution process, PDCSA significantly improves computational efficiency without compromising the effectiveness of the identified seed node set. Table 3 provides a summary of the experimental results for the PDCSA algorithm in comparison with advanced algorithms across four networks (CondMat, SynRand, Slashdot and Email), under the configuration parameters K = 30 and p = 0.01. As can be observed from the data presented in Table 3, while maintaining solution quality on par with greedy algorithms such as CELF, the PDCSA algorithm offers superior time efficiency compared to other advanced IM methods. This combination of effectiveness and efficiency makes it particularly suitable for real-world applications involving hyper-scale networks.
To efficiently address the problem of influence maximization in large-scale network structure, several important research directions warrant further investigation. First, the application of deep learning techniques to influence maximization in ultra-large-scale networks presents a promising avenue for developing more efficient and scalable algorithms. Exploring such data-driven approaches could lead to significant improvements in both solution quality and computational efficiency. Second, an essential prerequisite for any effective optimization algorithm is the accurate and efficient assessment of node local influence. Therefore, investigating novel methods for evaluating local influence, particularly those that balance computational cost with performance, is a crucial direction for future work.
References
- 1. Gao N, Han D, Weng T, Xia B, Li D, Castiglione A, et al. Modeling and analysis of port supply chain system based on Fabric blockchain. Comput Industrial Eng. 2022;172(1):108527.
- 2. Taurshia A n t o n y, G. K a t h r i n e, Alireza S o u r i. Software-defined network aided lightweight group key management for resource-constrained Internet of Things devices. Sustainable Computing: Informat Syst. 2022;36(1):100807.
- 3. Zhang S, Xu H, Zhu G, Chen X, Li K-C. A data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews. Soft Computing. 2022;26(1).
- 4. Li P, Lin Z, Li K, Bhalla S. Hot topics with decaying attention in social networks: Modeling and analysis of message spreading. Physica A: Statistical Mechanics and its Applications. 2023;625:129006.
- 5.
Kempe D, Kleinberg J, Tardos É. Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. 2003. 137–46. https://doi.org/10.1145/956750.956769
- 6.
Chen W, Wang C, Wang Y. Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 2010. 1029–38. https://doi.org/10.1145/1835804.1835934
- 7. Freeman LC. Centrality in social networks: conceptual clarification. Social Network.1979; 1(3):215–39.
- 8. Freeman LC. A set of measures of centrality based on betweenness. Sociometry. 1977;40(1):35–41.
- 9.
Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance N. Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 2007. 420–9. https://doi.org/10.1145/1281192.1281239
- 10.
Goyal A, Lu W, Lakshmanan LVS. CELF++ Proceedings of the 20th international conference companion on World wide web. 2011. 47–8. https://doi.org/10.1145/1963192.1963217
- 11. Karaboga D, Basturk B. A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J Glob Optim. 2007;39(3):459–71.
- 12. Mirjalili S. The Ant Lion Optimizer. Ad Eng Software. 2015;83:80–98.
- 13. Mirjalili S, Lewis A. The Whale Optimization Algorithm. Ad Eng Software. 2016;95:51–67.
- 14. Mirjalili S, Mirjalili SM, Lewis A. Grey Wolf Optimizer. Ad Eng Software. 2014;69:46–61.
- 15. Tang J, Zhang R, Wang P, Zhao Z, Fan L, Liu X. A discrete shuffled frog-leaping algorithm to identify influential nodes for influence maximization in social networks. Knowledge-Based Systems. 2020;187:104833.
- 16. Tang J, Zhang R, Yao Y, Zhao Z, Wang P, Li H, et al. Maximizing the spread of influence via the collective intelligence of discrete bat algorithm. Knowledge-Based Systems. 2018;160:88–103.
- 17. Jiang Q, Song G, Cong G. Simulated Annealing Based Influence Maximization in Social Networks. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence. San Francisco, California, USA; 2011.
- 18. Gong M, Yan J, Shen B, Ma L, Cai Q. Influence maximization in social networks based on discrete particle swarm optimization. Information Sci. 2016;367–368:600–14.
- 19. Tang J, Zhang R, Yao Y, Yang F, Zhao Z, Hu R, et al. Identification of top-k influential nodes based on enhanced discrete particle swarm optimization for influence maximization. Physica A: Statistical Mechan Applicat. 2019;513:477–96.
- 20. Wang L, Ma L, Wang C, Xie N-G, Koh JM, Cheong KH. Identifying Influential Spreaders in Social Networks through discrete moth-flame optimization. IEEE Trans Evol Computat. 2021;25(6):1091–102.
- 21. Singh SS, Kumar A, Singh K, Biswas B. IM‐SSO: Maximizing influence in social networks using social spider optimization. Concurrency Computat. 2019;32(2).
- 22. Singh SS, Singh K, Kumar A, Biswas B. ACO-IM: maximizing influence in social networks using ant colony optimization. Soft Comput. 2019;24(13):10181–203.
- 23. Han L, Li K-C, Castiglione A, Tang J, Huang H, Zhou Q. A clique-based discrete bat algorithm for influence maximization in identifying top-k influential nodes of social networks. Soft Comput. 2021;25(13):8223–40.
- 24. Gong M, Song C, Duan C, Ma L, Shen B. An Efficient Memetic Algorithm for Influence Maximization in Social Networks. IEEE Comput Intell Mag. 2016;11(3):22–33.
- 25. Taherinia M, Esmaeili M, Minaei-Bidgoli B. A high-performance algorithm for finding influential nodes in large-scale social networks. J Supercomput. 2022;78(14):15905–52.
- 26. Wang J, Cao Z, Xie C. DGN: influence maximization based on deep reinforcement learning. The J Supercomputing. 2025;81(1):1–26.
- 27. Chen T, Yan S, Guo J. ToupleGDD: A Fine-Designed Solution of Influence Maximization by Deep Reinforcement Learning. IEEE Transactions on Computational Social Systems. 2024(2):11.
- 28. Kumar S, Mallik A, Khetarpal A. Influence maximization in social networks using graph embedding and graph neural network. Information Sciences: An International J. 2022;607:1617–36.
- 29. Yu E, Wang Y, Fu Y. Identifying critical nodes in complex networks via graph convolutional network. Knowledge-Based Syst. 2020;198:105893.
- 30. Ou Y, Guo Q, Xing J. Identification of spreading influence nodes via multi-level structural attributes based on the graph convolutional network. Expert Syst Appl. 2022;203:117515.
- 31. Askarzadeh A. A novel metaheuristic method for solving constrained engineering optimization problems: crow search algorithm. Comput Structures. 2016;169(jun.):1–12.
- 32. Guimerà R, Danon L, Díaz-Guilera A, Giralt F, Arenas A. Self-similar community structure in a network of human interactions. Phys Rev E Stat Nonlin Soft Matter Phys. 2003;68(6 Pt 2):065103. pmid:14754250
- 33. Lu F, Zhang W, Shao L, Jiang X, Xu P, Jin H. Scalable influence maximization under independent cascade model. Journal of Network and Computer Applications. 2017;86:15–23.
- 34. Gregory S. Finding Overlapping Communities Using Disjoint Community Detection Algorithms. Studies in Computational Intelligence. Springer Berlin Heidelberg. 2009. p. 47–61.
- 35. Leskovec J, Kleinberg J, Faloutsos C. Graph evolution. ACM Trans Knowl Discov Data. 2007;1(1):2.
- 36. Tang J, Du Q. An adaptive differential evolution algorithm driven by multiple probabilistic mutation strategies for influence maximization in social networks. Int J Mod Phys C. 2024;36(06).
- 37. Han L, Zhou Q, Tang J, Yang X, Huang H. Identifying Top-k Influential Nodes Based on Discrete Particle Swarm Optimization With Local Neighborhood Degree Centrality. IEEE Access. 2021;9:21345–56.
- 38. Chai B, Zhang R, Li X. CLDE: a competitive learning-driven differential evolution optimization for the influence maximization problem in social networks. J Supercomputing. 2025;81(5).