Figures
Abstract
This paper investigates the asymptotic cluster synchronization of Boolean control networks (BCNs) under denial-of-service (DoS) attacks, where each state node in the network experiences random data loss following a Bernoulli distribution. First, the algebraic representation of BCNs under DoS attacks is established using the semi-tensor product (STP) of matrices. Using matrix-based methods, some necessary and sufficient algebraic conditions for BCNs to achieve asymptotic cluster synchronization under DoS attacks are derived. For both model-based and model-free cases, appropriate state feedback controllers guaranteeing asymptotic cluster synchronization of BCNs are obtained through set-iteration and double-deep Q-network (DDQN) methods, respectively. Besides, a double reinforcement learning algorithm is designed to identify suitable state feedback controllers. Finally, a numerical example is provided to demonstrate the effectiveness of the proposed approach.
Citation: Deng W, Huang C, Shuai Q (2025) Double reinforcement learning for cluster synchronization of Boolean control networks under denial of service attacks. PLoS One 20(7): e0327252. https://doi.org/10.1371/journal.pone.0327252
Editor: Claudio Zandron, University of Milano–Bicocca: Universita degli Studi di Milano-Bicocca, ITALY
Received: April 4, 2025; Accepted: June 11, 2025; Published: July 3, 2025
Copyright: © 2025 Deng et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting information files.
Funding: This work was supported by the Sichuan Science and Technology Program (https://kjt.sc.gov.cn) under Grants 2024NSFSC0527 and 2024ZYD0183 (both received by Chi Huang). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Boolean Networks (BNs), first introduced by Kauffman in 1969 as fundamental computational models for gene regulatory networks [1], have evolved into a versatile paradigm for analyzing complex dynamical systems. By abstracting system components as binary-state nodes (“on"/“off") governed by logical interaction rules, BNs achieve remarkable balance between computational tractability and biological plausibility [2,3]. This unique characteristic has fueled their widespread adoption across diverse domains, from cellular differentiation modeling [4,5] to power grid stability analysis [6]. The subsequent development of Boolean control networks (BCNs) by Akutsu et al. [7] introduced external control inputs, creating a powerful framework for studying targeted intervention strategies in networked systems. Recent advances in semi-tensor product (STP) theory [8] have further propelled BCNs research by enabling rigorous algebraic treatment of logical dynamics [9–11], as evidenced by emerging applications in smart grids [6], filter design [12] and multi-agent systems [13].
Due to their simple yet efficient modeling characteristics in gene regulatory networks [1,7], BNs and BCNs have been widely applied in real-world applications such as smart healthcare, intelligent home automation, smart transportation, and robotics. Particularly, therapeutic interventions for Parkinson’s disease [14], mathematical formulation of context-aware systems [15], optimal control design for urban traffic flow management [16], and robotic control architectures [17] have demonstrated based on BNs or/and BCNs methods. Moreover, in recent years, the financial domain has emerged as a particularly compelling application scenario for BNs and BCNs modeling [18]. Modern financial ecosystems comprise intricately interconnected entities—banks, investment firms, clearinghouses, and digital platforms—that exhibit nonlinear interdependencies akin to biological networks [19]. Network-based analyses have successfully captured phenomena like risk contagion in interbank markets [20] and crisis propagation dynamics [21]. However, current approaches predominantly employ continuous-variable models that may obscure essential discrete decision-making processes. This gap motivates our investigation of BCNs as a novel modeling paradigm for financial networks, particularly given their proven capacity to capture threshold-driven behaviors and abrupt state transitions characteristic of financial crises [22].
Denial-of-service (DoS) attacks represent a persistent and challenging threat in signal transmission processes. It is well-established that insufficient bit rates in communication channels degrade the stability of networked control systems [23], including BNs [24]. DoS attacks exploit network vulnerabilities to disrupt service availability by exhausting computational or bandwidth resources, leading to unauthorized or accidental data alteration, destruction, or loss [25]. These attacks are pervasive across many critical domains, including power grids [26], transportation networks [27], and financial networks [28]. Particularly in financial networks, the risks are exacerbated by their reliance on time-sensitive operations and the cascading effects of synchronization failures—a phenomenon in which clustered entities adopt coordinated strategies essential for maintaining market stability [29,30]. The disruption of such synchronization mechanisms during DoS events can trigger systemic failures through misaligned risk assessments and liquidity mismatches [31].
This paper addresses two fundamental challenges in securing financial BCNs under DoS attacks: (1) ensuring asymptotic cluster synchronization with known network topologies, and (2) achieving equivalent synchronization guarantees when node interaction rules are partially observable. While STP-based methods have demonstrated effectiveness for structured BCN analysis [32], the opacity of real-world financial networks necessitates innovative data-driven approaches. Recent breakthroughs in reinforcement learning (RL), particularly double deep Q-networks (DDQN) [33], offer promising solutions for control synthesis in partially observable environments [34]. However, existing RL applications to BCNs [35] have not adequately addressed the unique temporal constraints and attack resilience requirements of financial systems.
Our principal contributions are threefold:
- (1) We establish a novel STP-based representation for BCNs under DoS attacks, explicitly characterizing state transitions through attack-dependent matrix operations. This formalism extends conventional BCN models by incorporating time-varying attack impact matrices.
- (2) For systems with known topologies, we derive necessary and sufficient matrix conditions for asymptotic cluster synchronization using set-iteration methods. The proposed state feedback controller guarantees synchronization within finite time steps, even under intermittent DoS disruptions.
- (3) Addressing scenarios with unknown node interactions, we develop a dual RL architecture combining model-based policy iteration with DDQN-based exploration. This hybrid approach efficiently discovers stabilizing controllers without requiring prior knowledge of logical rules, significantly expanding BCN applicability to opaque financial networks.
The remainder of this paper is organized as follows: Section II reviews STP fundamentals and formulates the synchronization problem. Sections III details our controller design methodologies for known and unknown network structures, respectively. Section IV validates the framework through financial network simulations, followed by concluding remarks in Section V.
Notations: .
(
) is the set of non-negative (positive) integers.
is the set of all integers l satisfying
. The set of
real (column stochastic) matrices is represented by
(
).
denotes the ith column (row) of the matrix M. For any
, L is an
logical matrix if
for all
.
is the set of
logical matrices. In represents the
identity matrix.
, where
denotes the ith column of the identity matrix In. A matrix M =
can be simply denoted by
. [M]ij represents the (i,j)th entry of the matrix M.
and
called the dummy matrices.
is the
swap matrix, where
is Kronecker product.
is the power-reducing matrix.
represents the cardinal number of the set
.
is the probability of an event A;
is the conditional probability of an event A given that an event B has occurred. For two sets
and
, define
.
Preliminaries and problem formulations
In this section, some necessary preliminaries are presented, including an introduction to the STP method, formulations of a BCN under DoS attacks, and a formal statement of asymptotic cluster synchronization.
STP preliminaries
First, let , the Kronecker product [36] of A and B is defined as follows:
Besides, for two matrices and
, the Khatri-Rao product of A and B is defined as
, where
and
,
.
Then, the definition of the STP of matrices is presented as follows:
Definition 1 ([8]). Consider two real matrices and
, their STP
is defined as follows:
where is the least common multiple of n and p, and
is the Kronecker product.
Note that if q = n in Definition (1), where
is the conventional matrix multiplication. Due to this dimensional compatibility, the symbol
will be omitted in subsequent discussions for notational simplicity when no ambiguity arises. The STP method has achieved a breakthrough in the dimensionality of matrix multiplication, thereby enabling the establishment of an equivalent algebraic form of logical functions, as demonstrated by the following lemma.
Lema 1 ([8]). Given a logical function , there is a unique matrix which is called the structural matrix
such that
where are the vector form of logic variables
, respectively,
,
.
Model descriptions
A BCN with DoS attacks, comprising n state nodes and m control inputs, is mathematically described as follows:
Here, logical variables and
represent the received data and actual data of node i at time
, respectively.
is a logical function,
. Besides, the logical variable
denotes the control input at time t, where its logical relationship with the states is expressed as follows:
Here, is a logical function,
. Besides, it should be noted that yi(−1) is a predetermined value. Since this paper focuses on the analysis of global cluster synchronization, yi(−1) arbitrarily chosen from the set
for all
.
To characterize the impact of DoS attacks on the data received by system (1), the Bernoulli distribution is used. The data transmission process for each node of (1) can be described as follows:
where and
. Given a subset
, called a constrained set of the system (1), and for all
, the sequence
is modeled as a Bernoulli distributed random variable with the following probability distribution:
where and satisfies
, and the random variables
are assumed to be mutually independent for all
. One can obtain that the data of xi has been successfully transmitted at time t when
. Otherwise,
indicates that its data has been lost, and the latest received data will be used as a substitution. In addition, due to limited resources and the design of defensive measures, DoS attacks do not target all nodes in the network [37–40]. Therefore, the following set is defined as
, and for all
, the sequence
satisfies
This indicates that the nodes of (1) in are not affected by DoS attacks.
Let binary values 1 and 0 as the vectors and
, respectively. Then, consider system (1), let
,
, and
denote the vector forms of
, and ui(t) at time t, respectively. In addition, let
,
, and
. By utilizing Lemma 1, system (1) can be converted into the following equivalent algebraic form,
where , and * is the Khatri-Rao product. Here,
is the structure matrix of the logical function fi in system (1),
. In addition, control (2) can be converted into the following equivalent algebraic form,
It is the state feedback control of system (6), where is the state feedback gain matrix of (6). Here,
is the structure matrix of the logical function gi in control (2),
.
Problem formulations
Consider system (6), the transition of is made to depend stochastically on Y(t), thus forming a non-iterative system. Therefore, in order to convert system (6) to iterative form, it is necessary to analyse the properties of Y(t). First, assume that constrained set
, where
if
, then based on the construction of swap matrix, one has
. Therefore, for simplicity, let constrained set
in (6), s < n, then Y(t) can be expressed as follows:
where ,
,
=
,
,
, and
. For the sake of convenience, the symbol c is utilized to represent
in the rest of this paper, without repetition. It is easy to see that the function c has a range from 1 to
.
To address asymptotic cluster synchronization in BCNs under DoS attacks, define , then the following augmented system is obtained. The relationship between this augmented system and the original system (6) will be discussed in Lemma 2 and Section III.
where ,
, and
is defined in (8). The following result will establish the equivalence between the asymptotic stability of systems (6) and (9).
Lemma 2. Consider system (6) with a given target set , the following two statements are equivalent.
- (i) For any initial states
in system (6), there exists a control law
, which is given by
, such that
(10)
- (ii) For any initial state
in system (9), there exists a control law
, which is given by
, such that
(11)
Here,.
Proof: (Necessity) Consider a given target set , for any initial states
, the condition
implies that for any fixed
, there always exists an integer T such that for any t>T,
holds. Let
, and assume that there exists an initial state
, such that for any
, one can find a real number
satisfying the following condition,
Then, one has that
Furthermore, for the fixed real number , one can find an integer
, such that
. Since
Therefore, one has that
holds for all . However, let
Based on , one has that
Then, one can obtain that if
. This contradicts (12). Therefore, for any state
, and any fixed
, there exists an integer
and a control law
, such that
. Thus, (11) holds.
(Sufficiency) Since is equivalent to
and
, it implies that
. It follows from
and the squeeze theorem that
.
Next, the definition of asymptotic cluster synchronization with probability one for system (1) is expressed below.
Definition 2 ([41]). (CSPO) Consider system (1), whose state nodes can be divided into p clusters , such that
and
for
. System (1) is said to achieve asymptotic cluster synchronization with probability one, if there exists a state feedback control law
, which is equivalent to
, such that for each cluster
,
, one has
In addition, let .
Main results
In this section, a set-iteration method is first proposed to design the state feedback control (7) for achieving asymptotic cluster synchronization in system (1). For scenarios where the logical relationships between nodes are unknown in system (1), a DDQN algorithm is further developed to obtain the required state feedback control (7).
Set-iteration method
First, a set is constructed in following, which is the target state set for achieving asymptotic cluster synchronization in system (1) [41],
Then, in order to obtain equivalent algebraic conditions for asymptotic cluster synchronization of system (1), the following notions of control invariant subsets and largest control invariant subsets are given.
Definition 3. Consider system (9), and given a set , a subset
is a control invariant subset of
, if for any state
, there exists a control input
, such that
. In addition, the largest control invariant subset of
is the union of all control invariant subsets contained in
, and denoted as
.
Theorem 1. System (1) achieves asymptotic cluster synchronization with probability one if and only if the following matrix equation has a solution,
Here, , matrix
is the state feedback gain matrix of system (9), which is unknown, and F is defined in (9).
, where
is defined in (14),
, and
denotes the sign function.
Proof: According to the equivalent algebraic form (6) of system (1), for any initial states and feedback control law
, one has that
holds for all
and
. Therefore, system (1) achieves asymptotic cluster synchronization with probability one if and only if there exists a state feedback control law
, such that for each cluster
,
,
holds for system (6).
(Necessity) First, we proof that system (6) satisfies condition (10) with if (16) holds. We employ a proof by contradiction to demonstrate this fact. Assume that there exists an initial state of system (6)
, such that for any state feedback control law
and any time T1, there exists t>T1, one has that
. Then, there exists
, such that
. According to the definition of largest control invariant subset, one can find a real number
such that
. Let
. Based on the construction of the set
, there exist
, and
, such that
, which contradicts (16). Therefore, system (6) satisfies condition (10) with
. Based on Lemma 2, system (6) satisfies condition (11) with
. Consider the state feedback control law
in system (6), which satisfies condition (11). Then, according to [42],
is a solution to (17).
(Sufficiency) Assume that (17) has a solution, denoted by . Then, according to cite [42], system (6) satisfies condition (11) with
. Furthermore, based on Lemma (2), system (6) satisfies condition (10) with
. According to the construction of the set
, system (1) achieves asymptotic cluster synchronization with probability one. The detailed proof can be found in [41] and is omitted here.
Let , one has the following result, which proof is similar to that of [42] and is omitted here.
Theorem 2. System (1) achieves asymptotic cluster synchronization with probability one if and only if the following matrix equation has a solution,
Here, , matrix
is the state feedback gain matrix of system (9), which is unknown, and F is defined in (9).
, where
is defined in (14), and
denotes the sign function.
Based on Theorem 2, the following easily verifiable sufficient criterion can be derived to check asymptotic cluster synchronization with probability one for system (1).
Corollary 1. System (1) cannot achieve asymptotic cluster synchronization with probability one if (17) does not hold with respect to .
Proof: Assume that (17) does not hold with respect to , however system (1) achieve asymptotic cluster synchronization with probability one. Based on Theorem 2, one has that there exist a feedback gian matrix
such that for any initial state
, one has that
. Thus,
, where
. Therefore,
, where
. This demonstrates the validity of condition (17) with respect to
, which necessarily induces a contradiction.
However, solving equation (17) is challenging and often results in high computational complexity. Therefore, a set-iteration method is proposed to find a suitable matrix that satisfy equation (17).
Let , where
is defined in (14), and define
as follows:
The set ,
, constructed through the iterative process defined by (18), represents the states reachable from
in at most k steps. Based on this construction, the following result holds for all
.
Theorem 3. System (1) achieves asymptotic cluster synchronization with probability one if and only if the following conditions hold for system (9),
- (i) set
is nonempty;
- (ii) there exists a positive integer
, such that
.
Proof: (Necessity) As systems (1) and (6) are equivalent, it follows that system (6) achieves asymptotic cluster synchronization with probability one. Then, condition (i) is obvious by the construction of the set . According to Theorem 2, equation (17) has a solution, and denoted by K. Furthermore, due to the existence of K of (17), for any initial state
of system (9), there exists a trajectory of Z0 given by
Here, for any state ,
, there always exists a control
such that
, where
. Thus, one has that
.
Next, we use the converse to show that for all initial state Z0. Assume that there exists an initial state
such that
, where
. In addition, construct a following trajectory, which start from
to
,
Since , there exist two integers
in the trajectory (19), such that
, where
. Furthermore, one has that
, Then, a contradiction arises since
. Therefore, there exists an integer
, such that
.
(Sufficiency) Assume that for any , there exists an integer
such that
. In addition, let the trajectory from Z0 to
as follows:
Based on (18), for any state ,
, there always exists a control
such that
. Then, let
,
. Due to the randomness of the initial state Z0, the logical matrix
must satisfy equation (17). Therefore, based on Theorem 2, system (1) achieves asymptotic cluster synchronization with probability one.
It is worth noting that according to Theorem 3 and its proof, using the sets ,
, one can obtain the state feedback gain matrix K in system (6) that guarantees asymptotic cluster synchronization with probability one of (6). In addition, the design of matrix K is discussed below.
First, for all , let
where .
Then, according to the construction of sets ,
, and Theorem 3, the following result can be obtained.
Corollary 2. System (1) achieves asymptotic cluster synchronization with probability one if and only if the following conditions hold for system (9),
- (i) set
is nonempty;
- (ii) there exists a positive integer
, such that
.
Suppose the sets that satisfy the conditions in Corollary 2 is obtained. Then, the detailed construction of the state feedback gain matrix K for system (6) can be provided as follows:
Based on Corollary 2, one can obtain that the system (1) achieves asymptotic cluster synchronization with probability one, if and only if there exists a state feedback controller of the form given by (21).
Remark 1. The state feedback control design methodology in (21), derived from Corollary 2, can be extended to address finite-time cluster synchronization of system (1) by modifying the iterative procedure (18) as the following form, , for all
, with the initial condition
. This formulation ensures that for every state
,
, there exists a feedback control
guaranteeing that state X converges to set
with probability one. Furthermore, while the proposed control strategy (21) guarantees minimal time cluster synchronization for arbitrary initial states in the finite-time case, it does not optimize the instantaneous synchronization probability at each time t for asymptotic synchronization scenarios.
Remark 2. State-flipping control, as an effective control methodology, has been widely applied in various fields, including BNs [43–45]. State-flipping control enables the modification of individual node states, thereby altering the state transition dynamics of BNs to achieve systems’ synchronization and stabilization. However, our work focuses on designing state feedback controllers that determine the inherent control input rules of the system without disrupting the internal topological structure or dynamic transition relationships between network nodes. For future studies, if the proposed equivalent algebraic conditions for cluster synchronization under DoS attacks cannot be satisfied, adopting state-flipping control to achieve cluster synchronization under DoS attacks would represent a feasible and meaningful research direction.
Double reinforcement learning method
In this subsection, we aim to propose a model-free RL algorithm to compute the state feedback gain matrix K in system (6), which ensures that system (6) achieves asymptotic cluster synchronization with probability one.
Next, we outline how the DDQN method can be applied to address the cluster synchronization problem of system (6). In light of Lemma 2 and Theorem 2, we will proceed to design an algorithm for system (9). System (9) is a Markov decision process (MDP) model. An MDP is an mathematical framework for sequential decision-making, represented by the tuple . Here, set S represents the set of all possible states of the environment, and
in system (9). Set A denotes the set of actions, and
in system (9). The transition probability matrix P governs the rules for moving from the current state to the next state. In system (9), matrix
is a column-stochastic matrix representing transition probabilities. Specifically, each entry
corresponds to the probability of transitioning to state
from state
under an action
selected from the action set
. This is formally expressed as
. It is worth noting that, during the RL process, the transition matrix P is unknown to the agent. The immediate reward received after executing action
in state Xt is denoted by
, where rt + 1 represents the immediate reward.
RL aims to select an optimal policy that maximizes the expected sum of future rewards. The state-value function
and the action-value function
are defined as follows:
where is the discount factor,
is the state, and
is the action. To find an optimal policy, the optimal action-value function q*(x,a) is defined as
Furthermore, the optimal policy can be obtained by
The optimal action-value function q*(x,a) is estimated as the following Q-function, which is updated using the Bellman equation,
Here, is the expected reward for taking action at in state Xt, rt + 1 and Xt + 1 are the reward and state obtained after taking action at,
is the learning rate, and
is the discount factor. Additionally,
is the TD target, and
is the TD error.
Double-deep Q-learning (DDQN) is trained by minimizing the loss function defined as follows:
The parameter is updated using stochastic gradient descent,
To enhance the performance of DDQN, we employ the Prioritized Experience Replay (PER) technique [46] as the sampling method. Specifically, in the replay buffer , the probability of sampling any tuple
is defined as follows:
where ,
is a small constant, and
controls the degree of prioritization.
However, prioritized replay may introduce bias. To address this, importance sampling weights are assigned as where
is the size of the replay buffer,
is the probability of sampling the i-th experience, and
is a hyperparameter that gradually adjusts the degree of importance sampling during training to reduce bias while maintaining stability.
The state and action at time t are defined as Xt = Z(t) and at = U(t) for system (9), respectively, and a designed reward function is given by
where is defined in (14).
Next, Algorithm 1 describes the setup and training of the DDQN algorithm incorporating PER.
Algorithm 1. Cluster synchronization with probability one using DDQN.
Simulations
We consider a financial network comprising three investment institutions (A, B, and C) with coupled strategies in the renewable energy sector. Institution A operates as a subsidiary of institution B. The strategic alignment requirement between hierarchical entities necessitates that the investment strategies of A and B achieve asymptotic synchronization, while institution C, though organizationally independent from A and B, maintains bidirectional strategic interactions with them.
The investment decisions of these institutions are influenced by two external factors: government policy (u1), where u1 = 1 and u1 = 0 indicate favorable and unfavorable renewable energy policies, respectively; and bank loan interest rates (u2), where u2 = 1 and u2 = 0 represent interest rate reductions and increases, respectively.
To model the aforementioned financial network as a BCN, let ,
, denote the investment strategy of institutions A, B, and C at time
, respectively. Specifically, xi(t) = 1 and xi(t) = 0,
, represent the decisions of institutions A, B, and C to invest and not invest in the renewable energy sector, respectively.
However, the information exchange between institutions A and B is vulnerable to external adversarial attacks, such as DoS attacks, which can disrupt the exchange. This leads to non-real-time data reception between institutions, resulting in data loss. Therefore, let ,
, denote the observed strategy of institution A at time t, respectively. The data loss phenomenon follows a Bernoulli distribution, and the state observation of each institution at time t is expressed as
Here, the sequence is modeled as a Bernoulli distributed random variable with the following probability distribution,
This suggests that 0.9, 0.8, and 1 represent the probabilities of successful information transmission for institutions A, B, and C, respectively.
Furthermore, assume that the strategic interactions evolve according to the following rules.
- Institution A withholds investment (x1(t + 1) = 0) if and only if both institutions B and C are observed in non-investment states (
).
- Institution B initiates investment (x2(t + 1) = 1) if both of the following hold:
- a positive policy signal is present (u1(t) = 1), and
- its subsidiary A is observed to be investing (y1(t) = 1).
- Institution C initiates investment (x3(t + 1) = 1) if either:
- a positive policy incentive exists (u1(t) = 1), or
- both of the following are true: interest rates are favorable (u2(t) = 1), and institution A is investing (y1(t) = 1).
Then, the strategic interactions can be modeled as a BCN under DoS atacks, described as follows:
In addition, the network structure of system (22) is illustrated in Fig 1.
Here, the directed edges colored in red represent information transmission processes between nodes that are subject to DoS attacks, resulting in probabilistic data loss, while the black directed edges indicate information transmission processes that are not subject to DoS attacks.
In order to obtain the state feedback control (7) that synchronizes the investment strategies between A (x1) and B (x2), let ,
, and
,
in (22). Furthermore, let
and
, then one has the following equivalent algebraic form of (22),
Here, state transition matrix is expressed as
, where
,
, and
.
Since all nodes can be divided into two clusters and
, set
defined in (14) can be obtained as
. Furthermore, the largest control invariant set of
is
.
Then, let , based on (23) and the construction of system (9), one can obtain that
Here, the state transition matrix of (24), , is constructed in Fig 2. Besides,
.
Based on the definition of sets ,
, in (20), one can obtain that
Furthermore, one has that , where
Since , based on Corollary 2, system (22) achieves asymptotic cluster synchronization with the state feedback gain matrix K defined by (26).
For instance, according to (26), for any time , if institution A adopts an investment strategy (x1(t) = 1) while institutions B and C choose not to invest (
), and the observed strategies of all three institutions at time t–1 were non-investment (
), we may implement the following external intervention measures at time t:
- No need for favorable renewable energy policies (u1(t) = 0),
- Reduced bank loan interest rates (u2(t) = 1).
Furthermore, let ,
, N = 1000,
,
, T = 5,
, and
in Algorithm 1. Then, Figs 3 and 4 show the TD error and success rate for reaching set
, averaged over 500 experimental episodes, respectively, demonstrating the effectiveness of our proposed method. Then, the state feedback gain matrix K of system (23) can be obtained below using the DDQN method,
In order to demonstrate the advantages of the DDQN method, let ,
, N = 1000,
,
, T = 5,
, and
in Algorithm 1. Then, Figs 5 and 6 show the TD error and variance of Q value for reaching set
of the DDQN and deep Q-learning (DQN) mathods, respectively. We can observe that under this parameter configuration, DDQN method exhibits significantly faster convergence than DQN method, along with lower Q-value variance. This demonstrates the stability and effectiveness of the DDQN method.
For instance, according to (27), for any time , if all institutions A, B, and C adopts an investment strategy (xi(t) = 1,i = 1,2,3), and the observed strategies of all three institutions at time t–1 were investment (
), we may implement the following external intervention measures at time t:
- Implement favorable renewable energy policies (u1(t) = 1),
- Reduced bank loan interest rates (u2(t) = 1).
Conclusion
This paper has investigated the asymptotic cluster synchronization of BCNs under DoS attacks, where each state node has been subject to random data loss governed by a Bernoulli distribution. The algebraic representation of BCNs under DoS attacks has been established through the STP method, enabling a systematic analysis framework. Necessary and sufficient algebraic conditions for achieving asymptotic cluster synchronization under DoS attacks have been derived. For scenarios with known and unknown system models, suitable state feedback controllers ensuring asymptotic cluster synchronization have been successfully designed. The set-iteration method has addressed the model-based case, while a DDQN approach has been developed for the model-free case. Furthermore, a DDQN algorithm has been designed to identify an appropriate state feedback control policy. The efficacy of these methods has been validated through a numerical simulation, demonstrating robust cluster synchronization under DoS attacks.
References
- 1. Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol. 1969;22(3):437–67. pmid:5803332
- 2. Huang C, Wang W, Lu J, Kurths J. Asymptotic stability of boolean networks with multiple missing data. IEEE Trans Automat Contr. 2021;66(12):6093–9.
- 3. Wang Y, Li B, Pan Q, Zhong J, Li N. Asymptotic synchronization in coupled Boolean and probabilistic Boolean networks with delays. Nonl Anal: Hybrid Syst. 2025;55:101552.
- 4. Liu F, Sun Y, Zhang C, Xu L, Zhang H. Set stability and synchronization of generalized asynchronous probabilistic Boolean networks with impulsive effects. PLoS One. 2025;20(2):e0318038. pmid:39937801
- 5. Eduati F, Corradin A, Di Camillo B, Toffolo G. A Boolean approach to linear prediction for signaling network modeling. PLoS One. 2010;5(9):e12789. pmid:20862273
- 6.
Rivera-Torres P, Llanes Santiago O. Fault detection and isolation in smart grid devices using probabilistic boolean networks. Computational intelligence in emerging technologies for engineering applications. 2020. p. 165–85.
- 7. Akutsu T, Hayashida M, Ching W-K, Ng MK. Control of Boolean networks: hardness results and algorithms for tree structured networks. J Theor Biol. 2007;244(4):670–9. pmid:17069859
- 8.
Cheng D, Qi H, Li Z. Analysis and control of Boolean networks: a semi-tensor product approach. Springer. 2010.
- 9. Wang Y, Zhong J, Pan Q, Li N. Minimal pinning control for set stability of Boolean networks. Appl Math Comput. 2024;465:128433.
- 10. Chen H, Wang Z, Shen B, Liang J. Model evaluation of the stochastic boolean control networks. IEEE Trans Automat Contr. 2022;67(8):4146–53.
- 11. Chen H, Wang Z, Liang J, Li M. State estimation for stochastic time-varying boolean networks. IEEE Trans Automat Contr. 2020;65(12):5480–7.
- 12. Chen H, Wang Z, Shen B, Liang J. Distributed recursive filtering over sensor networks with nonlogarithmic sensor resolution. IEEE Trans Automat Contr. 2022;67(10):5408–15.
- 13. Li Y, Li H, Ding X. Set stability of switched delayed logical networks with application to finite-field consensus. Automatica. 2020;113:108768.
- 14. Ma Z, Wang ZJ, McKeown MJ. Probabilistic Boolean network analysis of brain connectivity in Parkinson’s disease. IEEE J Sel Top Signal Process. 2008;2(6):975–85.
- 15.
Kabir MH, Hoque MR, Koo B-J, Yang S-H. Mathematical modelling of a context-aware system based on Boolean control networks for smart home. In: The 18th IEEE International Symposium on Consumer Electronics (ISCE 2014). 2014. p. 1–2. https://doi.org/10.1109/isce.2014.6884406
- 16.
Diveev AI, Sofronova EA. Synthesis of intelligent control of traffic flows in urban roads based on the logical network operator method. In: 2013 European Control Conference (ECC), 2013. p. 3512–7. https://doi.org/10.23919/ecc.2013.6669696
- 17.
Roli A, Villani M, Serra R, Benedettini S, Pinciroli C, Birattari M. Dynamical properties of artificially evolved Boolean network robots. In: AI*IA 2015 Advances in Artificial Intelligence; 2015. p. 45–57.
- 18.
Makarov SI, Boldyrev MA. Application of bool variables in analysis of risks in the bond market. Digital Technologies in the New Socio-Economic Reality. 2022. p. 479–88.
- 19. Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang D. Complex networks: structure and dynamics. Phys Rep. 2006;424(4–5):175–308.
- 20. Allen F, Gale D. Optimal currency crises. Carnegie-Rochester Conf Ser Publ Policy. 2000;53(1):177–230.
- 21. Xu X, Zeng Z, Xu J, Zhang M. Fuzzy dynamical system scenario simulation-based cross-border financial contagion analysis: a perspective from international capital flows. IEEE Trans Fuzzy Syst. 2017;25(2):439–59.
- 22. Zhao Z, Chen D, Wang L, Han C. Credit risk diffusion in supply chain finance: a complex networks perspective. Sustainability. 2018;10(12):4608.
- 23. Feng S, Cetinkaya A, Ishii H, Tesi P, Persis CD. Networked control under DoS attacks: tradeoffs between resilience and data rate. IEEE Trans Automat Contr. 2021;66(1):460–7.
- 24.
Zhu S, Lu J, Cao J, Lin L, Lam J, Ng M, et al. Undetectable attacks on Boolean networks. In: 2023 62nd IEEE Conference on Decision and Control (CDC). 2023. p. 1698–703. https://doi.org/10.1109/cdc49753.2023.10383321
- 25. Wang Y-W, Zeng Z-H, Liu X-K, Liu Z-W. Input-to-state stability of switched linear systems with unstabilizable modes under DoS attacks. Automatica. 2022;146:110607.
- 26. Mölsä J. Mitigating denial of service attacks: a tutorial. JCS. 2005;13(6):807–37.
- 27. Abdollahi Biron Z, Dey S, Pisu P. Real-time detection and estimation of denial of service attack in connected vehicle systems. IEEE Trans Intell Transport Syst. 2018;19(12):3893–902.
- 28. Falowo OI, Ozer M, Li C, Abdo JB. Evolving malware and DDoS attacks: decadal longitudinal study. IEEE Access. 2024;12:39221–37.
- 29. Kochemazov S, Semenov A. Using synchronous Boolean networks to model several phenomena of collective behavior. PLoS One. 2014;9(12):e115156. pmid:25526612
- 30. Gualdi S, Cimini G, Primicerio K, Di Clemente R, Challet D. Statistically validated network of portfolio overlaps and systemic risk. Sci Rep. 2016;6:39467. pmid:28000764
- 31. Musciotto F, Marotta L, Piilo J, Mantegna RN. Long-term ecology of investors in a financial market. Palgrave Commun. 2018;4(1).
- 32. Mu T, Feng J-E, Wang B, Jia Y. Identification of Boolean control networks with time delay. ISA Trans. 2024;144:113–23. pmid:37865590
- 33. Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. AAAI. 2016;30(1):1–7.
- 34. Acernese A, Yerudkar A, Glielmo L, Vecchio CD. Double deep-Q learning-based output tracking of probabilistic boolean control networks. IEEE Access. 2020;8:199254–65.
- 35. Moschoyiannis S, Chatzaroulas E, Šliogeris V, Wu Y. Deep reinforcement learning for stabilization of large-scale probabilistic Boolean networks. IEEE Trans Control Netw Syst. 2023;10(3):1412–23.
- 36. Brewer J. Kronecker products and matrix calculus in system theory. IEEE Trans Circuits Syst. 1978;25(9):772–81.
- 37. Mehmood S, Amin R, Mustafa J, Hussain M, Alsubaei FS, Zakaria MD. Distributed Denial of Services (DDoS) attack detection in SDN using optimizer-equipped CNN-MLP. PLoS One. 2025;20(1):e0312425. pmid:39869573
- 38. Salim MM, Rathore S, Park JH. Distributed denial of service attacks and its defenses in IoT: a survey. J Supercomput. 2019;76(7):5320–63.
- 39. Patil S, Chaudhari S. DoS attack prevention technique in wireless sensor networks. Procedia Comput Sci. 2016;79:715–21.
- 40. Salem FM, Youssef H, Ali I, Haggag A. A variable-trust threshold-based approach for DDOS attack mitigation in software defined networks. PLoS One. 2022;17(8):e0273681. pmid:36037194
- 41. Ren Y, Lu J, Liu Y, Shi K. Cluster synchronization of Boolean networks under probabilistic function perturbation. IEEE Trans Circuits Syst II. 2022;69(2):504–8.
- 42. Guo Y, Zhou R, Wu Y, Gui W, Yang C. Stability and set stability in distribution of probabilistic Boolean networks. IEEE Trans Automat Control. 2018;64(2):736–42.
- 43. Ni J, Tang Y, Li F. Minimum-cost state-flipped control for reachability of Boolean control networks using reinforcement learning. IEEE Trans Cybern. 2024;54(11):7103–15. pmid:39288054
- 44. Du L, Zhang Z, Xia C. A state-flipped approach to complete synchronization of Boolean networks. Appl Math Comput. 2023;443:127788.
- 45. Du L, Zhang Z, Xia C. A node-pinning and state-flipped approach to partial synchronization of Boolean networks. Nonl Anal: Hybrid Syst. 2024;53:101501.
- 46.
Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. International Conference on Learning Representations. 2016; p. 1–23.