Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Double reinforcement learning for cluster synchronization of Boolean control networks under denial of service attacks

  • Wanqiu Deng,

    Roles Methodology, Writing – original draft

    Affiliation School of Management Science and Engineering, Southwestern University of Finance and Economics, Chengdu, China

  • Chi Huang,

    Roles Conceptualization, Funding acquisition

    Affiliations School of Computing and Artificial Intelligence, Southwestern University of Finance and Economics, Chengdu, China, Engineering Research Center of Intelligent Finance, Ministry of Education, Southwestern University of Finance and Economics, Chengdu, China

  • Qinghong Shuai

    Roles Supervision

    shuaiqh@swufe.edu.cn

    Affiliation School of Management Science and Engineering, Southwestern University of Finance and Economics, Chengdu, China

Abstract

This paper investigates the asymptotic cluster synchronization of Boolean control networks (BCNs) under denial-of-service (DoS) attacks, where each state node in the network experiences random data loss following a Bernoulli distribution. First, the algebraic representation of BCNs under DoS attacks is established using the semi-tensor product (STP) of matrices. Using matrix-based methods, some necessary and sufficient algebraic conditions for BCNs to achieve asymptotic cluster synchronization under DoS attacks are derived. For both model-based and model-free cases, appropriate state feedback controllers guaranteeing asymptotic cluster synchronization of BCNs are obtained through set-iteration and double-deep Q-network (DDQN) methods, respectively. Besides, a double reinforcement learning algorithm is designed to identify suitable state feedback controllers. Finally, a numerical example is provided to demonstrate the effectiveness of the proposed approach.

Introduction

Boolean Networks (BNs), first introduced by Kauffman in 1969 as fundamental computational models for gene regulatory networks [1], have evolved into a versatile paradigm for analyzing complex dynamical systems. By abstracting system components as binary-state nodes (“on"/“off") governed by logical interaction rules, BNs achieve remarkable balance between computational tractability and biological plausibility [2,3]. This unique characteristic has fueled their widespread adoption across diverse domains, from cellular differentiation modeling [4,5] to power grid stability analysis [6]. The subsequent development of Boolean control networks (BCNs) by Akutsu et al. [7] introduced external control inputs, creating a powerful framework for studying targeted intervention strategies in networked systems. Recent advances in semi-tensor product (STP) theory [8] have further propelled BCNs research by enabling rigorous algebraic treatment of logical dynamics [911], as evidenced by emerging applications in smart grids [6], filter design [12] and multi-agent systems [13].

Due to their simple yet efficient modeling characteristics in gene regulatory networks [1,7], BNs and BCNs have been widely applied in real-world applications such as smart healthcare, intelligent home automation, smart transportation, and robotics. Particularly, therapeutic interventions for Parkinson’s disease [14], mathematical formulation of context-aware systems [15], optimal control design for urban traffic flow management [16], and robotic control architectures [17] have demonstrated based on BNs or/and BCNs methods. Moreover, in recent years, the financial domain has emerged as a particularly compelling application scenario for BNs and BCNs modeling [18]. Modern financial ecosystems comprise intricately interconnected entities—banks, investment firms, clearinghouses, and digital platforms—that exhibit nonlinear interdependencies akin to biological networks [19]. Network-based analyses have successfully captured phenomena like risk contagion in interbank markets [20] and crisis propagation dynamics [21]. However, current approaches predominantly employ continuous-variable models that may obscure essential discrete decision-making processes. This gap motivates our investigation of BCNs as a novel modeling paradigm for financial networks, particularly given their proven capacity to capture threshold-driven behaviors and abrupt state transitions characteristic of financial crises [22].

Denial-of-service (DoS) attacks represent a persistent and challenging threat in signal transmission processes. It is well-established that insufficient bit rates in communication channels degrade the stability of networked control systems [23], including BNs [24]. DoS attacks exploit network vulnerabilities to disrupt service availability by exhausting computational or bandwidth resources, leading to unauthorized or accidental data alteration, destruction, or loss [25]. These attacks are pervasive across many critical domains, including power grids [26], transportation networks [27], and financial networks [28]. Particularly in financial networks, the risks are exacerbated by their reliance on time-sensitive operations and the cascading effects of synchronization failures—a phenomenon in which clustered entities adopt coordinated strategies essential for maintaining market stability [29,30]. The disruption of such synchronization mechanisms during DoS events can trigger systemic failures through misaligned risk assessments and liquidity mismatches [31].

This paper addresses two fundamental challenges in securing financial BCNs under DoS attacks: (1) ensuring asymptotic cluster synchronization with known network topologies, and (2) achieving equivalent synchronization guarantees when node interaction rules are partially observable. While STP-based methods have demonstrated effectiveness for structured BCN analysis [32], the opacity of real-world financial networks necessitates innovative data-driven approaches. Recent breakthroughs in reinforcement learning (RL), particularly double deep Q-networks (DDQN) [33], offer promising solutions for control synthesis in partially observable environments [34]. However, existing RL applications to BCNs [35] have not adequately addressed the unique temporal constraints and attack resilience requirements of financial systems.

Our principal contributions are threefold:

  1. (1) We establish a novel STP-based representation for BCNs under DoS attacks, explicitly characterizing state transitions through attack-dependent matrix operations. This formalism extends conventional BCN models by incorporating time-varying attack impact matrices.
  2. (2) For systems with known topologies, we derive necessary and sufficient matrix conditions for asymptotic cluster synchronization using set-iteration methods. The proposed state feedback controller guarantees synchronization within finite time steps, even under intermittent DoS disruptions.
  3. (3) Addressing scenarios with unknown node interactions, we develop a dual RL architecture combining model-based policy iteration with DDQN-based exploration. This hybrid approach efficiently discovers stabilizing controllers without requiring prior knowledge of logical rules, significantly expanding BCN applicability to opaque financial networks.

The remainder of this paper is organized as follows: Section II reviews STP fundamentals and formulates the synchronization problem. Sections III details our controller design methodologies for known and unknown network structures, respectively. Section IV validates the framework through financial network simulations, followed by concluding remarks in Section V.

Notations: . () is the set of non-negative (positive) integers. is the set of all integers l satisfying . The set of real (column stochastic) matrices is represented by (). denotes the ith column (row) of the matrix M. For any , L is an logical matrix if for all . is the set of logical matrices. In represents the identity matrix. , where denotes the ith column of the identity matrix In. A matrix M = can be simply denoted by . [M]ij represents the (i,j)th entry of the matrix M. and called the dummy matrices. is the swap matrix, where is Kronecker product. is the power-reducing matrix. represents the cardinal number of the set . is the probability of an event A; is the conditional probability of an event A given that an event B has occurred. For two sets and , define .

Preliminaries and problem formulations

In this section, some necessary preliminaries are presented, including an introduction to the STP method, formulations of a BCN under DoS attacks, and a formal statement of asymptotic cluster synchronization.

STP preliminaries

First, let , the Kronecker product [36] of A and B is defined as follows:

Besides, for two matrices and , the Khatri-Rao product of A and B is defined as , where and , .

Then, the definition of the STP of matrices is presented as follows:

Definition 1 ([8]). Consider two real matrices and , their STP is defined as follows:

where is the least common multiple of n and p, and is the Kronecker product.

Note that if q = n in Definition (1), where is the conventional matrix multiplication. Due to this dimensional compatibility, the symbol will be omitted in subsequent discussions for notational simplicity when no ambiguity arises. The STP method has achieved a breakthrough in the dimensionality of matrix multiplication, thereby enabling the establishment of an equivalent algebraic form of logical functions, as demonstrated by the following lemma.

Lema 1 ([8]). Given a logical function , there is a unique matrix which is called the structural matrix such that

where are the vector form of logic variables , respectively, , .

Model descriptions

A BCN with DoS attacks, comprising n state nodes and m control inputs, is mathematically described as follows:

(1)

Here, logical variables and represent the received data and actual data of node i at time , respectively. is a logical function, . Besides, the logical variable denotes the control input at time t, where its logical relationship with the states is expressed as follows:

(2)

Here, is a logical function, . Besides, it should be noted that yi(−1) is a predetermined value. Since this paper focuses on the analysis of global cluster synchronization, yi(−1) arbitrarily chosen from the set for all .

To characterize the impact of DoS attacks on the data received by system (1), the Bernoulli distribution is used. The data transmission process for each node of (1) can be described as follows:

(3)

where and . Given a subset , called a constrained set of the system (1), and for all , the sequence is modeled as a Bernoulli distributed random variable with the following probability distribution:

(4)

where and satisfies , and the random variables are assumed to be mutually independent for all . One can obtain that the data of xi has been successfully transmitted at time t when . Otherwise, indicates that its data has been lost, and the latest received data will be used as a substitution. In addition, due to limited resources and the design of defensive measures, DoS attacks do not target all nodes in the network [3740]. Therefore, the following set is defined as , and for all , the sequence satisfies

(5)

This indicates that the nodes of (1) in are not affected by DoS attacks.

Let binary values 1 and 0 as the vectors and , respectively. Then, consider system (1), let , , and denote the vector forms of , and ui(t) at time t, respectively. In addition, let , , and . By utilizing Lemma 1, system (1) can be converted into the following equivalent algebraic form,

(6)

where , and * is the Khatri-Rao product. Here, is the structure matrix of the logical function fi in system (1), . In addition, control (2) can be converted into the following equivalent algebraic form,

(7)

It is the state feedback control of system (6), where is the state feedback gain matrix of (6). Here, is the structure matrix of the logical function gi in control (2), .

Problem formulations

Consider system (6), the transition of is made to depend stochastically on Y(t), thus forming a non-iterative system. Therefore, in order to convert system (6) to iterative form, it is necessary to analyse the properties of Y(t). First, assume that constrained set , where if , then based on the construction of swap matrix, one has . Therefore, for simplicity, let constrained set in (6), s < n, then Y(t) can be expressed as follows:

(8)

where , , = , , , and . For the sake of convenience, the symbol c is utilized to represent in the rest of this paper, without repetition. It is easy to see that the function c has a range from 1 to .

To address asymptotic cluster synchronization in BCNs under DoS attacks, define , then the following augmented system is obtained. The relationship between this augmented system and the original system (6) will be discussed in Lemma 2 and Section III.

(9)

where , , and is defined in (8). The following result will establish the equivalence between the asymptotic stability of systems (6) and (9).

Lemma 2. Consider system (6) with a given target set , the following two statements are equivalent.

  1. (i) For any initial states in system (6), there exists a control law , which is given by , such that(10)
  2. (ii) For any initial state in system (9), there exists a control law , which is given by , such that(11)
    Here, .

Proof: (Necessity) Consider a given target set , for any initial states , the condition implies that for any fixed , there always exists an integer T such that for any t>T, holds. Let , and assume that there exists an initial state , such that for any , one can find a real number satisfying the following condition,

Then, one has that

Furthermore, for the fixed real number , one can find an integer , such that . Since

Therefore, one has that

(12)

holds for all . However, let

Based on , one has that

Then, one can obtain that if . This contradicts (12). Therefore, for any state , and any fixed , there exists an integer and a control law , such that . Thus, (11) holds.

(Sufficiency) Since is equivalent to and , it implies that . It follows from and the squeeze theorem that .

Next, the definition of asymptotic cluster synchronization with probability one for system (1) is expressed below.

Definition 2 ([41]). (CSPO) Consider system (1), whose state nodes can be divided into p clusters , such that and for . System (1) is said to achieve asymptotic cluster synchronization with probability one, if there exists a state feedback control law , which is equivalent to , such that for each cluster , , one has

(13)

In addition, let .

Main results

In this section, a set-iteration method is first proposed to design the state feedback control (7) for achieving asymptotic cluster synchronization in system (1). For scenarios where the logical relationships between nodes are unknown in system (1), a DDQN algorithm is further developed to obtain the required state feedback control (7).

Set-iteration method

First, a set is constructed in following, which is the target state set for achieving asymptotic cluster synchronization in system (1) [41],

(14)

Then, in order to obtain equivalent algebraic conditions for asymptotic cluster synchronization of system (1), the following notions of control invariant subsets and largest control invariant subsets are given.

Definition 3. Consider system (9), and given a set , a subset is a control invariant subset of , if for any state , there exists a control input , such that . In addition, the largest control invariant subset of is the union of all control invariant subsets contained in , and denoted as .

Theorem 1. System (1) achieves asymptotic cluster synchronization with probability one if and only if the following matrix equation has a solution,

(15)

Here, , matrix is the state feedback gain matrix of system (9), which is unknown, and F is defined in (9). , where is defined in (14), , and denotes the sign function.

Proof: According to the equivalent algebraic form (6) of system (1), for any initial states and feedback control law , one has that holds for all and . Therefore, system (1) achieves asymptotic cluster synchronization with probability one if and only if there exists a state feedback control law , such that for each cluster , ,

(16)

holds for system (6).

(Necessity) First, we proof that system (6) satisfies condition (10) with if (16) holds. We employ a proof by contradiction to demonstrate this fact. Assume that there exists an initial state of system (6) , such that for any state feedback control law and any time T1, there exists t>T1, one has that . Then, there exists , such that . According to the definition of largest control invariant subset, one can find a real number such that . Let . Based on the construction of the set , there exist , and , such that , which contradicts (16). Therefore, system (6) satisfies condition (10) with . Based on Lemma 2, system (6) satisfies condition (11) with . Consider the state feedback control law in system (6), which satisfies condition (11). Then, according to [42], is a solution to (17).

(Sufficiency) Assume that (17) has a solution, denoted by . Then, according to cite [42], system (6) satisfies condition (11) with . Furthermore, based on Lemma (2), system (6) satisfies condition (10) with . According to the construction of the set , system (1) achieves asymptotic cluster synchronization with probability one. The detailed proof can be found in [41] and is omitted here.

Let , one has the following result, which proof is similar to that of [42] and is omitted here.

Theorem 2. System (1) achieves asymptotic cluster synchronization with probability one if and only if the following matrix equation has a solution,

(17)

Here, , matrix is the state feedback gain matrix of system (9), which is unknown, and F is defined in (9). , where is defined in (14), and denotes the sign function.

Based on Theorem 2, the following easily verifiable sufficient criterion can be derived to check asymptotic cluster synchronization with probability one for system (1).

Corollary 1. System (1) cannot achieve asymptotic cluster synchronization with probability one if (17) does not hold with respect to .

Proof: Assume that (17) does not hold with respect to , however system (1) achieve asymptotic cluster synchronization with probability one. Based on Theorem 2, one has that there exist a feedback gian matrix such that for any initial state , one has that . Thus, , where . Therefore, , where . This demonstrates the validity of condition (17) with respect to , which necessarily induces a contradiction.

However, solving equation (17) is challenging and often results in high computational complexity. Therefore, a set-iteration method is proposed to find a suitable matrix that satisfy equation (17).

Let , where is defined in (14), and define as follows:

(18)

The set , , constructed through the iterative process defined by (18), represents the states reachable from in at most k steps. Based on this construction, the following result holds for all .

Theorem 3. System (1) achieves asymptotic cluster synchronization with probability one if and only if the following conditions hold for system (9),

  1. (i) set is nonempty;
  2. (ii) there exists a positive integer , such that .

Proof: (Necessity) As systems (1) and (6) are equivalent, it follows that system (6) achieves asymptotic cluster synchronization with probability one. Then, condition (i) is obvious by the construction of the set . According to Theorem 2, equation (17) has a solution, and denoted by K. Furthermore, due to the existence of K of (17), for any initial state of system (9), there exists a trajectory of Z0 given by

Here, for any state , , there always exists a control such that , where . Thus, one has that .

Next, we use the converse to show that for all initial state Z0. Assume that there exists an initial state such that , where . In addition, construct a following trajectory, which start from to ,

(19)

Since , there exist two integers in the trajectory (19), such that , where . Furthermore, one has that , Then, a contradiction arises since . Therefore, there exists an integer , such that .

(Sufficiency) Assume that for any , there exists an integer such that . In addition, let the trajectory from Z0 to as follows:

Based on (18), for any state , , there always exists a control such that . Then, let , . Due to the randomness of the initial state Z0, the logical matrix must satisfy equation (17). Therefore, based on Theorem 2, system (1) achieves asymptotic cluster synchronization with probability one.

It is worth noting that according to Theorem 3 and its proof, using the sets , , one can obtain the state feedback gain matrix K in system (6) that guarantees asymptotic cluster synchronization with probability one of (6). In addition, the design of matrix K is discussed below.

First, for all , let

(20)

where .

Then, according to the construction of sets , , and Theorem 3, the following result can be obtained.

Corollary 2. System (1) achieves asymptotic cluster synchronization with probability one if and only if the following conditions hold for system (9),

  1. (i) set is nonempty;
  2. (ii) there exists a positive integer , such that .

Suppose the sets that satisfy the conditions in Corollary 2 is obtained. Then, the detailed construction of the state feedback gain matrix K for system (6) can be provided as follows:

(21)

Based on Corollary 2, one can obtain that the system (1) achieves asymptotic cluster synchronization with probability one, if and only if there exists a state feedback controller of the form given by (21).

Remark 1. The state feedback control design methodology in (21), derived from Corollary 2, can be extended to address finite-time cluster synchronization of system (1) by modifying the iterative procedure (18) as the following form, , for all , with the initial condition . This formulation ensures that for every state , , there exists a feedback control guaranteeing that state X converges to set with probability one. Furthermore, while the proposed control strategy (21) guarantees minimal time cluster synchronization for arbitrary initial states in the finite-time case, it does not optimize the instantaneous synchronization probability at each time t for asymptotic synchronization scenarios.

Remark 2. State-flipping control, as an effective control methodology, has been widely applied in various fields, including BNs [4345]. State-flipping control enables the modification of individual node states, thereby altering the state transition dynamics of BNs to achieve systems’ synchronization and stabilization. However, our work focuses on designing state feedback controllers that determine the inherent control input rules of the system without disrupting the internal topological structure or dynamic transition relationships between network nodes. For future studies, if the proposed equivalent algebraic conditions for cluster synchronization under DoS attacks cannot be satisfied, adopting state-flipping control to achieve cluster synchronization under DoS attacks would represent a feasible and meaningful research direction.

Double reinforcement learning method

In this subsection, we aim to propose a model-free RL algorithm to compute the state feedback gain matrix K in system (6), which ensures that system (6) achieves asymptotic cluster synchronization with probability one.

Next, we outline how the DDQN method can be applied to address the cluster synchronization problem of system (6). In light of Lemma 2 and Theorem 2, we will proceed to design an algorithm for system (9). System (9) is a Markov decision process (MDP) model. An MDP is an mathematical framework for sequential decision-making, represented by the tuple . Here, set S represents the set of all possible states of the environment, and in system (9). Set A denotes the set of actions, and in system (9). The transition probability matrix P governs the rules for moving from the current state to the next state. In system (9), matrix is a column-stochastic matrix representing transition probabilities. Specifically, each entry corresponds to the probability of transitioning to state from state under an action selected from the action set . This is formally expressed as . It is worth noting that, during the RL process, the transition matrix P is unknown to the agent. The immediate reward received after executing action in state Xt is denoted by , where rt + 1 represents the immediate reward.

RL aims to select an optimal policy that maximizes the expected sum of future rewards. The state-value function and the action-value function are defined as follows:

where is the discount factor, is the state, and is the action. To find an optimal policy, the optimal action-value function q*(x,a) is defined as

Furthermore, the optimal policy can be obtained by

The optimal action-value function q*(x,a) is estimated as the following Q-function, which is updated using the Bellman equation,

Here, is the expected reward for taking action at in state Xt, rt + 1 and Xt + 1 are the reward and state obtained after taking action at, is the learning rate, and is the discount factor. Additionally, is the TD target, and is the TD error.

Double-deep Q-learning (DDQN) is trained by minimizing the loss function defined as follows:

The parameter is updated using stochastic gradient descent,

To enhance the performance of DDQN, we employ the Prioritized Experience Replay (PER) technique [46] as the sampling method. Specifically, in the replay buffer , the probability of sampling any tuple is defined as follows:

where , is a small constant, and controls the degree of prioritization.

However, prioritized replay may introduce bias. To address this, importance sampling weights are assigned as where is the size of the replay buffer, is the probability of sampling the i-th experience, and is a hyperparameter that gradually adjusts the degree of importance sampling during training to reduce bias while maintaining stability.

The state and action at time t are defined as Xt = Z(t) and at = U(t) for system (9), respectively, and a designed reward function is given by

where is defined in (14).

Next, Algorithm 1 describes the setup and training of the DDQN algorithm incorporating PER.

Algorithm 1. Cluster synchronization with probability one using DDQN.

Simulations

We consider a financial network comprising three investment institutions (A, B, and C) with coupled strategies in the renewable energy sector. Institution A operates as a subsidiary of institution B. The strategic alignment requirement between hierarchical entities necessitates that the investment strategies of A and B achieve asymptotic synchronization, while institution C, though organizationally independent from A and B, maintains bidirectional strategic interactions with them.

The investment decisions of these institutions are influenced by two external factors: government policy (u1), where u1 = 1 and u1 = 0 indicate favorable and unfavorable renewable energy policies, respectively; and bank loan interest rates (u2), where u2 = 1 and u2 = 0 represent interest rate reductions and increases, respectively.

To model the aforementioned financial network as a BCN, let , , denote the investment strategy of institutions A, B, and C at time , respectively. Specifically, xi(t) = 1 and xi(t) = 0, , represent the decisions of institutions A, B, and C to invest and not invest in the renewable energy sector, respectively.

However, the information exchange between institutions A and B is vulnerable to external adversarial attacks, such as DoS attacks, which can disrupt the exchange. This leads to non-real-time data reception between institutions, resulting in data loss. Therefore, let , , denote the observed strategy of institution A at time t, respectively. The data loss phenomenon follows a Bernoulli distribution, and the state observation of each institution at time t is expressed as

Here, the sequence is modeled as a Bernoulli distributed random variable with the following probability distribution,

This suggests that 0.9, 0.8, and 1 represent the probabilities of successful information transmission for institutions A, B, and C, respectively.

Furthermore, assume that the strategic interactions evolve according to the following rules.

  • Institution A withholds investment (x1(t + 1) = 0) if and only if both institutions B and C are observed in non-investment states ().
  • Institution B initiates investment (x2(t + 1) = 1) if both of the following hold:
    • a positive policy signal is present (u1(t) = 1), and
    • its subsidiary A is observed to be investing (y1(t) = 1).
  • Institution C initiates investment (x3(t + 1) = 1) if either:
    • a positive policy incentive exists (u1(t) = 1), or
    • both of the following are true: interest rates are favorable (u2(t) = 1), and institution A is investing (y1(t) = 1).

Then, the strategic interactions can be modeled as a BCN under DoS atacks, described as follows:

(22)

In addition, the network structure of system (22) is illustrated in Fig 1.

thumbnail
Fig 1. The network structure of system (22).

Here, the directed edges colored in red represent information transmission processes between nodes that are subject to DoS attacks, resulting in probabilistic data loss, while the black directed edges indicate information transmission processes that are not subject to DoS attacks.

https://doi.org/10.1371/journal.pone.0327252.g001

In order to obtain the state feedback control (7) that synchronizes the investment strategies between A (x1) and B (x2), let , , and , in (22). Furthermore, let and , then one has the following equivalent algebraic form of (22),

(23)

Here, state transition matrix is expressed as , where , , and .

Since all nodes can be divided into two clusters and , set defined in (14) can be obtained as . Furthermore, the largest control invariant set of is .

Then, let , based on (23) and the construction of system (9), one can obtain that

(24)

Here, the state transition matrix of (24), , is constructed in Fig 2. Besides, .

Based on the definition of sets , , in (20), one can obtain that

(25)

Furthermore, one has that , where

(26)

Since , based on Corollary 2, system (22) achieves asymptotic cluster synchronization with the state feedback gain matrix K defined by (26).

For instance, according to (26), for any time , if institution A adopts an investment strategy (x1(t) = 1) while institutions B and C choose not to invest (), and the observed strategies of all three institutions at time t–1 were non-investment (), we may implement the following external intervention measures at time t:

  • No need for favorable renewable energy policies (u1(t) = 0),
  • Reduced bank loan interest rates (u2(t) = 1).

Furthermore, let , , N = 1000, , , T = 5, , and in Algorithm 1. Then, Figs 3 and 4 show the TD error and success rate for reaching set , averaged over 500 experimental episodes, respectively, demonstrating the effectiveness of our proposed method. Then, the state feedback gain matrix K of system (23) can be obtained below using the DDQN method,

(27)

In order to demonstrate the advantages of the DDQN method, let , , N = 1000, , , T = 5, , and in Algorithm 1. Then, Figs 5 and 6 show the TD error and variance of Q value for reaching set of the DDQN and deep Q-learning (DQN) mathods, respectively. We can observe that under this parameter configuration, DDQN method exhibits significantly faster convergence than DQN method, along with lower Q-value variance. This demonstrates the stability and effectiveness of the DDQN method.

thumbnail
Fig 5. The average TD error across episodes of DDQN and DQN methods.

https://doi.org/10.1371/journal.pone.0327252.g005

thumbnail
Fig 6. The average Q value variance across episodes of DDQN and DQN methods.

https://doi.org/10.1371/journal.pone.0327252.g006

For instance, according to (27), for any time , if all institutions A, B, and C adopts an investment strategy (xi(t) = 1,i = 1,2,3), and the observed strategies of all three institutions at time t–1 were investment (), we may implement the following external intervention measures at time t:

  • Implement favorable renewable energy policies (u1(t) = 1),
  • Reduced bank loan interest rates (u2(t) = 1).

Conclusion

This paper has investigated the asymptotic cluster synchronization of BCNs under DoS attacks, where each state node has been subject to random data loss governed by a Bernoulli distribution. The algebraic representation of BCNs under DoS attacks has been established through the STP method, enabling a systematic analysis framework. Necessary and sufficient algebraic conditions for achieving asymptotic cluster synchronization under DoS attacks have been derived. For scenarios with known and unknown system models, suitable state feedback controllers ensuring asymptotic cluster synchronization have been successfully designed. The set-iteration method has addressed the model-based case, while a DDQN approach has been developed for the model-free case. Furthermore, a DDQN algorithm has been designed to identify an appropriate state feedback control policy. The efficacy of these methods has been validated through a numerical simulation, demonstrating robust cluster synchronization under DoS attacks.

References

  1. 1. Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol. 1969;22(3):437–67. pmid:5803332
  2. 2. Huang C, Wang W, Lu J, Kurths J. Asymptotic stability of boolean networks with multiple missing data. IEEE Trans Automat Contr. 2021;66(12):6093–9.
  3. 3. Wang Y, Li B, Pan Q, Zhong J, Li N. Asymptotic synchronization in coupled Boolean and probabilistic Boolean networks with delays. Nonl Anal: Hybrid Syst. 2025;55:101552.
  4. 4. Liu F, Sun Y, Zhang C, Xu L, Zhang H. Set stability and synchronization of generalized asynchronous probabilistic Boolean networks with impulsive effects. PLoS One. 2025;20(2):e0318038. pmid:39937801
  5. 5. Eduati F, Corradin A, Di Camillo B, Toffolo G. A Boolean approach to linear prediction for signaling network modeling. PLoS One. 2010;5(9):e12789. pmid:20862273
  6. 6. Rivera-Torres P, Llanes Santiago O. Fault detection and isolation in smart grid devices using probabilistic boolean networks. Computational intelligence in emerging technologies for engineering applications. 2020. p. 165–85.
  7. 7. Akutsu T, Hayashida M, Ching W-K, Ng MK. Control of Boolean networks: hardness results and algorithms for tree structured networks. J Theor Biol. 2007;244(4):670–9. pmid:17069859
  8. 8. Cheng D, Qi H, Li Z. Analysis and control of Boolean networks: a semi-tensor product approach. Springer. 2010.
  9. 9. Wang Y, Zhong J, Pan Q, Li N. Minimal pinning control for set stability of Boolean networks. Appl Math Comput. 2024;465:128433.
  10. 10. Chen H, Wang Z, Shen B, Liang J. Model evaluation of the stochastic boolean control networks. IEEE Trans Automat Contr. 2022;67(8):4146–53.
  11. 11. Chen H, Wang Z, Liang J, Li M. State estimation for stochastic time-varying boolean networks. IEEE Trans Automat Contr. 2020;65(12):5480–7.
  12. 12. Chen H, Wang Z, Shen B, Liang J. Distributed recursive filtering over sensor networks with nonlogarithmic sensor resolution. IEEE Trans Automat Contr. 2022;67(10):5408–15.
  13. 13. Li Y, Li H, Ding X. Set stability of switched delayed logical networks with application to finite-field consensus. Automatica. 2020;113:108768.
  14. 14. Ma Z, Wang ZJ, McKeown MJ. Probabilistic Boolean network analysis of brain connectivity in Parkinson’s disease. IEEE J Sel Top Signal Process. 2008;2(6):975–85.
  15. 15. Kabir MH, Hoque MR, Koo B-J, Yang S-H. Mathematical modelling of a context-aware system based on Boolean control networks for smart home. In: The 18th IEEE International Symposium on Consumer Electronics (ISCE 2014). 2014. p. 1–2. https://doi.org/10.1109/isce.2014.6884406
  16. 16. Diveev AI, Sofronova EA. Synthesis of intelligent control of traffic flows in urban roads based on the logical network operator method. In: 2013 European Control Conference (ECC), 2013. p. 3512–7. https://doi.org/10.23919/ecc.2013.6669696
  17. 17. Roli A, Villani M, Serra R, Benedettini S, Pinciroli C, Birattari M. Dynamical properties of artificially evolved Boolean network robots. In: AI*IA 2015 Advances in Artificial Intelligence; 2015. p. 45–57.
  18. 18. Makarov SI, Boldyrev MA. Application of bool variables in analysis of risks in the bond market. Digital Technologies in the New Socio-Economic Reality. 2022. p. 479–88.
  19. 19. Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang D. Complex networks: structure and dynamics. Phys Rep. 2006;424(4–5):175–308.
  20. 20. Allen F, Gale D. Optimal currency crises. Carnegie-Rochester Conf Ser Publ Policy. 2000;53(1):177–230.
  21. 21. Xu X, Zeng Z, Xu J, Zhang M. Fuzzy dynamical system scenario simulation-based cross-border financial contagion analysis: a perspective from international capital flows. IEEE Trans Fuzzy Syst. 2017;25(2):439–59.
  22. 22. Zhao Z, Chen D, Wang L, Han C. Credit risk diffusion in supply chain finance: a complex networks perspective. Sustainability. 2018;10(12):4608.
  23. 23. Feng S, Cetinkaya A, Ishii H, Tesi P, Persis CD. Networked control under DoS attacks: tradeoffs between resilience and data rate. IEEE Trans Automat Contr. 2021;66(1):460–7.
  24. 24. Zhu S, Lu J, Cao J, Lin L, Lam J, Ng M, et al. Undetectable attacks on Boolean networks. In: 2023 62nd IEEE Conference on Decision and Control (CDC). 2023. p. 1698–703. https://doi.org/10.1109/cdc49753.2023.10383321
  25. 25. Wang Y-W, Zeng Z-H, Liu X-K, Liu Z-W. Input-to-state stability of switched linear systems with unstabilizable modes under DoS attacks. Automatica. 2022;146:110607.
  26. 26. Mölsä J. Mitigating denial of service attacks: a tutorial. JCS. 2005;13(6):807–37.
  27. 27. Abdollahi Biron Z, Dey S, Pisu P. Real-time detection and estimation of denial of service attack in connected vehicle systems. IEEE Trans Intell Transport Syst. 2018;19(12):3893–902.
  28. 28. Falowo OI, Ozer M, Li C, Abdo JB. Evolving malware and DDoS attacks: decadal longitudinal study. IEEE Access. 2024;12:39221–37.
  29. 29. Kochemazov S, Semenov A. Using synchronous Boolean networks to model several phenomena of collective behavior. PLoS One. 2014;9(12):e115156. pmid:25526612
  30. 30. Gualdi S, Cimini G, Primicerio K, Di Clemente R, Challet D. Statistically validated network of portfolio overlaps and systemic risk. Sci Rep. 2016;6:39467. pmid:28000764
  31. 31. Musciotto F, Marotta L, Piilo J, Mantegna RN. Long-term ecology of investors in a financial market. Palgrave Commun. 2018;4(1).
  32. 32. Mu T, Feng J-E, Wang B, Jia Y. Identification of Boolean control networks with time delay. ISA Trans. 2024;144:113–23. pmid:37865590
  33. 33. Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. AAAI. 2016;30(1):1–7.
  34. 34. Acernese A, Yerudkar A, Glielmo L, Vecchio CD. Double deep-Q learning-based output tracking of probabilistic boolean control networks. IEEE Access. 2020;8:199254–65.
  35. 35. Moschoyiannis S, Chatzaroulas E, Šliogeris V, Wu Y. Deep reinforcement learning for stabilization of large-scale probabilistic Boolean networks. IEEE Trans Control Netw Syst. 2023;10(3):1412–23.
  36. 36. Brewer J. Kronecker products and matrix calculus in system theory. IEEE Trans Circuits Syst. 1978;25(9):772–81.
  37. 37. Mehmood S, Amin R, Mustafa J, Hussain M, Alsubaei FS, Zakaria MD. Distributed Denial of Services (DDoS) attack detection in SDN using optimizer-equipped CNN-MLP. PLoS One. 2025;20(1):e0312425. pmid:39869573
  38. 38. Salim MM, Rathore S, Park JH. Distributed denial of service attacks and its defenses in IoT: a survey. J Supercomput. 2019;76(7):5320–63.
  39. 39. Patil S, Chaudhari S. DoS attack prevention technique in wireless sensor networks. Procedia Comput Sci. 2016;79:715–21.
  40. 40. Salem FM, Youssef H, Ali I, Haggag A. A variable-trust threshold-based approach for DDOS attack mitigation in software defined networks. PLoS One. 2022;17(8):e0273681. pmid:36037194
  41. 41. Ren Y, Lu J, Liu Y, Shi K. Cluster synchronization of Boolean networks under probabilistic function perturbation. IEEE Trans Circuits Syst II. 2022;69(2):504–8.
  42. 42. Guo Y, Zhou R, Wu Y, Gui W, Yang C. Stability and set stability in distribution of probabilistic Boolean networks. IEEE Trans Automat Control. 2018;64(2):736–42.
  43. 43. Ni J, Tang Y, Li F. Minimum-cost state-flipped control for reachability of Boolean control networks using reinforcement learning. IEEE Trans Cybern. 2024;54(11):7103–15. pmid:39288054
  44. 44. Du L, Zhang Z, Xia C. A state-flipped approach to complete synchronization of Boolean networks. Appl Math Comput. 2023;443:127788.
  45. 45. Du L, Zhang Z, Xia C. A node-pinning and state-flipped approach to partial synchronization of Boolean networks. Nonl Anal: Hybrid Syst. 2024;53:101501.
  46. 46. Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. International Conference on Learning Representations. 2016; p. 1–23.