Figures
Abstract
Ship collision avoidance has become a focus issue in maritime navigation. Existing methods often struggle to simultaneously meet the hierarchical decision-making requirements of the International Regulations for Preventing Collisions at Sea (COLREGs), address the dynamic uncertainty of ship risk attitudes, and effectively cope with multi-ship coupling risks. To solve the above problems,this paper proposes an algorithm that combines multi-agent systems with game theory, and integrates ship collision avoidance rules into the reward function design. The algorithm constructs a two-stage framework: the risk attitude perception layer uses a Long Short-Term Memory (LSTM) network to predict the short-term motion states of target ships, and dynamically infers the probability distribution of target ships’ risk attitudes through a Bayesian network combined with historical Automatic Identification System (AIS) data and encounter characteristics. The decision-making execution layer integrates Stackelberg game with the Multi-Agent Actor-Critic (MAAC) algorithm, and embeds COLREGs as rigid constraints into the action space to ensure the compliance of the algorithm. Experimental verification is carried out based on historical AIS data and simulation scenarios. The results show that the proposed algorithm has certain advantages in various key indicators,the collision rate, the COLREGs compliance rate, the trajectory smoothness, and the average risk. Statistical significance tests confirm the robustness and superiority of the algorithm. This study provides a reliable technical scheme for ship collision avoidance strategies in multi-ship waters.
Citation: Xu T, Wang T, Zhao J, Hu Q (2026) AIS data-driven MAAC-Stackelberg multi-ship cooperative collision avoidance algorithm. PLoS One 21(6): e0345950. https://doi.org/10.1371/journal.pone.0345950
Editor: Yile Chen, Macau University of Science and Technology, MACAO
Received: December 18, 2025; Accepted: March 12, 2026; Published: June 3, 2026
Copyright: © 2026 Xu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting information files.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
With the rapid development of autonomous shipping, maritime traffic safety has become a core focus of the international maritime community. The International Regulations for Preventing Collisions at Sea (COLREGs) provide the fundamental framework for ship collision avoidance, but the complexity of multi-ship encounter scenarios and the uncertainty of maritime environments pose significant challenges to the practical implementation of these regulations [1]. Collision accidents not only cause enormous economic losses but also lead to severe environmental pollution, making the development of efficient and reliable multi-ship collision avoidance algorithms an urgent demand in ocean engineering [2].
Automatic Identification System (AIS) data, as a key source of maritime traffic information, has been widely used in collision risk assessment,anomaly detection,and ship behavior analysis due to its advantages of high accuracy and real-time performance [3]. Recent studies have demonstrated the potential of AIS data-driven methods in improving the intelligence of collision avoidance systems, such as ship domain estimation [4], risk assessment and navigation decision-making [5]. However, the effective integration of AIS data with advanced decision-making algorithms to solve multi-ship cooperative collision avoidance problems remains a critical research gap.
In the field of multi-agent decision-making, multi-agent actor-critic (MAAC) has emerged as a promising algorithm for cooperative-competitive environments,enabling multiple agents to learn optimal strategies through interactive training [6]. Several studies have applied MAAC and other MARL methods to ship collision avoidance, achieving positive results in two-ship or simple multi-ship scenarios. Nevertheless,these methods often ignore the hierarchical decision-making characteristics of multi-ship encounters, where ships may have different priorities such as give-way ships and stand-on ships as required by COLREGs [7], leading to suboptimal cooperative effects in complex encounter situations.
Game theory provides an effective tool for modeling multi-agent interactive decision-making,and Stackelberg equilibrium as a hierarchical game solution is particularly suitable for scenarios with leader-follower relationships. However,existing game-based methods either focus on two-ship encounters [8] or lack effective integration with MARL algorithms, making it difficult to adapt to the dynamic and cooperative characteristics of multi-ship collision avoidance in real maritime environments. Additionally,although some studies have combined game theory with reinforcement learning, they rarely consider the guidance of AIS data in strategy learning, resulting in poor generalization of the algorithms to actual maritime traffic scenarios [9].
To address the aforementioned limitations. This research introduces a multi-ship cooperative collision avoidance algorithm based on the integration of AIS data and MAAC-Stackelberg frameworks. The core innovations of this work are as follows: (1) Incorporate the preprocessing of AIS data and the extraction of features to deliver precise environmental insights and comprehensive ship state details to enhance the effectiveness of the decision-making system; (2) Integrate MAAC with Stackelberg equilibrium to develop a hierarchical cooperative decision-making model,allowing vessels to modify their strategies based on their COLREGs specified priorities [10]; (3) Design a cooperative reward function considering both collision avoidance safety and navigation efficiency,ensuring the algorithm’s compliance with COLREGs and practical engineering applicability.
The remainder of this paper is structured as follows. Section 2 reviews the relevant literature on collision risk assessment, game-theoretic approaches, and multi-agent reinforcement learning in maritime CADM. Section 3 details the methodology, including the risk attitude perception layer based on AIS data, Bayesian reasoning and the collision avoidance decision execution layer integrating MAAC and Stackelberg game. Section 4 presents the simulation results and analysis, followed by conclusions and future work in Section 5.
2. Literature review
2.1. Collision risk assessment in maritime navigation
Collision risk assessment is the foundational premise of collision avoidance decision-making, responsible for quantifying the probability and severity of potential collisions to provide decision support for subsequent avoidance maneuvers. In the maritime field, this research has evolved from static geometric indicators to dynamic data-driven models, but key limitations remain in adapting to complex multi-ship and human-in-the-loop scenarios [11].
Traditional collision risk assessment relies heavily on kinematic parameter-based indicators. The Distance to Closest Point of Approach (DCPA) and Time to Closest Point of Approach (TCPA) are the most widely used core metrics, as they can quickly reflect the spatial proximity and temporal urgency of two-ship encounters [12]. However, these indicators are inherently static and fail to capture the dynamic evolution of encounter situations such as sudden speed changes of target ships and the subjective risk preferences of mariners [13].
To address this, the ship domain theory has become a mainstream research direction since its proposal by Fujii (1971) [14]. Early models such as circular and rectangular domains simplified the navigational environment and treated all ships as homogeneous. With the popularization of Automatic Identification System data, recent studies have focused on dynamic and asymmetric ship domains tailored to maritime characteristics. For instance, Silveira et al. (2022) extracted quaternion ship domain parameters from AIS data by considering differences in bow, stern, port and starboard safety distances, which improved the fitting degree with actual navigation behavior [15].
Additionally, data-driven methods have been widely adopted. Wang et al. (2023) used a multi-scale risk estimation model based on AIS data to quantify collision risks in complex waters [16], Guo et al. (2023) proposes the VT-MDM method, which has successfully achieved high-precision multi modal vessel trajectory prediction [17].
Despite these advancements, collision risk assessment in the maritime field still faces three critical limitations. One limitation is that most models such as ship domain and DCPA/TCPA treat risk perception as deterministic, ignoring the variability of mariners’ risk attitudes that include conservative, neutral and aggressive types due to experience, education and navigation scenarios [18]. For instance,a conservative mariner may trigger avoidance actions when the target ship is 5 ship lengths away, while an aggressive one may wait until 3 ship lengths, a difference that existing models fail to capture. Another limitation lies in that in multi-ship encounters, the collision risk between any two ships is not independent but mutually coupled, as avoiding Ship A may increase the risk of colliding with Ship B. However, current models either simplify multi-ship scenarios into pairwise interactions or rely on pre-defined rules to reduce complexity [19]. A further limitation is that risk assessment results are often used as independent early warning signals rather than being deeply integrated into subsequent collision avoidance decision-making [20]. This disconnect leads to suboptimal strategies such as overly conservative maneuvers based on high-risk warnings that compromise navigation efficiency.
2.2. Game-theoretic approaches in maritime CADM
Game theory provides a rigorous mathematical framework for modeling interactive decision-making among multiple vessels, where each ship’s strategy depends on the expected behaviors of others. In the maritime field, game-theoretic approaches have been extensively probed to resolve collision avoidance conflicts, with particular emphasis on Nash equilibrium and Stackelberg equilibrium as core analytical paradigms but significant gaps remain in adapting to incomplete information and COLREGs compliance.
Nash equilibrium is a prevailing game-theoretic model applied to maritime CADM, assuming simultaneous and independent decision-making by all ships. Wan et al. (2025) proposes a cooperative collision avoidance framework integrating game theory and intelligent optimization [21].
The Stackelberg equilibrium,alternatively denoted as the leader-follower paradigm,has garnered growing scholarly traction in recent years. This model assumes that the leader,which comprises stand-on ships and large-tonnage vessels,makes decisions first,while the follower,which consists of give-way ships and small vessels, responds optimally. Additionally,probabilistic game theory has been explored to address uncertainty. A probabilistic Stackelberg game is introduced into autonomy using a leader–follower structure,aiming to compute equilibrium strategies under perceptual uncertainty.
Game-theoretic approaches in maritime CADM have key shortcomings. One limitation is that most models assume perfect knowledge of other ships’ intentions, risk attitudes, and maneuver capabilities, but in reality, AIS data may be noisy, delayed, or lost, and human operated ships exhibit unpredictable behaviors, leading to incomplete information. Another limitation is that existing models in the maritime field often use fixed rules such as tonnage and COLREGs status to identify leaders and followers, failing to dynamically adjust based on real-time encounter evolution; for instance, a sudden speed reduction by the leader may require role switching. A further limitation is that the utility functions in game models are typically designed based on universal safety and efficiency criteria, ignoring differences in risk attitudes between ships. For instance, an aggressive follower may not respond to a leader’s maneuver as expected, leading to strategy failure [22].
2.3. Multi-agent reinforcement learning in maritime CADM
MARL enables ships to learn nearly optimal collision avoidance strategies through interaction with the environment and other agents,making it suitable for dynamic multi-ship scenarios. The MAAC algorithm,distinguished by its architecture of centralized training and decentralized execution, has emerged as a excellent area of research within maritime CADM. However, its implementation continues to encounter difficulties with adherence to rules and scalability.
Multi-Agent Reinforcement Learning algorithms such as Multi‑Agent Deep Deterministic Policy Gradient (MADDPG), Multi-Agent Actor-Critic with Attention (MAAC) and Q-Mixing Network (QMIX) have been widely explored in maritime collision avoidance. For example,Wang et al.(2025) proposes a communication-integrated multi-agent deep deterministic policy gradient algorithm to address cooperative control issues for Unmanned Surface Vehicles (USV) in USV-based multi-agent systems under partial observability and non-stationarity. The algorithm is verified via cooperative navigation and collision avoidance tasks, outperforms traditional single-agent methods and enables effective coordination by establishing communication protocols and sharing observations to compensate for missing information [23]. Zhu et al. (2024) proposes a multi-ship autonomous collision avoidance decision-making algorithm based on MER-D3QN to enables all ships to complete collision avoidance tasks safely and efficiently [24].
MARL in maritime CADM has three limitations. One limitation is that although some studies incorporate COLREGs into reward functions, the rules are often simplified, for example by adopting binary rewards that distinguish between compliance and non-compliance, rather than being deeply embedded in the learning process. This leads to strategies that may violate detailed provisions such as those related to the timing and amplitude of maneuvers in complex scenarios. Another limitation is that MARL’s training efficiency and strategy performance depend heavily on reasonable state partitioning, but existing multi-ship partitioning methods fail to account for dynamic risk attitudes and coupled risks, leading to unstable learning in dense traffic. A further limitation is that MARL relies on large-scale AIS data for training, but real-world data often contains noise, gaps, and outliers. Current models lack effective preprocessing and adaptation mechanisms, resulting in degraded performance in practical applications [25].
2.4. Contributions and innovations of this study
To address these gaps,this study proposes a novel collision avoidance strategy combining the MAAC algorithm and the Stackelberg game based on AIS data analysis, with the following key contributions and innovations. Probabilistic dynamic modeling of risk attitudes for maritime scenarios. By analyzing historical AIS data to extract collision avoidance behaviors, we establish a prior distribution of risk attitudes,and use Bayesian reasoning to dynamically update the probability distribution of target ships’ risk attitudes in real time. This overcomes the limitation of static risk attitude assumptions. Stackelberg game with incomplete information tailored to maritime characteristics. We design a dynamic leader-follower identification mechanism based on AIS-derived kinematic features including distance to conflict point and maneuverability as well as COLREGs rules, and construct a utility function that integrates risk attitude probabilities. This addresses the problem of static role partitioning and complete information assumptions in traditional maritime game models, ensuring hierarchical decision-making consistency with practical navigation. MAAC algorithm with deep COLREGs integration and multi-ship coupling adaptation. We embed detailed COLREGs provisions including maneuver timing and direction into the reward function, and optimize the MAAC’s centralized critic network to capture multi-ship coupled risks. This improvement in rule adherence and decision-making rationality of the acquired strategies addresses the issues of superficial rule integration and limited adaptability to multiple ships found in current MAAC-based maritime CADM systems. We unified multi-ship encounter state partitioning framework. We propose a two-stage partitioning method that first classifies basic encounter states including head-on, crossing and overtaking based on AIS kinematic data, then refines sub-states according to risk attitude probabilities and coupled risk levels. This provides a scalable foundation for multi-ship decision-making,bridging the gap between two-ship and multi-ship state modeling.
3. Methodology
This segment offers a comprehensive overview of the technical execution of the AIS data-driven MAAC-Stackelberg multi-ship cooperative collision avoidance algorithm. Beginning with data preprocessing, progressing to state identification,continuing through prior analysis and risk assessment, then advancing to prediction inference,and culminating in decision optimization. Fig 1 gives the methodology framework and more details are discussed below.
The overall architecture of the proposed cooperative collision-avoidance system.
3.1. AIS data preprocessing
AIS data serves as the core input for the entire framework, including dynamic parameters position (x,y), speed v, heading , rate of turn r, static parameters ship length L, width B, ship type T, and timestamp t. To eliminate noise and ensure data reliability, a three-stage preprocessing pipeline is implemented, referring to mature AIS data processing protocols in maritime research.
Abnormal data such as signal drift, sudden jumps and missing values are identified using the principle and kinematic constraints, consistent with the data quality control method. For dynamic parameters including v,
and r, the anomaly detection criterion is defined as follows:
where and
are the mean and standard deviation of parameter p in a sliding window with a window size of 30s, and
is the maximum allowable change rate such as
and
s, which is determined based on ship maneuverability constraints.
Missing values are supplemented using cubic spline interpolation to ensure trajectory smoothness:
where are interpolation coefficients solved by minimizing the second derivative of the trajectory.
3.2. Trajectory reconstruction
To unify the time interval of AIS data with the original interval ranging from 1s to 10s, the trajectory is resampled at s using linear interpolation for dynamic parameters. The reconstructed trajectory of ship i is represented as follows:
where .
To facilitate distance computation, the initial WGS-84 geodetic coordinates denoted by latitude and longitude
are transformed into a local Cartesian coordinate system (x, y) through the application of a simplified UTM transverse Mercator approximation. This approximation is appropriate for relatively small study regions, generally those that do not exceed 100 km. The transformation process is formulated as follows:
where and
are the geodetic latitude and longitude in degrees of the point to be converted.
and
denote the latitude and longitude of the local coordinate origin usually chosen as the centroid of the study region. R = 6378 km is the WGS-84 reference ellipsoid semi-major axis. The factor
converts angular values from degrees to radians for trigonometric consistency. This local planar approximation replaces spherical distance calculations with Euclidean computations, reducing computational complexity in trajectory analysis and spatial positioning.
3.3. Two-ship encounter identification
In accordance with COLREGs and standard maritime navigation practices, two-ship encounters are detected by fusing kinematic indicators including DCPA and TCPA and spatial criteria including ship domain intrusion.
For ship i (own ship, OS) and ship j (target ship, TS), the relative motion vector is , and the initial relative position is
. The TCPA and DCPA are calculated using the classic kinematic model:
An encounter is identified if TCPA > 0 (future encounter) and DCPA < Dsafe (unsafe proximity), where (
). Three canonical encounter states are classified using the relative bearing
and relative speed ratio
, consistent with COLREGs rules:
To further refine encounter patterns for complex multi-ship scenarios, the azimuth-based sub-classification and encounter pattern mapping are established as shown in Table 1 and Table 2.
Traditional models for two-ship encounter situations rely on pairwise DCPA and TCPA calculations and fixed relative bearing rules to identify and classify head-on, crossing, and overtaking encounter states. But in a real environment, ship roles can change dynamically. For instance, a target ship (TS) in one encounter may act as an own ship (OS) in another ship pair’s encounter. Meanwhile, risk attitudes are heterogeneous among different target ships, which may exhibit conservative, neutral, or aggressive navigation behaviors. This can further influence collision avoidance decision-making. To remediate these aforementioned challenges, we use Table 1 and Table 2 functioning as the linchpin tools underpinning state structuring. The Table 1 and Table 2 adopt the azimuth Az instead of , because Az as a global angle is more stable in complex multi-ship scenarios and will not fluctuate frequently with the turning of the own ship, making it more suitable for multi-ship relationship modeling. Based on this, Table 1 discretizes the continuous azimuth Az into 8 sub-classes to realize the coding of orientation information, laying a foundation for the refined classification of encounter patterns in Table 2. For an OS and n TSs where j takes values from 1 to n, the relative azimuth Azij is defined as the angle of TS j relative to OS i’s navigation coordinate system, and it is calculated for each ship pair to form an initial multi-ship azimuth set
.
Table 1 resolves the lack of granularity in two-ship state classification by subdividing azimuths into 8 sub-classifications including A1/A2,B1/B2,C1/C2 and D1/D2 based on maritime kinematic rules. For each Azij in the multi-ship azimuth set, match Azij to the Azimuth range in Table 1 to determine the corresponding sub-classification. For instance, when , it is classified as A1. When
, it is classified as B2. This step converts unstructured multi-ship azimuth data into structured sub-classification labels, forming a multi-ship sub-classification set
where Sij represents the sub-classification of TS j relative to OS i.
Table 2 expands pairwise encounter patterns to multi-ship scenarios by establishing a mapping between sub-classification pairs and encounter situations. For each TSj, construct the sub-classification pair Sij, Sji where Sji is the sub-classification of OS i relative to TS j and calculated via the same azimuth matching method as the previous step; match the sub-classification pair to the encounter pattern column in Table 2 to determine the pairwise encounter situation, for example, the sub-classification pair A1-A1 maps to Head-on and B1-D2 maps to Crossing. For multi-ship coupled scenarios such as TS1 and TS2 both forming Crossing situations with OS, refer to the “Crossing-(Head-on solution)” situation in Table 2 and integrate clustering-based multi-ship grouping to classify TSs with overlapping encounter domains into the same group, and prioritize collision avoidance operations for groups with smaller TCPA. This step forms a multi-ship encounter pattern set
where Pij represents the refined encounter situation of TS j relative to OS i, achieving standardized description of multi-ship states.
After completing the determination of ship encounter situations,we can conduct prior analysis on AIS data in combination with the identified situation characteristics to provide targeted data support for subsequent research.
3.4. Prior analysis of AIS data
Prior analysis of historical AIS data aims to extract statistical characteristics of collision avoidance behaviors, laying the foundation for subsequent risk attitude modeling and algorithm training [26].
A collision avoidance behavior is defined as a maneuver where the ship’s turning angle or speed change
kn during an encounter. The probability distribution of core avoidance parameters including turning angle
and avoidance timing tlead is fitted using the Weibull distribution which is suitable for non-negative continuous variables in maritime engineering, with tlead defined as the difference between t0 and tstart and t0 being the time when DCPA equals Dsafe:
where is a shape parameter,reflecting the concentration degree of collision avoidance behavior. When
, collision avoidance actions are concentrated, and when
, they are dispersed.
is the scale parameter, corresponding to the characteristic value of the collision avoidance action initiation time, which is obtained through maximum likelihood estimation from historical AIS trajectories. The parameter
is set to 180 s.
where M is the number of historical avoidance behavior samples. An eccentric elliptical ship domain is adopted widely used in modern maritime collision risk assessment, with the OS at the left rear of the ellipse center. The domain boundary is:
where is the relative coordinate of TS relative to OS rotated by OS’s heading
, a takes the value 3.5 and serves as the long-axis coefficient corresponding to the bow direction, b takes the value 1.5 and serves as the short-axis coefficient corresponding to the port-starboard direction,and L is OS’s length [27]. According to the boundary constraint of the elliptical ship domain established above, the spatial risk Rs is defined by the domain violation degree ddv, which refers to the ratio of TS’s intrusion depth to domain semi-axis:
where ks = 2.3 is the decay coefficient calibrated via AIS data.
On the temporal dimension,the temporal risk Rt is constructed to characterize the time urgency of collision risk,which is calculated based on TCPA normalized by the critical time tc. The critical time tc takes the value of 600s, equivalent to 10 min, and the calculation formula is given as:
To achieve a comprehensive and quantitative evaluation of ship collision risk from both spatial and temporal dimensions, the comprehensive collision risk index is obtained by weighted fusion of Rs and Rt. The weights used in the fusion process are determined via the Analytic Hierarchy Process (AHP), and the fusion formula is expressed as:
where is the spatial risk weight, and
is the temporal risk weight. Both of them are determined by the Analytic Hierarchy Process (AHP).
3.5 Feature engineering
To support subsequent prediction and inference models, 22 dimensional features are extracted,divided into three categories referring to the feature selection framework for maritime collision avoidance. Dynamic features include which are ship motion parameters. Encounter features include DCPA, TCPA,
which are calculated through the corresponding relational expressions. Static features include
. Features are standardized using Z-score normalization to eliminate scale differences, a method common for MARL algorithms:
where and
are the mean and standard deviation of feature x in the training set. Based on prior AIS data, key scenario parameters including initial relative position
, speed vi, vj and heading
are sampled from their empirical distributions. For each sample, DCPA and TCPA are calculated through corresponding relational expressions to determine valid encounters. Where TCPA is greater than 0 and DCPA is less than Dsafe.
For each valid scenario, TS’s avoidance behavior is simulated using the prior distribution of and tlead. The maneuver process is modeled via the Nomoto ship motion equation which is classic for maritime maneuverability:
where r is the rate of turn, is the rudder angle, K is the rudder gain and T is the time constant which is calibrated by ship type from AIS data. The simulated avoidance trajectory is:
where .
Simulated avoidance behaviors are aggregated to obtain the avoidance distribution . The 95% confidence interval of each distribution is:
3.6. LSTM-BN hybrid framework for target ship state
To address the uncertainty of target ship motion and the ambiguity of risk attitude in maritime encounter scenarios, a hybrid framework integrating Long Short-Term Memory (LSTM) and Bayesian Network (BN) is proposed. This framework first leverages LSTM to capture temporal dependencies in AIS data for predicting future motion states, and then fuses the predicted states with static or dynamic encounter features as inputs to BN for probabilistic inference of target ship risk attitude. The synergistic integration of LSTM’s temporal prediction capability and BN’s uncertainty reasoning advantage provides reliable state support and risk prior information for subsequent cooperative collision avoidance decision-making, which is consistent with the hybrid intelligent reasoning paradigm in maritime safety research. The specific process is shown in the Fig 2.
The overall architecture of the proposed cooperative collision-avoidance system.
For prediction, given the time-series nature of AIS data including speed vj, course and rate of turn rj. An LSTM network with fully-connected preprocessing and postprocessing layers is designed to predict the target ship’s motion states at future key time steps of 5 s,10 s and 15 s. The input of the LSTM network is a 6-step historical AIS feature sequence
consisting of feature vectors ft from t = 0 to t = 5, where each time-step feature vector ft is a 3-dimensional vector in
with elements vj(t),
and rj(t). To enhance feature expression ability, a fully-connected preprocessing layer maps each 3 dimensional time step feature to a high dimensional space:
where and
are the weight matrix and bias vector of the preprocessing layer respectively. The processed sequence
is fed into a two-layer LSTM network to model temporal dependencies:
where is the hidden state at time t, hn is the final hidden state and cn is the final hidden state and cell state of the LSTM network respectively. A dropout layer with a dropout rate of 0.2 is added between the two LSTM layers to mitigate overfitting. The final hidden state hn is input to a fully-connected postprocessing layer for motion state prediction:
where ,
,
,
are trainable parameters. The output
represents the predicted motion states of the target ship at 5s, 10s, and 15s in the future. The LSTM network is trained with the Adam optimizer (learning rate
) and Mean Squared Error loss function, as recommended for maritime motion prediction tasks [28].
The BN is constructed to infer the target ship’s risk attitude by fusing LSTM-predicted motion states and original AIS encounter features. The BN is formally defined as BN = (V, E, P). Node set V includes 1 target node and 10 feature nodes. The feature nodes are divided into two categories. LSTM-predicted motion features adopted from the 5 s predicted results for real-time decision-making. Original AIS encounter or static features
, where DCPA and TCPA characterize collision risk,
is the relative course.tlead is the lead time.Ris the encounter range. Lj is the target ship length. Tj is the ship type. Edge set
directed edges from all 10 feature nodes to the target node A, indicating the causal relationship between feature variables and risk attitude. Conditional Probability
calibrated using historical AIS trajectories, where the prior probability of risk attitude is determined by statistical analysis of collision avoidance behaviors. According to the statistical results,
. Based on Bayes’ theorem, the posterior probability of the target ship belonging to risk attitude Ak (k = 1,2,3) is calculated as:
where denotes the likelihood function of feature variables under risk attitude Ak. Considering the continuity of maritime features, this likelihood function follows a multivariate Gaussian distribution [29]:
where d = 10 is the number of feature nodes, is the feature vector and
,
are the mean vector and covariance matrix of features calibrated from historical data under risk attitude Ak respectively.
To construct a comprehensive decision feature set for the subsequent MAAC-Stackelberg collision avoidance algorithm, three types of features are fused. We normalized original AIS features . LSTM-predicted motion states
including 6 dimensions,
. The BN-inferred risk attitude probabilities are given by
. The fused decision feature set is defined as:
resulting in a 28-dimensional feature vector. To eliminate redundant information and enhance decision relevance, feature selection is performed using Mutual Information. Specifically, features with mutual information between the feature and optimal collision avoidance actions including rudder angle and speed change
greater than 0.1 are retained:
This selection criterion ensures that the final feature set is both compact and discriminative, laying a foundation for efficient and accurate collision avoidance decision-making.
3.7. The framework of COLREGs embedded MAAC-Stackelberg
The decision module integrates Stackelberg game which focuses on hierarchical decision and MAAC which emphasizes multi-agent collaboration with rigid COLREGs embedding, combining game-theoretic reasoning and MARL adaptive learning.
Key COLREGs rules are converted to mathematical constraints for decision variables :
- Rule 13 (Overtaking): Overtaking ship
(starboard turn),
kn (reduce speed);
- Rule 14 (Head-on):
(starboard turn),
kn (no excessive deceleration);
- Rule 15 (Crossing): TS on OS’s port side
(starboard turn); TS on starboard side
(maintain course).
These constraints are embedded into the MAAC action space:
where takes the value of 30° and
takes the value of 5 kn, with both being consistent with ship maneuverability limits.
In the leader-follower identification, leader L possesses a higher priority level.
where . Follower F is the other ship.
The leader’s utility aims to maximize safety and efficiency:
The follower’s utility is to respond to the leader’s action:
where is the leader’s optimal action.
which are calibrated via simulation.
Solved via backward induction:
Under the Centralized Training-Decentralized Execution (CTDE) framework, the MAAC algorithm improves the coordination efficiency of multi-agents through the attention mechanism,but suffers from defects such as slow convergence and multi-solution uncertainty. To address these issues, the Stackelberg Equilibrium (SE) is incorporated into the objective function as a regularization term. In the training phase, the central critic utilizes global information to suppress strategy mutual interference, narrow the exploration space,and guide the convergence to a unique subgame perfect equilibrium through the sequential constraints of SE. In the execution phase, the leader and follower agents make independent decisions based on local observations without communication. This design not only retains the engineering feasibility of the CTDE framework, but also solves the coordination and convergence problems of MAAC through the inductive bias of leader-follower games, which is suitable for large-scale hierarchical decision-making scenarios such as power grid regulation and traffic control.
Centralized Critic:
Decentralized Actors generate ship-specific actions for real-time execution:
where is the exploration noise coefficient. The selection of this parameter is synergistically matched with the core hyperparameters of the algorithm,including the learning rate of the Adam optimizer and the regularization coefficient of the Stackelberg equilibrium,which can effectively guarantee the comprehensive performance of the algorithm in terms of collision avoidance safety, COLREGs compliance and training convergence.
Critic loss aims to minimize temporal difference error:
Actor loss incorporates Stackelberg equilibrium for hierarchical decision-making:
where is the action distribution;
is the regularization coefficient.
Alternate between centralized training which involves updating and
and decentralized execution where each ship uses its own Actor. Training stops when the validation set collision rate is less than 0.5% and the COLREGs compliance rate is greater than 95%.
4. Experiments
4.1. Experimental setup
The experimental dataset consists of historical AIS data sourced from multiple origins, focusing on open waters of the East China Sea (122–123°E, 36–37°N). This dataset includes plenty of valid ship trajectories spanning various vessel categories such as power-driven ships, container ships, bulk carriers, and fishing vessels. It thoroughly represents typical maritime encounter scenarios such as head-on, crossing, and overtaking along with different risk attitude behaviors including aggressive, neutral, and conservative. The probabilistic reasoning component adopts a directed acyclic graph consisting of 10 feature nodes and 3 target nodes representing aggressive, neutral and conservative navigation behaviors respectively. The feature nodes include dynamic motion states predicted by the LSTM module as well as static and dynamic encounter characteristics such as DCPA, TCPA and others, with a prior probability distribution of risk attitudes set at 5:8:7. All comparative algorithms use the same training and test dataset and action space to ensure fair comparison.
4.2. Evaluation indicators
A comprehensive evaluation system is constructed covering 4 dimensions, safety, rule compliance, navigation efficiency, and decision stability. With specific indicators and mathematical definitions as follows:
1. Collision Rate (CR):
2. Minimum Safety Distance (MSD):
where K is the number of ship pairs in each scenario.
3. Average Risk:
where d0 = 1 n mile,t0 = 300 s,vr is relative speed, and .
4. COLREGs Compliance Rate (CCR):
5. Average Navigation Time (ANT):
6. Speed Loss Rate (SLR):
7. Trajectory Smoothness:
where T = 10 s, is the rudder angle of ship j at time t, and
.
8. Comprehensive Performance Score (CPS):
where: w1 = 0.4 (safety), w2 = 0.3 (compliance), w3 = 0.2 (stability), w4 = 0.1 (efficiency).
4.3. Experimental results
4.3.1. Training process analysis.
Fig 3 shows the training curves of 5 algorithms over 100 episodes,reflecting the convergence characteristics of collision rate, average risk, and COLREGs compliance rate.
Collision rate, Average risk, COLREGs compliance rate over 100 training episodes for the proposed MAAC-Stackelberg and four baseline algorithms.
It can be observed that the proposed MAAC-Stackelberg algorithm converges the fastest which stables after 60 episodes. MAAC-Stackelberg achieves the lowest collision rate and risk level in the late training stage, with a compliance rate exceeding 0.9.
4.3.2. Single indicator comparison.
Fig 4 shows the trajectory smoothness comparison. MAAC-Stackelberg achieves the highest score, which is 4% higher than the second-ranked DRL-COLREGs. This is because the LSTM based motion prediction module reduces abrupt rudder adjustments by forecasting 5s ahead states.
MAAC-Stackelberg achieves the highest smoothness score.
Fig 5 shows the average risk comparison among different algorithms. MAAC-Stackelberg achieves the lowest average risk, which is lower than DRL-COLREGs and lower than the rule-based method. This benefit comes from the Bayesian Network’s risk attitude inference, which enables adaptive strategy adjustment according to target ships’ behaviors. MAAC-Stackelberg achieves the lowest average risk. This reduction stems from the algorithm’s integrated design, the Bayesian Network infers target ships’ risk attitudes to preemptively adjust strategies, while the Stackelberg game hierarchy ensures coordinated risk mitigation among multiple ships.
Proposed algorithm maintains the lowest average risk.
Fig 6 demonstrates that MAAC-Stackelberg has the lowest collision rate (0.045). The rule-based method shows the highest collision rate (0.125) due to its inability to handle dynamic multi-ship interactions.
Proposed algorithm obtains the smallest average collision risk.
Fig 7 shows the COLREGs compliance rate comparison among different algorithms. MAAC-Stackelberg achieves the highest compliance rate (0.920), which is higher than DRL-COLREGs (0.880). This is attributed to the Stackelberg game’s leader-follower mechanism, which strictly follows COLREGs, while Pure MAAC (0.750) ignores rule constraints in pursuit of collision avoidance.
Proposed algorithm achieves the highest COLREGs compliance rate.
4.3.3. Comprehensive performance evaluation.
Fig 8 integrates the four core indicators into a radar chart, where a larger enclosed area indicates better comprehensive performance. MAAC-Stackelberg shows balanced superiority in all dimensions. To further quantify the overall performance of each algorithm, based on the balanced performance across the four dimensions of safety, compliance, trajectory smoothness, and navigation efficiency in the radar chart of Fig 8, a comprehensive performance score is calculated according to preset weights.
Radar plot of safety,compliance, smoothness and efficiency indices for five evaluated algorithms.
Fig 9 shows the weighted comprehensive score with weight distribution as safety 40%, compliance 30%, smoothness 20% and efficiency 10%. MAAC-Stackelberg achieves the highest score of 0.818, which is higher than DRL-COLREGs and higher than the rule-based method. Pure MAAC scored 0.75 in the compliance rate dimension significantly lower than the 0.92 of the algorithm in this paper. The reason why MAAC-Stackelberg outperforms the rule-based method lies in the fact that the latter relies on artificially preset collision avoidance logic and fixed thresholds, which is essentially a passive response mode with static triggering, single-ship perspective, and no optimization capability. In contrast, MAAC-Stackelberg takes data-driven and reinforcement learning as its core, and can adaptively generalize to complex multi-ship scenarios by autonomously learning the mapping relationship between environment-action-reward through end-to-end optimization. Compared to DRL-COLREGs, DRL-COLREGs makes decisions from a single-agent perspective, only focusing on its own operations, and does not model the intention game and behavioral interaction between multiple ships, resulting in insufficient decision-making robustness in complex scenarios such as multi-ship parallel encounters and crossing encounters. Meanwhile, MAAC-Stackelberg introduces a risk perception module to realize “forward-looking collision avoidance,” further improving decision-making stability and comprehensive benefits on the basis of meeting COLREGs constraints, hence achieving a higher comprehensive score. The original pure MAAC takes collision punishment as the sole constraint, and its training objective is only limited to “avoiding collisions” without the mandatory guidance of maritime rules. At the same time,the absence of rule constraints leads to an excessively large exploration space for the agent, making it prone to frequent and large-amplitude steering and speed change operations,with poor navigation smoothness, which is not conducive to ship power protection and navigation safety.
Proposed algorithm achieves the highest overall performance score.
4.3.4. Superiority across scenario complexity and environmental conditions.
To further illustrate the scalability and adaptability of MAAC-Stackelberg across diverse operational conditions, we analyze the 3D performance distribution in Fig 10, which visualizes the performance scores of three algorithm categories across scenario complexity.
Complexity–environment surface showing hybrid algorithm peak at high-density, high-uncertainty encounters.
It shows that Hybrid Algorithm MAAC-Stackelberg achieves a peak performance score of 0.92 in complex scenarios. This superiority stems from its integrated architecture that combines Stackelberg game, risk attitude inference,and MAAC’s centralized attention mechanism and which enables robust decision-making in highly dynamic and uncertain maritime environments.
LSTM-only Algorithm scores 0.78 with limited adaptability, performing moderately in intermediate scenarios but failing to match the Hybrid algorithm. This reflects the limitations of single modality sequence prediction in capturing multi-agent game dynamics.
To further validate these quantitative findings and illustrate the model’s practical behavior in complex maritime environments, we present trajectory visualizations and conduct safety performance verification under realistic multi-ship encounter scenarios.
4.4. Visualization of trajectories and verification of safety performance in realistic multi-ship encounter scenarios
This section constructs an 8-ship encounter scenario based on real AIS data from Chengshantou Fig 11 shows the differences in collision avoidance trajectories among various algorithms. The solid lines in the figure are the original AIS trajectories, the dashed lines are the algorithm output trajectories, the red dots are the collision avoidance decision trigger points. Key nodes are marked at the 0th, 30th, 60th and 90th seconds. This study further combines the actual navigation data of the scenario. The average minimum safe distance of these 8 ships in Chengshantou waters is 5.3104 nautical miles. Under the MAAC-Stackelberg algorithm, the average minimum safe distance of the ships reaches 5.3477 nautical miles. This value is 0.0373 nautical miles higher than the real value, corresponding to an increase of approximately 0.7%. The algorithm generates 24 decision points with an average time interval of 12 seconds between adjacent trigger points. For the DRL-COLREGs algorithm, the average minimum safe distance of the ships is 5.3131 nautical miles. This algorithm also produces 24 decision points, but it leads to 2 groups of local trajectory overlaps. The Rule-Based Method yields an average minimum safe distance of 5.3019 nautical miles for the ships, which is lower than the real value. This method generates only 16 decision points. To further quantify the performance advantages of the MAAC-Stackelberg algorithm over the baseline DRL-COLREGs method across different encounter scales, a comprehensive comparison of key navigation and safety indicators is conducted in both two-ship and multi-ship scenarios. The quantitative improvement results are summarized in Table 3.
Performance evaluation of MAAC-Stackelberg, DRL-COLREGs and rule-based method.
4.4.1. Statistical significance test.
To validate that the superior collision avoidance performance of the proposed MAAC-Stackelberg algorithm is not attributable to random fluctuations, two-tailed t-tests were performed to quantify the between-group differences in collision rate (CR) and COLREGs compliance rate (CCR). The two core performance metrics—between the MAAC-Stackelberg algorithm and the benchmark DRL-COLREGs algorithm. In essence, this statistical validation was designed to furnish a quantitative statistical corroboration for the performance disparities of the algorithms, rather than merely presenting a straightforward numerical comparison. The test results revealed that the t-statistic for CR was 6.87 with a p-value of less than 0.01, and the t-statistic for CCR reached 7.32 with a p-value also below 0.01, both achieving a level of extremely significant statistical difference.
4.5. Discussion
To visually compare the comprehensive performance of different algorithms, this part compiles the experimental results of each algorithm in five indicators in Table 4. The hybrid Bayesian Network namely BN-LSTM framework resolves the ambiguity of target ship risk attitudes [30]. Unlike traditional models that assume static risk preferences,the BN module infers dynamic posterior probabilities of aggressive,neutral and conservative attitudes using AIS-derived features, while the LSTM network captures temporal dependencies to predict short-term motion states including 5 s, 10 s and 15 s.
The integration of Stackelberg equilibrium with MAAC addresses hierarchical decision conflicts inherent in COLREGs. The leader-follower mechanism, dynamically identified based on ship priority including ship type [31], length and DCPA, ensures compliance with give-way and stand-on provisions. While MAAC’s centralized training, decentralized execution architecture captures multi-ship coupled risks. This avoids the local optimality of pairwise interaction models [32]. Despite these advancements, the study has notable limitations. The current model assumes stable environmental conditions, ignoring the impact of extreme weather such as strong winds and currents on ship maneuverability and AIS data reliability. The scope is limited to general power-driven ships in open waters, excluding special ship types such as fishing vessels and ships with restricted maneuverability. The MAAC’s centralized Critic and Stackelberg equilibrium solving introduce computational overhead, which may hinder real-time deployment on shipborne embedded systems. The reliance on historical AIS data and simulated scenarios lacks validation in field tests with autonomous surface vessels namely ASVs, limiting the assessment of practical engineering applicability [33].
5. Conclusion
This study proposes a novel AIS data-driven multi-ship collision avoidance strategy that integrates MAAC, Stackelberg Game, Bayesian Network, and LSTM with rigid embedding of the COLREGs to address challenges in multi-ship navigation scenarios. The BN-LSTM hybrid framework enables dynamic modeling of target ship risk attitudes by fusing prior distributions derived from historical AIS data and predictions of short-term motion states. This framework overcomes the limitation of traditional models that rely on static risk assumptions, enhancing the strategy’s adaptability to the behavioral variability of human-operated ships. The integration of Stackelberg Game and MAAC aligns hierarchical decision-making processes with the priority requirements of COLREGs. The proposed strategy demonstrates strong robustness, maintaining a low collision rate in multi-ship scenarios. In conclusion, this approach integrates theoretical algorithm development with the practical demands of autonomous navigation in real-world scenarios.
References
- 1. Meftah LH, Cherif A, Braham R. Improving autonomous vehicles maneuverability and collision avoidance in adverse weather conditions using generative adversarial networks. IEEE Access. 2024;12:89679–90.
- 2. Li L, Wu D, Huang Y, Yuan Z-M. A path planning strategy unified with a COLREGS collision avoidance function based on deep reinforcement learning and artificial potential field. Appl Ocean Res. 2021;113:102759.
- 3. Kijima K, Furukawa Y. Automatic collision avoidance system using the concept of blocking area. 6th IFAC Conference on Manoeuvring and Control of Marine Craft (MCMC 2003), Girona, Spain, 17-19 September, 1997. IFAC Proceedings Volumes. 2003;36(21):223–8.
- 4. Everett M, Chen YF, How JP. Collision avoidance in pedestrian-rich environments with deep reinforcement learning. IEEE Access. 2021;9:10357–77.
- 5. Li G, Yang Y, Zhang T, Qu X, Cao D, Cheng B, et al. Risk assessment based collision avoidance decision-making for autonomous vehicles in multi-scenarios. Transp Res Part C: Emerg Technol. 2021;122:102820.
- 6. Degrieck A, Uyttersprot B, Sutulo S, Guedes Soares C, Van Hoydonck W, Vantorre M, et al. Hydrodynamic ship–ship and ship–bank interaction: a comparative numerical study. Ocean Eng. 2021;230:108970.
- 7. Niu Y, Zhu F, Wei M, Du Y, Zhai P. A multi-ship collision avoidance algorithm using data-driven multi-agent deep reinforcement learning. J Mar Sci Eng. 2023;11(11):2101.
- 8. Liu J, Zhang J, Yang Z, Zhang M, Tian W. A game-based decision-making method for multi-ship collaborative collision avoidance reflecting risk attitudes in open waters. Ocean Coast Manage. 2024;259:107450.
- 9. Liu P, Xiao F, Wei B, Yu M. Generalized nash equilibrium seeking for noncooperative games with heterogeneous individual dynamics. IEEE Trans Automat Contr. 2024;69(4):2492–9.
- 10. Tang M, Liao H, Wu X. A Stackelberg game model for large-scale group decision making based on cooperative incentives. Inform Fusion. 2023;96:103–16.
- 11. Wen Liu R, Lu Y, Gao Y, Guo Y, Ren W, Zhu F, et al. Real-time multi-scene visibility enhancement for promoting navigational safety of vessels under complex weather conditions. IEEE Trans Intell Transport Syst. 2024;25(12):19979–94.
- 12. Cheng Z, Chen P, Mou J, Chen L. Novel collision risk measurement method for multi-ship encounters via velocity obstacles and temporal proximity. Ocean Eng. 2024;302:117585.
- 13. Seo C, Noh Y, Abebe M, Kang Y-J, Park S, Kwon C. Ship collision avoidance route planning using CRI-based A* algorithm. Int J Naval Arch Ocean Eng. 2023;15:100551.
- 14. Fujii Y, Tanaka K. Traffic capacity. J Navigation. 1971;24(4):543–52.
- 15. Silveira P, Teixeira AP, Guedes Soares C. A method to extract the Quaternion Ship Domain parameters from AIS data. Ocean Eng. 2022;257:111568.
- 16. Wang M, Wang Y, Cui E, Fu X. A novel multi-ship collision probability estimation method considering data-driven quantification of trajectory uncertainty. Ocean Eng. 2023;272:113825.
- 17. Guo S, Zhang H, Guo Y. Toward multimodal vessel trajectory prediction by modeling the distribution of modes. Ocean Eng. 2023;282:115020.
- 18. Fan C, Wróbel K, Montewka J, Gil M, Wan C, Zhang D. A framework to identify factors influencing navigational risk for Maritime Autonomous Surface Ships. Ocean Eng. 2020;202:107188.
- 19. Ma J, Liu K, Tan C. Finite-time robust containment control for autonomous surface vehicle with input saturation constraint. Ocean Eng. 2022;252:111111.
- 20. Jialin M, Zhongyi Z, Zihao L. Quantitative modeling of ship collision hazards based on complex networks. Navig China. 2025;48(1):26–33, 68.
- 21. Wan Z, Gan L, Zhang L, Shu Y. Multi-ship collision avoidance decision-making based on cooperative game and particle swarm optimization algorithm. Ocean Eng. 2025;341:122537.
- 22. Zhong H, Zhang F, Gu Y. A Stackelberg game based two-stage framework to make decisions of freight rate for container shipping lines in the emerging blockchain-based market. Transp Res Part E: Logist Transp Rev. 2021;149:102303.
- 23. Wang Y, Zhao Y. Multiple ships cooperative navigation and collision avoidance using multi-agent reinforcement learning with communication. Ocean Eng. 2025;320:120244.
- 24.
Zhu F, Niu Y, Du Y, Wei M, Zhai P, Wang Z, et al. Multi-ship Autonomous Collision Avoidance Decision-making Algorithm Based on MER-D3QN. In: Proceedings of the 2024 International Conference on Intelligent Driving and Smart Transportation. IDST ’24. New York, NY, USA: Association for Computing Machinery; 2024. pp. 77–83. https://doi.org/10.1145/3704657.3704671
- 25. Ma J, Feng Y, Wang X, Jiang Z. An end-to-end multilingual framework for intelligent analysis of risk influence factors in ship grounding accidents. Reliab Eng Syst Saf. 2026;267:111788.
- 26. Wang W, Huang L, Liu K, Zhou Y, Yuan Z, Xin X, et al. Ship encounter scenario generation for collision avoidance algorithm testing based on AIS data. Ocean Eng. 2024;291:116436.
- 27. Liu RW, Liang M, Nie J, Yuan Y, Xiong Z, Yu H, et al. STMGCN: mobile edge computing-empowered vessel trajectory prediction using spatio-temporal multigraph convolutional network. IEEE Trans Ind Inf. 2022;18(11):7977–87.
- 28. Fiskin R. An advanced decision-making model for determining ship domain size with a combination of MCDM and fuzzy logic. Ocean Eng. 2023;283:114976.
- 29. Gao D, Zhu Y, Guedes Soares C. Uncertainty modelling and dynamic risk assessment for long-sequence AIS trajectory based on multivariate Gaussian Process. Reliab Eng Syst Saf. 2023;230:108963.
- 30. Chun D-H, Roh M-I, Lee H-W, Ha J, Yu D. Deep reinforcement learning-based collision avoidance for an autonomous ship. Ocean Eng. 2021;234:109216.
- 31. Zhang X, Wang C, Jiang L, An L, Yang R. Collision-avoidance navigation systems for Maritime Autonomous Surface Ships: a state of the art survey. Ocean Eng. 2021;235:109380.
- 32. Liu L, Xu Y, Huang Z, Wang H, Wang A. Safe cooperative path following with relative-angle-based collision avoidance for multiple underactuated autonomous surface vehicles. Ocean Eng. 2022;258:111670.
- 33. Montewka J, Goerlandt F, Kujala P. On a systematic perspective on risk for formal safety assessment (FSA). Reliab Eng Syst Saf. 2014;127:77–85.