Figures
Abstract
An effective Multi-Agent Path Finding (MAPF) algorithm must efficiently plan paths for multiple agents while adhering to constraints, ensuring safe navigation from start to goal. However, due to partial observability, agents often struggle to determine optimal strategies. Thus, developing a robust information fusion method is crucial for addressing these challenges. Information fusion expands the observation range of each agent, thereby enhancing the overall performance of the MAPF system. This paper explores a fusion approach in both temporal and spatial dimensions based on Graph Attention Networks (GAT). Since MAPF is a long-horizon, continuous task, leveraging historical observation dependencies is key for predicting future actions. Initially, historical observations are fused by incorporating a Gated Recurrent Unit (GRU) with a Convolutional Neural Network (CNN), extracting local observations to form an encoder. Next, GAT is used to enable inter-agent communication, utilizing the stability of the scaled dot-product aggregation to merge agents’ information. Finally, the aggregated data is decoded into the agent’s final action strategy, effectively solving the partial observability problem. Experimental results show that the proposed method improves accuracy and time efficiency by 24.5%, 47%, and 37.5%, 73% over GNN and GAT, respectively, under varying map sizes and agent densities. Notably, the performance enhancement is more pronounced in larger maps, highlighting the algorithm’s scalability.
Citation: Zhang Q, Wang P, Ni C, Liu X (2025) Graph attention networks based multi-agent path finding via temporal-spatial information aggregation. PLoS One 20(6): e0318981. https://doi.org/10.1371/journal.pone.0318981
Editor: Qionghao Huang, Zhejiang Normal University, CHINA
Received: April 18, 2024; Accepted: January 26, 2025; Published: June 16, 2025
Copyright: © 2025 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
With the rapid advancement of artificial intelligence, multi-agent technology has seen significant progress and is now widely applied in areas such as warehouse logistics [1–3], inspection [4–6], security, and autonomous driving [7–10]. As computational capabilities improve and algorithms are optimized, multi-agent systems— which simulate the cooperation and competition among agents—have become vital for solving complex challenges. In such systems, Multi-Agent Path Finding (MAPF) plays a critical role in enabling autonomous navigation for mobile agents. The primary goal of MAPF is to generate efficient paths for multiple agents within given constraints, ensuring they can safely move from the starting point to the target while avoiding collisions with obstacles or other agents [11–13]. The effectiveness of MAPF directly influences the operational efficiency of multi-agent systems. Although path finding techniques have received much attention, there is still a partial observability problem in MAPF. Since each intelligence can only observe the information within its field of view, this may lead to the problem of “short-sightedness” of agents in complex scenes, which may lead to collisions. In order to solve this problem, we extend the observable range of each agent and realize the information sharing between agents, so as to improve the scientific decision-making and the reliability of path finding.
In centralized MAPF, a master control unit usually grasps all information environment and controlled agent information, uses planning algorithm to decompose tasks, and then distributes them to each controlled agent to organize them to complete tasks. Search-based algorithms [14,15], conflict-based search algorithms [16,17], cost-growth tree search algorithms [18] and statute-based algorithms [19] are the four categories into which representative centralized MAPF algorithms can be separated. The centralized MAPF algorithm is a more classical class of MAPF algorithms, which can achieve better results in terms of the speed and quality of the solution, but its flexibility and adaptability to the environment are poor.
Compared to centralized MAPF, distributed MAPF allows multiple agents to learn concurrently, leading to higher learning efficiency [20], as well as greater flexibility and adaptability to the environment. Currently, most distributed MAPF algorithms are based on reinforcement learning [21–24]. However, when applied to multi-agent path finding, reinforcement learning faces challenges such as complex state-action combinations, slow learning speeds, and sparse rewards. To address these issues, imitation learning [25–27] has been proposed as a solution, using expert algorithms and providing dense rewards to tackle multi-agent path finding problems. Additionally, as agent density increases, effective communication [28,29] mechanisms are needed for agents to share position and path information, enabling coordination in path finding and avoiding conflicts. Many Graph Neural Network (GNN)-based communication methods fail to account for the relative importance of features received from neighboring agents, which affects information fusion. As a result, agents may fail to assess the intentions of neighboring agents, leading to poor decision-making that impacts the Accuracy and Time of path finding.
In this paper, we study to propose a method based on temporal-spatial two-dimensional information fusion based on the idea of imitation learning on the basis of GAT. The method utilizes the attention of scaled dot-product to realize the communication of agent, and fuses the historical information and current observation information among multiple agents, focusing on solving the partial observability problem of agents, with a view to improving the Accuracy and Time of MAPF under different size maps and different agent densities. The main contributions of this paper are as follows:
- (1). Considering that MAPF is a continuous long-view task, the gated recurrent unit (GRU) is added to the feature extraction network (CNN), while the features of the agent’s historical observation information and the current observation information are extracted and fused to improve the observability of the agent in the time dimension;
- (2). GAT is used to establish message communication between multiple agents, and the scaled dot-product attention is used to assign importance weights to the features received from neighboring agents, which can improve the efficiency of information aggregation, and thus improves the observability of agents in the spatial dimension;
- (3). Experimental results demonstrate that our model can achieve better results in both Accuracy and Time of MAPF, especially on large-scale maps with high agent density.
2. Related work
2.1. Path finding based on historical information
In recent years, many researchers have used Recurrent Neural Networks (RNN) to encode historical trajectories, capturing key patterns in time series data [30,31], especially when historical information needs to be considered and future states need to be predicted. By modeling previous paths, motion history, and environmental changes, RNNs can help agents predict the optimal paths to avoid obstacles or adapt to different working scenarios. For example, Wang et al. [32]. used Long Short-Term Memory (LSTM) networks to manage local path trajectories, focusing on extracting historical trajectories to reduce path duration and length. El–Ela et al. [33] combined historical and real-time data with LSTM to predict the next optimal path segment using a neural network model. Compared to LSTM, the Gated Recurrent Unit (GRU) structure is simpler, faster in computation, requires fewer parameters, and is more suitable for resource-constrained or real-time applications. Feng et al. [34], Huang et al. [35] and Choi et al. [36] all applied GRU to the multi-agent path finding problem. To capture more comprehensive entity information from neighboring entities and relationships, Tiwari P et al. [37] proposed a GRU-based method that takes into account the memory of relationships in the path. Ling-Xiao et al. [38] proposed a GRU-GAT framework aimed at solving the issue of preserving neighbor information and historical trajectory information.
2.2. Communication-based path finding
In multi-agent path finding, communication allows for the sharing of position information, task status, and environmental data, which helps avoid path conflicts, optimize overall path finding, and enables real-time adjustments in dynamic environments [28,29]. Li et al. [39] used Convolutional Neural Networks (CNN) to extract sufficient features from the local observations of each agent and employed Graph Neural Networks (GNN) [40,41] to transmit these features between agents. They then trained the model using imitation learning based on expert algorithms. However, this approach does not consider the relative importance of the features received from neighboring agents before making decisions. Building on this, Li et al. proposed the Message-aware Graph Attention Network (MAGAT) [42] in 2021, which uses a dot-product attention method to determine the importance of features received from each neighboring agent. They further incorporated multi-head attention mechanisms and a bottleneck structure into the MAGAT model to test the communication performance of MAGAT-P and MAGAT-B models, but no significant improvements were observed. Additionally, Lin et al. [43] introduced the SACHA method, which utilizes communication between agents to facilitate information exchange, enabling path finding in crowded environments. Wang et al. [44] also proposed SCRIMP, which relies on an improved Transformer for local communication, assisting in generating independent and conflict-free paths between agents.
3. SDPGAT-G infrastructure
Our model proposed in this paper contains three main parts: extracting time dimension features using GRU-CNN, aggregating spatial dimension information using SDPGAT-G, and decoding action strategies using MLP. The model framework is shown in Fig 1.
3.1. Formulation of the MAPF problem
Modeling MAPF can transform MAPF into a sequential decision problem. Each agent needs to take an immediate action at time t with three constraints: (1) to reach the goal point from the start point quickly; (2) to ensure that the chosen path is optimal; and (3) to avoid collisions between the agents as much as possible [45].
Grids map: Firstly, a grid map M of the two-dimensional world is generated, as shown in Fig 2, with length and width W and H, respectively. Where, M contains a set of static obstacles S, . Let
be a set of N agents, each agent having an independent start point
and a goal point
, respectively.
Observation range: In the grid map M, agent i has a local field of view defined by radius
(the purple range in Fig 2), beyond which the agent observes nothing of size
. The agent is located at the center of the local field of view, it is not aware of the global position it is in, and the map perceived by agent i is defined as
. To simplify the learning task of MAPF, in a finite field of view, we separate the available information into different channels. Specifically, the feature tensor
observed by agent i at time t consists of a binary-valued matrix representing the position of the obstacle, the target point, the position of the agent (if it is within the
), and the positions of other observable agents, respectively, as shown in Fig 2.
Purple ranges indicate observation ranges, blue ranges and within indicate communication ranges, black squares indicate obstacles, colored squares indicate each agent, and red squares indicate target points.
When agent is close to the edge of the map M, obstacles are added to all positions outside the observation boundary . When the agent’s target point is not within range
, the target point information is mapped on its observation range boundary and the agent only perceives the direction of the target.
Communication: Each agent can only communicate with neighboring (within the communication range ) agents, each agent has a communication radius
, beyond the size of the range of
(the blue range shown in Fig 2), the communication between the agents cannot be carried out. Assuming that such communication is formalized in terms of a dynamic distance-based communication network, the graph
is defined as the communication network
of the agent at time t, where, v is the set of vertices, i.e., agents, in the graph,
is the set of edges consisting of vertices, and
is the function that assigns weights to the edges. Since the graph is distance-based,
and
can communicate with each other at time t when the coordinates of the agents
and
are
, where,
and
.
Conversely, then and
cannot communicate. By representing the information of vertices
and
as
, the information between the nodes of the whole graph
can be stored as a two-dimensional array
, represented by the adjacency matrix
, as expressed in Eq. (1).
The corresponding edge weight , i.e.,
denotes the communicability between agents and
denotes the importance of communication between agents.
3.2. Temporal-dimensional information extraction by GRU-CNN
MAPF is a continuous long-view task type, so it is an important aspect to consider the use of agent’s historical observation information to capture long time continuous features in the time dimension. CNN [46] is a commonly used model for extracting features, which is able to successfully capture the temporal dependency of the observed feature values through the use of filters and convolutions, and thus apply a closer connection between the observed features. Tighter connection between features is applied to MAPF. The main goal of CNN is to aggregate the information of all neighboring pixels to highlight the most important pixels. However, CNN only utilizes the feature information of the agent state at the current moment when performing feature extraction, and does not make use of whether and how much the information of the agent state at the previous moment has any influence on the agent state at the current moment, which may cause the agent to collide or generate suboptimal paths throughout the path finding process due to the incomplete information mastery and collision. Therefore, utilizing historical information [47–49] can help the agent better understand the dynamics of the environment and the actions of other agent by considering multiple time steps in order to optimize the overall path.
GRU [50] is a variant of recurrent neural network, similar to LSTM [51] (Long-Short Term Memory), which is proposed to solve the long-term memory problem. GRU network can effectively reduce the number of parameters of LSTM network units and shorten the training time of the model by optimizing the three gate functions of LSTM, setting forgetting gates and input gates into a single updating gate, and at the same time mixing the neuron states and hidden states. The number of parameters of LSTM network units and shorten the training time of the model. In this paper, GRU is utilized to capture long-term dependencies of information observed by the agent, and gates are effectively utilized to control the flow of information and “memorize” historical information for use in predicting future movement trends. In this paper, GRU is utilized to capture the long-term dependencies of information observed by the agent, and gates are effectively utilized to control the flow of information and to “memorize” historical information that can be used to predict future movement trends.
In the MAPF model, it is assumed that the feature size extracted by each agent at time t is F. The observations of each agent together form the observation matrix
of all agents, as shown in Eq. (2).
The feature captured by the CNN is defined as a linear combination of the observation matrix of the current agent at the moment t and the matrix
of the neighboring nodes in the graph
. The value of the feature F at the node
at the moment t, after the
operation, is shown in Eq. (3).
Where, denotes a linear combination of
and
, and
is a set of
matrices representing filter coefficients combining different observations, where, F and G denote the dimensions of the input and output layers of the graph convolution. In addition,
is computed by exchanging K communication exchanges with 1-hop neighbors. The CNN module is composed of a cascade of L layer graphical convolutions, each as shown in Eq. (4), followed by an activation function
.
Where, σ acts on each layer of the graph convolution, fusing the output features of the previous layer
as input to the current l layer and computing the current aggregation information
.
The input-output structure of GRU is similar to that of ordinary RNN [52], which consists of a feature input from the previous l layers at the current moment t and a hidden state information
from the previous moment existed in the memory, vector splicing, respectively, with its weight vectors to obtain the reset gating
and update gating
by performing the dot product operation to get the gating signals after. The reset data is
, and then the reset data
is spliced with
, and then the tanh activation function is used to obtain
, as shown in Eq. (5).
contains the current input data
, and the current hidden state is added purposefully, which is equivalent to memorizing the current moment. The updated
is the sum of the “forgotten” and “memorized” parts, as shown in Eq. (6).
Where, represents the selective forgetting of some unimportant information in the original hidden state, and
represents the selective “remembering” of some information in the current information node.
It is the memory role of the GRU that enables the agent to utilize its historical observation information and then combine it with the current observation information to coordinate the agent in a longer view of the whole situation. Therefore, in this paper, GRU and CNN are fused into a network structure GRU-CNN, which comprehensively utilizes historical information instead of one-sidedly utilizing only the current information, so that agent communicates with other agents through the communication network
at the moment t, thus ensuring that the agent can move to the destination in the shortest possible time while avoiding collisions with other agents. While avoiding collisions with other agents. Utilizing GRU-CNN network for information extraction, Eq. (3) should be changed to Eq. (7).
Where, is the history information matrix of the agent’s current moment t, which represents the sum of the agent’s memorized history information from moment
to
, and is calculated as shown in Eq. (8), Φ is the mapping relationship between the current information and the history information.
The new feature extracted by GRU-CNN is cascaded to the new output feature tensor
, and the captured information is used for the aggregation of GAT information, the structure of the feature extraction network is shown in Fig 3.
3.3. Spatial-dimensional information aggregation and update by SDPGAT-G
GRU-CNN performs feature extraction on the input tensor to obtain the feature tensor . Then in the spatial dimension, GAT is utilized to establish a communication mechanism among multiple agents and the feature tensor
is passed into the GAT, and then the observability of the agents is improved by aggregating the information of the neighboring agents. However, as the number of agents increases and the feature tensor data dimension becomes higher, GAT uses dot product attention for information aggregation, which may cause the distribution of attention weights to be biased, resulting in the GAT model may be more inclined to focus on some specific dimensions, which will reduce the efficiency of MAPF. Compared with general dot product attention, scaled dot-product attention scales the feature tensor during dot product computation, which can control the dot product value in a smaller range, thus ensuring that the attention weights can maintain a uniform distribution in different dimensions. Especially for large-scale datasets or high-dimensional data, scaled dot-product attention usually has higher computational efficiency and stability.
3.3.1. Information aggregation by SDPGAT-G.
Inspired by the message network MAGAT [42], this paper uses a scaled dot-product attention mechanism as the information aggregation model, such that the edge weights between nodes are determined by the relative importance of the node’s features, which allows the agent to aggregate the information features received from its neighbors with a selective focus. Formally, inspired by the literature [53], a SDPGAT-G model is defined, as shown in Fig 4.
The red circle is the current agent, the blue circle is the current agent’s neighbor within hop, and the green circle is the blue circle’s
hop neighbor (and also the red circle’s neighbor within
hops)
Assuming that the input feature tensor of the GNN is , there is a feature tensor set
. Where, N is the number of nodes and F is the number of features per node, the tensor
is a matrix of size
. In order to compute the attention mechanism to find the assigned weights and get the corresponding input-to-output transformation relations, it is necessary to consider a trainable matrix
containing the weights of all nodes, where, F and F′ are the number of input and output features of the matrix W, respectively.
The scaled dot-product model is chosen for each vertex to compute the attention score, and is the correlation score between each agent
and its neighbor agent
, so that each agent gets a correlation score of the same dimension. Let
be the function to calculate the importance between x and y, which represents the importance of the influence of the information of the neighbor agent
on the decision making of the agent
, and
is the dimension of the input feature tensor value
, then
is calculated as shown in Eq. (9). The reason for dividing by
is to limit the value of the attention score to an appropriate range, which facilitates model optimization, improves the stability of the network during training, and ensures that the weights are uniformly distributed in all dimensions.
After that, masked attention is used to allocate the attention only to the node set of agent
, which is the set of all neighboring nodes of the node,
. The attention score
is obtained by regularizing the neighboring nodes by Softmax operation as shown in Eq. (10).
For Eq. (10), the larger the order of magnitude of the attention score , the larger the input to the Softmax function, and when normalized, the result will be very close to 1. Softmax will assign almost all the weight to the vertex corresponding to the maximum value, resulting in a biased weight assignment. To solve this problem, we divide the similarity score by the square root of the dimension
of the feature tensor value
. In this way, the attention score
is obtained as an input with a relatively small gap before Softmax is normalized, ensuring the stability of the attention weight allocation, and more conducive to the next step of message to pass the update within the communication hop count K.
3.3.2. Message delivery and update by SDPGAT-G.
The message delivery update between multiple agents is synchronized with the process of agent aggregation of neighbor node information. In this paper, we use to denote the feature tensor of a vertex communicating at the
st hop, and
denotes the attributes of the edges of vertex
and vertex
. First, the information of the vertex set
adjacent to vertex
is aggregated to vertex
as shown in Eq. (11).
Where, ϕ is the differentiable function used to aggregate the neighbor information. The obtained neighbor information is nonlinearly combined with the attention score matrix
using the λ function as shown in Eq. (12).
Finally, the updated information of vertex can be obtained by putting the information of vertex
together with the neighbor information obtained by aggregation through a γ nonlinear transformation, as shown in Eq. (13).
The vertex (i.e., agent
) synchronously completes message delivery and aggregation with its K -hop neighbors when updating its own information, and the weight parameter of GAT enables
to aggregate more important information more efficiently.
3.4. Aggregated information decoding by MLP
In the model of MAPF, the path finding problem is abstracted into a sequential decision problem, where at time t, the current problem to be solved by each agent is to reach the destination. The goal of this work is to learn a mapping such that agent
, at time t, learns a mapping
and determines an appropriate action
based on the agent’s observed information
and the information of the communication network
. The MLP is analogous to this mapping
.
MLP [54] (Multilayer Perceptron), a multilayer, fully connected neural network, is widely used in a number of predictive classification problems. In this paper, by learning the features extracted from the CNN and fused by the GAT, the MLP employs a probability-distributed stochastic action strategy to decode the prediction of all possible actions to be taken for each agent i, and then synthesizes to give a current optimal action , which determines a collision-free path from the starting point to the goal point.
4. Experiments
Experiment target
The proposed algorithm in this paper is trained and tested using randomly generated map sets and compared with GNN [39], GAT [55], MAGAT [42], MAGAT-P4, MAGAT-P4-B, RL-RVO [30], ERL-MAPF [31] for evaluating the effectiveness of the algorithm under different map sizes and agent densities. GNN is a graph neural network model that enables communication between multiple agents, but it does not weigh the relative importance of features received from neighboring agents and is prone to collision when the number of agents increases; GAT is a graph attention network that performs a weighted summation of individual nodes, but the attention is not stable enough in this way; and MAGAT, also a graph attention neural network model, uses dot product summation to assign weights. Using dot product summation to assign weights, but this calculation leads to biased weights; MAGAT-P4 and MAGAT-P4-B are both variants based on MAGAT, but with better performance; RL-RVO is a path finding algorithm based on feature extraction and action computation by GRU in continuous space; ERL-MAPF is an evolutionary algorithm based on GRU in the MAPF method.
4.2. Experimental setup
The CPU used for our experiments is Inter(R) Core(TM) i7-10700, the GPU is NVIDIA Corporation Device 2482, and the operating system is ubuntu20.04. The experimental environment was implemented with the help of Tensorflow2.2.0. The specific experimental parameter settings are shown in Table 1.
4.3. Data preparation
We prepared the test map sets shown in Table 2 with randomly generated obstacles in the maps, each type of map set contains 20000 randomly generated maps. Then we train the model of this paper using Map 1 and test it on all map sets, and the resultant data is the average of multiple tests. Agent density is calculated as shown in Eq. (14).
Where is the width and height of the map and
is the number of agents.
4.4. Metrics
(1) Accuracy: Accuracy indicates the ratio of the number of tests in which the agent reaches the target point to the total number of tests, as shown in Eq. (15).
is the number of tests in which the agent successfully reaches the target point from the start point within a certain number of steps and time, and
is the total number of tests. “Success rate” can be used to measure the success or accuracy of a MAPF task.
(2) MakeSpan: Makespan is the ratio of the time required for all agents to move from the start point to the goal point, as shown in Eq. (16).
Where, is the actual time for all agents to move from the start point to the goal point and
is the expected time for all agents to move from the start point to the goal point.
(3) FlowTime: FlowTime is the ratio of the difference between the actual path length and the expected path length, as shown in Eq. (17).
Where, is the actual executed path length and
is the expected path length.
(4) FailedReachGoal: Corresponding to accuracy, FailedReachGoal is the ratio of the number of tests in which the agent did not successfully reach the target point to the total number of tests, as shown in Eq. (18).
Where, is the number of times the agent did not complete the test to reach the target point from the start point within a certain number of steps and time.
(5) Time: the time from the beginning to the completion of agent path finding, i.e., the time taken by all agents to reach the goal point from the start point, which is used to measure the efficiency of MAPF. “Time” is an important index to measure the time efficiency of MAPF task.
In the experimental process, this paper chooses the most representative Accuracy and Time as the evaluation indexes of the model, and analyzes and compares different MAPF models on different maps.
4.5. Experiments and analysis
4.5.1. Ablation study and analysis.
For the convenience of description, the models proposed in this paper are denoted as MAGAT-G, SDPGAT and SDPGAT-G. Among them, MAGAT-G uses the GRU-CNN network proposed in this paper for the extraction of features from agent-observed information on the basis of MAGAT; SDPGAT uses a different attention mechanism from that of MAGAT. SDPGAT uses the scaled dot-product attention proposed in this paper; SDPGAT-G uses the GRU-CNN network proposed in this paper to extract features from the information observed by the agent on the basis of SDPGAT.
The test is performed from Maps 1–5, and the results are shown in Table 3. From the Table 3, it can be found that the Accuracy with the addition of GRU will be higher than the baseline model, this is because GRU provides the agent with additional history information, which enables it to optimize the strategy under partial observability by learning the history information and the neighbor information and utilizing them when deciding the action, and improves the Accuracy of agent path finding. This effect is not found to be obvious in Map 1, instead, this advantage is more obvious in Maps 2 and 3, especially in Map 3, where the Accuracy of path finding is improved by 31.17% compared to the model MAGAT. It is not difficult to find that the effect of introducing GRU becomes more obvious as the map size increases. This is because when the map size is small and simple, history information does not have much effect on the long horizon decision. However, as the map size increases, the influence of historical information gradually increases, and the synthesis of historical information and current information can better guide the agent to make more reasonable actions, thus improving the Accuracy of agent path finding.
However, the Accuracy cannot fully measure the quality of the model, as it only focuses on whether the agent achieves the goal within the given time, ignoring the time cost of execution. Therefore, it is also necessary to use Time to measure the model’s pathfinding time. As shown in Table 3, the SDPGAT-G and SDPGAT models have shorter running times than the MAGAT-G model. This is because the scaled dot-product attention uses a scaling factor to adjust the scale of the dot-product attention scores. The purpose of this is to reduce the bias of the weight values when the dimensionality is large. Due to the more stable numerical properties of scaled dot-product attention, the calculation of the attention mechanism is more scientifically sound, thereby saving multi-agent pathfinding time and improving pathfinding efficiency. Especially on Map 2 and Map 3, the proposed model shows great potential, with pathfinding times reduced by 24.90% and 20.03%, respectively, compared to the MAGAT model. This proves that the proposed model is more time-efficient than other models.
4.5.2. Comparison to state-of-the-art.
Similarly, tests were conducted on Maps 1 - 5, with results shown in (Tables 4–8). The proposed SDPGAT-G model outperforms the MAGAT model in both path finding Accuracy and Time, and also surpasses other improved versions of MAGAT, such as MAGAT-P4 and MAGAT-B, as well as GRU-related RL-RVO and ERL-MAPF algorithms. When tested on Map 1, the Accuracy of our model is comparable to that of the seven models. However, as the map size and the number of obstacles increase, we observe that the Accuracy of all models declines to varying degrees, especially for the GNN and GAT models. This could be due to the fact that GNN and GAT typically encounter performance bottlenecks when handling large-scale, dense environments. As the number of obstacles and the map size increase, the graph density and computational complexity grow exponentially, which impacts the stability of the models. In contrast, the decline in the Accuracy of our model is much slower than that of the seven models. Particularly when expanding to Map 2 (as shown in Table 5), the Accuracy of our model improves to varying extents, while MakeSpan, FlowTime, and Time are all reduced. Moreover, this trend becomes even more pronounced on Map 3 (as shown in Table 6), where the Accuracy reaches 79%. MAGAT, as well as MAGAT-P4 and MAGAT-B, performs relatively steadily, but the Accuracy drops significantly on larger and more crowded maps. This might be because these models did not fully utilize historical information to optimize pathfinding decisions. RL-RVO and ERL-MAPF perform relatively well, but still do not have the advantage of our model.
4.5.3. Trends in different map sizes and densities.
As the map size increases, as shown in Fig 5, the performance of various algorithms on different metrics undergoes significant changes. For Accuracy, most algorithms maintain a high detection rate on small-sized maps (e.g., Map 1 and Map 2). However, on larger maps (e.g., Map 3 and Map 5), the Accuracy of GAT and GNN declines significantly, possibly due to their failure to fully leverage historical information in path finding, leading to more path conflicts or failures. In contrast, the MAGAT-G and SDPGAT series manage to maintain stable accuracy even on larger maps, demonstrating better scalability in path finding.
Regarding Time and MakeSpan, the impact of large maps on the algorithms is particularly notable. As the map size increases, all algorithms exhibit a certain degree of increase in Time, but the growth rates of SDPGAT-G and RL-RVO are relatively moderate. This suggests that their time complexity performs better when scaling to larger maps. Additionally, large-sized maps are often associated with higher FlowTime, and some algorithms, due to low path-finding efficiency, fail to effectively optimize their paths.
From the perspective of maps with varying densities, as shown in Fig 6, FailedReachGoal serves as a critical indicator. In low-density maps, all algorithms can complete their target tasks effectively with fewer conflicts. However, as the map density increases (e.g., higher obstacle ratios or more agents), the proportion of unfinished goals for GAT and GNN rises significantly. In contrast, superior algorithms, such as the SDPGAT series and MAGAT-G, can still maintain a low failure rate in high-density maps.
In terms of MakeSpan and Time, increasing density significantly elevates the complexity of path finding, causing most algorithms to experience substantial increases in runtime. This is particularly evident for GAT and GNN, which exhibit the largest growth in time overhead and path length. This reflects the computational bottleneck of some traditional methods in complex scenarios. Conversely, algorithms such as SDPGAT-G and ERL-MAPF demonstrate better capabilities in balancing path conflicts and finding efficiency. Even in high-density maps, they manage to maintain shorter total path lengths and lower time costs, making them better suited for multi-agent pathfinding tasks in complex scenarios.
4.6. Summary
In summary, the models SDPGAT as well as SDPGAT-G proposed in this paper are able to guarantee the Accuracy and Time of their pathfinding on maps of different sizes, and outperform the baseline model in most cases, which indicates that the SDPGAT family of models has a greater potential in terms of generalization ability and in dealing with congestion to avoid collisions. This proves that the model proposed in this paper is better able to take into account the comprehensiveness of wayfinding, thus ensuring that more agents can complete the wayfinding task in a shorter period of time.
5. Conclusion
In this paper, we fuse the observation information of agents in both temporal and spatial dimensions, and propose a scalable and migratory MAPF model, SDPGAT-G. The model utilizes GRU to record the historical information to construct local observation encoders, and uses the SDPGAT mechanism to communicate among agents to improve the MAPF performance. In most cases, all types of evaluation metrics outperform the baseline model. Although our model performs reasonably well in the current experiments, there are still many aspects that need to be further improved, e.g., latency and redundancy problems in the communication process may significantly affect the performance and reliability of the model. These issues are not only related to the real-time performance and accuracy of the model, but may also limit its applicability in dynamic and complex scenarios. In addition, the uncertainty of dynamic environments, diverse scenarios, and complex interactions among agents may pose challenges to the scalability of the model. Therefore, in the subsequent research, we will focus on solving these problems, especially on optimizing the communication redundancy problem, designing more efficient communication topology and data processing mechanisms to enhance the overall performance of the system, and verifying the scalability of the model in diverse scenarios.
Acknowledgments
I would like to express my heartfelt gratitude to everyone who has supported me in the completion of this research paper. Special thanks to my advisor for their guidance and expertise. I am also thankful to my colleagues for their valuable insights and encouragement. Additionally, I appreciate the support from my family and friends, whose unwavering belief in me kept me motivated. Thank you all for your contributions; I am truly grateful for your help and support.
References
- 1. Guo S, Hu H, Xue H. A two-echelon multi-trip capacitated vehicle routing problem with time windows for fresh E-commerce logistics under front warehouse mode. Systems. 2024;12(6):205.
- 2. Shen GC, Liu J, Ding YL, Zhang C, Duo JY. Continuous path planning for multi-robot in intelligent warehouse. Int J Simul Model. 2024;23(2):323–34.
- 3. Han W, Niu G, Zhou M, Zhang XJMS. Hierarchical bi-directional conceptual interaction for text-video retrieval. 2024;30(6):1–12.
- 4. Cid A, Vangasse A, Campos S, Delunardo M, Cruz Júnior G, Neto N, et al. Wireless communication-aware path planning and multiple robot navigation strategies for assisted inspections. J Intell Robot Syst. 2024;110(2):88.
- 5.
Song M, Yao F, Zhong G, Ji Z, Zhang X. Matching multi-scale feature sets in vision transformer for few-shot classification. IEEE Transactions on Circuits and Systems for Video Technology. 2024.
- 6. Kang H, Zhang X, Han W, Zhou M. Dark knowledge association guided hashing for unsupervised cross-modal retrieval. MJMS. 2024;30(6):1–13.
- 7. Reda M, Onsy A, Haikal AY, Ghanbari A. Path planning algorithms in the autonomous driving system: a comprehensive review. Robot Auton Syst. 2024;174:104630.
- 8. Li X, Li G, Zhang Z. Research on obstacle avoidance replanning and trajectory tracking control driverless ferry vehicles. Appl Sci. 2024;14(8):3216.
- 9. Wang K, Zhou M, Lin Q, Niu G, Zhang X. Geometry-guided point generation for 3D object detection. J Intell Syst Program Lang. 2024.
- 10. Ding W, Zhang T, Gao H, Yu Q, Wang J, Zhao Z. Multi-graph spatio-temporal convolution for traffic flow prediction focusing on edge derived imbalanced data from highway electronics. J Intell Trans Syst. 2024.
- 11.
Stern R, Sturtevant N, Felner A, Koenig S, Ma H, Walker T, et al., editors. Multi-agent pathfinding: Definitions, variants, and benchmarks. Proceedings of the International Symposium on Combinatorial Search. 2019.
- 12.
Kong X, Wang J, Hu Z, He Y, Zhao X, Shen G. Mobile trajectory anomaly detection: taxonomy, methodology, challenges, and directions. IEEE Internet of Things Journal. 2024.
- 13. Kong X, Chen Q, Hou M, Wang H, Xia F. Mobility trajectory generation: a survey. Art Intell Rev. 2023;56(Suppl 3):3057–98.
- 14. Zhang Z, Jiang J, Wu J, Zhu X. Efficient and optimal penetration path planning for stealth unmanned aerial vehicle using minimal radar cross-section tactics and modified a-star algorithm. ISA Trans. 2023;134:42–57. pmid:36058717
- 15. Liao T, Chen F, Wu Y, Zeng H, Ouyang S, Guan J. Research on path planning with the integration of adaptive a-star algorithm and improved dynamic window approach. Electronics. 2024;13(2):455.
- 16. Jin J, Zhang Y, Zhou Z, Jin M, Yang X, Hu F. Conflict-based search with D* lite algorithm for robot path planning in unknown dynamic environments. Comp Elect Eng. 2023;105:108473.
- 17. Stiller J, Feng S, Chowdhury A-A, Rivas-González I, Duchêne DA, Fang Q, et al. Complexity of avian evolution revealed by family-level genomes. Nature. 2024;629(8013):851–60. pmid:38560995
- 18.
Chen L, Kyng R, Liu YP, Meierhans S, Probst Gutenberg M, editors. Almost-Linear Time Algorithms for Incremental Graphs: Cycle Detection, SCCs, st Shortest Path, and Minimum-Cost Flow. Proceedings of the 56th Annual ACM Symposium on Theory of Computing; 2024.
- 19. Zhang F, Liu Y, Du L, Goerlandt F, Sui Z, Wen Y. A rule-based maritime traffic situation complex network approach for enhancing situation awareness of vessel traffic service operators. J Environ Manag. 2023;284:115203.
- 20. Panait L, Luke Sja. Cooperative multi-agent learning: The state of the art. J Machine Learn Res. 2005;11:387–434.
- 21. Xiao Z, Li P, Liu C, Gao H, Wang X. MACNS: A generic graph neural network integrated deep reinforcement learning based multi-agent collaborative navigation system for dynamic trajectory planning. Inform Fusion. 2024;105:102250.
- 22.
Jakkala K, Akella S, editors. Multi-robot informative path planning from regression with sparse gaussian processes. 2024 IEEE International Conference on Robotics and Automation (ICRA); 2024: IEEE.
- 23. Chen P, Pei J, Lu W, Li M. A deep reinforcement learning based method for real-time path planning and dynamic obstacle avoidance. Neurocomputing. 2022;497:64–75.
- 24. Semnani SH, Liu H, Everett M, de Ruiter A, How JP. Multi-agent motion planning for dense and dynamic environments via deep reinforcement learning. IEEE Robot Autom Lett. 2020;5(2):3221–6.
- 25. Zare M, Kebria P, Khosravi A, Nahavandi SJapa . A survey of imitation learning: Algorithms, recent developments, and challenges. arXiv. 2023.
- 26. Sartoretti G, Kerr J, Shi Y, Wagner G, Kumar TKS, Koenig S, et al. PRIMAL: Pathfinding via reinforcement and imitation multi-agent learning. IEEE Robot Autom Lett. 2019;4(3):2378–85.
- 27. Damani M, Luo Z, Wenzel E, Sartoretti G. PRIMAL$_2$: Pathfinding via reinforcement and imitation multi-agent learning - lifelong. IEEE Robot Autom Lett. 2021;6(2):2666–73.
- 28. An X, Wu C, Lin Y, Lin M, Yoshinaga T, Ji Y. Multi-robot systems and cooperative object transport: Communications, platforms, and challenges. J Int Organ Joint Operat. 2023;4:23–36.
- 29. Luo J, Wang Z, Xia M, Wu L, Tian Y, Chen Y. Path planning for UAV communication networks: Related technologies, solutions, and opportunities. JACS. 2023;55(9):1–37.
- 30. Han R, Chen S, Wang S, Zhang Z, Gao R, Hao Q. Reinforcement learned distributed multi-robot navigation with reciprocal velocity obstacle shaped rewards. J Robotics. 2022;7(3):5896–903.
- 31.
Shi Q, Liu M, Zhang S, Zheng R, Lan X, editors. Multi-agent path finding method based on evolutionary reinforcement learning. 2024 43rd Chinese Control Conference (CCC); 2024: IEEE.
- 32.
Wang T. CLE: an integrated framework of CNN, LSTM and enhanced A3C for addressing multi-agent pathfinding challenges in warehousing systems. 2024.
- 33.
El–Ela MHA, Hamdi A, editors. Deep Heuristic Learning for Real-Time Urban Pathfinding. 2024 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC); 2024: IEEE.
- 34. Feng B, Bi Y, Li M, Lin L. A decentralized multi-agent path planning approach based on imitation learning and selective communication. J Control Eng. 2024;24(8):084501.
- 35.
Huang L, Chai R, Xing Z, Chen K, Chai S, Xia Y, editors. Real-Time Trajectory Planning for Logistical Supply Transportation Using GRU Neural Networks. International Conference on Autonomous Unmanned Systems; 2023: Springer.
- 36.
Choi H-B, Kim J-B, Ji C-H, Ihsan U, Han Y-H, Oh S-W, et al., editors. Marl-based optimal route control in multi-agv warehouses. 2022 International Conference on Artificial Intelligence in Information and Communication (ICAIIC); 2022: IEEE.
- 37. Tiwari P, Zhu H, Pandey HM. DAPath: Distance-aware knowledge graph reasoning based on deep reinforcement learning. Neural Netw. 2021;135:1–12. pmid:33310193
- 38.
Ling-Xiao X, Lin F, Zi-Hao L, Ling Y, Qiu-Ping S, Jie-Wei L, editors. LengthPath: The Length Reward of Knowledge Graph Reasoning Based on Deep Reinforcement Learning. 2024 International Joint Conference on Neural Networks (IJCNN); 2024: IEEE.
- 39.
Li Q, Gama F, Ribeiro A, Prorok A, editors. Graph neural networks for decentralized multi-robot path planning. 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS); 2020: IEEE.
- 40. Gao H, Xiao J, Yin Y, Liu T, Shi J. A mutually supervised graph attention network for few-shot segmentation: the perspective of fully utilizing limited samples. IEEE Trans Neural Netw Learn Syst. 2024;35(4):4826–38. pmid:35286269
- 41. Zhang Z, Li Y, Dong H, Gao H, Jin Y, Wang W. Spectral-based directed graph network for malware detection. J Intelligent & Fuzzy Systems. 2020;8(2):957–70.
- 42. Li Q, Lin W, Liu Z, Prorok A. Message-aware graph attention networks for large-scale multi-robot path planning. AJIR Letters. 2021;6(3):5533–40.
- 43. Lin Q, Ma H. SACHA: Soft actor-critic with heuristic-based attention for partially observable multi-agent path finding. Letters A. n.d.;1(1):1–10.
- 44.
Wang Y, Xiang B, Huang S, Sartoretti G, editors. SCRIMP: Scalable communication for reinforcement-and imitation-learning-based multi-agent pathfinding. 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2023: IEEE.
- 45.
Ma J, Lian D, editors. Attention-cooperated reinforcement learning for multi-agent path planning. International Conference on Database Systems for Advanced Applications; 2022: Springer.
- 46. Chua LO. CNN: a vision of complexity. Int J Bifurcation Chaos. 1997;07(10):2219–425.
- 47. Chen Y, Song G, Ye Z, Jiang X. Scalable and transferable reinforcement learning for multi-agent mixed cooperative-competitive environments based on hierarchical graph attention. Entropy (Basel). 2022;24(4):563. pmid:35455226
- 48. Ma J, Lian D. Learning attention-based strategies to cooperate for multi-agent path finding. J Title. n.d.;53(4):0404–1–13.
- 49.
Everett M, Chen YF, How JP, editors. Motion planning among dynamic, decision-making agents with deep reinforcement learning. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2018:IEEE.
- 50.
Chung J, Gulcehre C, Cho K, Bengio YJapa. Empirical evaluation of gated recurrent neural networks on sequence modeling. 2014.
- 51.
Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-cJAinips. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. 2015;28.
- 52.
Wang C, Niepert M, editors. State-regularized recurrent neural networks. International Conference on Machine Learning; PMLR: 2019.
- 53. Isufi E, Gama F, Ribeiro A. EdgeNets: edge varying graph neural networks. IEEE Trans Pattern Anal Mach Intell. 2022;44(11):7457–73. pmid:34516371
- 54.
Tolstikhin IO, Houlsby N, Kolesnikov A, Beyer L, Zhai X, Unterthiner T, et al. Mlp-mixer: An all-mlp architecture for vision. 2021;34:24261–72.
- 55.
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio YJapa. Graph attention networks. 2017.