Graph attention networks based multi-agent path finding via temporal-spatial information aggregation

Qingling Zhang; Peng Wang; Cui Ni; Xianchang Liu

doi:10.1371/journal.pone.0318981

Abstract

An effective Multi-Agent Path Finding (MAPF) algorithm must efficiently plan paths for multiple agents while adhering to constraints, ensuring safe navigation from start to goal. However, due to partial observability, agents often struggle to determine optimal strategies. Thus, developing a robust information fusion method is crucial for addressing these challenges. Information fusion expands the observation range of each agent, thereby enhancing the overall performance of the MAPF system. This paper explores a fusion approach in both temporal and spatial dimensions based on Graph Attention Networks (GAT). Since MAPF is a long-horizon, continuous task, leveraging historical observation dependencies is key for predicting future actions. Initially, historical observations are fused by incorporating a Gated Recurrent Unit (GRU) with a Convolutional Neural Network (CNN), extracting local observations to form an encoder. Next, GAT is used to enable inter-agent communication, utilizing the stability of the scaled dot-product aggregation to merge agents’ information. Finally, the aggregated data is decoded into the agent’s final action strategy, effectively solving the partial observability problem. Experimental results show that the proposed method improves accuracy and time efficiency by 24.5%, 47%, and 37.5%, 73% over GNN and GAT, respectively, under varying map sizes and agent densities. Notably, the performance enhancement is more pronounced in larger maps, highlighting the algorithm’s scalability.

Citation: Zhang Q, Wang P, Ni C, Liu X (2025) Graph attention networks based multi-agent path finding via temporal-spatial information aggregation. PLoS One 20(6): e0318981. https://doi.org/10.1371/journal.pone.0318981

Editor: Qionghao Huang, Zhejiang Normal University, CHINA

Received: April 18, 2024; Accepted: January 26, 2025; Published: June 16, 2025

Copyright: © 2025 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

With the rapid advancement of artificial intelligence, multi-agent technology has seen significant progress and is now widely applied in areas such as warehouse logistics [1–3], inspection [4–6], security, and autonomous driving [7–10]. As computational capabilities improve and algorithms are optimized, multi-agent systems— which simulate the cooperation and competition among agents—have become vital for solving complex challenges. In such systems, Multi-Agent Path Finding (MAPF) plays a critical role in enabling autonomous navigation for mobile agents. The primary goal of MAPF is to generate efficient paths for multiple agents within given constraints, ensuring they can safely move from the starting point to the target while avoiding collisions with obstacles or other agents [11–13]. The effectiveness of MAPF directly influences the operational efficiency of multi-agent systems. Although path finding techniques have received much attention, there is still a partial observability problem in MAPF. Since each intelligence can only observe the information within its field of view, this may lead to the problem of “short-sightedness” of agents in complex scenes, which may lead to collisions. In order to solve this problem, we extend the observable range of each agent and realize the information sharing between agents, so as to improve the scientific decision-making and the reliability of path finding.

In centralized MAPF, a master control unit usually grasps all information environment and controlled agent information, uses planning algorithm to decompose tasks, and then distributes them to each controlled agent to organize them to complete tasks. Search-based algorithms [14,15], conflict-based search algorithms [16,17], cost-growth tree search algorithms [18] and statute-based algorithms [19] are the four categories into which representative centralized MAPF algorithms can be separated. The centralized MAPF algorithm is a more classical class of MAPF algorithms, which can achieve better results in terms of the speed and quality of the solution, but its flexibility and adaptability to the environment are poor.

Compared to centralized MAPF, distributed MAPF allows multiple agents to learn concurrently, leading to higher learning efficiency [20], as well as greater flexibility and adaptability to the environment. Currently, most distributed MAPF algorithms are based on reinforcement learning [21–24]. However, when applied to multi-agent path finding, reinforcement learning faces challenges such as complex state-action combinations, slow learning speeds, and sparse rewards. To address these issues, imitation learning [25–27] has been proposed as a solution, using expert algorithms and providing dense rewards to tackle multi-agent path finding problems. Additionally, as agent density increases, effective communication [28,29] mechanisms are needed for agents to share position and path information, enabling coordination in path finding and avoiding conflicts. Many Graph Neural Network (GNN)-based communication methods fail to account for the relative importance of features received from neighboring agents, which affects information fusion. As a result, agents may fail to assess the intentions of neighboring agents, leading to poor decision-making that impacts the Accuracy and Time of path finding.

In this paper, we study to propose a method based on temporal-spatial two-dimensional information fusion based on the idea of imitation learning on the basis of GAT. The method utilizes the attention of scaled dot-product to realize the communication of agent, and fuses the historical information and current observation information among multiple agents, focusing on solving the partial observability problem of agents, with a view to improving the Accuracy and Time of MAPF under different size maps and different agent densities. The main contributions of this paper are as follows:

(1). Considering that MAPF is a continuous long-view task, the gated recurrent unit (GRU) is added to the feature extraction network (CNN), while the features of the agent’s historical observation information and the current observation information are extracted and fused to improve the observability of the agent in the time dimension;
(2). GAT is used to establish message communication between multiple agents, and the scaled dot-product attention is used to assign importance weights to the features received from neighboring agents, which can improve the efficiency of information aggregation, and thus improves the observability of agents in the spatial dimension;
(3). Experimental results demonstrate that our model can achieve better results in both Accuracy and Time of MAPF, especially on large-scale maps with high agent density.

2. Related work

2.1. Path finding based on historical information

In recent years, many researchers have used Recurrent Neural Networks (RNN) to encode historical trajectories, capturing key patterns in time series data [30,31], especially when historical information needs to be considered and future states need to be predicted. By modeling previous paths, motion history, and environmental changes, RNNs can help agents predict the optimal paths to avoid obstacles or adapt to different working scenarios. For example, Wang et al. [32]. used Long Short-Term Memory (LSTM) networks to manage local path trajectories, focusing on extracting historical trajectories to reduce path duration and length. El–Ela et al. [33] combined historical and real-time data with LSTM to predict the next optimal path segment using a neural network model. Compared to LSTM, the Gated Recurrent Unit (GRU) structure is simpler, faster in computation, requires fewer parameters, and is more suitable for resource-constrained or real-time applications. Feng et al. [34], Huang et al. [35] and Choi et al. [36] all applied GRU to the multi-agent path finding problem. To capture more comprehensive entity information from neighboring entities and relationships, Tiwari P et al. [37] proposed a GRU-based method that takes into account the memory of relationships in the path. Ling-Xiao et al. [38] proposed a GRU-GAT framework aimed at solving the issue of preserving neighbor information and historical trajectory information.

2.2. Communication-based path finding

In multi-agent path finding, communication allows for the sharing of position information, task status, and environmental data, which helps avoid path conflicts, optimize overall path finding, and enables real-time adjustments in dynamic environments [28,29]. Li et al. [39] used Convolutional Neural Networks (CNN) to extract sufficient features from the local observations of each agent and employed Graph Neural Networks (GNN) [40,41] to transmit these features between agents. They then trained the model using imitation learning based on expert algorithms. However, this approach does not consider the relative importance of the features received from neighboring agents before making decisions. Building on this, Li et al. proposed the Message-aware Graph Attention Network (MAGAT) [42] in 2021, which uses a dot-product attention method to determine the importance of features received from each neighboring agent. They further incorporated multi-head attention mechanisms and a bottleneck structure into the MAGAT model to test the communication performance of MAGAT-P and MAGAT-B models, but no significant improvements were observed. Additionally, Lin et al. [43] introduced the SACHA method, which utilizes communication between agents to facilitate information exchange, enabling path finding in crowded environments. Wang et al. [44] also proposed SCRIMP, which relies on an improved Transformer for local communication, assisting in generating independent and conflict-free paths between agents.

3. SDPGAT-G infrastructure

Our model proposed in this paper contains three main parts: extracting time dimension features using GRU-CNN, aggregating spatial dimension information using SDPGAT-G, and decoding action strategies using MLP. The model framework is shown in Fig 1.

Download:

Fig 1. The framework of our model.

https://doi.org/10.1371/journal.pone.0318981.g001

3.1. Formulation of the MAPF problem

Modeling MAPF can transform MAPF into a sequential decision problem. Each agent needs to take an immediate action at time t with three constraints: (1) to reach the goal point from the start point quickly; (2) to ensure that the chosen path is optimal; and (3) to avoid collisions between the agents as much as possible [45].

Grids map: Firstly, a grid map M of the two-dimensional world is generated, as shown in Fig 2, with length and width W and H, respectively. Where, M contains a set of static obstacles S, . Let be a set of N agents, each agent having an independent start point and a goal point , respectively.

Download:

Fig 2. Agent observation range mapping.

https://doi.org/10.1371/journal.pone.0318981.g002

Observation range: In the grid map M, agent i has a local field of view defined by radius (the purple range in Fig 2), beyond which the agent observes nothing of size . The agent is located at the center of the local field of view, it is not aware of the global position it is in, and the map perceived by agent i is defined as . To simplify the learning task of MAPF, in a finite field of view, we separate the available information into different channels. Specifically, the feature tensor observed by agent i at time t consists of a binary-valued matrix representing the position of the obstacle, the target point, the position of the agent (if it is within the ), and the positions of other observable agents, respectively, as shown in Fig 2.

Purple ranges indicate observation ranges, blue ranges and within indicate communication ranges, black squares indicate obstacles, colored squares indicate each agent, and red squares indicate target points.

When agent is close to the edge of the map M, obstacles are added to all positions outside the observation boundary . When the agent’s target point is not within range , the target point information is mapped on its observation range boundary and the agent only perceives the direction of the target.

Communication: Each agent can only communicate with neighboring (within the communication range ) agents, each agent has a communication radius , beyond the size of the range of (the blue range shown in Fig 2), the communication between the agents cannot be carried out. Assuming that such communication is formalized in terms of a dynamic distance-based communication network, the graph is defined as the communication network of the agent at time t, where, v is the set of vertices, i.e., agents, in the graph, is the set of edges consisting of vertices, and is the function that assigns weights to the edges. Since the graph is distance-based, and can communicate with each other at time t when the coordinates of the agents and are , where, and .

Conversely, then and cannot communicate. By representing the information of vertices and as , the information between the nodes of the whole graph can be stored as a two-dimensional array , represented by the adjacency matrix , as expressed in Eq. (1).

(1)

The corresponding edge weight , i.e., denotes the communicability between agents and denotes the importance of communication between agents.

3.2. Temporal-dimensional information extraction by GRU-CNN

MAPF is a continuous long-view task type, so it is an important aspect to consider the use of agent’s historical observation information to capture long time continuous features in the time dimension. CNN [46] is a commonly used model for extracting features, which is able to successfully capture the temporal dependency of the observed feature values through the use of filters and convolutions, and thus apply a closer connection between the observed features. Tighter connection between features is applied to MAPF. The main goal of CNN is to aggregate the information of all neighboring pixels to highlight the most important pixels. However, CNN only utilizes the feature information of the agent state at the current moment when performing feature extraction, and does not make use of whether and how much the information of the agent state at the previous moment has any influence on the agent state at the current moment, which may cause the agent to collide or generate suboptimal paths throughout the path finding process due to the incomplete information mastery and collision. Therefore, utilizing historical information [47–49] can help the agent better understand the dynamics of the environment and the actions of other agent by considering multiple time steps in order to optimize the overall path.

GRU [50] is a variant of recurrent neural network, similar to LSTM [51] (Long-Short Term Memory), which is proposed to solve the long-term memory problem. GRU network can effectively reduce the number of parameters of LSTM network units and shorten the training time of the model by optimizing the three gate functions of LSTM, setting forgetting gates and input gates into a single updating gate, and at the same time mixing the neuron states and hidden states. The number of parameters of LSTM network units and shorten the training time of the model. In this paper, GRU is utilized to capture long-term dependencies of information observed by the agent, and gates are effectively utilized to control the flow of information and “memorize” historical information for use in predicting future movement trends. In this paper, GRU is utilized to capture the long-term dependencies of information observed by the agent, and gates are effectively utilized to control the flow of information and to “memorize” historical information that can be used to predict future movement trends.

In the MAPF model, it is assumed that the feature size extracted by each agent at time t is F. The observations of each agent together form the observation matrix of all agents, as shown in Eq. (2).

(2)

The feature captured by the CNN is defined as a linear combination of the observation matrix of the current agent at the moment t and the matrix of the neighboring nodes in the graph . The value of the feature F at the node at the moment t, after the operation, is shown in Eq. (3).

(3)

Where, denotes a linear combination of and , and is a set of matrices representing filter coefficients combining different observations, where, F and G denote the dimensions of the input and output layers of the graph convolution. In addition, is computed by exchanging K communication exchanges with 1-hop neighbors. The CNN module is composed of a cascade of L layer graphical convolutions, each as shown in Eq. (4), followed by an activation function .

(4)

Where, σ acts on each layer of the graph convolution, fusing the output features of the previous layer as input to the current l layer and computing the current aggregation information .

The input-output structure of GRU is similar to that of ordinary RNN [52], which consists of a feature input from the previous l layers at the current moment t and a hidden state information from the previous moment existed in the memory, vector splicing, respectively, with its weight vectors to obtain the reset gating and update gating by performing the dot product operation to get the gating signals after. The reset data is , and then the reset data is spliced with , and then the tanh activation function is used to obtain , as shown in Eq. (5).

(5)

contains the current input data , and the current hidden state is added purposefully, which is equivalent to memorizing the current moment. The updated is the sum of the “forgotten” and “memorized” parts, as shown in Eq. (6).

(6)

Where, represents the selective forgetting of some unimportant information in the original hidden state, and represents the selective “remembering” of some information in the current information node.

It is the memory role of the GRU that enables the agent to utilize its historical observation information and then combine it with the current observation information to coordinate the agent in a longer view of the whole situation. Therefore, in this paper, GRU and CNN are fused into a network structure GRU-CNN, which comprehensively utilizes historical information instead of one-sidedly utilizing only the current information, so that agent communicates with other agents through the communication network at the moment t, thus ensuring that the agent can move to the destination in the shortest possible time while avoiding collisions with other agents. While avoiding collisions with other agents. Utilizing GRU-CNN network for information extraction, Eq. (3) should be changed to Eq. (7).

(7)

Where, is the history information matrix of the agent’s current moment t, which represents the sum of the agent’s memorized history information from moment to , and is calculated as shown in Eq. (8), Φ is the mapping relationship between the current information and the history information.

(8)

The new feature extracted by GRU-CNN is cascaded to the new output feature tensor , and the captured information is used for the aggregation of GAT information, the structure of the feature extraction network is shown in Fig 3.

Download:

Fig 3. Structure of GRU-CNN feature extraction network.

https://doi.org/10.1371/journal.pone.0318981.g003

3.3. Spatial-dimensional information aggregation and update by SDPGAT-G

GRU-CNN performs feature extraction on the input tensor to obtain the feature tensor . Then in the spatial dimension, GAT is utilized to establish a communication mechanism among multiple agents and the feature tensor is passed into the GAT, and then the observability of the agents is improved by aggregating the information of the neighboring agents. However, as the number of agents increases and the feature tensor data dimension becomes higher, GAT uses dot product attention for information aggregation, which may cause the distribution of attention weights to be biased, resulting in the GAT model may be more inclined to focus on some specific dimensions, which will reduce the efficiency of MAPF. Compared with general dot product attention, scaled dot-product attention scales the feature tensor during dot product computation, which can control the dot product value in a smaller range, thus ensuring that the attention weights can maintain a uniform distribution in different dimensions. Especially for large-scale datasets or high-dimensional data, scaled dot-product attention usually has higher computational efficiency and stability.

3.3.1. Information aggregation by SDPGAT-G.

Inspired by the message network MAGAT [42], this paper uses a scaled dot-product attention mechanism as the information aggregation model, such that the edge weights between nodes are determined by the relative importance of the node’s features, which allows the agent to aggregate the information features received from its neighbors with a selective focus. Formally, inspired by the literature [53], a SDPGAT-G model is defined, as shown in Fig 4.

Download:

Fig 4. Diagram of SDPGAT-G communication model.

https://doi.org/10.1371/journal.pone.0318981.g004

The red circle is the current agent, the blue circle is the current agent’s neighbor within hop, and the green circle is the blue circle’s hop neighbor (and also the red circle’s neighbor within hops)

Assuming that the input feature tensor of the GNN is , there is a feature tensor set . Where, N is the number of nodes and F is the number of features per node, the tensor is a matrix of size . In order to compute the attention mechanism to find the assigned weights and get the corresponding input-to-output transformation relations, it is necessary to consider a trainable matrix containing the weights of all nodes, where, F and F^′ are the number of input and output features of the matrix W, respectively.

The scaled dot-product model is chosen for each vertex to compute the attention score, and is the correlation score between each agent and its neighbor agent , so that each agent gets a correlation score of the same dimension. Let be the function to calculate the importance between x and y, which represents the importance of the influence of the information of the neighbor agent on the decision making of the agent , and is the dimension of the input feature tensor value , then is calculated as shown in Eq. (9). The reason for dividing by is to limit the value of the attention score to an appropriate range, which facilitates model optimization, improves the stability of the network during training, and ensures that the weights are uniformly distributed in all dimensions.

(9)

After that, masked attention is used to allocate the attention only to the node set of agent , which is the set of all neighboring nodes of the node, . The attention score is obtained by regularizing the neighboring nodes by Softmax operation as shown in Eq. (10).

(10)

For Eq. (10), the larger the order of magnitude of the attention score , the larger the input to the Softmax function, and when normalized, the result will be very close to 1. Softmax will assign almost all the weight to the vertex corresponding to the maximum value, resulting in a biased weight assignment. To solve this problem, we divide the similarity score by the square root of the dimension of the feature tensor value . In this way, the attention score is obtained as an input with a relatively small gap before Softmax is normalized, ensuring the stability of the attention weight allocation, and more conducive to the next step of message to pass the update within the communication hop count K.

3.3.2. Message delivery and update by SDPGAT-G.

The message delivery update between multiple agents is synchronized with the process of agent aggregation of neighbor node information. In this paper, we use to denote the feature tensor of a vertex communicating at the ^st hop, and denotes the attributes of the edges of vertex and vertex . First, the information of the vertex set adjacent to vertex is aggregated to vertex as shown in Eq. (11).

(11)

Where, ϕ is the differentiable function used to aggregate the neighbor information. The obtained neighbor information is nonlinearly combined with the attention score matrix using the λ function as shown in Eq. (12).

(12)

Finally, the updated information of vertex can be obtained by putting the information of vertex together with the neighbor information obtained by aggregation through a γ nonlinear transformation, as shown in Eq. (13).

(13)

The vertex (i.e., agent ) synchronously completes message delivery and aggregation with its K -hop neighbors when updating its own information, and the weight parameter of GAT enables to aggregate more important information more efficiently.

3.4. Aggregated information decoding by MLP

In the model of MAPF, the path finding problem is abstracted into a sequential decision problem, where at time t, the current problem to be solved by each agent is to reach the destination. The goal of this work is to learn a mapping such that agent , at time t, learns a mapping and determines an appropriate action based on the agent’s observed information and the information of the communication network . The MLP is analogous to this mapping .

MLP [54] (Multilayer Perceptron), a multilayer, fully connected neural network, is widely used in a number of predictive classification problems. In this paper, by learning the features extracted from the CNN and fused by the GAT, the MLP employs a probability-distributed stochastic action strategy to decode the prediction of all possible actions to be taken for each agent i, and then synthesizes to give a current optimal action , which determines a collision-free path from the starting point to the goal point.

4. Experiments

4.1.

Experiment target

The proposed algorithm in this paper is trained and tested using randomly generated map sets and compared with GNN [39], GAT [55], MAGAT [42], MAGAT-P4, MAGAT-P4-B, RL-RVO [30], ERL-MAPF [31] for evaluating the effectiveness of the algorithm under different map sizes and agent densities. GNN is a graph neural network model that enables communication between multiple agents, but it does not weigh the relative importance of features received from neighboring agents and is prone to collision when the number of agents increases; GAT is a graph attention network that performs a weighted summation of individual nodes, but the attention is not stable enough in this way; and MAGAT, also a graph attention neural network model, uses dot product summation to assign weights. Using dot product summation to assign weights, but this calculation leads to biased weights; MAGAT-P4 and MAGAT-P4-B are both variants based on MAGAT, but with better performance; RL-RVO is a path finding algorithm based on feature extraction and action computation by GRU in continuous space; ERL-MAPF is an evolutionary algorithm based on GRU in the MAPF method.

4.2. Experimental setup

The CPU used for our experiments is Inter(R) Core(TM) i7-10700, the GPU is NVIDIA Corporation Device 2482, and the operating system is ubuntu20.04. The experimental environment was implemented with the help of Tensorflow2.2.0. The specific experimental parameter settings are shown in Table 1.

Download:

Table 1. Experimental parameter settings.

https://doi.org/10.1371/journal.pone.0318981.t001

4.3. Data preparation

We prepared the test map sets shown in Table 2 with randomly generated obstacles in the maps, each type of map set contains 20000 randomly generated maps. Then we train the model of this paper using Map 1 and test it on all map sets, and the resultant data is the average of multiple tests. Agent density is calculated as shown in Eq. (14).

Download:

Table 2. Information on each atlas.

https://doi.org/10.1371/journal.pone.0318981.t002

(14)

Where is the width and height of the map and is the number of agents.

4.4. Metrics

(1) Accuracy: Accuracy indicates the ratio of the number of tests in which the agent reaches the target point to the total number of tests, as shown in Eq. (15).

(15)

is the number of tests in which the agent successfully reaches the target point from the start point within a certain number of steps and time, and is the total number of tests. “Success rate” can be used to measure the success or accuracy of a MAPF task.

(2) MakeSpan: Makespan is the ratio of the time required for all agents to move from the start point to the goal point, as shown in Eq. (16).

(16)

Where, is the actual time for all agents to move from the start point to the goal point and is the expected time for all agents to move from the start point to the goal point.

(3) FlowTime: FlowTime is the ratio of the difference between the actual path length and the expected path length, as shown in Eq. (17).

(17)

Where, is the actual executed path length and is the expected path length.

(4) FailedReachGoal: Corresponding to accuracy, FailedReachGoal is the ratio of the number of tests in which the agent did not successfully reach the target point to the total number of tests, as shown in Eq. (18).

(18)

Where, is the number of times the agent did not complete the test to reach the target point from the start point within a certain number of steps and time.

(5) Time: the time from the beginning to the completion of agent path finding, i.e., the time taken by all agents to reach the goal point from the start point, which is used to measure the efficiency of MAPF. “Time” is an important index to measure the time efficiency of MAPF task.

In the experimental process, this paper chooses the most representative Accuracy and Time as the evaluation indexes of the model, and analyzes and compares different MAPF models on different maps.

4.5. Experiments and analysis

4.5.1. Ablation study and analysis.

For the convenience of description, the models proposed in this paper are denoted as MAGAT-G, SDPGAT and SDPGAT-G. Among them, MAGAT-G uses the GRU-CNN network proposed in this paper for the extraction of features from agent-observed information on the basis of MAGAT; SDPGAT uses a different attention mechanism from that of MAGAT. SDPGAT uses the scaled dot-product attention proposed in this paper; SDPGAT-G uses the GRU-CNN network proposed in this paper to extract features from the information observed by the agent on the basis of SDPGAT.

The test is performed from Maps 1–5, and the results are shown in Table 3. From the Table 3, it can be found that the Accuracy with the addition of GRU will be higher than the baseline model, this is because GRU provides the agent with additional history information, which enables it to optimize the strategy under partial observability by learning the history information and the neighbor information and utilizing them when deciding the action, and improves the Accuracy of agent path finding. This effect is not found to be obvious in Map 1, instead, this advantage is more obvious in Maps 2 and 3, especially in Map 3, where the Accuracy of path finding is improved by 31.17% compared to the model MAGAT. It is not difficult to find that the effect of introducing GRU becomes more obvious as the map size increases. This is because when the map size is small and simple, history information does not have much effect on the long horizon decision. However, as the map size increases, the influence of historical information gradually increases, and the synthesis of historical information and current information can better guide the agent to make more reasonable actions, thus improving the Accuracy of agent path finding.

Download:

Table 3. Results of ablation experiments.

https://doi.org/10.1371/journal.pone.0318981.t003

However, the Accuracy cannot fully measure the quality of the model, as it only focuses on whether the agent achieves the goal within the given time, ignoring the time cost of execution. Therefore, it is also necessary to use Time to measure the model’s pathfinding time. As shown in Table 3, the SDPGAT-G and SDPGAT models have shorter running times than the MAGAT-G model. This is because the scaled dot-product attention uses a scaling factor to adjust the scale of the dot-product attention scores. The purpose of this is to reduce the bias of the weight values when the dimensionality is large. Due to the more stable numerical properties of scaled dot-product attention, the calculation of the attention mechanism is more scientifically sound, thereby saving multi-agent pathfinding time and improving pathfinding efficiency. Especially on Map 2 and Map 3, the proposed model shows great potential, with pathfinding times reduced by 24.90% and 20.03%, respectively, compared to the MAGAT model. This proves that the proposed model is more time-efficient than other models.

4.5.2. Comparison to state-of-the-art.

Similarly, tests were conducted on Maps 1 - 5, with results shown in (Tables 4–8). The proposed SDPGAT-G model outperforms the MAGAT model in both path finding Accuracy and Time, and also surpasses other improved versions of MAGAT, such as MAGAT-P4 and MAGAT-B, as well as GRU-related RL-RVO and ERL-MAPF algorithms. When tested on Map 1, the Accuracy of our model is comparable to that of the seven models. However, as the map size and the number of obstacles increase, we observe that the Accuracy of all models declines to varying degrees, especially for the GNN and GAT models. This could be due to the fact that GNN and GAT typically encounter performance bottlenecks when handling large-scale, dense environments. As the number of obstacles and the map size increase, the graph density and computational complexity grow exponentially, which impacts the stability of the models. In contrast, the decline in the Accuracy of our model is much slower than that of the seven models. Particularly when expanding to Map 2 (as shown in Table 5), the Accuracy of our model improves to varying extents, while MakeSpan, FlowTime, and Time are all reduced. Moreover, this trend becomes even more pronounced on Map 3 (as shown in Table 6), where the Accuracy reaches 79%. MAGAT, as well as MAGAT-P4 and MAGAT-B, performs relatively steadily, but the Accuracy drops significantly on larger and more crowded maps. This might be because these models did not fully utilize historical information to optimize pathfinding decisions. RL-RVO and ERL-MAPF perform relatively well, but still do not have the advantage of our model.

Download:

Table 4. Test data for Map 1.

https://doi.org/10.1371/journal.pone.0318981.t004

Download:

Table 5. Test data for Map 2.

https://doi.org/10.1371/journal.pone.0318981.t005

Download:

Table 6. Test data for Map 3.

https://doi.org/10.1371/journal.pone.0318981.t006

Download:

Table 7. Test data for Map 4.

https://doi.org/10.1371/journal.pone.0318981.t007

Download:

Table 8. Test data for Map 5.

https://doi.org/10.1371/journal.pone.0318981.t008

4.5.3. Trends in different map sizes and densities.

As the map size increases, as shown in Fig 5, the performance of various algorithms on different metrics undergoes significant changes. For Accuracy, most algorithms maintain a high detection rate on small-sized maps (e.g., Map 1 and Map 2). However, on larger maps (e.g., Map 3 and Map 5), the Accuracy of GAT and GNN declines significantly, possibly due to their failure to fully leverage historical information in path finding, leading to more path conflicts or failures. In contrast, the MAGAT-G and SDPGAT series manage to maintain stable accuracy even on larger maps, demonstrating better scalability in path finding.

Download:

Fig 5. Performance of agents on maps of different sizes.

https://doi.org/10.1371/journal.pone.0318981.g005

Regarding Time and MakeSpan, the impact of large maps on the algorithms is particularly notable. As the map size increases, all algorithms exhibit a certain degree of increase in Time, but the growth rates of SDPGAT-G and RL-RVO are relatively moderate. This suggests that their time complexity performs better when scaling to larger maps. Additionally, large-sized maps are often associated with higher FlowTime, and some algorithms, due to low path-finding efficiency, fail to effectively optimize their paths.

From the perspective of maps with varying densities, as shown in Fig 6, FailedReachGoal serves as a critical indicator. In low-density maps, all algorithms can complete their target tasks effectively with fewer conflicts. However, as the map density increases (e.g., higher obstacle ratios or more agents), the proportion of unfinished goals for GAT and GNN rises significantly. In contrast, superior algorithms, such as the SDPGAT series and MAGAT-G, can still maintain a low failure rate in high-density maps.

Download:

Fig 6. Performance of agents on maps with different densities.

https://doi.org/10.1371/journal.pone.0318981.g006

In terms of MakeSpan and Time, increasing density significantly elevates the complexity of path finding, causing most algorithms to experience substantial increases in runtime. This is particularly evident for GAT and GNN, which exhibit the largest growth in time overhead and path length. This reflects the computational bottleneck of some traditional methods in complex scenarios. Conversely, algorithms such as SDPGAT-G and ERL-MAPF demonstrate better capabilities in balancing path conflicts and finding efficiency. Even in high-density maps, they manage to maintain shorter total path lengths and lower time costs, making them better suited for multi-agent pathfinding tasks in complex scenarios.

4.6. Summary

In summary, the models SDPGAT as well as SDPGAT-G proposed in this paper are able to guarantee the Accuracy and Time of their pathfinding on maps of different sizes, and outperform the baseline model in most cases, which indicates that the SDPGAT family of models has a greater potential in terms of generalization ability and in dealing with congestion to avoid collisions. This proves that the model proposed in this paper is better able to take into account the comprehensiveness of wayfinding, thus ensuring that more agents can complete the wayfinding task in a shorter period of time.

5. Conclusion

In this paper, we fuse the observation information of agents in both temporal and spatial dimensions, and propose a scalable and migratory MAPF model, SDPGAT-G. The model utilizes GRU to record the historical information to construct local observation encoders, and uses the SDPGAT mechanism to communicate among agents to improve the MAPF performance. In most cases, all types of evaluation metrics outperform the baseline model. Although our model performs reasonably well in the current experiments, there are still many aspects that need to be further improved, e.g., latency and redundancy problems in the communication process may significantly affect the performance and reliability of the model. These issues are not only related to the real-time performance and accuracy of the model, but may also limit its applicability in dynamic and complex scenarios. In addition, the uncertainty of dynamic environments, diverse scenarios, and complex interactions among agents may pose challenges to the scalability of the model. Therefore, in the subsequent research, we will focus on solving these problems, especially on optimizing the communication redundancy problem, designing more efficient communication topology and data processing mechanisms to enhance the overall performance of the system, and verifying the scalability of the model in diverse scenarios.

Acknowledgments

I would like to express my heartfelt gratitude to everyone who has supported me in the completion of this research paper. Special thanks to my advisor for their guidance and expertise. I am also thankful to my colleagues for their valuable insights and encouragement. Additionally, I appreciate the support from my family and friends, whose unwavering belief in me kept me motivated. Thank you all for your contributions; I am truly grateful for your help and support.

References

1. Guo S, Hu H, Xue H. A two-echelon multi-trip capacitated vehicle routing problem with time windows for fresh E-commerce logistics under front warehouse mode. Systems. 2024;12(6):205.
- View Article
- Google Scholar
2. Shen GC, Liu J, Ding YL, Zhang C, Duo JY. Continuous path planning for multi-robot in intelligent warehouse. Int J Simul Model. 2024;23(2):323–34.
- View Article
- Google Scholar
3. Han W, Niu G, Zhou M, Zhang XJMS. Hierarchical bi-directional conceptual interaction for text-video retrieval. 2024;30(6):1–12.
- View Article
- Google Scholar
4. Cid A, Vangasse A, Campos S, Delunardo M, Cruz Júnior G, Neto N, et al. Wireless communication-aware path planning and multiple robot navigation strategies for assisted inspections. J Intell Robot Syst. 2024;110(2):88.
- View Article
- Google Scholar
5. Song M, Yao F, Zhong G, Ji Z, Zhang X. Matching multi-scale feature sets in vision transformer for few-shot classification. IEEE Transactions on Circuits and Systems for Video Technology. 2024.
6. Kang H, Zhang X, Han W, Zhou M. Dark knowledge association guided hashing for unsupervised cross-modal retrieval. MJMS. 2024;30(6):1–13.
- View Article
- Google Scholar
7. Reda M, Onsy A, Haikal AY, Ghanbari A. Path planning algorithms in the autonomous driving system: a comprehensive review. Robot Auton Syst. 2024;174:104630.
- View Article
- Google Scholar
8. Li X, Li G, Zhang Z. Research on obstacle avoidance replanning and trajectory tracking control driverless ferry vehicles. Appl Sci. 2024;14(8):3216.
- View Article
- Google Scholar
9. Wang K, Zhou M, Lin Q, Niu G, Zhang X. Geometry-guided point generation for 3D object detection. J Intell Syst Program Lang. 2024.
- View Article
- Google Scholar
10. Ding W, Zhang T, Gao H, Yu Q, Wang J, Zhao Z. Multi-graph spatio-temporal convolution for traffic flow prediction focusing on edge derived imbalanced data from highway electronics. J Intell Trans Syst. 2024.
- View Article
- Google Scholar
11. Stern R, Sturtevant N, Felner A, Koenig S, Ma H, Walker T, et al., editors. Multi-agent pathfinding: Definitions, variants, and benchmarks. Proceedings of the International Symposium on Combinatorial Search. 2019.
12. Kong X, Wang J, Hu Z, He Y, Zhao X, Shen G. Mobile trajectory anomaly detection: taxonomy, methodology, challenges, and directions. IEEE Internet of Things Journal. 2024.
13. Kong X, Chen Q, Hou M, Wang H, Xia F. Mobility trajectory generation: a survey. Art Intell Rev. 2023;56(Suppl 3):3057–98.
- View Article
- Google Scholar
14. Zhang Z, Jiang J, Wu J, Zhu X. Efficient and optimal penetration path planning for stealth unmanned aerial vehicle using minimal radar cross-section tactics and modified a-star algorithm. ISA Trans. 2023;134:42–57. pmid:36058717
- View Article
- PubMed/NCBI
- Google Scholar
15. Liao T, Chen F, Wu Y, Zeng H, Ouyang S, Guan J. Research on path planning with the integration of adaptive a-star algorithm and improved dynamic window approach. Electronics. 2024;13(2):455.
- View Article
- Google Scholar
16. Jin J, Zhang Y, Zhou Z, Jin M, Yang X, Hu F. Conflict-based search with D* lite algorithm for robot path planning in unknown dynamic environments. Comp Elect Eng. 2023;105:108473.
- View Article
- Google Scholar
17. Stiller J, Feng S, Chowdhury A-A, Rivas-González I, Duchêne DA, Fang Q, et al. Complexity of avian evolution revealed by family-level genomes. Nature. 2024;629(8013):851–60. pmid:38560995
- View Article
- PubMed/NCBI
- Google Scholar
18. Chen L, Kyng R, Liu YP, Meierhans S, Probst Gutenberg M, editors. Almost-Linear Time Algorithms for Incremental Graphs: Cycle Detection, SCCs, st Shortest Path, and Minimum-Cost Flow. Proceedings of the 56th Annual ACM Symposium on Theory of Computing; 2024.
19. Zhang F, Liu Y, Du L, Goerlandt F, Sui Z, Wen Y. A rule-based maritime traffic situation complex network approach for enhancing situation awareness of vessel traffic service operators. J Environ Manag. 2023;284:115203.
- View Article
- Google Scholar
20. Panait L, Luke Sja. Cooperative multi-agent learning: The state of the art. J Machine Learn Res. 2005;11:387–434.
- View Article
- Google Scholar
21. Xiao Z, Li P, Liu C, Gao H, Wang X. MACNS: A generic graph neural network integrated deep reinforcement learning based multi-agent collaborative navigation system for dynamic trajectory planning. Inform Fusion. 2024;105:102250.
- View Article
- Google Scholar
22. Jakkala K, Akella S, editors. Multi-robot informative path planning from regression with sparse gaussian processes. 2024 IEEE International Conference on Robotics and Automation (ICRA); 2024: IEEE.
23. Chen P, Pei J, Lu W, Li M. A deep reinforcement learning based method for real-time path planning and dynamic obstacle avoidance. Neurocomputing. 2022;497:64–75.
- View Article
- Google Scholar
24. Semnani SH, Liu H, Everett M, de Ruiter A, How JP. Multi-agent motion planning for dense and dynamic environments via deep reinforcement learning. IEEE Robot Autom Lett. 2020;5(2):3221–6.
- View Article
- Google Scholar
25. Zare M, Kebria P, Khosravi A, Nahavandi SJapa . A survey of imitation learning: Algorithms, recent developments, and challenges. arXiv. 2023.
- View Article
- Google Scholar
26. Sartoretti G, Kerr J, Shi Y, Wagner G, Kumar TKS, Koenig S, et al. PRIMAL: Pathfinding via reinforcement and imitation multi-agent learning. IEEE Robot Autom Lett. 2019;4(3):2378–85.
- View Article
- Google Scholar
27. Damani M, Luo Z, Wenzel E, Sartoretti G. PRIMAL$_2$: Pathfinding via reinforcement and imitation multi-agent learning - lifelong. IEEE Robot Autom Lett. 2021;6(2):2666–73.
- View Article
- Google Scholar
28. An X, Wu C, Lin Y, Lin M, Yoshinaga T, Ji Y. Multi-robot systems and cooperative object transport: Communications, platforms, and challenges. J Int Organ Joint Operat. 2023;4:23–36.
- View Article
- Google Scholar
29. Luo J, Wang Z, Xia M, Wu L, Tian Y, Chen Y. Path planning for UAV communication networks: Related technologies, solutions, and opportunities. JACS. 2023;55(9):1–37.
- View Article
- Google Scholar
30. Han R, Chen S, Wang S, Zhang Z, Gao R, Hao Q. Reinforcement learned distributed multi-robot navigation with reciprocal velocity obstacle shaped rewards. J Robotics. 2022;7(3):5896–903.
- View Article
- Google Scholar
31. Shi Q, Liu M, Zhang S, Zheng R, Lan X, editors. Multi-agent path finding method based on evolutionary reinforcement learning. 2024 43rd Chinese Control Conference (CCC); 2024: IEEE.
32. Wang T. CLE: an integrated framework of CNN, LSTM and enhanced A3C for addressing multi-agent pathfinding challenges in warehousing systems. 2024.
33. El–Ela MHA, Hamdi A, editors. Deep Heuristic Learning for Real-Time Urban Pathfinding. 2024 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC); 2024: IEEE.
34. Feng B, Bi Y, Li M, Lin L. A decentralized multi-agent path planning approach based on imitation learning and selective communication. J Control Eng. 2024;24(8):084501.
- View Article
- Google Scholar
35. Huang L, Chai R, Xing Z, Chen K, Chai S, Xia Y, editors. Real-Time Trajectory Planning for Logistical Supply Transportation Using GRU Neural Networks. International Conference on Autonomous Unmanned Systems; 2023: Springer.
36. Choi H-B, Kim J-B, Ji C-H, Ihsan U, Han Y-H, Oh S-W, et al., editors. Marl-based optimal route control in multi-agv warehouses. 2022 International Conference on Artificial Intelligence in Information and Communication (ICAIIC); 2022: IEEE.
37. Tiwari P, Zhu H, Pandey HM. DAPath: Distance-aware knowledge graph reasoning based on deep reinforcement learning. Neural Netw. 2021;135:1–12. pmid:33310193
- View Article
- PubMed/NCBI
- Google Scholar
38. Ling-Xiao X, Lin F, Zi-Hao L, Ling Y, Qiu-Ping S, Jie-Wei L, editors. LengthPath: The Length Reward of Knowledge Graph Reasoning Based on Deep Reinforcement Learning. 2024 International Joint Conference on Neural Networks (IJCNN); 2024: IEEE.
39. Li Q, Gama F, Ribeiro A, Prorok A, editors. Graph neural networks for decentralized multi-robot path planning. 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS); 2020: IEEE.
40. Gao H, Xiao J, Yin Y, Liu T, Shi J. A mutually supervised graph attention network for few-shot segmentation: the perspective of fully utilizing limited samples. IEEE Trans Neural Netw Learn Syst. 2024;35(4):4826–38. pmid:35286269
- View Article
- PubMed/NCBI
- Google Scholar
41. Zhang Z, Li Y, Dong H, Gao H, Jin Y, Wang W. Spectral-based directed graph network for malware detection. J Intelligent & Fuzzy Systems. 2020;8(2):957–70.
- View Article
- Google Scholar
42. Li Q, Lin W, Liu Z, Prorok A. Message-aware graph attention networks for large-scale multi-robot path planning. AJIR Letters. 2021;6(3):5533–40.
- View Article
- Google Scholar
43. Lin Q, Ma H. SACHA: Soft actor-critic with heuristic-based attention for partially observable multi-agent path finding. Letters A. n.d.;1(1):1–10.
- View Article
- Google Scholar
44. Wang Y, Xiang B, Huang S, Sartoretti G, editors. SCRIMP: Scalable communication for reinforcement-and imitation-learning-based multi-agent pathfinding. 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2023: IEEE.
45. Ma J, Lian D, editors. Attention-cooperated reinforcement learning for multi-agent path planning. International Conference on Database Systems for Advanced Applications; 2022: Springer.
46. Chua LO. CNN: a vision of complexity. Int J Bifurcation Chaos. 1997;07(10):2219–425.
- View Article
- Google Scholar
47. Chen Y, Song G, Ye Z, Jiang X. Scalable and transferable reinforcement learning for multi-agent mixed cooperative-competitive environments based on hierarchical graph attention. Entropy (Basel). 2022;24(4):563. pmid:35455226
- View Article
- PubMed/NCBI
- Google Scholar
48. Ma J, Lian D. Learning attention-based strategies to cooperate for multi-agent path finding. J Title. n.d.;53(4):0404–1–13.
- View Article
- Google Scholar
49. Everett M, Chen YF, How JP, editors. Motion planning among dynamic, decision-making agents with deep reinforcement learning. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2018:IEEE.
50. Chung J, Gulcehre C, Cho K, Bengio YJapa. Empirical evaluation of gated recurrent neural networks on sequence modeling. 2014.
51. Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-cJAinips. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. 2015;28.
52. Wang C, Niepert M, editors. State-regularized recurrent neural networks. International Conference on Machine Learning; PMLR: 2019.
53. Isufi E, Gama F, Ribeiro A. EdgeNets: edge varying graph neural networks. IEEE Trans Pattern Anal Mach Intell. 2022;44(11):7457–73. pmid:34516371
- View Article
- PubMed/NCBI
- Google Scholar
54. Tolstikhin IO, Houlsby N, Kolesnikov A, Beyer L, Zhai X, Unterthiner T, et al. Mlp-mixer: An all-mlp architecture for vision. 2021;34:24261–72.
55. Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio YJapa. Graph attention networks. 2017.

[ref1] 1. Guo S, Hu H, Xue H. A two-echelon multi-trip capacitated vehicle routing problem with time windows for fresh E-commerce logistics under front warehouse mode. Systems. 2024;12(6):205.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Shen GC, Liu J, Ding YL, Zhang C, Duo JY. Continuous path planning for multi-robot in intelligent warehouse. Int J Simul Model. 2024;23(2):323–34.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Han W, Niu G, Zhou M, Zhang XJMS. Hierarchical bi-directional conceptual interaction for text-video retrieval. 2024;30(6):1–12.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Cid A, Vangasse A, Campos S, Delunardo M, Cruz Júnior G, Neto N, et al. Wireless communication-aware path planning and multiple robot navigation strategies for assisted inspections. J Intell Robot Syst. 2024;110(2):88.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Song M, Yao F, Zhong G, Ji Z, Zhang X. Matching multi-scale feature sets in vision transformer for few-shot classification. IEEE Transactions on Circuits and Systems for Video Technology. 2024.

[ref6] 6. Kang H, Zhang X, Han W, Zhou M. Dark knowledge association guided hashing for unsupervised cross-modal retrieval. MJMS. 2024;30(6):1–13.
View Article
Google Scholar

[15] View Article

[16] Google Scholar

[ref7] 7. Reda M, Onsy A, Haikal AY, Ghanbari A. Path planning algorithms in the autonomous driving system: a comprehensive review. Robot Auton Syst. 2024;174:104630.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref8] 8. Li X, Li G, Zhang Z. Research on obstacle avoidance replanning and trajectory tracking control driverless ferry vehicles. Appl Sci. 2024;14(8):3216.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref9] 9. Wang K, Zhou M, Lin Q, Niu G, Zhang X. Geometry-guided point generation for 3D object detection. J Intell Syst Program Lang. 2024.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref10] 10. Ding W, Zhang T, Gao H, Yu Q, Wang J, Zhao Z. Multi-graph spatio-temporal convolution for traffic flow prediction focusing on edge derived imbalanced data from highway electronics. J Intell Trans Syst. 2024.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref11] 11. Stern R, Sturtevant N, Felner A, Koenig S, Ma H, Walker T, et al., editors. Multi-agent pathfinding: Definitions, variants, and benchmarks. Proceedings of the International Symposium on Combinatorial Search. 2019.

[ref12] 12. Kong X, Wang J, Hu Z, He Y, Zhao X, Shen G. Mobile trajectory anomaly detection: taxonomy, methodology, challenges, and directions. IEEE Internet of Things Journal. 2024.

[ref13] 13. Kong X, Chen Q, Hou M, Wang H, Xia F. Mobility trajectory generation: a survey. Art Intell Rev. 2023;56(Suppl 3):3057–98.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref14] 14. Zhang Z, Jiang J, Wu J, Zhu X. Efficient and optimal penetration path planning for stealth unmanned aerial vehicle using minimal radar cross-section tactics and modified a-star algorithm. ISA Trans. 2023;134:42–57. pmid:36058717
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref15] 15. Liao T, Chen F, Wu Y, Zeng H, Ouyang S, Guan J. Research on path planning with the integration of adaptive a-star algorithm and improved dynamic window approach. Electronics. 2024;13(2):455.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref16] 16. Jin J, Zhang Y, Zhou Z, Jin M, Yang X, Hu F. Conflict-based search with D* lite algorithm for robot path planning in unknown dynamic environments. Comp Elect Eng. 2023;105:108473.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref17] 17. Stiller J, Feng S, Chowdhury A-A, Rivas-González I, Duchêne DA, Fang Q, et al. Complexity of avian evolution revealed by family-level genomes. Nature. 2024;629(8013):851–60. pmid:38560995
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref18] 18. Chen L, Kyng R, Liu YP, Meierhans S, Probst Gutenberg M, editors. Almost-Linear Time Algorithms for Incremental Graphs: Cycle Detection, SCCs, st Shortest Path, and Minimum-Cost Flow. Proceedings of the 56th Annual ACM Symposium on Theory of Computing; 2024.

[ref19] 19. Zhang F, Liu Y, Du L, Goerlandt F, Sui Z, Wen Y. A rule-based maritime traffic situation complex network approach for enhancing situation awareness of vessel traffic service operators. J Environ Manag. 2023;284:115203.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref20] 20. Panait L, Luke Sja. Cooperative multi-agent learning: The state of the art. J Machine Learn Res. 2005;11:387–434.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref21] 21. Xiao Z, Li P, Liu C, Gao H, Wang X. MACNS: A generic graph neural network integrated deep reinforcement learning based multi-agent collaborative navigation system for dynamic trajectory planning. Inform Fusion. 2024;105:102250.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref22] 22. Jakkala K, Akella S, editors. Multi-robot informative path planning from regression with sparse gaussian processes. 2024 IEEE International Conference on Robotics and Automation (ICRA); 2024: IEEE.

[ref23] 23. Chen P, Pei J, Lu W, Li M. A deep reinforcement learning based method for real-time path planning and dynamic obstacle avoidance. Neurocomputing. 2022;497:64–75.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref24] 24. Semnani SH, Liu H, Everett M, de Ruiter A, How JP. Multi-agent motion planning for dense and dynamic environments via deep reinforcement learning. IEEE Robot Autom Lett. 2020;5(2):3221–6.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref25] 25. Zare M, Kebria P, Khosravi A, Nahavandi SJapa . A survey of imitation learning: Algorithms, recent developments, and challenges. arXiv. 2023.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref26] 26. Sartoretti G, Kerr J, Shi Y, Wagner G, Kumar TKS, Koenig S, et al. PRIMAL: Pathfinding via reinforcement and imitation multi-agent learning. IEEE Robot Autom Lett. 2019;4(3):2378–85.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref27] 27. Damani M, Luo Z, Wenzel E, Sartoretti G. PRIMAL$_2$: Pathfinding via reinforcement and imitation multi-agent learning - lifelong. IEEE Robot Autom Lett. 2021;6(2):2666–73.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref28] 28. An X, Wu C, Lin Y, Lin M, Yoshinaga T, Ji Y. Multi-robot systems and cooperative object transport: Communications, platforms, and challenges. J Int Organ Joint Operat. 2023;4:23–36.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref29] 29. Luo J, Wang Z, Xia M, Wu L, Tian Y, Chen Y. Path planning for UAV communication networks: Related technologies, solutions, and opportunities. JACS. 2023;55(9):1–37.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref30] 30. Han R, Chen S, Wang S, Zhang Z, Gao R, Hao Q. Reinforcement learned distributed multi-robot navigation with reciprocal velocity obstacle shaped rewards. J Robotics. 2022;7(3):5896–903.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref31] 31. Shi Q, Liu M, Zhang S, Zheng R, Lan X, editors. Multi-agent path finding method based on evolutionary reinforcement learning. 2024 43rd Chinese Control Conference (CCC); 2024: IEEE.

[ref32] 32. Wang T. CLE: an integrated framework of CNN, LSTM and enhanced A3C for addressing multi-agent pathfinding challenges in warehousing systems. 2024.

[ref33] 33. El–Ela MHA, Hamdi A, editors. Deep Heuristic Learning for Real-Time Urban Pathfinding. 2024 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC); 2024: IEEE.

[ref34] 34. Feng B, Bi Y, Li M, Lin L. A decentralized multi-agent path planning approach based on imitation learning and selective communication. J Control Eng. 2024;24(8):084501.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref35] 35. Huang L, Chai R, Xing Z, Chen K, Chai S, Xia Y, editors. Real-Time Trajectory Planning for Logistical Supply Transportation Using GRU Neural Networks. International Conference on Autonomous Unmanned Systems; 2023: Springer.

[ref36] 36. Choi H-B, Kim J-B, Ji C-H, Ihsan U, Han Y-H, Oh S-W, et al., editors. Marl-based optimal route control in multi-agv warehouses. 2022 International Conference on Artificial Intelligence in Information and Communication (ICAIIC); 2022: IEEE.

[ref37] 37. Tiwari P, Zhu H, Pandey HM. DAPath: Distance-aware knowledge graph reasoning based on deep reinforcement learning. Neural Netw. 2021;135:1–12. pmid:33310193
View Article
PubMed/NCBI
Google Scholar

[92] View Article

[93] PubMed/NCBI

[94] Google Scholar

[ref38] 38. Ling-Xiao X, Lin F, Zi-Hao L, Ling Y, Qiu-Ping S, Jie-Wei L, editors. LengthPath: The Length Reward of Knowledge Graph Reasoning Based on Deep Reinforcement Learning. 2024 International Joint Conference on Neural Networks (IJCNN); 2024: IEEE.

[ref39] 39. Li Q, Gama F, Ribeiro A, Prorok A, editors. Graph neural networks for decentralized multi-robot path planning. 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS); 2020: IEEE.

[ref40] 40. Gao H, Xiao J, Yin Y, Liu T, Shi J. A mutually supervised graph attention network for few-shot segmentation: the perspective of fully utilizing limited samples. IEEE Trans Neural Netw Learn Syst. 2024;35(4):4826–38. pmid:35286269
View Article
PubMed/NCBI
Google Scholar

[98] View Article

[99] PubMed/NCBI

[100] Google Scholar

[ref41] 41. Zhang Z, Li Y, Dong H, Gao H, Jin Y, Wang W. Spectral-based directed graph network for malware detection. J Intelligent & Fuzzy Systems. 2020;8(2):957–70.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref42] 42. Li Q, Lin W, Liu Z, Prorok A. Message-aware graph attention networks for large-scale multi-robot path planning. AJIR Letters. 2021;6(3):5533–40.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

[ref43] 43. Lin Q, Ma H. SACHA: Soft actor-critic with heuristic-based attention for partially observable multi-agent path finding. Letters A. n.d.;1(1):1–10.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref44] 44. Wang Y, Xiang B, Huang S, Sartoretti G, editors. SCRIMP: Scalable communication for reinforcement-and imitation-learning-based multi-agent pathfinding. 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2023: IEEE.

[ref45] 45. Ma J, Lian D, editors. Attention-cooperated reinforcement learning for multi-agent path planning. International Conference on Database Systems for Advanced Applications; 2022: Springer.

[ref46] 46. Chua LO. CNN: a vision of complexity. Int J Bifurcation Chaos. 1997;07(10):2219–425.
View Article
Google Scholar

[113] View Article

[114] Google Scholar

[ref47] 47. Chen Y, Song G, Ye Z, Jiang X. Scalable and transferable reinforcement learning for multi-agent mixed cooperative-competitive environments based on hierarchical graph attention. Entropy (Basel). 2022;24(4):563. pmid:35455226
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref48] 48. Ma J, Lian D. Learning attention-based strategies to cooperate for multi-agent path finding. J Title. n.d.;53(4):0404–1–13.
View Article
Google Scholar

[120] View Article

[121] Google Scholar

[ref49] 49. Everett M, Chen YF, How JP, editors. Motion planning among dynamic, decision-making agents with deep reinforcement learning. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2018:IEEE.

[ref50] 50. Chung J, Gulcehre C, Cho K, Bengio YJapa. Empirical evaluation of gated recurrent neural networks on sequence modeling. 2014.

[ref51] 51. Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-cJAinips. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. 2015;28.

[ref52] 52. Wang C, Niepert M, editors. State-regularized recurrent neural networks. International Conference on Machine Learning; PMLR: 2019.

[ref53] 53. Isufi E, Gama F, Ribeiro A. EdgeNets: edge varying graph neural networks. IEEE Trans Pattern Anal Mach Intell. 2022;44(11):7457–73. pmid:34516371
View Article
PubMed/NCBI
Google Scholar

[127] View Article

[128] PubMed/NCBI

[129] Google Scholar

[ref54] 54. Tolstikhin IO, Houlsby N, Kolesnikov A, Beyer L, Zhai X, Unterthiner T, et al. Mlp-mixer: An all-mlp architecture for vision. 2021;34:24261–72.

[ref55] 55. Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio YJapa. Graph attention networks. 2017.

Figures

Abstract

1. Introduction

2. Related work

2.1. Path finding based on historical information

2.2. Communication-based path finding

3. SDPGAT-G infrastructure

3.1. Formulation of the MAPF problem

3.2. Temporal-dimensional information extraction by GRU-CNN

3.3. Spatial-dimensional information aggregation and update by SDPGAT-G

3.3.1. Information aggregation by SDPGAT-G.

3.3.2. Message delivery and update by SDPGAT-G.

3.4. Aggregated information decoding by MLP

4. Experiments

Experiment target

4.2. Experimental setup

4.3. Data preparation

4.4. Metrics

4.5. Experiments and analysis

4.5.1. Ablation study and analysis.

4.5.2. Comparison to state-of-the-art.

4.5.3. Trends in different map sizes and densities.

4.6. Summary

5. Conclusion

Acknowledgments

References