Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Adaptive output steps: FlexiSteps network for dynamic trajectory prediction

  • Yunxiang Liu ,

    Contributed equally to this work with: Yunxiang Liu, Hongkuo Niu

    Roles Conceptualization, Supervision

    Affiliation Faculty of Intelligence Technology, Shanghai Institute of Technology, Shanghai, Shanghai, China

  • Hongkuo Niu ,

    Contributed equally to this work with: Yunxiang Liu, Hongkuo Niu

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    236142132@mail.sit.edu.cn

    Affiliation Faculty of Intelligence Technology, Shanghai Institute of Technology, Shanghai, Shanghai, China

  • Jianlin Zhu

    Roles Supervision

    ‡ JZ also contributed equally to this work.

    Affiliation Faculty of Intelligence Technology, Shanghai Institute of Technology, Shanghai, Shanghai, China

Abstract

Accurate trajectory prediction is vital for autonomous driving, robotics, and intelligent decision-making systems, yet traditional models typically rely on fixed-length output predictions, limiting their adaptability to dynamic real-world scenarios. In this paper, we introduce the FlexiSteps Network (FSN), a novel framework that dynamically adjusts prediction output time steps based on varying contextual conditions. Inspired by recent advancements addressing observation length discrepancies and dynamic feature extraction, FSN incorporates a pre-trained Adaptive Prediction Module (APM) to intelligently determine optimal prediction horizons and a Dynamic Decoder (DD) module that enables flexible output generation across different time steps. Additionally, to balance prediction horizon and accuracy, we design a scoring mechanism that leverages Fréchet distance to evaluate geometric similarity between predicted and ground truth trajectories while considering prediction length, enabling principled trade-offs between prediction horizon and accuracy. Our plug-and-play design allows seamless integration with existing trajectory prediction models. Extensive experiments on benchmark datasets including Argoverse and INTERACTION demonstrate that FSN achieves superior prediction accuracy and contextual adaptability compared to traditional fixed-step approaches.

Introduction

Trajectory prediction plays an essential role in various critical applications such as autonomous driving, robotics, and intelligent decision-making systems. Accurately predicting the future motion of dynamic agents is fundamental to ensuring safety and efficiency in real-world scenarios. Recent advancements in deep learning have significantly improved the precision of trajectory prediction models [110]. However, most existing methods are constrained by a fixed-length prediction horizon, limiting their adaptability and effectiveness when confronted with dynamic and unpredictable environments.

Traditional trajectory prediction models typically employ fixed-length output predictions, fundamentally limiting their ability to adapt to dynamic real-world environments. This rigid approach fails to recognize that optimal prediction horizons naturally vary based on changing contextual conditions such as road geometry, traffic density, and agent behaviors. Consequently, these models either generate unnecessarily long predictions in simple scenarios or produce insufficient forecasts when longer horizons are needed for complex situations. While recent studies have begun addressing input-side variability through methods like FlexiLength Network (FLN) [11], which introduced calibration and adaptation mechanisms to handle varying observation lengths as shown in Fig 2, the critical issue of output-length adaptability—dynamically determining how far into the future to predict based on current conditions—remains largely unaddressed in existing literature.

thumbnail
Fig 1. Prediction results of HiVT and HPNet with different fixed prediction steps.

https://doi.org/10.1371/journal.pone.0333926.g001

Similarly, Length-agnostic Knowledge Distillation (LaKD) [12] was proposed to handle varying observation lengths by dynamically transferring knowledge between trajectories of differing lengths, thus overcoming the inherent limitations of traditional fixed-length input methods. LaKD’s approach highlighted that the effectiveness of longer observed trajectories could sometimes be compromised by interference, emphasizing the necessity for adaptive mechanisms to handle real-world trajectory variations.

Inspired by these developments, we propose the FlexiSteps Network (FSN), a novel trajectory prediction framework specifically designed to dynamically adjust the prediction output steps based on contextual cues and environmental conditions. FSN incorporates an innovative pre-trained Adaptive Prediction Module (APM) to intelligently evaluate and determine the optimal number of predicted future steps during inference, ensuring predictions are contextually appropriate. The framework addresses the fundamental limitation of fixed-step predictions by enabling adaptive output generation that responds to varying scenario complexities. Furthermore, to ensure seamless integration with existing architectures, we design a Dynamic Decoder (DD) module that trains multiple specialized decoders for different step lengths, facilitating flexible output generation during inference. This approach significantly enhances prediction adaptability and accuracy across diverse scenarios.

However, the challenge of balancing prediction horizon and accuracy remains a critical issue. Traditional metrics such as Average Displacement Error (ADE) and Final Displacement Error (FDE) often fail to capture the overall shape and temporal consistency of predicted trajectories, leading to suboptimal evaluation of model performance [13]. Recent researches [14,46] and our experiments, as shown in Fig 1, also show that prediction accuracy and prediction horizon are inversely proportional. To address this, we introduce a scoring mechanism that leverages the Fréchet distance, a robust geometric measure that comprehensively assesses trajectory similarity by considering both spatial and temporal relationships [13,15]. By combining the Fréchet distance with the prediction steps, we can effectively evaluate the quality of predictions while dynamically adjusting the output horizon.

This scoring mechanism not only enhances the evaluation of trajectory predictions but also provides a means to trade off between prediction horizon and accuracy. By incorporating the Fréchet distance, we can ensure that the model does not favor shorter-term predictions for higher precision, thus maintaining a balance between flexibility and accuracy.

Through extensive experiments on prominent benchmark datasets, including Argoverse and INTERACTION, our FSN demonstrates significant improvements in flexibility and predictive accuracy compared to traditional models. This framework provides a practical solution to the critical need for adaptive, context-aware trajectory prediction models, setting a new benchmark for effectiveness in dynamic prediction scenarios.

In summary, our work has the following contributions:

  • We propose a novel dynamic trajectory prediction framework FlexiSteps Network (FSN), which enables adaptive output step determination based on varying contextual conditions, addressing a critical limitation of traditional fixed-step prediction models. Both the Adaptive Prediction Module (APM) and Dynamic Decoder (DD) are designed as plug-and-play components for seamless integration with existing learning-based models.
  • We introduce a comprehensive scoring mechanism that enables principled trade-offs between prediction accuracy and temporal horizon, representing the first systematic exploration of metrics to balance prediction step length and accuracy. By incorporating the Fréchet distance, which considers both spatial and temporal trajectory relationships, our approach provides a robust evaluation framework for prediction quality across varying time horizons, addressing limitations of traditional point-wise distance metrics.
  • We validate the accuracy and flexibility through comprehensive experiments on benchmark datasets including Argoverse [16] and INTERACTION [17].

Related work

Traditional trajectory prediction

Trajectory prediction plays a critical role in applications such as autonomous driving, robotics, and intelligent systems. Traditionally, trajectory prediction approaches focus on using deep neural networks to learn from historical agent movements and contextual data, effectively modeling complex interactions and behaviors. Methods leveraging Graph Neural Networks (GNNs) [1822], Generative Adversarial Networks (GANs) [23,24], and Transformer-based architectures have significantly advanced the field [3,2532], demonstrating robust predictive capabilities on various benchmarks.

Variable timestep prediction

A major limitation of traditional trajectory prediction methods is their inability to handle varying data effectively. Xu and Fu [11] propose the FlexiLength Network (FLN), which integrates trajectory data of diverse length and employs FlexiLength Calibration (FLC) and Adaptation (FLA) to learn temporal invariant representations. In contrast, Li et al. [12] introduce LaKD, a length-agnostic knowledge distillation framework that dynamically transfers knowledge between trajectories of different lengths. LaKD addresses the limitations of FLN by employing a length-agnostic knowledge distillation to dynamically transfer knowledge and a dynamic soft-masking mechanism to prevent knowledge conflicts. However, both FLN and LaKD primarily focus on input length variability and do not address the challenge of dynamically adjusting output prediction steps, which is critical for real-time decision-making.

Other similar methods such as [33] also try to select different prediction models by understanding different scenarios, but they can not achieve plug-and-play effects for more learning-based methods, which have great limitations. And [7] just divided the prediction into 2 key horizons including short-term and long-term, which is not flexible enough to adapt to the real-world scenarios. In contrast, our proposed FlexiSteps Network (FSN) introduces a pre-trained Adaptive Prediction Module (APM) that dynamically adjusts the output prediction steps based on contextual cues, ensuring optimal prediction accuracy and efficiency. This approach allows for greater flexibility in handling varying prediction requirements, making FSN a more adaptable solution for dynamic trajectory prediction tasks.

Metrics for trajectory prediction

Traditional metrics such as ADE, FDE, MR, has been used in almost all trajectory prediction methods [19,25,26,3439] in autonomous driving scenarios. However, these metrics overlook the overall shape and temporal consistency [13]. Song et al. [40] employs Hausdorff Distance [4143] in their trajectory matching module to ensure coherence. While the Hausdorff distance serves as a natural metric for computing curves or compact sets, it ignores both the direction and motion dynamics along the curves [44,45].

The Fréchet distance, however, accounts for the order of points, making it a superior measure for assessing curve similarity [44]. Thus it is a robust geometric measure, has been proposed as an alternative to ADE and FDE, providing a more comprehensive evaluation of trajectory prediction quality [14,46,47]. By considering both spatial and temporal relationships, the Fréchet distance offers a more nuanced assessment of trajectory predictions, making it a suitable choice for evaluating dynamic prediction models.

Method

Our FlexiSteps Network (FSN) is designed to dynamically adjust the prediction output steps based on contextual cues and environmental conditions. The overall architecture of FSN is shown in Fig 2. The key components of FSN include a pre-trained Adaptive Prediction Module (APM), a Dynamic Decoder (DD), and a scoring mechanism that incorporates the Fréchet distance to evaluate trajectory predictions. We detail the scoring mechanism in section Scoring mechanism, APM in section Pretrained adaptive prediction module and DD in section Dynamic decoder respectively.

Problem formulation

Given the target agent i’s locations in the past T time steps, where represents the 2D coordinates of the agent, including vehicles, pedestrians and every traffic participants within the perceived range, at time step t, we aim to predict the future locations , where F is the number of future steps to be predicted. We present the historical relative trajectory of agent i as in historical time steps. Naturally, the target agent will interact with the context including historical locations of surrounding agents and high-definition map(HD map) that represented as and , where Na is the number of surrounding agents within a certain radius, Nm is the number of HD map segments in the same radius, and and are the start and end points of the HD map segment .

Overview of FlexiSteps network

The overview of our FlexiSteps Network (FSN) is shown in Fig 2. Most methods in trajectory prediction use the framework of encoder-decoder, where the encoder encodes the historical trajectory and context information into a latent representation and the decoder decodes it into future trajectory. Our FSN also follows this way, and we focus on the decoder method, which means that we use baseline encoder as:

(1)

where k is the mode index in multi-modal prediction framework, ei,k is the latent representation of the target agent i under the k mode, is the encoder function from our baseline methods, including [3,6], Pi is concated from pi and poth, ci is the agent attribute including agent heading, velocity and agent type, and is the HD map lane segments attribute including lane heading, lane turn direction and whether it is an interaction. The output of the encoder is then fed into our APM and DD module to generate prediction steps and future trajectory predictions.

The FSN framework consists of three main components: the pre-trained Adaptive Prediction Module (APM), the Dynamic Decoder (DD), and the scoring mechanism. The APM is responsible for dynamically adjusting the prediction output steps based on contextual cues and environmental conditions. It is trained through the assessment result from trained baseline model in different fixed output steps. And the assessment result is evaluated by the scoring mechanism. The DD is designed to handle varying prediction lengths, allowing the model to output sequences with different step lengths during inference. This flexibility is crucial for adapting to real-world scenarios where the required prediction horizon may vary significantly. Finally, the scoring mechanism incorporates the Fréchet distance to evaluate the quality of trajectory predictions, considering both spatial and temporal relationships. By combining these components, FSN provides a robust and adaptable solution for dynamic trajectory prediction tasks.

Scoring mechanism

According to the experiments on HiVT [3] and HPNet [6], as shown in Fig 1 the prediction accuracy and prediction horizon are inversely proportional, which means that the longer the prediction horizon, the lower the prediction accuracy. This will cause our Dynamic Decoder to tend to predict shorter durations for lower prediction errors and influence the performance of our Dynamic Prediction. To address this, we introduce a scoring mechanism that incorporates the Fréchet distance to evaluate the quality of trajectory predictions. As shown in Fig 3.

thumbnail
Fig 2. The overview of our FlexiSteps Network(FSN).

The FSN framework consists of three main components: the pre-trained Adaptive Prediction Module (APM), the Dynamic Decoder (DD), and the scoring mechanism. The APM is responsible for dynamically adjusting the prediction output steps based on contextual cues and environmental conditions. It is trained through the assessment result from trained baseline model, we also take it as the backbone as shown in Scoring Mechanism, in different fixed output steps. Then DD handles varying prediction lengths, allowing the model to output sequences with different step lengths during inference. Finally, the scoring mechanism evaluate the quality of trajectory predictions, considering both spatial and temporal relationships.

https://doi.org/10.1371/journal.pone.0333926.g002

thumbnail
Fig 3. Overview of our scoring mechanism.

The Fréchet distance is used to evaluate the quality of trajectory predictions, and the prediction length is also considered to balance accuracy and prediction horizon.

https://doi.org/10.1371/journal.pone.0333926.g003

Fréchet distance Kernel.

The Fréchet distance is a robust geometric measure that comprehensively assesses trajectory similarity by considering both spatial and temporal relationships [13,15]. However, it can not be directly applied into machine learning frameworks due to its non-smoothness function as:

(2)

where d(x,y) is the Euclidean distance between vector x and y, and are two sets of points, and is the set of all possible alignments between the two sets. Fréchet distance kernal (FRK) [48] design soft-min approximation and smooth-min approximation to make fréchet distance computable in machine learning frameworks. Howereve, the smooth-min approximation of the original FRK is noise-sensitive because the exponential weight assigned to outliers may be too large. To address this issue, we propose a new Fréchet distance kernal (FDK) to approximate the fréchet distance as a smooth function by introducing Huber Loss Smooth [49,50]:

(3)

where δ is threshold parameters, which is set to 0.1 in our experiments. The Fréchet distance kernal (FDK) is defined as:

(4)(5)

where , with parameters , β, ZA>0.

Score Kernel.

In addition, in order to balance accuracy and prediction length, it is not enough to just calculate the fréchet distance, the prediction length is necessary to be included. The smaller the distance, the closer between the prediction trajectories and ground truth trajectories. To make the fraction of the prediction trajectory smaller, the higher the prediction quality, we supplement the prediction length at the denominator as:

(6)

where is the score of agent i with prediction steps f, is the Fréchet distance between the predicted trajectory and the ground truth trajectory of agent i with prediction steps f. The score is designed to be lower for better predictions, as it combines the Fréchet distance with the prediction length, thus penalizing longer predictions that do not match the ground truth well.

Finally, we can get the final score from all prediction steps:

(7)

And the prediction step of agent i is:

(8)

To summarize, we express the scoring mechanism in:

(9)

where is the optimal prediction step achieves the lowest score, i.e.the best result, under our scoring mechanism, is the scoring mechanism function, and is the ground truth future trajectory of agent i. To calibrate the trade-off between prediction horizon and trajectory accuracy, we do not introduce additional hyperparameters; instead, the division by the step length f naturally balances longer and shorter predictions. This design ensures that longer horizons are only favored when their trajectory similarity (measured by Fréchet distance) is sufficiently high. We confirmed the stability of this trade-off through empirical validation: the formulation consistently produced reasonable horizon selections across the validation set without requiring further tuning. By combining the Fréchet distance with the prediction steps, we can effectively evaluate the quality of predictions while dynamically adjusting the output horizon.

Pretrained adaptive prediction module

The Adaptive Prediction Module (APM) is a pre-trained module that adaptively adjusts the prediction steps based on the contextual information and environmental conditions. The APM is trained using a set of historical trajectories and their corresponding future trajectories, allowing it to learn the optimal prediction steps for different scenarios. The training process involves evaluating the performance of the baseline model with different fixed prediction steps and using the assessment results to guide the APM’s adjustments. And the detailed evaluation criteria will illustrated in section Scoring mechanism.

Training stage.

To train our APM, we first collect the prediction results from the baseline model trained by different fixed output steps:

(10)

where is a twolayerMLP as the decoder function from our baseline methods, is the fixed output steps, and are the predicted future trajectory and probability of agent i mode k with fixed output steps f respectively.

Then we use the scoring mechanism to evaluate the prediction results and return the optimal prediction steps for each agent . After collecting the optimal prediction steps for all agents among datasets, we can train our APM as illustrated in Fig 4 and following:

thumbnail
Fig 4. Training stage of the Adaptive Prediction Module (APM).

The APM is trained to predict the optimal prediction step based on the encoded latent features of the target agent and its context.

https://doi.org/10.1371/journal.pone.0333926.g004

(11)(12)(13)(14)

where is the predicted probability distribution of different prediction steps for agent i, N is the number of agents in the dataset, is a two-layer-Multilayer-Perceptron(twolayerMLP) as the steps decoder, is the cross entropy loss function, is the regression loss, and bi is the one-hot encoded ground truth with . The APM is trained to predict the optimal prediction step based on the input context.

In summary, the loss function for trainging the APM is:

(15)

Inference stage.

APM is applied to the training and inference process of FSN as a plug-and-play module. APM takes the encoded latent features of the target agent and its context as input and outputs the predicted optimal prediction step:

(16)

where fi is the predicted optimal prediction step for agent i. The APM can be integrated into any trajectory prediction model, allowing it to dynamically adjust the prediction steps based on the contextual information and environmental conditions. This adaptability is crucial for ensuring that the model can effectively handle varying prediction requirements in real-world scenarios.

Dynamic decoder

Traditional trajectory prediction models typically use MLP or twolayerMLP as the decoder to decode the latent embedding into future trajectory with fixed steps, as shown in Eq 10. This leads to a limitation in flexibility, as the model can only align weights to a specific output steps. To address this, we propose Dynamic Decoder (DD) that can handle varying prediction lengths, allowing the model to output sequences with different step lengths during inference.

Specifically, each time step of prediction matches the decoder with the corresponding step length. During training, the input ei and its output steps fi, which is from the APM, are processed by their respective sub-networks and each parameter update is independent of each other:

(17)

where and are the predicted future trajectory and probability of agent i mode k with output steps fi, is the decoder function for the whole network. During inference, the sub-network matching the predicted output steps is exclusively activated.

Additionally, the input ei,k with different movement information may lead to different fit, which has potential drawbacks. To address this, we employ KL divergence [12] to distill knowledge from the "lower" score sequences to the "higher" ones:

(18)

where and are the latent features of the "higher" and "lower" score sequences respectively.

Training objective

In addition to the pre-trained APM, we also need to train the entire FlexiSteps Network. Following [3] and HPNet [6], we also adopt the negative log-likelihood in HiVT and Huber in HPNet as the regression loss . We also use the cross-entropy loss as the classification loss to optimize the model. Finally, the total loss function can be expressed as:

(19)

where λ is a hyperparameter to balance the contribution of KL loss function. The training process involves optimizing the model parameters to minimize the total loss, allowing the FlexiSteps Network to effectively learn from the data and adapt to varying prediction requirements.

Experiments

Settings

Baselines.

Our FlexiSteps Network (FSN) is a plug-and-play model that can be integrated into deep learning-based trajectory prediction models. We have chosen HiVT [3] and HPNet [6], a typical method and a state-of-the-art open source method, as our baseline models. Due to limited computing resources,for the HPNet model, we only use the propose stage among the propose prediction and refine prediction stage and marked as HPNet(h) below. For more comprehensive evaluation, we include 3 additional baselines: (1) Isolated Training (IT): Train the baseline model with different fixed output steps. (2) Intercepted Result (IR): The model just trained once with the longest prediction steps and the prediction results are intercepted to different steps. (3) Fixed Steps Network (fixed): A version of our FSN where the output of APM is fixed to 5-30 steps during training and inference respectively. This allows us to isolate the impact of dynamic step prediction from other factors.

Datasets.

Argoverse [16] is a large-scale dataset for autonomous driving, containing diverse scenarios and complex interactions between agents. It includes high-definition maps and rich contextual information, making it suitable for evaluating trajectory prediction models. The dataset is divided into training, validation, and test sets, with a total of 324,557 interesting vehicle trajectories. This rich dataset includes high-definition (HD) maps and recordings of sensor data, referred to as “log segments,” collected in two U.S. cities: Miami and Pittsburgh. These cities were chosen for their distinct urban driving challenges, including unique road geometries, local driving habits, and a variety of traffic conditions.

INTERACTION [17] is an extensive dataset developed for autonomous driving research, particularly focusing on behavioral aspects such as motion prediction, behavior analysis, and behavior cloning. It contains a diverse collection of natural movement patterns from various road users, including vehicles and pedestrians, captured across numerous highly interactive traffic scenarios from multiple countries. This dataset provides valuable information for studying complex agent interactions in different driving environments.

Metrics.

Although we have metioned the limitations of mainstream metrics such as ADE and FDE, we still use them because we need to compare our FSN with traditional methods to verify the performance of our method.

Specifically, for Argoverse dataset, we use minimum ADE (minADE), minimum FDE (minFDE) and Miss Rate (MR) to measure the accuracy and robustness of the model. The minimum ADE is the minimum average displacement error between the predicted trajectory and the ground truth trajectory, while the minimum FDE is the minimum final displacement error. The Miss Rate is the percentage of predicted trajectories that do not match any ground truth trajectory within a certain threshold.

For INTERACTION dataset, we employ minJointADE, minJointFDE and Cross Collision Rate (CCR) to evaluate the performance of joint trajectory prediction. The minJointADE is the minimum average displacement error between the predicted joint trajectory and the ground truth joint trajectory, while the minJointFDE is the minimum final displacement error. The Cross Collision Rate (CCR) is the percentage of predicted joint trajectories that collide with each other, indicating the model’s ability to handle interactions between agents.

Implementation details.

For the HiVT baseline, we train our APM and FSN for 64 epochs with a batch size of 32 on 1 NVIDIA A6000 GPU, using the AdamW optimizer with a learning rate of and weight decay of . For the HPNet(h) baseline, we train our APM and FSN for 64 epochs with a batch size of 4 on 1 NVIDIA A6000 GPU, using the AdamW optimizer with a learning rate of and weight decay of .

The hyperparameter λ is set to 0.5 for both baselines. The training process involves optimizing the model parameters to minimize the total loss, allowing the FlexiSteps Network to effectively learn from the data and adapt to varying prediction requirements.

Main results

Argoverse.

As shown in Table 1, our FlexiSteps Network (FSN) outperforms the baselines on the Argoverse dataset across all prediction steps. Specifically, FSN achieves a minimum ADE of 0.2121 and a minimum FDE of 0.1632 at 5 timesteps, which is significantly better than the IT and IR baselines. The results demonstrate that FSN effectively leverages the pre-trained Adaptive Prediction Module (APM) and Dynamic Decoder (DD) to adaptively adjust the prediction steps based on contextual information, leading to improved accuracy and robustness in trajectory prediction.

Furthermore, the FSN(fixed) variant, which uses fixed prediction steps during training and inference, also outperforms the baseline methods but still falls short of our full FSN approach, highlighting the importance of adaptive prediction step selection in improving model performance (Figs 5 and 6).

thumbnail
Fig 5. The prediction results of HiVT with different prediction steps from Intercepted Results (IR).

The blue line is the prediction trajectories, the red line is the ground truth trajectory, and the green line is the historical trajectory.

https://doi.org/10.1371/journal.pone.0333926.g005

thumbnail
Fig 6. The prediction results of HiVT trained with fixed prediction steps.

The blue line is the prediction trajectories, the red line is the ground truth trajectory, and the green line is the historical trajectory. The last subfigure is the prediction results of our FlexiSteps Network (FSN).

https://doi.org/10.1371/journal.pone.0333926.g006

In addition, we also compare our FSN with the state-of-the-art dynamic prediction methods FLN and LaKD. The results show that FSN achieves better performance than FLN and LaKD, demonstrating the effectiveness of our approach in handling varying prediction lengths. The minimum ADE and FDE of FSN at 30 timesteps are 0.6571 and 0.9602, respectively, which are significantly lower than those of FLN and LaKD. This indicates that FSN can effectively adapt to different prediction requirements and achieve better performance in trajectory prediction tasks.

Interation.

As shown in Table 1, our FlexiSteps Network (FSN) also outperforms the baselines on the INTERACTION dataset across all prediction steps. Specifically, FSN achieves a minimum Joint ADE of 0.0066 and a minimum Joint FDE of 0.0067 at 5 timesteps, which is significantly better than the IT and IR baselines. Furthermore, the FSN(fixed) variant also shows improvements over the baseline methods, achieving better performance than IT and IR across different prediction horizons, but still falls short compared to our full adaptive FSN approach. This further demonstrates that FSN effectively leverages the pre-trained Adaptive Prediction Module (APM) and Dynamic Decoder (DD) to adaptively adjust the prediction steps based on contextual information, leading to improved accuracy and robustness in joint trajectory prediction.

thumbnail
Table 1. Performance comparison of FlexiSteps Network (FSN).

https://doi.org/10.1371/journal.pone.0333926.t001

Computational efficiency analysis.

As shown in Table 2, we evaluate the computational efficiency of our FlexiSteps Network (FSN) by measuring the inference time and the number of parameters when integrated with HiVT and HPNet(h) as backbone models. The results indicate that FSN introduces a slight increase in inference time and parameters compared to the original backbone models. Specifically, when using HiVT as the backbone, FSN increases the average inference time from 42.56 ms to 45.58 ms and the number of parameters from 2.5M to 3.1M. Similarly, with HPNet(h) as the backbone, FSN increases the average inference time from 61.72 ms to 64.41 ms and the number of parameters from 4.1M to 4.7M.

The results in Table 2 reveal that FSN introduces a moderate increase in computational resources compared to baseline models. For inference time, FSN adds approximately 3.02 ms to HiVT and 2.69 ms to HPNet(h). This increase can be attributed to the additional processing required by the Adaptive Prediction Module (APM), which must dynamically evaluate the appropriate prediction horizon during inference. More significantly, FSN increases the parameter count by 0.6M for both backbones. This substantial parameter increase primarily stems from the Dynamic Decoder (DD) design, which maintains multiple specialized fixed-length decoders to handle different prediction horizons. While each individual decoder has relatively modest parameter requirements, maintaining a comprehensive set of decoders for various prediction lengths (from 5 to 30 timesteps) collectively contributes to the significant parameter increase. Despite these costs, the performance improvements demonstrated in our experiments suggest that this computational trade-off is justified for applications requiring adaptive prediction horizons.

thumbnail
Table 2. Computational Efficiency Analysis of FlexiSteps Network (FSN).

https://doi.org/10.1371/journal.pone.0333926.t002

Distribution of prediction steps.

demonstrate the adaptability of our FlexiSteps Network (FSN) in predicting varying output steps, we analyze the distribution of predicted steps on the Argoverse validation dataset. As shown in Fig 7, in small-scale scenarios with fewer than 10 agents, longer horizons (25-30 steps) remain prevalent, reflecting relatively simple interaction dynamics. For medium-scale scenarios (11-30 agents), the majority of cases concentrate on 15-20 steps, while still preserving both shorter and longer horizons, indicating diverse prediction demands. As the number of agents increases beyond 30, shorter horizons (5-15 steps) dominate, and in highly congested scenes (51+ agents), predictions are largely restricted to very short lengths. Overall, the distribution highlights that medium horizons (10-20 steps) account for the majority of scenarios, aligning with the intuition that moderate predictive ranges balance stability and adaptability in complex environments.

thumbnail
Fig 7. Prediction Steps Distribution on Argoverse validation set.

(a) is the distribution of the number of agents in different scenarios of Argoverse validation set. (b) is a heatmap showing the distribution of prediction lengths across different agent count bins. The x-axis represents the number of agents, while the y-axis represents the prediction length in the scenario. The color intensity indicates the number of scenarios for each combination of agent count and prediction length.

https://doi.org/10.1371/journal.pone.0333926.g007

Ablation study

To evaluate the effectiveness of each component in our FlexiSteps Network (FSN), we conduct an ablation study on the Argoverse validation dataset with 30 prediction steps. The results are presented in Table 3. We analyze the impact of the Dynamic Decoder (DD), and scoring mechanism on the overall performance of FSN. Specifically, we compare the performance of Dynamic Decoder (DD) with and without the KL divergence loss, and the performance of the scoring mechanism with and without the Fréchet distance, we employ ADE and FDE instead of Fréchet distance.

The results show that our score mechanism with Fréchet distance achieves the best performance, with a minimum FDE of 0.9602 and a minimum ADE of 0.6571, significantly outperforming the score mechanism with ADE and FDE. This indicates that the Fréchet distance effectively captures the spatial and temporal relationships in trajectory predictions, leading to improved accuracy. And the Dynamic Decoder (DD) with KL divergence loss also improves the performance, achieving a minimum FDE of 0.9683 and a minimum ADE of 0.6647, compared to the DD without KL divergence loss. This demonstrates that the KL divergence loss helps to distill knowledge from lower score sequences to higher ones, enhancing the model’s ability to adapt to varying prediction lengths.

Conclusion

In this paper, we propose the FlexiSteps Network (FSN), a novel framework for dynamic trajectory prediction that adapts the prediction output steps based on contextual cues and environmental conditions. FSN consists of three main components: a pre-trained Adaptive Prediction Module (APM), a Dynamic Decoder (DD) module, and a scoring mechanism that incorporates the Fréchet distance to evaluate trajectory predictions. The APM is trained to predict the optimal prediction step based on the encoded latent features of the target agent and its context, while the DD handles varying prediction lengths. The scoring mechanism evaluates the quality of trajectory prediction by combining the Fréchet distance with the prediction length. Our experimental results demonstrate that FSN outperforms state-of-the-art methods on both Argoverse and INTERACTION datasets, achieving better accuracy trajectory prediction tasks.

While our FlexiSteps Network (FSN) shows promising results in dynamic trajectory prediction, there are still several limitations and potential areas for future work. One limitation is the reliance on the pre-trained Adaptive Prediction Module (APM), which may not generalize well to unseen scenarios or environments. Future work could explore methods to enhance the adaptability of APM, such as incorporating online learning or transfer learning techniques. Additionally, the scoring mechanism based on Fréchet distance may not capture all aspects of trajectory quality, and further research could investigate alternative metrics or hybrid approaches to improve evaluation accuracy. Finally, while FSN demonstrates flexibility in handling varying prediction lengths, it may still struggle with highly dynamic or unpredictable environments. Future research could focus on enhancing the model’s robustness to such scenarios, potentially through the integration of reinforcement learning or other adaptive techniques.

References

  1. 1. Chai Y, Sapp B, Bansal M, Anguelov D. Multipath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv preprint 2019.
  2. 2. Varadarajan B, Hefny A, Srivastava A, Refaat KS, Nayakanti N, Cornman A, et al. MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction. In: 2022 International Conference on Robotics and Automation (ICRA). 2022. p. 7814–21. https://doi.org/10.1109/icra46639.2022.9812107
  3. 3. Zhou Z, Ye L, Wang J, Wu K, Lu K. Hivt: hierarchical vector transformer for multi-agent motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. p. 8823–33. https://openaccess.thecvf.com/content/CVPR2022/html/Zhou_HiVT_Hierarchical_Vector_Transformer_for_Multi-Agent_Motion_Prediction_CVPR_2022_paper.html?ref=https://githubhelp.com
  4. 4. Zhou Z, Wang J, Li YH, Huang YK. Query-centric trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. p. 17863–73. https://openaccess.thecvf.com/content/CVPR2023/html/Zhou_Query-Centric_Trajectory_Prediction_CVPR_2023_paper.html?ref=https://githubhelp.com
  5. 5. Zhou Y, Shao H, Wang L, Waslander SL, Li H, Liu Y. SmartRefine: a scenario-adaptive refinement framework for efficient motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024. p. 15281–90. https://openaccess.thecvf.com/content/CVPR2024/html/Zhou_SmartRefine_A_Scenario-Adaptive_Refinement_Framework_for_Efficient_Motion_Prediction_CVPR_2024_paper.html
  6. 6. Tang X, Kan M, Shan S, Ji Z, Bai J, Chen X. Hpnet: dynamic trajectory forecasting with historical prediction attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024. p. 15261–70. https://openaccess.thecvf.com/content/CVPR2024/html/Tang_HPNet_Dynamic_Trajectory_Forecasting_with_Historical_Prediction_Attention_CVPR_2024_paper.html
  7. 7. Wang C, Liao H, Zhu K, Zhang G, Li Z. DEMO: a dynamics-enhanced learning model for multi-horizon trajectory prediction in autonomous vehicles. Information Fusion. 2025;118:102924.
  8. 8. Climent Pardo JC. Learning isometric embeddings of road networks using multidimensional scaling. arXiv e-prints. 2025.
  9. 9. Liu Y, Niu H, Zhu J. GAMDTP: dynamic trajectory prediction with graph attention mamba network. Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology. 2025.
  10. 10. Zhu J, Niu H. DyTTP: trajectory prediction with normalization-free transformers. arXiv preprint 2025.
  11. 11. Xu Y, Fu Y. Adapting to length shift: Flexilength network for trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024. p. 15226–37. https://openaccess.thecvf.com/content/CVPR2024/html/Xu_Adapting_to_Length_Shift_FlexiLength_Network_for_Trajectory_Prediction_CVPR_2024_paper.html
  12. 12. Li Y, Li C, Lv R, Li R, Yuan Y, Wang G. Lakd: length-agnostic knowledge distillation for trajectory prediction with any length observations. Advances in Neural Information Processing Systems. 2024;37:28720–44.
  13. 13. He S, Wu C. A Fréchet distance-hierarchical density-based spatial clustering of applications with noise for ship trajectory clustering and route recognition. In: 2024 International Conference on Distributed Systems, Computer Networks and Cybersecurity (ICDSCNC). 2024. p. 1–5. https://ieeexplore.ieee.org/abstract/document/10939571
  14. 14. Kuo Y-L, Huang X, Barbu A, McGill SG, Katz B, Leonard JJ, et al. Trajectory prediction with linguistic representations. In: 2022 International Conference on Robotics and Automation (ICRA), 2022. p. 2868–75. https://doi.org/10.1109/icra46639.2022.9811928
  15. 15. Eiter T, Mannila H. 1994. https://www.kr.tuwien.ac.at/staff/eiter/et-archive/files/cdtr9464.pdf
  16. 16. Chang MF, Lambert J, Sangkloy P, Singh J, Bak S, Hartnett A. Argoverse: 3d tracking and forecasting with rich maps. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019. p. 8748–57. https://www.argoverse.org/av1.html#forecasting-link
  17. 17. Zhan W, Sun L, Wang D, Shi H, Clausse A, Naumann M. Interaction dataset: an international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps. 2019. https://doi.org/arXiv:191003088
  18. 18. Liang M, Yang B, Hu R, Chen Y, Liao R, Feng S, et al. Learning lane graph representations for motion forecasting. Lecture Notes in Computer Science. Springer; 2020. p. 541–56. https://doi.org/10.1007/978-3-030-58536-5_32
  19. 19. Wang Y, Zhao S, Zhang R, Cheng X, Yang L. Multi-vehicle collaborative learning for trajectory prediction with spatio-temporal tensor fusion. IEEE Trans Intell Transport Syst. 2022;23(1):236–48.
  20. 20. Xu D, Shang X, Peng H, Li H. MVHGN: multi-view adaptive hierarchical spatial graph convolution network based trajectory prediction for heterogeneous traffic-agents. IEEE Trans Intell Transport Syst. 2023;24(6):6217–26.
  21. 21. Zhang K, Zhao L, Dong C, Wu L, Zheng L. AI-TP: attention-based interaction-aware trajectory prediction for autonomous driving. IEEE Trans Intell Veh. 2023;8(1):73–83.
  22. 22. Sheng Z, Xu Y, Xue S, Li D. Graph-based spatial-temporal convolutional network for vehicle trajectory prediction in autonomous driving. IEEE Trans Intell Transport Syst. 2022;23(10):17654–65.
  23. 23. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. In: Advances in neural information processing systems. 2014. https://proceedings.neurips.cc/paper_files/paper/2014/hash/f033ed80deb0234979a61f95710dbe25-Abstract.html
  24. 24. Gupta A, Johnson J, Fei-Fei L, Savarese S, Alahi A. Social gan: socially acceptable trajectories with generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. p. 2255–64. https://openaccess.thecvf.com/content_cvpr_2018/html/Gupta_Social_GAN_Socially_CVPR_2018_paper.html.
  25. 25. Azadani MN, Boukerche A. STAG: A novel interaction-aware path prediction method based on spatio-temporal attention graphs for connected automated vehicles. Ad Hoc Networks. 2023;138:103021.
  26. 26. Gu J, Sun C, Zhao H. Densetnt: end-to-end trajectory prediction from dense goal sets. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021. p. 15303–12. https://openaccess.thecvf.com/content/ICCV2021/html/Gu_DenseTNT_End-to-End_Trajectory_Prediction_From_Dense_Goal_Sets_ICCV_2021_paper.html
  27. 27. Ngiam J, Caine B, Vasudevan V, Zhang Z, Chiang HTL, Ling J. Scene transformer: a unified architecture for predicting multiple agent trajectories. arXiv preprint 2021.
  28. 28. Mohamed A, Qian K, Elhoseiny M, Claudel C. Social-stgcnn: A social spatio-temporal graph convolutional neural network for human trajectory prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020. p. 14424–32. https://openaccess.thecvf.com/content_CVPR_2020/html/Mohamed_Social-STGCNN_A_Social_Spatio-Temporal_Graph_Convolutional_Neural_Network_for_Human_CVPR_2020_paper.html
  29. 29. Gao K, Li X, Chen B, Hu L, Liu J, Du R, et al. Dual transformer based prediction for lane change intentions and trajectories in mixed traffic environment. IEEE Trans Intell Transport Syst. 2023;24(6):6203–16.
  30. 30. Zhou X, Zhao W, Wang A, Wang C, Zheng S. Spatiotemporal attention-based pedestrian trajectory prediction considering traffic-actor interaction. IEEE Trans Veh Technol. 2023;72(1):297–311.
  31. 31. Hu H, Wang Q, Cheng M, Gao Z. Trajectory prediction neural network and model interpretation based on temporal pattern attention. IEEE Trans Intell Transport Syst. 2023;24(3):2746–59.
  32. 32. Hou L, Li SE, Yang B, Wang Z, Nakano K. Structural transformer improves speed-accuracy trade-off in interactive trajectory prediction of multiple surrounding vehicles. IEEE Trans Intell Transport Syst. 2022;23(12):24778–90.
  33. 33. Karle P, Furtner L, Lienkamp M. Self-evaluation of trajectory predictors for autonomous driving. Electronics. 2024;13(5):946.
  34. 34. Qiao S, Shen D, Wang X, Han N, Zhu W. A self-adaptive parameter selection trajectory prediction approach via hidden markov models. IEEE Trans Intell Transport Syst. 2015;16(1):284–96.
  35. 35. Jiang Y, Zhu B, Yang S, Zhao J, Deng W. Vehicle trajectory prediction considering driver uncertainty and vehicle dynamics based on dynamic Bayesian network. IEEE Trans Syst Man Cybern, Syst. 2023;53(2):689–703.
  36. 36. Zyner A, Worrall S, Nebot E. Naturalistic driver intention and path prediction using recurrent neural networks. IEEE Trans Intell Transport Syst. 2020;21(4):1584–94.
  37. 37. Chen X, Zhang H, Zhao F, Hu Y, Tan C, Yang J. Intention-aware vehicle trajectory prediction based on spatial-temporal dynamic attention network for internet of vehicles. IEEE Trans Intell Transport Syst. 2022;23(10):19471–83.
  38. 38. Zhang K, Feng X, Wu L, He Z. Trajectory prediction for autonomous driving using spatial-temporal graph attention transformer. IEEE Trans Intell Transport Syst. 2022;23(11):22343–53.
  39. 39. Zhao H, Gao J, Lan T, Sun C, Sapp B, Varadarajan B. Tnt: target-driven trajectory prediction. In: Conference on robot learning, 2021. p. 895–904. https://proceedings.mlr.press/v155/zhao21b
  40. 40. Song Z, Jia C, Liu L, Pan H, Zhang Y, Wang J, et al. Don’t shake the wheel: momentum-aware planning in end-to-end autonomous driving. 2025. p. 22432–41. https://openaccess.thecvf.com/content/CVPR2025/html/Song_Dont_Shake_the_Wheel_Momentum-Aware_Planning_in_End-to-End_Autonomous_Driving_CVPR_2025_paper.html
  41. 41. Huttenlocher DP, Klanderman GA, Rucklidge WJ. Comparing images using the Hausdorff distance. IEEE Trans Pattern Anal Machine Intell. 1993;15(9):850–63.
  42. 42. Dubuisson M-P, Jain AK. A modified Hausdorff distance for object matching. In: Proceedings of 12th International Conference on Pattern Recognition. IEEE; 1993. p. 566–8. https://doi.org/10.1109/icpr.1994.576361
  43. 43. Belogay E, Cabrelli C, Molter U, Shonkwiler R. Calculating the Hausdorff distance between curves. Information Processing Letters. 1997;64(1):17–22.
  44. 44. Shahbaz K. Applied similarity problems using Fréchet distance. arXiv preprint 2013. https://arxiv.org/abs/1307.6628
  45. 45. Zhao J-L, Wu Z-K, Pan Z-K, Duan F-Q, Li J-H, Lv Z-H, et al. 3D Face similarity measure by fréchet distances of geodesics. J Comput Sci Technol. 2018;33(1):207–22.
  46. 46. Bhattacharyya A, Fritz M, Schiele B. Long-term on-board prediction of people in traffic scenes under uncertainty. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. p. 4194–202. https://openaccess.thecvf.com/content_cvpr_2018/html/Bhattacharyya_Long-Term_On-Board_Prediction_CVPR_2018_paper.html
  47. 47. Alt H, Godau M. Computing the fréchet distance between two polygonal curves. Int J Comput Geom Appl. 1995;05(01n02):75–91.
  48. 48. Takeuchi K, Imaizumi M, Kanda S, Tabei Y, Fujii K, Yoda K, et al. Fréchet Kernel for trajectory data analysis. In: Proceedings of the 29th International Conference on Advances in Geographic Information Systems. 2021. p. 221–4. https://doi.org/10.1145/3474717.3483949
  49. 49. Gupta D, Hazarika BB, Berlin M. Robust regularized extreme learning machine with asymmetric Huber loss function. Neural Comput & Applic. 2020;32(16):12971–98.
  50. 50. Meyer GP. An alternative probabilistic interpretation of the huber loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. p. 5261–9. https://openaccess.thecvf.com/content/CVPR2021/html/Meyer_An_Alternative_Probabilistic_Interpretation_of_the_Huber_Loss_CVPR_2021_paper.html