Figures
Abstract
Spatiotemporal trajectory classification is essential for intelligent perception systems but faces challenges including weak separability of dynamic features, representation collapse under limited samples, and heterogeneous conflicts in multimodal data. To address these issues, we propose K-M LLM-pro, a physics-guided cross-modal adaptation framework that integrates statistical mechanics with large language models (LLMs) to improve trajectory understanding. Our approach incorporates: (1) physics-informed prompt engineering based on Kramers-Moyal coefficients, embedding physical constraints via reproducing kernel Hilbert space projection; (2) a dynamic patching optimization mechanism combining variance maximization and Lyapunov stability criteria for unified modeling of heterogeneous trajectories; and (3) dual spatiotemporal adapters with a parameter-efficient expansion strategy, injecting domain knowledge while optimizing only 3.8% of new parameters. Experimental results on public datasets such as Geolife and AIS show that K-M LLM-pro outperforms state-of-the-art models in classification accuracy, demonstrating strong performance even in few-shot scenarios with only 1% of training data. To our knowledge, this is the first work to integrate K-M coefficients as interpretable statistical priors into LLMs, offering a lightweight and effective solution for modeling complex spatiotemporal dynamics.
Citation: Ge C, Zhang J, Du J, Yao J, Yang T, Wang X, et al. (2025) K-M LLM-pro: Physics-guided cross-modal adaptation for fine-grained spatiotemporal trajectory classification. PLoS One 20(10): e0334412. https://doi.org/10.1371/journal.pone.0334412
Editor: Aiqing Fang, Chongqing Normal University, CHINA
Received: June 2, 2025; Accepted: September 26, 2025; Published: October 21, 2025
Copyright: © 2025 Ge et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting information files.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Spatiotemporal trajectory classification [1–3], as the core technology of intelligent perception systems, is evolving from basic behavioral discrimination (e.g., navigation state detection) to fine-grained subclass recognition. This evolution has significantly enhanced its application value across various domains, including smart cities, ecological conservation, and disaster prevention [4,5]. In aerial monitoring, distinguishing trajectory patterns between aircraft types (e.g., H1T helicopters vs. L2J commercial airliners) enhances airspace safety alert precision [6]. Urban traffic management benefits from the accurate identification of vehicular trajectories (e.g., buses, trucks, and private cars), which helps reduce congestion prediction errors [7,8]. Hurricane monitoring systems achieve 48-hour advance warnings through trajectory-based intensity forecasting, significantly mitigating disaster impacts. Ecological studies leverage animal migration trajectory analysis to optimize wildlife habitat preservation strategies.
Regarding the spatiotemporal trajectory classification task, existing methods can be categorized into three types. Firstly, traditional methods rely on manual feature engineering, extracting kinematic features such as speed and acceleration and then using machine learning models like support vector machines or decision trees for classification [1,2,7,9]. Secondly, deep learning methods utilize models such as CNN [10], RNN [11], or Transformer [3] to automatically mine features from raw trajectories [12]. Some studies enhance classification performance by combining trajectory imaging with computer vision techniques. Thirdly, a rule detection and error correction framework has been introduced to enhance the robustness of the models. However, these existing methods still face three major challenges in fine-grained subclass classification.
- Weak Separability of Dynamic Features: Differences among subclasses are often concealed within high-order statistical features. While traditional physics-based models (e.g., Kramers–Moyal coefficients) can capture drift-diffusion characteristics, they struggle with nonlinear couplings. Deep learning approaches also exhibit limitations in implicitly representing such features, leading to blurred decision boundaries.
- Representation Collapse Under Small-Sample Conditions: Limited samples in specialized scenarios (e.g., maritime monitoring) exacerbate overfitting. Although parameter-efficient fine-tuning techniques mitigate catastrophic forgetting, their low-rank adaptation mechanisms often induce gradient conflicts between common and subclass-specific features, hindering optimization.
- Heterogeneous Representation Conflicts in Multimodal Data: Disparities in physical properties—such as irregular sampling intervals and varying spatial scales—complicate unified modeling. For instance, densely sampled urban trajectories and sparse animal migration paths exhibit significant temporal and spatial granularity mismatches, which distort attention mechanisms in standard Transformer models and limit cross-scale generalization.
In recent years, the breakthroughs in large language models (LLMs), represented by GPT [13] and Llama [14], have provided new ideas for addressing the above challenges. LLMs, with their powerful modeling capabilities for long-range dependencies and global patterns through the self-attention mechanism, are highly compatible with the task of spatiotemporal trajectory feature extraction. The general semantic knowledge in their pre-trained parameters can be transferred to the spatiotemporal domain through efficient fine-tuning, significantly reducing the dependence on labeled data. However, there is a semantic gap between the discrete symbolic representation of the text modality and the continuous spatiotemporal evolution of trajectories. Directly fine-tuning LLMs makes it difficult to model the dynamic features constrained by physical laws.
To this end, this paper proposes a physics-guided cross-modal adaptation theory, aiming to break through the existing bottlenecks through systematic innovation: (1) Statistical physics-driven approach: A global prompt word based on the Kramers-Moyal (K-M) coefficients is constructed as a physical prior. It is projected onto the reproducing kernel Hilbert space (RKHS) and concatenated with the trajectory embedding, enabling the LLM to simultaneously perceive the constraints of physical laws during inference. (2) Data-driven adaptive optimization: A Patching Optimizer module is proposed. Based on the variance maximization criterion and Lyapunov exponent analysis, heterogeneous data (such as dense urban trajectories and sparse animal migration paths) are dynamically segmented. Combined with the spatiotemporal feature enhancement adapter (STFEA) and the Spatiotemporal Context Aggregation Adapter (STCA), robust fusion of cross-scale features is achieved. (3) Parameter-efficient expansion strategy: An identity initialization block expansion method is designed. By optimizing only 3.8% of the newly added parameters, domain knowledge can be injected while retaining the general semantic understanding ability of the LLM, breaking through the design boundary of its text modality.
Experiments on multiple public datasets show that the proposed method (K-M LLM-pro) outperforms the existing state-of-the-art models in terms of accuracy and other indicators, especially excelling in few-shot learning tasks. The main contributions of this paper include:
- For the first time, the K-M coefficients are introduced as interpretable statistical priors into the LLM, compensating for the deficiencies of traditional embedding methods in dynamic system modeling.
- A cross-scale heterogeneous data adaptive optimization framework is proposed to solve the problem of unified representation of multi-modal trajectories.
- Parameter-efficient expansion of the LLM is achieved, providing a lightweight solution for domain knowledge transfer.
- The advancement and robustness of the proposed method in complex scenarios are systematically verified.
Related work
In this section, we provide a review of existing spatiotemporal trajectory classification methods and techniques for fine-tuning large models.
Spatiotemporal trajectory classification: Spatiotemporal trajectory classification plays a pivotal role in diverse applications, including location-based services, traffic management, and public safety systems. This classification task fundamentally involves analyzing mobile object trajectory data to discern distinct target types. Conventional methodologies primarily rely on heuristic algorithms and statistical approaches. For example, TraClass [15] implements trajectory classification through grid partitioning and refinement processes, while grid-based methods [16] employ predefined thresholds for uneven expansion of spatiotemporal grids.Additionally, Kramers-Moyal coefficients have demonstrated efficacy in target identification within complex spatiotemporal datasets by extracting higher-order statistical properties. However, these traditional methods exhibit limitations in handling large-scale, high-noise datasets, particularly in capturing complex spatiotemporal characteristics [15].
The evolution of deep learning has substantially advanced spatiotemporal trajectory classification techniques. TrajODE, built upon Recurrent Neural Network (RNN) architecture, integrates continuous-time characteristics with stochastic latent spaces, effectively addressing traditional RNN limitations in processing continuous-time dynamic trajectories [17]. The Spatiotemporal Gated Recurrent Unit (ST-GRU) enhances modeling capabilities through segmental convolutional weights and auxiliary temporal gates, effectively capturing spatiotemporal correlations and irregular time intervals [16]. TraClets introduces an innovative approach by transforming trajectory data into visual representations and employing Convolutional Neural Networks (CNNs) for classification, thereby simplifying preprocessing procedures [15]. TrajFormer, leveraging Transformer architecture, introduces a novel framework that generates continuous point embeddings by incorporating spatiotemporal interval information. This approach accelerates representation learning through a squeeze function and simplifies training via auxiliary losses, demonstrating particular efficacy with large-scale complex datasets [3]. Despite these advancements, current deep learning approaches face challenges in generalization capability, long-range dependency modeling, and high-order feature extraction, limiting their practical application in complex real-world scenarios.
The domain of time series classification has also contributed valuable insights to spatiotemporal trajectory analysis through innovative modeling approaches. Informer [18] enhances long-sequence processing efficiency through a locally sensitive hashing attention mechanism, maintaining an optimal balance between computational efficiency and predictive accuracy. TimesNet [10] introduces a novel paradigm by transforming 1D time series into 2D representations and utilizing Inception modules to capture temporal variations across different tensor configurations, establishing a robust foundation for general time series analysis. The Non-stationary Transformer [19] addresses the inherent non-stationary characteristics of time series data, while iTransformer [20] demonstrates superior performance in multidimensional time series prediction through its inverted Transformer architecture. However, the fundamental distinction between trajectory data and conventional time series data, particularly in terms of periodicity characteristics, presents significant challenges in directly applying these models to trajectory analysis. This limitation stems from the fact that existing models are predominantly optimized for data with strong periodic patterns , whereas trajectory data typically exhibits less pronounced periodicity.
Furthermore,beyond spatiotemporal and time series domains, LLMs have also achieved notable success in computational biology and bioinformatics. For example, DeepAIPs-Pred [21] integrates LLM-based feature extraction within a deep learning framework to identify anti-inflammatory peptides. Similarly, pACP-HybDeep [22] uses a hybrid deep learning model enhanced by pre-trained language representations for anticancer peptide prediction. TargetCLP [23] combines ESM embeddings with structural descriptors and employs feature selection to improve clathrin protein identification. Additionally, pNPs-CapsNet [24] leverages multi-view protein language models with capsule networks to capture hierarchical features in neuropeptides. These cases demonstrate the versatility of LLMs in handling complex biological data. Yet the very richness of such textual representations introduces a shared challenge: redundant or irrelevant dimensions risk overfitting and diminish interpretability [25]. Similarly, in spatiotemporal trajectory analysis, effective feature selection is crucial for capturing discriminative patterns while maintaining model interpretability.
Large-scale model fine-tuning technology: The remarkable advancements in Large Language Models have revolutionized core artificial intelligence domains, particularly Natural Language Processing (NLP) and Computer Vision (CV), prompting extensive research into their application across diverse scientific disciplines. Nevertheless, the fundamental architecture of LLMs, originally designed and optimized for sequential text processing, exhibits significant limitations when handling complex spatiotemporal data structures. These constraints arise from the inherent nature of spatiotemporal data, which is characterized by intricate multidimensional spatial correlations and non-stationary temporal dynamics. To address this architectural disparity, researchers have developed specialized adaptation frameworks, such as adapter-based techniques and Low-Rank Adaptation (LoRA) methodologies [26], which enable selective parameter optimization. These approaches facilitate efficient knowledge transfer from LLMs to spatiotemporal domains while preserving the models’ ability to capture and represent critical spatiotemporal patterns and relationships [12,27].
In the realm of temporal and spatiotemporal data analysis, substantial efforts have been devoted to leveraging pre-trained models for classification tasks through innovative adaptation strategies. A significant breakthrough is the One-Fits-All framework, which enhances pre-trained language and vision models by integrating projection matrices and task-specific adapter modules. This architecture has demonstrated exceptional efficacy in temporal sequence analysis, achieving state-of-the-art (SOTA) performance across various time series benchmarks [27]. Expanding on this paradigm, the STG-LLM study introduced a novel spatiotemporal graph adapter mechanism. This approach effectively bridges the gap between pre-trained language models and spatiotemporal data processing, delivering competitive performance compared to specialized, domain-specific SOTA methodologies [12].
The emergence of Parameter-Efficient Fine-Tuning (PEFT) techniques, including LoRA [26], Prefix Tuning [28], and adapter modules [29], has further advanced the fine-tuning of large models. These methods significantly reduce the number of fine-tuning parameters, enabling large models to adapt to specific task requirements with lower computational overhead. For instance, the LLAMA-PRO model extends Transformer blocks to increase depth, preserving the general capabilities of LLMs while achieving superior performance in specialized fields such as programming and mathematics [30]. These technological developments have not only enhanced the adaptability and efficiency of models but also opened new avenues for the application of LLMs in previously unexplored domains.
Despite these advancements, to the best of our knowledge, no systematic methodology has been established for effectively fine-tuning pre-trained models specifically for applications involving spatiotemporal trajectory data. This gap highlights a critical area for future research and innovation in the field.
Problem definition
This paper defines the spatiotemporal trajectory sequence dataset as D, , where Xr(i) represents the set of observations for trajectory r at timestep i, which can be represented as a set of observations across multiple feature dimensions
, with m denoting the number of features,
represents the true label associated with trajectory r. The objective of the spatiotemporal trajectory classification task is to utilize dataset D to train our model
, and to derive an optimal set of parameters
.This enables the model to produce a predicted label
that closely approximates the true label
when given any spatiotemporal trajectory data Xr(i) as input, such that
.
Materials and methods
Our proposed K-M LLM-pro framework establishes a physics-informed multimodal architecture that synergizes statistical mechanics principles with large language models (LLMs) for spatiotemporal trajectory understanding. As illustrated in Fig 1, the methodology comprises four innovative components: (1) Physics-guided cross-modal prompt engineering through Kramers-Moyal (K-M) coefficient extraction and covariance-aware RKHS projection; (2) Dynamic patching optimization with variance-stability co-optimization; (3) Dual spatiotemporal adapters combining CNN-local and Bi-LSTM-global feature learning; (4) Parameter-efficient LLM expansion via identity-initialized Transformer replication. This integrated approach addresses three fundamental challenges in trajectory analysis: weak separability of high-order dynamic features (resolved through K-M moment descriptors), inconsistent sampling rates (handled by Lyapunov-constrained patch selection), and modality gap between physical priors and data-driven embeddings (bridged via SNR-adaptive gated fusion).
0.1. Normalize
Data normalization represents a crucial preprocessing step in deep learning, significantly enhancing model performance and ensuring training stability. In this study, we implement reversible instance normalization [31] for trajectory data standardization. Considering the m-th variable within spatiotemporal trajectories as Xm , the normalization process for each variable is mathematically expressed as follows:
Where is the mean of Xm,
is the variance of Xm, and η is a very small constant to prevent the denominator from being zero.
0.2. Physics-informed Kramers-Moyal prompt engineering
To address the challenge of weak separability in high-order dynamic features, we propose a statistical physics-driven prompt engineering framework that integrates domain knowledge into large language models (LLMs). As shown in Fig 2, our method establishes a dual-channel encoding mechanism where physical priors and trajectory embeddings interact through a shared Hilbert space.
The left branch calculates K-M coefficients from raw trajectories, while the right branch processes spatiotemporal embeddings. Both streams are projected into RKHS for feature fusion. The gating module dynamically adjusts physical constraints based on local SNR.
K-M Coefficient Extraction and Validation: For a trajectory sequence , we compute the Kramers-Moyal coefficients through conditional moments:
The first-order (M(1)) and second-order (M(2)) coefficients capture drift and diffusion characteristics respectively. Compared to Lyapunov exponents in turbulence analysis (Fig 3), K-M coefficients achieve 8.6% higher average classification accuracy due to their complete characterization of Fokker-Planck dynamics:
For Markovian trajectories with multiplicative noise, form a complete basis for stochastic differential equation reconstruction, while Lyapunov exponents only characterize asymptotic stability.
Covariance-Aware RKHS Projection: To preserve physical constraints during modality fusion, we implement Mahalanobis kernel projection with hybrid covariance:
where ,
enforces physical constraints, and
learns from embeddings. This achieves 23.4% higher mutual information retention than standard RBF kernels (Fig 4).
Stabilized Gated Attention Fusion: We implement two-phase training for robust fusion:
Algorithm 1. Gated attention stabilization.
1: Phase 1: Fix , optimize
for 100 epochs
2: Phase 2: Update
As shown in Fig 4, this strategy maintains gate weight variance below 0.15 across datasets while enabling SNR-adaptive fusion: physical dominance () when SNR<3dB, data-driven focus (
) when SNR>10dB.
Cross-Modal Prompt Fusion: The physical prompts are concatenated with trajectory embeddings
through a gated attention mechanism:
where denotes the learnable gating weights, and σ is the sigmoid activation. This adaptive fusion allows the LLM to dynamically adjust the influence of physical constraints during different inference phases.
Dynamic Feature Disentanglement: Our contrastive regularization in RKHS:
Our framework provides three key advantages: (1) Physically interpretable descriptors via K-M coefficients; (2) Geometry-preserving RKHS projection with hybrid covariance; (3) SNR-aware feature fusion through stabilized gating.
0.3. Patching optimizer
In the domain of text classification, Radford et al. [32] demonstrated the effectiveness of segmenting text into meaningful tokens for enhanced semantic comprehension. This concept has been extended to time series analysis, where Y. Nie et al. [33] pioneered the application of patch-based embedding for time series data. Building upon this foundation, Gerald Woo et al. [34] established that time series with varying frequencies require distinct patch sizes for optimal processing. While existing techniques predominantly employ fixed patch sizes, the selection of appropriate patch dimensions proves crucial for effectively capturing the characteristics of spatiotemporal trajectories.
Spatiotemporal trajectory sequences, which can be conceptualized as combinations of behavioral features (e.g., turning maneuvers, constant velocity, and acceleration patterns), lend themselves well to patch-based segmentation for extracting spatiotemporal semantic information. However, the inherent challenge of inconsistent sampling rates across different trajectory types necessitates careful patch size selection. This approach enables the model to not only comprehend intra-patch relationships but also capture inter-patch connections across extended temporal spans, effectively transforming the problem into a higher-order Markov model framework.
To address these challenges, we introduce the PatchOptimizer module, which employs a variance-stability co-optimization mechanism to identify the most informative data regions. The module calculates both the variance of data blocks and their dynamic stability through Lyapunov exponents. While regions with higher variance typically contain richer information, we introduce a stability criterion to filter out dynamically unstable patterns. By prioritizing patches that simultaneously maximize variance and maintain chaotic stability, this approach ensures optimal resource allocation with dynamical consistency guarantees. We summarize our whole algorithm in Table 1.
To address these challenges, we introduce the PatchOptimizer module, which employs a variance-stability co-optimization mechanism to identify the most informative data regions. The module calculates both the variance of data blocks and their dynamic stability through Lyapunov exponents—a fundamental metric in dynamical systems theory that quantifies the exponential divergence rates of nearby trajectories, serving as a hallmark of chaotic behavior [35]. Following the theoretical framework of phase space reconstruction [36], we compute the maximum Lyapunov exponent () for each candidate patch using the Rosenstein algorithm [37]:
Here, is adopted as the stability criterion, inspired by empirical studies on dissipative chaotic systems [38], where values below this threshold indicate weakly chaotic or quasi-periodic dynamics suitable for local feature extraction. While regions with higher variance typically contain richer information, we introduce a stability criterion
to filter out dynamically unstable patterns. By prioritizing patches that simultaneously maximize variance and maintain chaotic stability, this approach ensures optimal resource allocation with dynamical consistency guarantees. We summarize our whole algorithm in Table 1.
0.4. Spatiotemporal adapters
The Adapter represents a lightweight fine-tuning approach that introduces a minimal number of trainable parameters, enabling efficient task adaptation while preserving the performance of the pre-trained model [29]. In this work, we design specialized Spatiotemporal Trajectory Sequence Adapters to enhance the capability of pre-trained LLMs in understanding spatiotemporal trajectory sequences.
When designing these adapters, we carefully account for the distinct characteristics of spatiotemporal trajectory sequences compared to traditional time series. Unlike conventional time series, spatiotemporal trajectories encompass both temporal and spatial dimensions, requiring the model to simultaneously capture correlations across time and space. While LLMs excel at processing textual data through their attention mechanisms—particularly in capturing long-range dependencies such as those between sentences and paragraphs—they are less inherently suited for handling the strong local correlations present in spatiotemporal data, where points at adjacent timestamps exhibit significant relationships.
To address this limitation, we integrate CNN modules into our adapter design. CNNs are uniquely advantageous in capturing local patterns and adjacent information, making them particularly effective for processing spatiotemporal trajectory sequences. By combining the global contextual understanding of LLMs with the local feature extraction capabilities of CNNs, our adapters enable the model to better comprehend the complex spatiotemporal relationships inherent in trajectory data.
The architecture of our Spatiotemporal Trajectory Sequence Adapters is as follows:
Spatiotemporal Feature Extractor Adapter (STFEA): To address the inherent limitation of LLMs in understanding spatiotemporal relationships, we propose a Spatiotemporal Feature Extraction Adapter based on a stacked architecture of CNN layers and Bidirectional Long Short-Term Memory (Bi-LSTM) networks. This hybrid design enables the model to effectively capture both spatial and temporal features of trajectory data, thereby enhancing its ability to comprehend spatiotemporal dynamics. The adapter structure is illustrated in Fig 5.
Spatiotemporal Compressor Adapter (STCA): To enhance the spatiotemporal understanding of Large Language Models (LLMs) while maintaining computational efficiency, we employ an Adapter with a Bottleneck Structure. This design leverages a Convolutional Neural Network (CNN) to capture spatiotemporal information, followed by a dimensionality reduction process that maps high-dimensional features into a low-dimensional hidden space and subsequently reconstructs them back to the original high-dimensional space. This bottleneck architecture not only mitigates the risk of overfitting but also retains critical spatiotemporal features, aligning with established practices in adapter design for reducing computational complexity. The detailed structure of the adapter is illustrated in Fig 6.
The classification performance of the two aforementioned adapters on different datasets is presented in the following Table 2.
We also compared the impact of various fine-tuning methods on classification performance metrics, as shown in Table 3.The experimental analysis indicates that the use of spatiotemporal trajectory sequence adapters significantly improved the model’s performance in classification tasks. Furthermore, our fine-tuning approach is significantly superior to other methods. The enhancement in performance can be attributed to the fact that the STFEA enhances the model’s ability to comprehend spatiotemporal dynamics; the STCA achieves an efficient integration of temporal and spatial features. The synergistic effect of both allows the model to simultaneously focus on local details and global relationships, thereby demonstrating superior performance across various datasets, further validating the effectiveness of the design.
0.5. LLM-pro
Current methodologies for fine-tuning large language models (LLMs) in cross-modal learning scenarios have demonstrated extensive applicability across diverse domains. As discussed in section Related work, prominent approaches including adapter modules, LoRA, Prefix Tuning and Prompt Tuning have shown significant promise. Our investigation reveals that augmenting pre-trained LLMs with duplicated Transformer blocks effectively maintains the foundational capabilities of the original models while enabling cross-modal adaptation.
Through empirical validation, we have established that LLMs pre-trained on textual corpora exhibit an inherent capacity for comprehending spatiotemporal trajectory patterns. In this study, we employ the GPT-2 architecture as our experimental framework, specifically utilizing a version pre-trained on Chinese language corpora. Our methodological approach involves the following implementation: We loaded the GPT-2 model pre-trained on Chinese corpora (https://gitee.com/gapyanpeng/GPT2-Chinese) and froze the parameters of its original Transformer blocks. Subsequently, we introduced a trajectory embedding module and an output module to perform trajectory classification tasks on the Geolife and AIS datasets. For comparison, we also constructed a simplified model consisting only of a trajectory embedding module and an output module. The experimental results are shown in the Table 4:
The experimental results conclusively demonstrate that GPT-2, pre-trained on Chinese corpora, exhibits inherent capability in comprehending spatiotemporal trajectory patterns. To preserve and enhance this capability, we implement a Transformer block expansion strategy that maintains the model’s original understanding while augmenting its trajectory processing capacity. Empirical research suggests that different model layers contribute variably to task performance, with mid-to-late layers typically playing a more substantial role in encoding task-specific features [27]. Specifically, for discriminative tasks such as classification or regression, the later layers’ outputs demonstrate a stronger correlation with final task objectives, while earlier layers primarily handle low-level feature extraction, including pattern recognition and general representation learning.
Building upon this theoretical foundation, we propose a targeted expansion approach that selectively replicates only the final Transformer blocks of the LLM. This strategy enables efficient injection of domain-specific knowledge while maximally preserving the pre-trained model’s general characteristics. Using GPT-2 as our exemplar architecture, we implement the following methodological process:
Model Initialization and Expansion: The GPT-2 architecture fundamentally comprises a sequential arrangement of Transformer blocks, which constitute the model’s core computational framework. Let N represent the original number of Transformer blocks and L denote the number of additional blocks to be incorporated. Consequently, the expanded model architecture will encompass a total of (N + L) Transformer blocks. For the implementation of these expanded blocks, we have developed an enhanced replication strategy, as illustrated in Fig 7.
Identity Initialization: For the extended Transformer Block, the initialization method is based on Identity Initialization, which involves copying the parameters from the corresponding layers in the pre-trained model. This not only maximizes the language comprehension ability of Large Language Models (LLMs) on large-scale corpora, but also prevents disturbances to the model distribution caused by the new layers. For the extended L-th layer, the weight initialization is given by:
Where WN−L + l and represent the weight matrix and bias vector of the
-th layer in the Transformer Block, respectively.
LayerNorm and Bias Initialization: To prevent gradient explosion or vanishing, we employ the default initialization strategy for the LayerNorm layer and bias terms, initializing the normalization parameters to have a mean of 0 and a variance of 1:
Let γ and β represent the scale factor and offset of LayerNorm, respectively.
The bias terms are initialized to zero, b = 0 .This approach prevents disturbances. This initialization method, when combined with the application of Residual Connections, ensures that during the model initialization phase, the newly added blocks maintain the transmission of input information unchanged, achieving the so-called ’identity mapping’ effect. Employing this method ensures that the model can maintain its original performance level after the introduction of new blocks, laying the groundwork for subsequent fine-tuning.
As shown in Table 5, we conducted experiments comparing different initialization strategies to verify their impact on model performance.
Fine-tuning: Upon expanding the model, we conducted fine-tuning of the L newly incorporated blocks utilizing a domain-specific spatiotemporal trajectory dataset D. During this process, only the parameters of the newly added blocks and Wpe were adjusted, whereas the parameters of the original N blocks remained fixed. The principal aim of this fine-tuning was to augment the model’s efficacy on designated tasks, specifically to reduce the loss function L as defined on dataset D.
Let represent the model’s output, and let θ be the set of all model parameters, where
is the loss function used to assess the discrepancy between predicted outcomes and true labels. In the fine-tuning process, we only update the parameters of the newly added blocks,
, while the parameters of the original blocks,
, remain unchanged.
In the subsequent experiments, we systematically evaluate the influence of varying the number of added blocks (1, 2, 3, 4, 5, 6) on both the training loss and downstream classification performance. This analysis aims to demonstrate the efficacy of our approach in enhancing model adaptability and task-specific accuracy. By incrementally increasing the number of blocks, we can observe the corresponding improvements or trade-offs in model behavior, thereby providing a comprehensive understanding of the optimal configuration for our fine-tuning strategy.
Experimental results (as shown in Tables 6, 7, 8, and Fig 8) reveal that as the number of extended layers increases, the model’s classification performance (Accuracy) on the two datasets initially improves and subsequently declines, with optimal performance achieved when three layers are added. Notably, both Feature Separability and Attention Distribution Entropy also peak at this configuration. Furthermore, in terms of optimization dynamics, the Convergence Speed and Gradient Stability required to attain an 80% accuracy rate are most favorable when three layers are incorporated.
We attribute this optimal performance to the moderate expansion of the model, which enhances its expressive power for task-specific features while mitigating the risks of overfitting and gradient instability associated with excessive layers. The middle and later layers of the model play a more significant role in extracting task-relevant features, and the addition of three layers strikes an ideal balance between capturing domain-specific trajectory classification features and preserving the general capabilities of the pre-trained model.
Additionally, experiments demonstrate that at this configuration, feature separability and attention distribution entropy reach their highest levels, while gradient stability and convergence speed are also optimized. This indicates that the three-layer configuration achieves an effective synergy between optimization dynamics and feature expression, ultimately leading to enhanced overall performance.
Experimental results (as shown in Tables 6, 7, 8, and Fig 9) reveal an inverted U-shaped relationship between layer numbers and model performance, with optimal accuracy achieved at three added layers. This phenomenon stems from three interdependent mechanisms:
Gradient Dynamics: The gradient norm in Table 8 shows optimal stability (1.2 norm with 0.05 variance) at three layers. Shallower extensions (L< = 3) maintain effective gradient flow through identity initialization, while deeper configurations (L>3) suffer from gradient vanishing as signal paths lengthen (evidenced by 2.0 gradient norm at six layers).
Capacity-Overfitting Tradeoff: Feature separability (Table 7) peaks at 2.4 with three layers, indicating enhanced discriminative power. Beyond this point, separability declines (1.8 at six layers) despite decreasing training loss (Fig 8), suggesting excessive layers memorize dataset noise rather than learning generalizable patterns.
Optimization Equilibrium: The convergence speed (18 epochs to 80
The three-layer configuration achieves dual equilibrium: sufficient non-linearity to model spatiotemporal dependencies (3.1 attention entropy in Table 7), while maintaining proximity to the original pre-trained manifold. This balances feature discriminability against optimization complexity - each additional layer beyond three provides diminishing representational returns while exponentially increasing training instability. The results empirically validate the effectiveness of controlled architectural expansion in parameter-efficient tuning.
1. Experimental
To comprehensively validate the effectiveness of K-M LLM-pro, we will design comparative experiments in the following aspects:
1. Compared to existing methods, the performance of K-M LLM-pro on the spatiotemporal trajectory classification task
2. Few-shot experiments to verify the scalability of downstream tasks of the K-M LLM-pro
3. Ablation experiments to verify the effectiveness of each key component of K-M LLM-pro
Baselines: We selected models that have demonstrated superior performance in the classification domain as our baselines, including machine learning-based random-forest, CNN-based TimesNet [10], TraClets [15], Transformer-based Non-stationary-Transformer [19], informer [18], iTransformer [20], TrajFormer [3],UniTraj [39], Mamba [40], Mamba-Attention [41] and fine-tuned LLM-based gpt2-adapter [27].
Datasets: This paper selected several representative public trajectory datasets to explore movement patterns in various scenarios. Specifically, we utilized five representative public datasets: The first is the Geolife dataset (https://github.com/yupidevs/trajectory-datasets) [7], collected by the Geolife project at Microsoft Research Asia, covering the daily activity trajectories of 182 users from April 2007 to August 2012, including a variety of outdoor activities such as shopping, hiking, and cycling, as well as various modes of transportation. The second is the Animals dataset (https://github.com/yupidevs/trajectory-datasets) [15], which records the movement trajectories of elk, deer and cattle in the Staky Experimental Forest and ranches in Oregon, reflecting the main habitat characteristics of the animals in the form of sparse sample points. The third is the hudat2 dataset (https://github.com/yupidevs/trajectory-datasets) [11], maintained by the United States National Hurricane Center, which contains hurricane trajectory information in the Atlantic region from 1950 to 2008, classified by the intensity levels of the hurricanes in the form of sparse points. The fourth is the MarineCadastre AIS Data (https://hub.marinecadastre.gov/), operated jointly by the National Oceanic and Atmospheric Administration (NOAA) and the United States Coast Guard (USCG), from which we selected five consecutive days of Automatic Identification System (AIS) data for maritime and coastal areas in April 2023. The last is the OpenSky ADS-B Data (https://opensky-network.org/data/datasets), which contains ADS-B data for aircraft obtained from the OpenSky network between April and July 2023. Through these datasets, we can comprehensively analyze the spatiotemporal trajectory analysis capabilities of K-M LLM-pro across various domains, from human daily activities to natural phenomena such as animal migration and hurricane paths, to maritime traffic and aviation. The datasets are summarized in the Table 9 below:
Experiment Setup: To ensure a fair comparison with the baselines, we divided the training, validation, and test sets in a 6:2:2 ratio, trained on a single NVIDIA® A800 40 GB GPU, with training conducted for 100 epochs, using the RAdam optimizer, with a learning rate of 0.001, a step size of 1,000,000, and a batch size of 64. We used GPT-2 (https://gitee.com/gapyanpeng/GPT2-Chinese), which was pre-trained on a text corpus, as our foundational large-scale language model. The hyperparameters were selected based on established practices in training deep models and a limited search constrained by computational resources. A learning rate of 0.001 with RAdam optimizer proved optimal for stable convergence, while a batch size of 64 balanced memory constraints and gradient stability. The chosen values were validated on a held-out subset of the Geolife dataset prior to full-scale training.
1.1. Data processing
We performed preprocessing on the five trajectory datasets. This process involved cleaning outliers from the data and filling in missing points in the trajectories using cubic spline interpolation and weighted moving average methods, to ensure the high quality and good trainability of the datasets. As can be seen from the statistical data in Table 9, there are significant differences in time step and the number of trajectories across the datasets, indicating that the sampling rates of different datasets vary significantly. Based on these statistical results, we selected an appropriate length for standardizing the truncation of all trajectories, ensuring that the trajectories input to the model have a uniform length, while retaining as much useful information as possible. The specific number of points corresponding to each dataset is detailed in Table 10.
1.2. Performance comparison with existing methods on spatiotemporal trajectory classification
We conducted a comprehensive evaluation of the proposed K-M LLM-pro model against multiple baseline methods on five public spatiotemporal trajectory datasets: Geolife, Animals, hurdat2, AIS, and ADS-B. The evaluation metrics include Accuracy (A.), Precision (P.), Recall (R.), and F1-score (F1.). The following are the formulas for the evaluation metrics:
Results: The results, summarized in Table 11 and Fig 9, demonstrate that K-M LLM-pro achieves state-of-the-art performance across all datasets and metrics, confirming its robustness and superior capability in spatiotemporal trajectory classification.
Specifically, on the Geolife dataset, K-M LLM-pro attains an accuracy of 89.8%, outperforming the second-best model, UniTraj (87.5%), by a margin of 2.3%. This improvement can be largely attributed to the physics-informed Kramers-Moyal prompt engineering, which embeds interpretable statistical priors into the trajectory representation. By projecting K-M coefficients into RKHS and fusing them with trajectory embeddings through a gated attention mechanism, the model effectively captures high-order dynamic features that are typically challenging for purely data-driven approaches. This enables more precise discrimination between subtle behavioral patterns in human mobility data.
On the Animals dataset, which contains sparse animal migration trajectories, K-M LLM-pro achieves a remarkable accuracy of 97.2%, significantly exceeding all other models. This exceptional performance demonstrates the effectiveness of our dynamic patching optimization mechanism, which combines variance maximization with Lyapunov stability criteria. By adaptively segmenting heterogeneous trajectories based on their intrinsic dynamical properties, the model maintains optimal resource allocation while preserving dynamical consistency, making it particularly suitable for handling irregularly sampled trajectories with varying temporal granularities.
The challenging hurdat2 dataset (hurricane trajectories) presents particularly difficult conditions due to high noise and non-stationary dynamics. Here, K-M LLM-pro still achieves the highest accuracy of 56.8%, surpassing traditional machine learning methods like Random Forest by approximately 5%. This robustness stems from our dual spatiotemporal adapters (STFEA and STCA), which synergistically combine CNN-local feature extraction with Bi-LSTM-global context modeling. The STFEA enhances the model’s ability to capture local spatiotemporal patterns, while the STCA efficiently compresses and reconstructs critical features through its bottleneck structure, preventing overfitting while retaining essential information for classification.
For maritime and aviation trajectories, K-M LLM-pro achieves accuracies of 90.1% on the AIS dataset and 78.9% on the ADS-B dataset, consistently exceeding all baselines. The model’s strong generalization capability across these heterogeneous trajectory types can be attributed to our parameter-efficient expansion strategy with identity initialization. By selectively replicating only the final Transformer blocks and initializing them with pre-trained parameters, we effectively inject domain-specific knowledge while preserving the LLM’s general semantic understanding capabilities. This approach enables efficient adaptation to diverse trajectory patterns without catastrophic forgetting or overfitting.
Furthermore, K-M LLM-pro maintains balanced performance across Precision, Recall, and F1-score, indicating not only high classification accuracy but also consistent discriminative power and recall stability across different classes. This balanced performance suggests that the model’s architectural components work synergistically to address the fundamental challenges in spatiotemporal trajectory analysis: the physics-guided prompt engineering addresses weak separability of high-order features, the dynamic patching optimizer handles inconsistent sampling rates, and the dual adapters bridge the modality gap between physical priors and data-driven embeddings.
In summary, the superior performance of K-M LLM-pro across diverse datasets demonstrates its effectiveness as a unified framework for spatiotemporal trajectory classification. The integration of physical principles with data-driven learning through our novel architectural components enables the model to achieve robust performance across various sampling conditions, spatial scales, and trajectory types, marking a significant advancement in the field of trajectory analysis.
Futhermore, to provide a comprehensive evaluation of model generalization and avoid overfitting, we further recorded the performance metrics (Accuracy, Precision, Recall, F1-score) across training, validation, and testing phases for all baseline models and our proposed K-M LLM-pro. Detailed results and analysis are provided in Appendix A (Table A1). These results confirm that our model not only achieves superior performance on the test set but also maintains consistency across all phases, demonstrating robust generalization capability.
To further validate the robustness of the results, a 5-fold cross-validation was conducted on the AIS and Geolife datasets. The results confirmed the stability of the model’s performance, with mean accuracy scores closely matching those from the standard 6:2:2 split (as shown in Table 12).
1.3. Evaluation of few-shot learning capabilities for downstream tasks
To validate the generalization capability and transfer learning potential of the K-M LLM-pro model in few-shot learning, we designed this experiment. The aim is to assess the performance of the K-M LLM-pro model under few-shot training conditions across various datasets, thereby evaluating its application value in downstream tasks.
Data Splitting: To simulate real-world few-shot learning scenarios, we employed varying training sample ratios on the AIS and ADS-B datasets, including 1%, 5%, 10%, and 20%, while the ratios for the validation and test sets remained constant (each at 20%). This design facilitates a comprehensive assessment of the K-M LLM-pro model’s performance under different training sample sizes, thereby gaining insights into the model’s generalization ability and stability.
Baseline Models: In addition to the K-M LLM-pro, we selected all the aforementioned baseline models to ensure comparisons under identical conditions, thereby validating the few-shot learning advantages of the K-M LLM-pro. The Table 13 presents the performance comparison between the K-M LLM-pro and other baseline models across different datasets and varying training sample ratios. The values in the table represent the average accuracy rates of the models on the test sets.
In few-shot learning scenarios, K-M LLM-pro consistently outperforms all baseline models across varying training data ratios. On the AIS dataset, it achieves accuracies of 80.2% with only 1% of the training data and 87.6% with 20%, significantly surpassing competing methods. Similarly, on the ADS-B dataset, it attains 68.5% accuracy at 1% training ratio and improves to 78.1% at 20%.
These results highlight the model’s strong generalization capacity and efficient use of limited samples, which we attribute to its physics-informed representation learning and parameter-efficient adaptation strategy. By incorporating Kramers–Moyal coefficients as physical priors, the model captures essential dynamic features even under extreme data scarcity. Furthermore, the lightweight adapter architecture and stabilized gating mechanism allow effective knowledge transfer without overfitting.
The consistent superiority under low-data regimes confirms that K-M LLM-pro is particularly suited for real-world applications where annotated trajectory data is limited or costly to acquire.
1.4. Ablation study on key components
To explore the individual contributions of various modules within the K-M LLM-pro model, we conducted a comprehensive ablation study by incrementally disabling key components. We evaluate the impact on classification accuracy across the AIS, Geolife, and Hurdat2 datasets. The results are summarized in Table 14.
The results demonstrate that each component is critical to the overall performance. The most pronounced performance degradation occurs on the challenging Hurdat2 dataset, underscoring the model’s reliance on its integrated design to handle sparse, noisy, and heterogeneous trajectory data. The removal of the Kramers-Moyal Prompt or the Extended Transformer Blocks caused the most significant drops in accuracy, highlighting the fundamental importance of injecting physics-guided priors and expanding model capacity for cross-modal adaptation.
1.4.1. Interpretability and generalization analysis of K-M coefficients.
The ablation study in Sect 1.3 preliminarily confirmed the contribution of the K-M prompt to the overall performance. To further quantify its interpretability and role in enhancing cross-modal generalization, we conducted a comparative analysis against other state-of-the-art dynamical feature descriptors. We evaluated the following methods by replacing the K-M prompt module while keeping the rest of the K-M LLM-pro architecture unchanged:
- Raw Embedding: Using only the trajectory embedding without any physical prior.
- Lyapunov Exponents: A classical descriptor for quantifying chaotic dynamics.
- Fourier Transform (FFT): Representing trajectories in the frequency domain.
- TrajODE [3]: A neural ODE-based method for continuous-time trajectory representation.
- K-M Coefficients (Ours): Our proposed physics-guided feature descriptor.
As shown in Table 15, the K-M coefficients consistently achieve superior performance across the Geolife, AIS, and Hurdat2 datasets. Notably, the improvement is most significant on the challenging Hurdat2 dataset, which consists of sparse and irregular hurricane trajectories. This demonstrates the exceptional capability of K-M coefficients in modeling complex, non-stationary dynamics where traditional descriptors fail. Furthermore, we assessed the cross-modal generalization ability by fine-tuning a model on one dataset (e.g., Geolife) and testing it on another (e.g., AIS). The K-M coefficients yielded the highest generalization accuracy, indicating that the physical prior embedded through K-M coefficients provides a robust and transferable representation that mitigates the distribution shift across heterogeneous trajectory modalities.
Generalization accuracy (Gen.) is reported for model transfer: Geolife AIS (G
A), AIS
Hurdat2 (A
H), and Hurdat2
Geolife (H
G).
These results provide empirical evidence that the K-M coefficients are not merely an auxiliary feature but a interpretable statistical prior that fundamentally enhances the model’s understanding of spatiotemporal dynamics, leading to improved accuracy and superior generalization across diverse data modalities.
1.4.2. In-depth analysis of patching and adapter contributions.
While Table 14 establishes the necessity of the Patching Optimizer and Spatiotemporal Adapters as whole modules, we conducted a finer-grained ablation to isolate their specific contributions and design choices. We compared our full design against several variants:
- Fixed-Size Patching: Replaces the Lyapunov-based Patching Optimizer with a standard fixed-size strategy (size=32, stride=16).
- w/o STFEA: Removes the Spatiotemporal Feature Extractor Adapter (STFEA).
- w/o STCA: Removes the Spatiotemporal Compressor Adapter (STCA).
- GPT-2 Base: A baseline using only the pre-trained GPT-2 backbone with a simple linear head.
The detailed results in Table 16 reveal several key insights: The Lyapunov-based Patching Optimizer provides a consistent performance advantage over fixed-size patching. The gain is most substantial ( +2.1% on Hurdat2), demonstrating its critical role in adaptively segmenting complex, non-stationary trajectories where fixed patches are suboptimal. The dual adapters are both crucial. The STFEA, specialized in local spatiotemporal feature extraction, has a slightly larger impact on dense trajectories (Geolife, AIS). Conversely, the STCA, responsible for feature refinement and compression, shows a stronger effect on the complex Hurdat2 dataset. Their combination yields the best result, confirming their complementary functions. The integrated architecture (Full Model) achieves the highest performance, demonstrating that the adaptive data segmentation by the Patching Optimizer and the powerful feature extraction by the dual Adapters work synergistically within the LLM backbone.
In conclusion, the ablation studies confirm that each module in K-M LLM-pro is indispensable. The K-M Prompt and Extended Transformer Blocks provide the foundational cross-modal and capacity enhancements, while the Patching Optimizer and Spatiotemporal Adapters work in concert to enable optimal and robust processing of diverse spatiotemporal trajectories.
1.5. Cross-dataset validation for generalizability assessment
To further evaluate the generalizability and robustness of the proposed K-M LLM-pro framework, we conducted cross-dataset validation experiments. This approach tests the model on an external dataset that was not used during training, thereby assessing its ability to generalize across different data distributions and recording conditions—a critical requirement for real-world deployment.
We trained K-M LLM-pro on two source datasets (Geolife and AIS) and evaluated its performance on three target datasets (Animals, hurdat2, and ADS-B) without any fine-tuning. The results, summarized in Table 17, demonstrate the model’s cross-domain transfer capability.
The model achieves strong transfer performance on the Animals dataset (exceeding 90% accuracy when trained on either Geolife or AIS), indicating that the learned spatiotemporal representations are highly generalizable across human and animal mobility patterns. However, performance on hurdat2 (hurricane trajectories) and ADS-B (aircraft trajectories) is comparatively lower, which can be attributed to the significant domain shift in dynamics and sampling characteristics. Notably, the model still outperforms random baselines and shows consistent metric alignment, confirming that the physics-guided prompts and adaptive patching mechanism contribute to robust feature extraction even under distribution shift.
These results affirm that K-M LLM-pro possesses substantial generalization capability, particularly across datasets with similar spatiotemporal granularity and motion patterns. Future work may involve domain adaptation techniques to further bridge the gap between highly divergent domains such as meteorological and vehicular trajectories.
2. Conclusion and future work
2.1. Conclusion
In this paper, we introduced K-M LLM-pro, a novel physics-guided cross-modal adaptation framework for fine-grained spatiotemporal trajectory classification. By synergistically integrating statistical mechanics principles with large language models (LLMs), our approach effectively addresses long-standing challenges in trajectory analysis, including the weak separability of high-order dynamic features, representation collapse under small-sample conditions, and heterogeneous representation conflicts across multi-modal data.
The core innovations of our work are fourfold:
- Physics-Informed Prompt Engineering: We proposed the novel use of Kramers-Moyal (K-M) coefficients as interpretable statistical priors. These are embedded into the LLM through a covariance-aware Reproducing Kernel Hilbert Space (RKHS) projection, enabling the model to intrinsically perceive and adhere to physical constraints during inference.
- Dynamic Patching Optimization: A variance-stability co-optimization mechanism, guided by Lyapunov exponents, was developed to dynamically segment heterogeneous trajectories. This ensures robust and consistent feature extraction across vastly varying sampling rates and spatial scales.
- Dual Spatiotemporal Adapters: The Spatiotemporal Feature Extractor Adapter (STFEA) and Spatiotemporal Compressor Adapter (STCA) modules were designed to synergistically enhance both local and global spatiotemporal feature extraction, significantly improving the model’s capacity to capture complex, fine-grained trajectory patterns.
- Parameter-Efficient Expansion: By extending the LLM with identity-initialized Transformer blocks and optimizing only 3.8% of the newly introduced parameters, we achieved effective domain adaptation while meticulously preserving the model’s general semantic capabilities and preventing catastrophic forgetting.
Extensive experiments on multiple public datasets (Geolife, AIS, Animals, Hurdat2, ADS-B) demonstrate that K-M LLM-pro consistently outperforms state-of-the-art baselines across all key classification metrics, including accuracy, precision, recall, and F1-score. Notably, the model exhibits exceptional few-shot learning capabilities, achieving competitive performance with as little as 1% of the training data. This highlights its superior data efficiency and strong generalization potential in data-scarce scenarios, a common constraint in real-world applications.
Ablation studies further confirm the critical and individual contributions of each proposed module, underscoring the necessity of the K-M prompt, patching optimizer, spatiotemporal adapters, and controlled LLM expansion for achieving optimal performance.
In summary, K-M LLM-pro represents a significant step toward bridging the gap between domain-specific physical knowledge and general-purpose AI models. Despite its advanced performance, it is important to note that the framework’s efficacy is bound by its underlying physical assumptions; its performance may diminish for systems exhibiting strong non-Markovian properties or in scenarios where the relationship between physical laws and data patterns is severely corrupted (e.g., by extreme noise). Furthermore, the integration of interpretable physical descriptors introduces a computational trade-off, potentially limiting real-time deployment in resource-constrained environments. It offers a lightweight, interpretable, and highly effective solution for complex spatiotemporal trajectory classification tasks, paving the way for more intelligent and reliable intelligent perception systems.
2.2. Future work
While K-M LLM-pro demonstrates compelling performance and robustness, several promising avenues remain for future exploration:
- Architecture Optimization: Integrating lightweight neural operators, such as neural ordinary differential equations (Neural ODEs), could further reduce computational overhead while enhancing the model’s interpretability for continuous-time dynamics.
- Multimodal Integration: Extending the framework to incorporate complementary multimodal inputs—such as visual context from satellite imagery or meteorological data from environmental sensors—could significantly enhance classification robustness in complex, real-world scenarios (e.g., distinguishing aircraft trajectories under varying weather conditions).
- Real-Time Deployment: Developing an incremental learning mechanism with adaptive physical priors is crucial for real-world applications. This would enable the model to dynamically adapt to non-stationary and evolving trajectory patterns in a continuous learning paradigm.
- Cross-Domain Applications: Exploring cross-domain applications in fields like ecological conservation (e.g., predicting endangered species migration routes) and public safety (e.g., real-time anomaly detection in urban crowd flows) could broaden the societal impact of this work.
- Theoretical Analysis: A deeper theoretical investigation into the interaction between K-M coefficients and the self-attention mechanism, particularly their role in mitigating gradient conflicts and stabilizing training during few-shot learning, will be essential for advancing the field of physics-guided LLMs.
We believe these research directions will further advance the integration of domain knowledge with generalizable AI systems, pushing the boundaries of what is possible in spatiotemporal analytics.
Appendix A: Comprehensive performance analysis across all phases
To provide a thorough evaluation of model generalization and training stability, we conducted extensive experiments comparing the performance of K-M LLM-pro against all baseline models across training, validation, and testing phases. The comprehensive results presented in Table 18 offer insights into each model’s learning behavior, generalization capability, and potential overfitting tendencies.
Analysis of comprehensive performance results
The extended results in Table 18 provide valuable insights into the training dynamics and generalization capabilities of all models across five diverse spatiotemporal trajectory datasets. Several key observations emerge from this comprehensive analysis:
1. Superior Generalization of K-M LLM-pro: Our proposed model consistently demonstrates the smallest performance gap between training and testing phases across all datasets. For instance, on the challenging hurdat2 dataset, K-M LLM-pro shows a train-test gap of 11.7% (68.5% 56.8%), which is notably smaller than the 15.6% gap observed in TraClets (63.5%
51.6%). This indicates that our physics-guided approach effectively mitigates overfitting, even on complex, sparse trajectory data.
2. Training Stability: K-M LLM-pro achieves the highest training accuracy on all datasets except hurdat2, where it ranks second. More importantly, it maintains this superiority through validation and testing phases, demonstrating both strong learning capacity and effective regularization. The identity initialization and parameter-efficient expansion strategy appear to contribute significantly to this stability.
3. Consistent Performance Across Metrics: The advantage of K-M LLM-pro is consistent across all evaluation metrics (Accuracy, Precision, Recall, F1-Score), suggesting balanced performance without bias toward any specific metric. This is particularly evident on the AIS dataset, where our model outperforms the second-best (TrajFormer) by 5.2% in accuracy, 5.3% in precision, 5.5% in recall, and 5.4% in F1-score.
4. Challenging Dataset Performance: On the most difficult hurdat2 dataset (characterized by extreme sparsity and irregular sampling), K-M LLM-pro achieves a modest but significant 1.2% improvement over the next best model (TimeNet). While absolute performance remains lower on this dataset, the relative improvement is substantial given the inherent challenges.
5. Baseline Model Comparisons: - Traditional machine learning approaches (random forest) show competitive performance on some datasets (e.g., Geolife) but struggle with more complex patterns in hurdat2 and ADS-B. - Transformer-based models (TrajFormer, n-s transformer, informer, iTransformer) generally outperform CNN-based approaches (TimeNet, TraClets), highlighting the importance of capturing long-range dependencies in spatiotemporal data. - The GPT-2 adapter baseline performs reasonably well but is consistently outperformed by our specialized architecture, demonstrating the value of domain-specific adaptations.
6. Validation-Test Consistency: The minimal performance drop from validation to test phases for K-M LLM-pro (e.g., only 0.5% on Geolife: 90.5% 90.0%) indicates robust hyperparameter tuning and effective early stopping, preventing overfitting to the validation set.
These comprehensive results reinforce the effectiveness of our proposed K-M LLM-pro framework, which integrates physics-guided prompt engineering, dynamic patching optimization, and parameter-efficient LLM expansion to achieve state-of-the-art performance across diverse spatiotemporal trajectory classification tasks.
Appendix B: Data preprocessing criteria for trajectory inclusion, truncation, and resampling
To ensure reproducibility and transparency in our experimental setup, we provide a detailed description of the inclusion/exclusion criteria and preprocessing steps applied to the spatiotemporal trajectory datasets used in this study. All preprocessing was implemented in Python, leveraging libraries such as NumPy, SciPy, and Pandas.
B.1 Trajectory inclusion and exclusion criteria
We applied the following criteria to determine whether a trajectory was included in the dataset:
- Minimum Length Requirement: Trajectories with fewer than 10 recorded points were excluded, as they lack sufficient spatiotemporal context for meaningful analysis.
- Maximum Missing Rate: Trajectories with more than 20% missing values (e.g., due to sensor dropout) were discarded.
- Plausibility Check: Trajectories containing physically implausible points (e.g., sudden jumps exceeding 100 km within a 1-second interval) were removed.
- Label Consistency: Only trajectories with clear and consistent class labels (as provided by the original datasets) were retained.
B.2 Trajectory truncation strategy
To handle variable-length trajectories and ensure uniform input dimensions for the model, we applied the following truncation strategy:
- Target Length Selection: The target length for each dataset was chosen based on the 80th percentile of trajectory lengths in that dataset (see Table 10 in the main text). This ensures that most trajectories retain their complete semantic structure while minimizing padding.
- Truncation Rule: Trajectories longer than the target length were truncated from the end, preserving the initial segment which typically contains the most stable behavioral patterns.
- Padding: Shorter trajectories were padded with zeros at the end to match the target length.
B.3 Trajectory resampling and imputation
To address irregular sampling rates and missing values, we applied the following methods:
- Resampling: Trajectories were resampled to a uniform time interval using cubic spline interpolation, which smooths the trajectory while preserving kinematic properties.
- Missing Value Imputation: Small gaps (≤ 3 consecutive points) were filled using weighted moving average with a window size of 5. Larger gaps were not imputed; instead, the trajectory was split into valid segments, and only the longest segment was retained.
B.4 Outlier detection and removal
We used a modified Z-score method with a threshold of 3.5 to detect and remove outliers in speed and acceleration values derived from the trajectory points. Points exceeding this threshold were treated as missing and imputed as described above.
B.5 Summary of preprocessing by dataset
See Table 19. These criteria and steps ensure that the processed trajectories are both representative of the original data and suitable for deep learning-based classification, while maintaining physical plausibility and temporal consistency.
Appendix C: Data availability
The minimal dataset necessary to replicate all findings reported in this study, including the cleaned and processed trajectory data, has been deposited and is publicly available on GitHub: https://github.com/christerminsder/K-M-LLM-pro.
Additionally, the original public datasets used in this study are available from the following sources:
References
- 1. Zhujun Z, Genlin JI. Research progress of spatial-temporal trajectory classification. Journal of Geo-Information Science. 2017.
- 2.
Baeg S, Hammond TA. Ship type classification based on the ship navigating trajectory and machine learning. In: IUI Workshops. 2023. p. 179–89. https://api.semanticscholar.org/CorpusID:257666723
- 3.
Liang Y, Ouyang K, Wang Y, Liu X, Chen H, Zhang J, et al. TrajFormer: efficient trajectory classification with transformers. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2022. p. 1229–37. https://doi.org/10.1145/3511808.3557481
- 4.
Wang Y, Guo J, Xu L, Li K, Li ZY. AIS data driven CNN-BiGRU model for ship target classification. In: International Conference on Spatial Data and Intelligence; 2022. https://api.semanticscholar.org/CorpusID:256391598
- 5. Xi B, Scaria K, Bavikadi D, Shakarian P. Rule-based error detection and correction to operationalize movement trajectory classification. arXiv preprint 2023. https://arxiv.org/abs/2308.14250
- 6. Bolton S, Dill R, Grimaila MR, Hodson D. ADS-B classification using multivariate long short-term memory–fully convolutional networks and data reduction techniques. J Supercomput. 2022;79(2):2281–307.
- 7.
Zheng Y, Fu H, Xie X, Ma WY, Li Q. Geolife GPS trajectory dataset - User Guide. 2011. https://api.semanticscholar.org/CorpusID:64038268
- 8. Duan H, Ma F, Miao L, Zhang C. A semi-supervised deep learning approach for vessel trajectory classification based on AIS data. Ocean & Coastal Management. 2022;218:106015.
- 9. Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S. On the opportunities and risks of foundation models. arXiv preprint 2021.
- 10. Wu H, Hu T, Liu Y, Zhou H, Wang J, Long M. Timesnet: temporal 2d-variation modeling for general time series analysis. arXiv preprint 2022. https://arxiv.org/abs/2210.02186
- 11. Luo D, Chen P, Yang J, Li X, Zhao Y. A new classification method for ship trajectories based on AIS data. JMSE. 2023;11(9):1646.
- 12. Liu L, Yu S, Wang R, Ma Z, Shen Y. How can large language models understand spatial-temporal data?. arXiv preprint 2024. https://arxiv.org/abs/2401.14192
- 13. Yenduri G, Ramalingam M, Selvi GC, Supriya Y, Srivastava G, Maddikunta PKR, et al. GPT (Generative Pre-Trained Transformer)—a comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. IEEE Access. 2024;12:54608–49.
- 14. Touvron H, Lavril T, Izacard G, Martinet X, Lachaux MA, Lacroix T. Llama: open and efficient foundation language models. arXiv preprint 2023. https://arxiv.org/abs/2302.13971
- 15. Kontopoulos I, Makris A, Tserpes K, Bogorny V. TraClets: harnessing the power of computer vision for trajectory classification. arXiv preprint 2022.
- 16.
Liu H, Wu H, Sun W, Lee I. Spatio-temporal GRU for trajectory classification. In: 2019 IEEE International Conference on Data Mining (ICDM). 2019. p. 1228–33. https://doi.org/10.1109/icdm.2019.00152
- 17. Bolton S, Dill R, Grimaila MR, Hodson D. ADS-B classification using multivariate long short-term memory–fully convolutional networks and data reduction techniques. J Supercomput. 2022;79(2):2281–307.
- 18. Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, et al. Informer: beyond efficient transformer for long sequence time-series forecasting. AAAI. 2021;35(12):11106–15.
- 19. Liu Y, Wu H, Wang J, Long M. Non-stationary transformers: Exploring the stationarity in time series forecasting. Advances in Neural Information Processing Systems. 2022;35:9881–93.
- 20. Liu Y, Hu T, Zhang H, Wu H, Wang S, Ma L. Itransformer: inverted transformers are effective for time series forecasting. arXiv preprint 2023. https://arxiv.org/abs/231006625
- 21. Akbar S, Ullah M, Raza A, Zou Q, Alghamdi W. DeepAIPs-Pred: predicting anti-inflammatory peptides using local evolutionary transformation images and structural embedding-based optimal descriptors with self-normalized BiTCNs. J Chem Inf Model. 2024;64(24):9609–25. pmid:39625463
- 22. Shahid , Hayat M, Alghamdi W, Akbar S, Raza A, Kadir RA, et al. pACP-HybDeep: predicting anticancer peptides using binary tree growth based transformer and structural feature encoding with deep-hybrid learning. Sci Rep. 2025;15(1):565. pmid:39747941
- 23. Ullah M, Akbar S, Raza A, Khan KA, Zou Q. TargetCLP: clathrin proteins prediction combining transformed and evolutionary scale modeling-based multi-view features via weighted feature integration approach. Brief Bioinform. 2024;26(1):bbaf026. pmid:39844339
- 24. Akbar S, Raza A, Awan HH, Zou Q, Alghamdi W, Saeed A. pNPs-CapsNet: predicting neuropeptides using protein language models and FastText encoding-based weighted multi-view feature integration with deep capsule neural network. ACS Omega. 2025;10(12):12403–16. pmid:40191328
- 25. Baydili İ, Tasci B, Tasci G. Deep learning-based detection of depression and suicidal tendencies in social media data with feature selection. Behav Sci (Basel). 2025;15(3):352. pmid:40150247
- 26. Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S. Lora: low-rank adaptation of large language models. arXiv preprint 2021.
- 27. Zhou T, Niu P, Wang X, Sun L, Jin R. One fits all: Universal time series analysis by pretrained lm and specially designed adaptors. arXiv preprint 2023. https://arxiv.org/abs/2311.14782
- 28. Li XL, Liang P. Prefix-tuning: optimizing continuous prompts for generation. arXiv preprint 2021.
- 29. Poth C, Sterz H, Paul I, Purkayastha S, Engländer L, Imhof T. Adapters: a unified library for parameter-efficient and modular transfer learning. arXiv preprint 2023.
- 30. Wu C, Gan Y, Ge Y, Lu Z, Wang J, Feng Y. LLaMA Pro: Progressive LLaMA with Block Expansion. arXiv preprint 2024. https://api.semanticscholar.org/CorpusID:266755997
- 31.
Kim T, Kim J, Tae Y, Park C, Choi JH, Choo J. Reversible instance normalization for accurate time-series forecasting against distribution shift. In: International Conference on Learning Representations; 2021.
- 32.
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI blog. 2019.
- 33. Nie Y, Nguyen NH, Sinthong P, Kalagnanam J. A time series is worth 64 words: long-term forecasting with transformers. arXiv preprint 2022. https://arxiv.org/abs/2211.14730
- 34. Woo G, Liu C, Kumar A, Xiong C, Savarese S, Sahoo D. Unified training of universal time series forecasting transformers. arXiv preprint 2024.
- 35. Wolf A, Swift JB, Swinney HL, Vastano JA. Determining Lyapunov exponents from a time series. Physica D: Nonlinear Phenomena. 1985;16(3):285–317.
- 36.
Takens F. Detecting strange attractors in turbulence. Dynamical systems, turbulence and Warwick 1980 : proceedings of a symposium held at the University of Warwick 1979/80. Springer; 2006. p. 366–81.
- 37. Rosenstein MT, Collins JJ, De Luca CJ. A practical method for calculating largest Lyapunov exponents from small data sets. Physica D: Nonlinear Phenomena. 1993;65(1–2):117–34.
- 38. Eckmann J-P, Ruelle D. Ergodic theory of chaos and strange attractors. Rev Mod Phys. 1985;57(3):617–56.
- 39. Zhu Y, Yu JJ, Zhao X, Wei X, Liang Y. UniTraj: learning a universal trajectory foundation model from billion-scale worldwide traces. arXiv preprint 2024. https://arxiv.org/abs/2411.03859
- 40. Gu A, Dao T. Mamba: linear-time sequence modeling with selective state spaces. arXiv preprint 2024. https://arxiv.org/abs/2312.00752
- 41. Xiong S, Liu S, Tang C, Okubo F, Xiong H, Shimada A. Attention mamba: time series modeling with adaptive pooling acceleration and receptive field enhancements. arXiv preprint 2025. https://arxiv.org/abs/2504.02013