STGAD: Self-temporal generative adversarial framework with transformer attention for unsupervised multivariate time-series anomaly detection and localization

Xiao Liao; Wei Deng; Hongyue Ma; Yihan Mu

doi:10.1371/journal.pone.0349223

Abstract

Unsupervised anomaly detection in multivariate time series is important for maintaining the reliability of complex cyber-physical systems. However, existing methods often face practical challenges in adversarial stability, temporal dependency modeling, and anomaly-score calibration across datasets. We present STGAD, a dual-score generative-adversarial framework for anomaly detection and localization in multivariate time series. STGAD employs a WGAN-GP critic with a Transformer encoder to perform self-temporal modeling of within-window dependencies and cross-variable interactions, and uses a stochastic generator trained under adversarial supervision with sample-level proximity regularization to model normal temporal patterns. During inference, multiple generated candidates are sampled for each input window, and the minimum residual is used as a sample-matching anomaly cue. This residual-based score is fused with the critic-based score after normalization, and final anomaly decisions are produced by distribution-adaptive thresholding. Experiments on five benchmark datasets spanning server monitoring, aerospace telemetry, industrial control, and ECG signals (SMD, SMAP, MSL, SWaT, and MIT-BIH) show that STGAD achieves strong and consistent performance against representative baselines. Ablation and robustness analyses further demonstrate the effectiveness of critic-side temporal modeling, stable adversarial learning, and dual-score fusion in the proposed framework.

Citation: Liao X, Deng W, Ma H, Mu Y (2026) STGAD: Self-temporal generative adversarial framework with transformer attention for unsupervised multivariate time-series anomaly detection and localization. PLoS One 21(5): e0349223. https://doi.org/10.1371/journal.pone.0349223

Editor: Abdul Ahad, Northwestern Polytechnical University School of Software and Microelectronics, CHINA

Received: October 16, 2025; Accepted: April 27, 2026; Published: May 21, 2026

Copyright: © 2026 Liao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data were obtained from third-party public repositories and controlled-access sources. The SMD (Server Machine Dataset) is available at the OmniAnomaly repository: https://github.com/NetManAIOps/OmniAnomaly. The SMAP and MSL (NASA Telemetry Datasets) are available at the NASA telemetry repository: https://s3-us-west-2.amazonaws.com/telemanom/data.zip. The SWaT (Secure Water Treatment Dataset) is available upon request from iTrust, Singapore University of Technology and Design, at: https://itrust.sutd.edu.sg/test-beds/secure-water-treatment-swat/. The MIT-BIH Arrhythmia Database is available from PhysioNet at: https://physionet.org/content/mitdb/1.0.0/. The code implementation of STGAD is publicly available at: https://github.com/shy3119/STGAD.git.

Funding: This work was supported in part by the State Grid Information and Telecommunication Group Co., Ltd., which coordinates scientific and technological projects (SGIT0000XTJS2401078). There was no additional external funding received for this study. The funder provided support in the form of salaries for authors X.L., W.D. and H.M., but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Abbreviation: STGAD, Self-Temporal Generative Adversarial Detector; MTS-AD, multivariate time-series anomaly detection; GAN, Generative Adversarial Network; WGAN-GP, Wasserstein GAN with Gradient Penalty; MLP, Multi-Layer Perceptron; RNN, Recurrent Neural Network; CNN, Convolutional Neural Network; VAE, Variational Autoencoder; ELBO, Evidence Lower Bound; GNN, Graph Neural Network; SMD, Server Machine Dataset; SMAP, Soil Moisture Active Passive (NASA dataset); MSL, Mars Science Laboratory (NASA telemetry dataset); SWaT, Secure Water Treatment (testbed dataset); MIT-BIH, MIT-BIH Arrhythmia dataset; AUC, Area Under the Receiver Operating Characteristic Curve; PR-AUC, Area Under the Precision–Recall Curve: P: Precision, R: Recall, F1: F1-score (harmonic mean of Precision and Recall); POT, Peak-Over-Threshold; CPU, Central Processing Unit; ECG, Electrocardiogram

1. Introduction

Modern IT operations and industrial IoT ecosystems continuously generate massive, high-dimensional, and strongly correlated streams of sensor readings and logs at sub-second cadence to monitor system health, service performance, and security risk in large-scale infrastructures [1,2]. Traditionally, engineers and data-mining practitioners have relied on rule bases, statistical control charts, and expert thresholds to flag deviations from nominal behavior and issue fault reports—artifacts that underpin reactive fault tolerance and robust database/system design workflows [3,4]. As cloud-native platforms, edge–cloud pipelines, and automated production lines scale, the dimensionality, sampling rates, and throughput of telemetry increase sharply, while anomaly manifestations diversify (point, contextual, collective; abrupt and gradual), making single rules or static statistics insufficient to meet simultaneous requirements of low miss rate and low false alarm in online monitoring [1,5]. The maturation of big-data analytics and deep learning has therefore brought renewed attention—across data mining and AIOps communities—to the problem of automatically discovering departures from normal mechanisms under weak or absent labels in high-dimensional time series. In Industry-4.0 manufacturing and process control, scenarios centered on service reliability and autonomous fault management are particularly salient: unified modeling of sensor streams and event logs enables early detection of incipient anomalies, triggering failover and self-healing to reduce downtime and maintenance cost [5–7]. This problem is commonly formulated as multivariate time-series anomaly detection (MTS-AD): identifying observations or segments that deviate from expected temporal evolution in high-dimensional, non-stationary sequences that may exhibit long-range dependencies [8,9]. Leveraging scalable computation and streaming frameworks, data-driven sectors—including distributed computing, IoT/vehicular systems, robotics and process industries, as well as urban infrastructure for energy and transportation—are increasingly adopting machine-learning-based unsupervised/self-supervised methods to improve alert timeliness and accuracy: one line learns normal behavior and measures deviation via reconstruction/prediction residuals; another performs explicit or implicit density modeling to estimate sample-level likelihood; still others employ adversarial or representation learning to better capture complex inter-variable structure and weak-magnitude anomalies [10,11]. This shift from expert-driven heuristics to data-driven modeling provides a practical pathway toward predictive maintenance and strengthened security in large-scale operational systems [12–14].

As modern telemetry scales across devices and services, multivariate time-series anomaly detection is becoming increasingly difficult due to higher dimensionality, richer modalities, and greater volatility. Contemporary IoT/IT platforms continually add sensors and microservices, inflating correlation structure and sampling heterogeneity, which in turn raises the data requirements for reliable inference and robust calibration. Meanwhile, privacy constraints and the rise of federated/geo-distributed training make cross-site synchronization costly, yielding fragmented, non-IID local datasets that hinder generalizable representation learning. Because these time series originate from engineered systems interacting with humans and environments, observations exhibit both stochastic variability and structured temporal dynamics; models must disentangle random fluctuations from genuine deviations under non-stationarity, concept drift, long-range dependencies, and time-varying inter-variable couplings [11,15]. Moreover, label scarcity and anomaly diversity limit the direct use of supervised paradigms that succeed elsewhere in data mining, while contamination of the “normal” set by a small number of anomalies weakens unsupervised objectives and thresholding [16,17]. Finally, practical value hinges not only on detecting anomalies but also on root-cause localization—identifying which variables, components, or subsystems drive failures—turning the task into joint detection and attribution under distribution shift and sparse feedback, which further complicates methodology and evaluation [18,19].

Existing approaches to multivariate time-series anomaly detection (MTS-AD) fall into several families. Statistical/signal-processing methods—control charts, change-point tests, ARIMA/Kalman filtering, and spectral/wavelet analysis—offer lightweight, interpretable pipelines grounded in explicit stochastic assumptions, yet struggle with high-dimensional nonlinearity and time-varying couplings [20–23]. Classical machine-learning techniques (e.g., OC-SVM, Isolation Forest, LOF and streaming variants) quantify outliers via density/distance or margin-based criteria, working well in moderate dimensions or weakly temporal settings but lacking expressivity for long-range dependencies and inter-variable structure [24–27]. Deep learning advances have proceeded along two main lines: reconstruction/prediction-driven models (autoencoders/VAEs, TCN/LSTM, Seq2Seq forecasting) that learn nominal dynamics and score deviations via residuals; and probabilistic/ generative models (normalizing flows, diffusion, GANs) that estimate sample likelihoods or discriminator confidences from explicit or implicit distributions [11,13,14]. With the rise of attention mechanisms, Transformer-based architectures strengthen long-horizon temporal modeling, while graph neural/attention models encode cross-variable relations using prior or adaptive graphs; hybrid designs combine temporal and graph structure for greater expressivity, and recent engineering AI studies have also highlighted the value of structured relational reasoning for learning from complex heterogeneous data [28–31]. On the decision side, researchers explore multi-score fusion, distribution-adaptive thresholds (e.g., EVT/POT), segment-level post-processing, and calibration to reduce false alarms and improve cross-domain comparability [32,33]. Practicality is further supported by work on online/incremental learning, knowledge distillation/pruning, federated learning, and explainable/causal attribution [34]. Despite these advances, notable gaps remain: robust detection of weak-magnitude and long-period anomalies; reliable score calibration under non-stationarity and distributional shifts; and cross-scenario generalization. Reconstruction/prediction objectives can “absorb” mild anomalies and compress decision margins; generative/adversarial objectives face training stability and score-scale consistency issues; and attention/graph models often rely on windowing or fixed priors that become brittle when transferred.

Building on sequence modeling and adversarial learning, we propose STGAD, a dual-score generative-adversarial framework for unsupervised multivariate time-series anomaly detection. STGAD combines a WGAN-GP critic with a Transformer encoder for within-window temporal modeling and a stochastic generator trained with adversarial and sample-level proximity constraints. During inference, the generator produces multiple candidate windows, and the minimum residual to the observed window is used as a sample-matching anomaly cue. This residual-based signal is fused with the critic-based score after validation-based normalization, and anomaly decisions are obtained through distribution-adaptive thresholding. Extensive experiments on five benchmark datasets show that this design provides strong and consistent performance across heterogeneous scenarios. Overall, STGAD offers an effective combination of temporal expressivity, adversarial stability, score calibration, and practical applicability for anomaly detection and localization in multivariate time series. The remainder of this paper is organized as follows. Section 2 reviews representative studies on multivariate time-series anomaly detection. Section 3 presents the proposed STGAD framework, including the model architecture and the training/inference procedure. Section 4 describes the datasets, preprocessing steps, model configuration, and evaluation protocol. Section 5 reports the comparative results together with ablation, robustness, sensitivity, and interpretability analyses. Finally, Section 6 concludes the paper and outlines directions for future work.

2. Related work

Traditional anomaly detection methods typically model the distribution of time-series data in an unsupervised manner. One major line is clustering (e.g., k-means and regression-style scoring), which identifies “normal” prototypes and flags deviations via distances or within-cluster residuals [35]; another line comprises distance-based schemes (fixed-radius counts, k-NN distance ranks) that trigger alarms when local neighborhoods thin out [36]. In parallel, density models (e.g., kernel density estimation or mixture models) label low-likelihood samples using (log-)probability thresholds [37], while isolation methods (Isolation/half-space trees) recursively partition the feature space so that points that are easier to isolate receive higher anomaly scores [38]. To better exploit temporal structure, signal-transform techniques—most notably wavelet and Hilbert transforms—map sequences into time–frequency or instantaneous amplitude/phase domains, from which robust multiscale features are extracted and fed to the above scorers [39]. A complementary autoregressive forecasting–residual paradigm (AR/ARMA/ARIMA/SARIMA, and ARIMAX when exogenous covariates are available) models short-range autocorrelation and seasonality and then uses one- or multi-step prediction errors as anomaly scores [40]. However, these classical routes generally rely on near-stationarity and weak nonlinearity: in high-order, strongly coupled, and non-stationary multivariate telemetry, distance/density methods become sensitive to metric choice and bandwidths with poor threshold transferability; clustering and density models require frequent re-training under concept drift; transform-domain methods depend on band/scale selection; and linear AR models struggle with long-range dependencies, time-varying inter-variable couplings, and weak-magnitude/contextual/collective anomalies. As a result, classical techniques often need to be combined with more expressive temporal representations and calibration strategies in modern operational settings.

For high-dimensional, tightly coupled telemetry, modern MTS-AD models replace hand-crafted scores with learned reconstructions, likelihoods, and relational reasoning. DAGMM [41] couples a deep autoencoder with a latent Gaussian-mixture estimator: the decoder minimizes reconstruction error while the GMM yields an energy score unifying residual and likelihood; however, near-Gaussian latent assumptions can underfit multi-modal normality and hinder cross-dataset calibration. OmniAnomaly [11] employs a stochastic recurrent VAE with temporal priors to capture non-stationary dynamics and uncertainty, detecting via reconstruction probability and KL terms; training/scoring can be sensitive to sequence length and hyperparameters, and mild anomalies may be “absorbed” by the decoder. MAD-GAN [14] uses recurrent generators/discriminators so that discriminator confidence and reconstruction discrepancy jointly expose outliers, yet adversarial optimization can be unstable (e.g., mode collapse) and is often confined to short windows. MSCRED [42] constructs multi-scale signature matrices (variable–variable correlation snapshots) and applies convolutional encoder–decoders to reveal deviations across time and inter-series structure; performance depends on window/scale choices and may miss weak, long-period drifts. USAD [33] adopts a dual-autoencoder scheme trained with complementary reconstruction objectives (and a minimax-style loss on first/second-stage reconstructions) to produce more robust residual scores under contamination; while efficient and architecture-simple, it remains window-based, can still compress weak anomalies into the reconstruction channel, and requires careful threshold calibration across datasets. TranAD [28] leverages a Transformer encoder–decoder with self-conditioning and adversarial training to stabilize long input windows and amplify small residuals; nevertheless, attention is typically bounded to a fixed window and score calibration still needs extra care. GDN [30] learns an adaptive variable graph and applies graph attention to quantify node-level deviations from learned relations—well suited to time-varying couplings—yet learned graphs are noise-sensitive, erroneous edges inflate false alarms, and graph construction/inference costs grow with dimensionality. Overall, these models advance expressivity across reconstruction/likelihood, adversarial discrimination, and temporal-graph structure, while leaving open issues in weak-magnitude/long-period detectability, robust calibration, and brittleness to windowing or priors.

Table 1 summarizes the strengths and limitations of mainstream methods: DAGMM performs representation compression via an autoencoder coupled with a Gaussian mixture model but exhibits limited temporal modeling; OmniAnomaly employs a variational recurrent architecture to capture temporal uncertainty, yet is sensitive to noise; MAD-GAN adopts a generative adversarial framework with recurrent units to model nonlinear dynamics, but suffers from training instability; MSCRED extracts multi-scale features using convolutional and recurrent networks at a higher computational cost; TranAD introduces local self-attention with a local Transformer but provides restricted global context; GDN encodes inter-variable relations with graph neural networks, although it often relies on prior graph structure.

Download:

Table 1. Comparison of representative methods in terms of key ideas, advantages, and limitations.

https://doi.org/10.1371/journal.pone.0349223.t001

STGAD addresses these gaps through a dual-score generative-adversarial design for multivariate time-series anomaly detection. The framework combines a WGAN-GP critic with a Transformer encoder to capture within-window temporal dependencies and cross-variable interactions, together with a stochastic generator that provides complementary residual evidence through sample matching at inference time. By fusing the critic-based score and the residual-based score under validation-based normalization and distribution-adaptive thresholding, STGAD enhances sensitivity to weak, gradual, and contextual anomalies while improving score stability across datasets. Overall, the framework provides an effective balance of temporal modeling, adversarial robustness, and practical anomaly scoring without relying on external graph priors.

3. Methodology

This section presents the detailed methodology of the proposed STGAD framework. We begin by formulating the multivariate time-series anomaly detection problem and describing how the data is segmented for training. We then introduce the overall model architecture, including the design of the Generator and Discriminator components. Finally, we outline the training and inference procedures, highlighting the use of WGAN-GP and the dual scoring mechanism for anomaly detection.

3.1. Problem formulation

Let be a multivariate time series, where each time step denotes a d-dimensional observation vector collected from a system with multiple sensors. The goal of anomaly detection is to determine whether a given subsequence exhibits behavior that deviates significantly from the normal patterns observed in the historical data.

To capture local temporal dependencies and generate structured inputs for training, we segment the time series into overlapping fixed-length windows of size . Each window is defined as:

(1)

The resulting training set is a sequence of windows , where .

We assume that the training dataset consists of primarily normal behavior, with negligible or no anomalous data. The model is trained unsupervised, learning the distribution of normal patterns. During inference, given a test window , the model computes an anomaly score , and compares it to a predefined threshold to determine whether an anomaly has occurred:

(2)

where denotes an anomaly and indicates normal behavior.

Modeling time series at the window granularity enables us to encode temporal evolution together with cross-variable dependencies—capabilities that are crucial for uncovering subtle and complex anomalies in multivariate settings.

3.2. Model architecture

STGAD adopts a dual-score generative–adversarial framework for multivariate time-series anomaly detection. The framework consists of a stochastic Generator and a WGAN-GP-based Discriminator. During training, the Generator synthesizes plausible temporal windows under adversarial supervision together with a sample-level proximity constraint, while the Discriminator learns to distinguish observed windows from generated ones. During inference, anomaly evidence is derived from two complementary signals: a critic-based score from the Discriminator and a sample-matching residual obtained from multiple generated candidates. Fig 1 illustrates the overall architecture and information flow of the framework.

Download:

Fig 1. Overall architecture of the proposed STGAD framework.

https://doi.org/10.1371/journal.pone.0349223.g001

Generator Architecture. The Generator in STGAD is a stochastic temporal generator that maps latent noise sequences to plausible multivariate windows. As shown in Fig 2, a sequence of latent vectors is first projected into a shared hidden space, then processed by an LSTM-based temporal encoder to capture sequential dependencies, and finally decoded step by step by a two-layer MLP to produce the multivariate output sequence. In this way, the Generator learns structured temporal dynamics under adversarial training rather than relying on a deterministic autoencoding pathway.

Download:

Fig 2. Internal architecture of the Generator in STGAD.

https://doi.org/10.1371/journal.pone.0349223.g002

In addition to the adversarial objective, the Generator is regularized by a sample-level proximity term during training, which constrains generated outputs to remain close to the observed input pattern. As a result, the Generator serves as a stochastic sequence model with sample-level consistency regularization. During inference, STGAD draws multiple generated candidates from noise and uses the minimum residual with respect to the input window as a sample-matching anomaly cue. This residual signal complements the critic-based score from the Discriminator in the final decision process.

Discriminator Architecture. The Discriminator evaluates whether an input window is more consistent with the observed data distribution or with generated samples. As shown in Fig 3, each time step is first projected into a latent representation through a shared linear layer followed by LeakyReLU activation. Sinusoidal positional encodings are then added to preserve temporal order. The resulting sequence is processed by a standard Transformer encoder embedded in the critic, which performs global within-window temporal reasoning through multi-head self-attention. This design enables the Discriminator to capture long-range temporal dependencies and cross-variable interactions before the sequence is summarized by temporal average pooling and mapped to a scalar normality score through a two-layer MLP.

Download:

Fig 3. Internal architecture of the Discriminator in STGAD.

https://doi.org/10.1371/journal.pone.0349223.g003

In summary, STGAD combines a stochastic Generator with sample-level proximity regularization and a Transformer-based critic under WGAN-GP training, producing complementary residual-based and critic-based anomaly evidence for downstream detection and localization.

3.3. Training and inference

STGAD adopts the Wasserstein GAN with Gradient Penalty (WGAN-GP) to improve adversarial stability and mitigate common failure modes such as mode collapse. During training, the Generator and Discriminator are jointly optimized so that the Generator learns plausible temporal windows under adversarial supervision and sample-level proximity regularization, while the Discriminator learns a stable critic for distinguishing observed from generated sequences. During inference, anomaly likelihood is evaluated using two complementary signals: a sample-matching residual obtained from multiple generated candidates and a critic-based score from the Discriminator.

The Discriminator loss combines the score difference between real and generated sequences with a gradient penalty computed from interpolated samples:

(3)

The Generator loss is defined as the negative of the Discriminator score on generated samples together with an proximity term:

(4)

In Eq. (4), the first term encourages the Generator to produce samples that obtain higher scores from the Discriminator, while the second term introduces a sample-level proximity constraint between the generated sequence and the input window x. This term regularizes the generated trajectory, suppresses excessive deviations, and improves the stability of adversarial optimization.

The overall optimization routine follows the standard WGAN-GP schedule in which the Discriminator is updated multiple times per Generator step. The resulting procedure is summarized in Algorithm 1 (Training Procedure of STGAD), which instantiates the losses in Eqs. (3)–(4) and specifies the update loop.

Algorithm 1. STGAD Training Algorithm (WGAN-GP)

Require: Generator G, Discriminator D; training window set ; batch size B; critic steps n_critic; learning rate ; gradient-penalty weight ; proximity weight ; total iterations T

1: Initialize weights of G and D

2: for t=1 to T do

3: for j=1 to n_critic do

4: Sample real batch and noise

5:

6:

7:

8:

9: Update D using Adam on with step size

10: end for

11: Sample real batch and noise

12:

13:

14: Update G using Adam on with step size

15: end for

During inference, STGAD evaluates anomaly likelihood using two complementary signals. First, the Generator produces multiple candidate windows from independent noise draws, and the minimum residual with respect to the observed input window is recorded:

(5)

Because the Generator is stochastic, multiple candidates are sampled for each window and the minimum residual is retained as a robust sample-matching score. This score approximates the distance between the observed window and the learned normal temporal manifold, while reducing sensitivity to occasional poor generated samples.

Second, the critic-based score is computed as:

(6)

The final anomaly score is obtained by fusing the normalized residual-based and critic-based components:

(7)

In practice, both per-window scores are min–max normalized on the validation split before fusion. The fusion weight β is selected through validation sensitivity analysis and then fixed during testing. Final anomaly decisions are produced using POT-based thresholding. The inference and dual-score anomaly scoring procedure is summarized in Algorithm 2.

Algorithm 2 STGAD Inference and Dual-Score Anomaly Scoring

Require: Trained G,D; test window x; number of draws K; fusion weight ; validation statistics for min–max normalization; threshold

1: for k=1 to K do

2: Sample

3:

4:

5: end for

6: ▷(sample-matching residual)

7: ▷ (critic-based score)

8: Normalize r(x) and s_critic(x) to [0,1] using validation statistics

9: Obtain normalized scores and

10:

11: if then

12: Label x as anomalous

13: else

14: Label x as normal

15: end if

16: return S(x) and the predicted label

Both the Generator and Discriminator are optimized using the Adam optimizer (learning rate = 0.001, , ), with batch size 64. The time-series data is segmented into overlapping windows and reshaped into sequences, where is the number of time steps per window and is the number of variables. Gradient clipping and early stopping are applied to ensure training stability and generalization. The model is implemented in PyTorch and supports GPU acceleration.

4. Experimental setup

This section details the experimental protocol used to assess STGAD’s effectiveness. Presented first are the benchmark datasets together with their preprocessing procedures; next, we specify the windowing scheme and model hyperparameters. To conclude, we lay out the evaluation metrics and the baseline comparison configuration so that results remain fair and reproducible.

4.1. Datasets and preprocessing

We assess STGAD on five widely used multivariate time-series anomaly benchmarks: SMD, SMAP, MSL, SWaT, and MIT-BIH. These corpora cover varied application areas—cloud operations, industrial control, aerospace telemetry, and physiological signal analysis. Brief summaries are provided below:

SMD (Server Machine Dataset) [43]: Multivariate system metrics from 28 cloud servers, each with synthetic anomalies. Features include CPU, memory, and network load.

SMAP & MSL (NASA Telemetry) [44]: Satellite and rover telemetry data with labeled anomalies due to system faults. Data contains multiple correlated physical channels.

SWaT [45]: Cyber-physical datasets collected from industrial control system testbeds. Each includes attack scenarios labeled as anomalies across water treatment and distribution systems.

MIT-BIH (MIT-BIH Arrhythmia) [46]: ECG recordings annotated with cardiac arrhythmias, converted into multivariate time series to test sensitivity to periodic yet irregular biological signals.

All datasets are standardized with per-variable z-score normalization computed exclusively from the training split; the resulting means and standard deviations are then applied unchanged to validation and test data. When an official train/test split is available, we adopt it verbatim; otherwise, we use a chronological split in which the first 80% of each series constitutes the training portion (assumed normal) and the remaining 20% serves as the test set. The anomaly labels in our benchmarks arise either from naturally occurring events or from controlled synthetic injections designed to probe robustness; together, these cases cover both realistic operational drifts and stress scenarios.

For window-level evaluation, we follow the point-adjusted convention used in our evaluation section: a window is marked anomalous if any timestamp within the window overlaps a ground-truth anomaly segment. To ensure reproducibility, all normalization statistics are derived solely from the training split, no resampling is performed, and each dataset’s native sampling rate and channel configuration are preserved throughout preprocessing and evaluation.

4.2. Windowing and model configuration

Fig 4 summarizes the end-to-end procedure used in our experiments, from preprocessing to inference and score fusion. We segment each multivariate time series into overlapping windows, with window length and stride . This dense windowing scheme increases the effective number of training samples and allows the model to observe short-period regularities and typical lead–lag relationships across variables in a consistent manner.

Download:

Fig 4. Windowing and fusion procedure.

https://doi.org/10.1371/journal.pone.0349223.g004

We choose as the default setting because short-to-moderate windows provide the best balance between temporal context and anomaly localization. In our setting, shorter windows preserve localized anomaly signatures and reduce the risk that weak or brief abnormal patterns are diluted by excessive surrounding context. At the same time, they keep the within-window variability and computational cost manageable. This choice is also supported by the sensitivity analysis in Section 5.4, where short-to-moderate window lengths consistently provide a strong accuracy–efficiency trade-off across datasets. Unless otherwise stated, the same window length and stride are used for all datasets to maintain a unified experimental protocol.

During inference, the residual-based score and the critic-based score are min–max normalized using statistics computed on the validation split; these normalization statistics remain fixed at test time. The two normalized scores are then fused with equal weighting, i.e., for the residual-based component and for the critic-based component. The fusion weight is selected based on validation sensitivity analysis and kept unchanged during testing. After fusion, the decision threshold is determined on the validation split using POT and then applied unchanged to the test split to produce timeline-level anomaly decisions.

All remaining hyperparameters are kept fixed unless a dataset imposes a documented domain-specific constraint. We train the model using Adam with a learning rate of 0.001 and a batch size of 64, together with gradient clipping and early stopping based on validation performance. Implementation details of adversarial optimization and update scheduling are reported in Section 3 and are not repeated here.

4.3. Evaluation protocol

We assess STGAD and all baselines with standard anomaly-detection criteria—Precision, Recall, F1-score, and ROC-AUC. A point-adjusted protocol is employed: an anomalous episode counts as detected if any timestamp within its span is flagged. For datasets with pronounced class imbalance, we additionally report Precision–Recall AUC (PR-AUC) or a Composite F1-score. To bolster reliability, each metric is averaged across multiple independent runs.

For fair benchmarking, we evaluate all baseline methods—DAGMM, USAD, OmniAnomaly, MAD-GAN, and TranAD—in an identical unsupervised setting. This means all models are trained solely on normal data, employing identical windowing preprocessing and applying uniform scoring rules consistent with STGAD. Furthermore, we use each baseline’s original open-source implementation with its default hyperparameter settings (as reported in the respective papers), without any additional tuning. Thresholding is applied uniformly to all methods. For each model, we fit the POT threshold on validation anomaly scores and then apply the resulting threshold unchanged to the test split. This protocol avoids using test labels for threshold selection and ensures a consistent, fair comparison across STGAD and all baselines.

5. Results and analysis

This section summarizes the empirical results of STGAD. We first present benchmark performance and efficiency comparisons, followed by ablation and stability analyses. We then report robustness, sensitivity, and interpretability results to further characterize the model’s behavior in practical settings.

5.1. Comparative evaluation with baseline methods

To evaluate the effectiveness of the proposed STGAD model comprehensively, we compare its performance with five representative baseline methods: TranAD, USAD, OmniAnomaly, DAGMM, and MAD-GAN. Experiments are conducted on five benchmark multivariate time-series anomaly detection datasets: MIT-BIH, SMAP, MSL, SWaT, and SMD. We report four standard metrics—Precision (P), Recall (R), F1-score, and Area Under the ROC Curve (AUC)—to provide a comprehensive assessment of detection accuracy and robustness. The performance comparison across the five datasets is summarized in Table 2.

Download:

Table 2. Performance comparison of STGAD and baseline methods across multiple datasets.

https://doi.org/10.1371/journal.pone.0349223.t002

As summarized in Table 2, STGAD achieves the best or near-best results across the five datasets, with consistently strong F1 and AUC. The improvements are most evident on MIT-BIH, MSL, and SMD, while on SWaT (highly imbalanced and noisy), all methods exhibit lower recall and STGAD remains competitive.

In addition to detection accuracy, we report computational cost to complement the practical evaluation in Tables 3 and 4.

Download:

Table 3. Training time (seconds) for 5 epochs.

https://doi.org/10.1371/journal.pone.0349223.t003

Download:

Table 4. STGAD inference throughput and latency.

https://doi.org/10.1371/journal.pone.0349223.t004

Table 3 summarizes the wall-clock training time of each method for five epochs under the same experimental environment. Overall, STGAD exhibits competitive training efficiency relative to several reconstruction-based and probabilistic baselines, while remaining substantially faster than heavier generative models such as OmniAnomaly on larger datasets.

Table 4 reports the inference throughput and latency of STGAD on an RTX 4060 Laptop GPU under the default sampling setting used in our experiments. Across datasets, the model processes approximately 793–861 windows per second, corresponding to about 1.16–1.26 ms per window. These results indicate that STGAD is suitable for offline analysis and near-real-time monitoring scenarios. Since the residual branch relies on sample matching over multiple generated candidates, the inference cost increases approximately linearly with the sampling budget N. In practice, this means that a moderate N offers a favorable balance between residual robustness and latency, while stricter real-time deployments may reduce N to satisfy tighter timing constraints.

5.2. Ablation analysis

To further evaluate the contribution of each component in STGAD, we conduct an ablation study focusing on three key design factors: the attention-based temporal interaction module in the critic, the Transformer encoder used for within-window temporal modeling, and the WGAN-GP training objective. Accordingly, we construct three variant models for comparison.

w/o Self-Attn: removes the multi-head self-attention operation from the critic while retaining the remaining projection and scoring layers.

w/o Transformer: replaces the Transformer encoder in the critic with a simplified single-layer MLP, thereby removing structured within-window temporal modeling.

w/o WGAN-GP: replaces the WGAN-GP objective with the standard GAN loss, in order to assess the effect of Wasserstein training with gradient penalty on optimization stability and detection performance.

For brevity, Table 5 uses the abbreviations w/o Attn for w/o Self-Attn, w/o Trans for w/o Transformer, and VGAN for the variant without WGAN-GP. Each variant is evaluated across the five benchmark datasets (MIT-BIH, SMAP, MSL, SWaT, and SMD). Table 5 presents the comparative results across Precision (P), Recall (R), ROC-AUC, and F1-score metrics.

Download:

Table 5. Ablation study comparing STGAD and its variant models on multiple datasets.

https://doi.org/10.1371/journal.pone.0349223.t005

Table 5 shows that removing either attention or the temporal encoder reduces performance, confirming that both cross-time modeling and within-window temporal encoding are beneficial.

The effect of removing the temporal encoder is dataset-dependent and is particularly pronounced on SMD, where F1 drops substantially, indicating that richer temporal encoding is important for complex telemetry.

Replacing WGAN-GP with a vanilla GAN objective (VGAN) leads to severe performance degradation on multiple datasets, often collapsing to near-zero F1, highlighting the importance of stable adversarial training.

In addition to the endpoint metrics in Table 5, Fig 5 and Table 6 provide training-stability evidence for the effect of WGAN-GP. Under the standard GAN objective (STGAD w/o WGAN-GP), the discriminator shows clear saturation behavior, whereas the WGAN-GP variant maintains a well-controlled gradient penalty throughout training.

Download:

Table 6. Training-loss variability (std) under standard GAN vs WGAN-GP.

https://doi.org/10.1371/journal.pone.0349223.t006

Download:

Fig 5. Representative training stability evidence on MSL.

https://doi.org/10.1371/journal.pone.0349223.g005

Consistently, Table 6 shows that WGAN-GP substantially reduces the variability of discriminator loss across datasets, indicating smoother and more stable adversarial optimization. Together, these results support that WGAN-GP is an important component for reliable training in STGAD.

5.3. Robustness analysis

To evaluate robustness under sensor noise, we conduct Gaussian noise injection experiments by adding zero-mean Gaussian noise to the input time series during training. We consider two noise intensities, σ = 0.1 and σ = 0.25, and evaluate all methods under the same noise settings for a fair comparison on five benchmark datasets (MIT-BIH, SMAP, MSL, SWaT, and SMD). We report AUC and F1-score to assess both ranking quality and detection accuracy under noise in Table 7.

Download:

Table 7. Robustness Comparison under Gaussian Noise (AUC/ F1).

https://doi.org/10.1371/journal.pone.0349223.t007

While most methods perform well at σ = 0.1, several baselines degrade noticeably when noise increases to σ = 0.25. In contrast, STGAD shows comparatively stable performance across the two noise levels on all datasets, maintaining strong AUC and F1. Overall, the robustness of STGAD is consistent with its design of combining complementary signals and modeling temporal structure within windows, which reduces sensitivity to noise perturbations.

5.4. Sensitivity analysis

We investigate the sensitivity of STGAD to three key hyper-parameters that affect the final detection behavior and efficiency: (i) the fusion weight β for combining critic-based and residual-based anomaly cues, (ii) the number of generator samples N used to compute the sample-matching residual, and (iii) the window length L that determines the temporal context seen by the temporal encoder and discriminator. Unless otherwise specified, we vary one factor at a time while keeping the remaining settings fixed, and report the F1-score trends across datasets. The corresponding results are summarized in Fig 6.

Download:

Fig 6. Sensitivity analysis of STGAD to key hyperparameters (β, N, and window length L).

https://doi.org/10.1371/journal.pone.0349223.g006

Overall, STGAD shows stable performance over a broad range of configurations, and the trends are consistent across datasets. First, the fusion weight β mainly controls the relative reliance on critic evidence versus sample-matching residuals. We observe that extremely small β is more likely to underperform, while mid-to-high β typically yields strong and stable results. This indicates that the residual term provides a reliable complementary signal, and combining the two cues is generally more robust than relying on either alone.

Second, increasing the number of generator samples N helps mitigate generation stochasticity and reduces the chance that the residual-based cue is dominated by occasional poor samples. In practice, performance usually improves when moving from very small N to a moderate range, after which the gains gradually saturate. Since the computational cost grows approximately linearly with N, the choice of N should be matched to the deployment budget. For offline analysis and post-event diagnosis, a moderate sampling budget is preferable because it provides a more stable residual estimate and stronger detection robustness. For near-real-time monitoring, the default setting offers a practical balance between residual stability and latency. For stricter real-time scenarios, N can be reduced to meet tighter latency constraints, with the understanding that this may slightly weaken the robustness of the sample-matching residual. Therefore, N serves as a controllable knob for trading residual stability against inference cost under different operational requirements.

Third, the window length L determines how much temporal context is available for within-window modeling. We find that short-to-moderate window lengths are generally preferable: they preserve localized anomaly signatures while keeping variability and computational cost manageable. In contrast, overly long windows may dilute short abnormal patterns with excessive surrounding context and can introduce unnecessary complexity, which may reduce both detection effectiveness and efficiency.

Based on these observations, we adopt β = 0.5, N = 10, and L = 5 as the default configuration. In particular, the choice reflects a practical balance: it preserves localized anomaly signatures, avoids unnecessary context dilution, and consistently provides a strong accuracy–efficiency trade-off across datasets under a unified configuration.

5.5. Interpretability analysis

We assess interpretability primarily on SMAP, a NASA telemetry dataset with strong cross-channel coupling and expert-labeled drift and step anomalies. All analyses are performed on fixed-length windows under the same training-split z-score normalization used in the main experiments. For visualization, each anomalous window is paired with a nearest-normal reference obtained by averaging its K closest normal windows. Explanations are produced using two complementary views: (i) a temporal attention map that highlights influential time steps, and (ii) gradient-based feature saliency that ranks influential variables.

To relate model saliency to model-agnostic statistical evidence, we compute two simple feature rankings for each window: (i) Max , the largest within-window change for each variable, and (ii) Max , the largest standardized deviation for each variable. Agreement is summarized by overlap@5 between the saliency Top-5 variables and the Top-5 variables ranked by these two statistics. We refer to these measures as Saliency–Local overlap@5 (vs. Max ) and Saliency–Global overlap@5 (vs. Max), respectively. In addition, we summarize decision strength using the score margin (score minus threshold) and the score percentile among all windows.

We illustrate interpretability on a true-positive SMAP window by examining attribution along both the temporal and variable axes. As shown in Fig 7A, a small subset of channels dominates the saliency ranking. Several of these variables also appear among the Top-K features ranked by Max and Max in Fig 7B, indicating that the model’s attribution is aligned with two intuitive model-agnostic anomaly cues: abrupt local changes and globally rare deviations.

Download:

Fig 7. Feature Attribution and Top-K Statistical Deviations (SMAP).

https://doi.org/10.1371/journal.pone.0349223.g007

In Fig 8, we overlay the anomalous window against the nearest-normal reference for the Top-6 salient channels and annotate each subplot with a confidence ribbon. Several structured departures can be observed, including persistent step-like drops, pulse-shaped excursions, and late rebounds that are absent in the normal reference windows. These deviations co-occur with attribution peaks in Fig 7A and with high rankings under Max or Max in Fig 7B, providing channel-level evidence that is broadly consistent with the saliency interpretation.

Download:

Fig 8. Channel-wise Shape Deviations on SMAP (Top-6 Salient Variables).

https://doi.org/10.1371/journal.pone.0349223.g008

While the SMAP case study provides an intuitive visualization, interpretability should also be examined at the distribution level. We therefore quantify agreement between saliency-ranked variables and statistical cues using overlap@5 over quantile-sampled true-positive windows. Specifically, true-positive windows are defined under the same evaluation protocol as the main experiments, stratified by confidence margin, and sampled across multiple margin quantiles. Table 8 and Table 9 report the median (IQR) overlap values and the number of sampled windows for each dataset.

Download:

Table 8. Saliency–Local overlap@5 (vs. Max|Δ|).

https://doi.org/10.1371/journal.pone.0349223.t008

Download:

Table 9. Saliency–Global overlap@5 (vs. Max|z|).

https://doi.org/10.1371/journal.pone.0349223.t009

As shown in Table 8 and Table 9, Saliency–Local overlap@5 is consistently high on several datasets, suggesting that abrupt local changes are an important cue captured by the model in these domains. In contrast, Saliency–Global overlap@5 varies more across datasets, reflecting differences in how anomalies manifest. For example, the lower overlap values on SMD suggest that anomalies can be more distributed, weaker, or less attributable to a small set of variables, whereas the near-perfect overlap on SWaT and MIT-BIH indicates that anomalies are often concentrated in a small subset of channels.

We emphasize that these explanations should be interpreted as anomaly localization or attribution cues rather than strict root-cause identification. From an operational perspective, these variable-level localization results can be used as diagnostic prioritization cues rather than as fully automatic root-cause decisions. In practice, once an anomalous window is detected, practitioners can first inspect the top-ranked variables and compare them with known sensor-to-subsystem mappings, control loops, or equipment modules. This can help narrow the troubleshooting scope, prioritize targeted sensor validation and subsystem inspection, and support alarm triage in monitoring centers. Therefore, the localization output of STGAD is most useful as a decision-support layer that helps experts focus on the most relevant channels before conducting deeper causal or engineering analysis. Nevertheless, the agreement statistics complement the case-study visualizations and provide a dataset-level view of how consistently the model’s saliency aligns with simple statistical evidence.

In Fig 9, we further visualize the alignment between the predicted anomaly score and the ground-truth labels on SMAP using a small subset of representative channels, including highly attributed channels identified in the case study. In these panels, the normalized time series is shown in blue, the model’s anomaly score in green, purple shading marks the ground-truth anomalous intervals, and orange shading highlights the predicted anomalous intervals.

Download:

Fig 9. Qualitative Alignment of Predicted and True Anomalies on SMAP.

https://doi.org/10.1371/journal.pone.0349223.g009

The plots reveal prompt, clearly discernible responses at genuine fault onsets: predicted anomalous spans closely track the labeled intervals with minimal localization error, while scores remain low and steady throughout nominal periods—evidence of resilience to transient noise. Taken together, the SMAP case confirms precise timing capture and strong temporal alignment, in line with the quantitative results reported earlier.

6. Conclusion

In this paper, we presented STGAD, an unsupervised generative-adversarial framework for multivariate time-series anomaly detection and localization. STGAD integrates a standard Transformer encoder into the critic of a WGAN-GP framework, enabling global within-window temporal modeling under stable adversarial training. This critic-side temporal modeling is complemented by a sample-matching residual derived from a stochastic generator with sample-level proximity regularization, and the two signals are fused for final anomaly scoring.

Experiments on five benchmark datasets—MIT-BIH, SMAP, MSL, SWaT, and SMD—show that STGAD achieves strong and consistent performance across diverse anomaly-detection scenarios. Additional ablation, robustness, sensitivity, and interpretability analyses further support the effectiveness of the overall design, particularly the combination of critic-side temporal modeling, dual-score fusion, and stable adversarial optimization.

From a practical perspective, the results suggest that STGAD provides a favorable balance between detection effectiveness and deployment cost, especially for offline analysis and near-real-time monitoring settings. At the same time, the sensitivity analysis indicates that moderate sampling and short-to-moderate windows are sufficient to capture most of the performance gains without incurring excessive computational overhead. Moreover, the variable-level localization output can serve as a practical prioritization signal for downstream diagnosis, while the sampling budget N can be adjusted according to the latency requirements of different deployment scenarios.

Future work may extend STGAD toward online and incremental learning, multi-resolution temporal modeling, and privacy-preserving or federated deployments. These directions may further improve the applicability of the framework in large-scale cyber-physical monitoring environments.Abbreviations

References

1. Adams C, Alonso L, Atkin B, Banning J, Bhola S, Buskens R, et al. Monarch: Google’s planet-scale in-memory time series database. Proceedings of the VLDB Endowment. 2020;13(12):3181–94.
- View Article
- Google Scholar
2. Andersen MP, Culler DE. BTrDB: optimizing storage system design for timeseries processing. 14th USENIX Conference on File and Storage Technologies (FAST 16); 2016.
- View Article
- Google Scholar
3. Venkatasubramanian V, Rengaswamy R, Yin K, Kavuri SN. A review of process fault detection and diagnosis: Part I: Quantitative model-based methods. Computers & Chemical Engineering. 2003;27(3):293–311.
- View Article
- Google Scholar
4. Mustafa FE, Ahmed I, Basit A, Alvi U-E-H, Malik SH, Mahmood A, et al. A review on effective alarm management systems for industrial process control: Barriers and opportunities. International Journal of Critical Infrastructure Protection. 2023;41:100599.
- View Article
- Google Scholar
5. Zamanzadeh Darban Z, Webb GI, Pan S, Aggarwal C, Salehi M. Deep learning for time series anomaly detection: A survey. ACM Comput Surv. 2024;57(1):1–42.
- View Article
- Google Scholar
6. Zhong Z, Fan Q, Zhang J, Ma M, Zhang S, Sun Y. A survey of time series anomaly detection methods in the aiops domain. 2023. https://arxiv.org/abs/2308.00393
- View Article
- Google Scholar
7. Jiang M, Hou C, Zheng A, Hu X, Han S, Huang H, et al. Weakly supervised anomaly detection: A survey. 2023. https://arxiv.org/abs/2302.04549
- View Article
- Google Scholar
8. Singh P, Saman Azari M, Vitale F, Flammini F, Mazzocca N, Caporuscio M, et al. Using log analytics and process mining to enable self-healing in the Internet of Things. Environ Syst Decis. 2022;42(2):234–50.
- View Article
- Google Scholar
9. Zonta T, da Costa CA, da Rosa Righi R, de Lima MJ, da Trindade ES, Li GP. Predictive maintenance in the Industry 4.0: A systematic literature review. Computers & Industrial Engineering. 2020;150:106889.
- View Article
- Google Scholar
10. Blázquez-García A, Conde A, Mori U, Lozano JA. A Review on outlier/anomaly detection in time series data. ACM Comput Surv. 2021;54(3):1–33.
- View Article
- Google Scholar
11. Su Y, Zhao YJ, Niu CH, Liu R, Sun W, Pei D. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). Anchorage, AK, 2019.
- View Article
- Google Scholar
12. Munir M, Siddiqui SA, Dengel A, Ahmed S. DeepAnT: A deep learning approach for unsupervised anomaly detection in time series. IEEE Access. 2019;7:1991–2005.
- View Article
- Google Scholar
13. Xu H, Feng Y, Chen J, Wang Z, Qiao H, Chen W, et al. Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications. Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW ’18, 2018. 187–96.
- View Article
- Google Scholar
14. Li D, Chen DC, Shi L, Jin BH, Goh J, Ng SK. MAD-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. Tech Univ Munchen, Klinikum Rechts Isar, Munich, GERMANY, 2019.
- View Article
- Google Scholar
15. Liu J, Yang D, Zhang K, Gao H, Li J. Anomaly and change point detection for time series with concept drift. World Wide Web. 2023;26(5):3229–52.
- View Article
- Google Scholar
16. Goswami M, Challu C, Callot L, Minorics L, Kan A. Unsupervised model selection for time-series anomaly detection. 2022. https://arxiv.org/abs/2210.01078
- View Article
- Google Scholar
17. Guo H, Wang Y, Zhang J, Lin Z, Tong Y, Yang L. Label-efficient interactive time-series anomaly detection. 2022. https://arxiv.org/abs/2212.14621
- View Article
- Google Scholar
18. Yang W, Zhang K, Hoi SC. A causal approach to detecting multivariate time-series anomalies and root causes. 2022. https://arxiv.org/abs/2206.15033
- View Article
- Google Scholar
19. Han X, Zhang L, Wu Y, Yuan S. On root cause localization and anomaly mitigation through causal inference. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023. 699–708.
- View Article
- Google Scholar
20. Page ES. Continuous Inspection Schemes. Biometrika. 1954;41(1/2):100.
- View Article
- Google Scholar
21. Lowry CA, Woodall WH, Champ CW, Rigdon SE. A multivariate exponentially weighted moving average control chart. Technometrics. 1992;34(1):46.
- View Article
- Google Scholar
22. Kalman RE. A new approach to linear filtering and prediction problems. 1960.
23. Mallat SG. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans Pattern Anal Machine Intell. 1989;11(7):674–93.
- View Article
- Google Scholar
24. Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC. Estimating the support of a high-dimensional distribution. Neural Comput. 2001;13(7):1443–71. pmid:11440593
- View Article
- PubMed/NCBI
- Google Scholar
25. Breunig MM, Kriegel HP, Ng RT, Sander J. Lof: Identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 2000.
- View Article
- Google Scholar
26. Liu FT, Ting KM, Zhou ZH. Isolation forest. 2008 eighth ieee international conference on data mining; 2008.
- View Article
- Google Scholar
27. Tan SC, Ting KM, Liu TF. Fast anomaly detection for streaming data. IJCAI proceedings-international joint conference on artificial intelligence, 2011.
- View Article
- Google Scholar
28. Tuli S, Casale G, Jennings NR. TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data. Proceedings of the VLDB Endowment, 2022. 1201–14.
- View Article
- Google Scholar
29. Xu J, Wu H, Wang J, Long M. Anomaly transformer: Time series anomaly detection with association discrepancy. 2021.
- View Article
- Google Scholar
30. Deng A, Hooi B. Graph neural network-based anomaly detection in multivariate time series. AAAI. 2021;35(5):4027–35.
- View Article
- Google Scholar
31. Akhtar MU, Liu J, Xie Z, Cui X, Liu X, Huang B. Multilingual entity alignment by abductive knowledge reasoning on multiple knowledge graphs. Engineering Applications of Artificial Intelligence. 2025;139:109660.
- View Article
- Google Scholar
32. Siffer A, Fouque P-A, Termier A, Largouet C. Anomaly Detection in Streams with Extreme Value Theory. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017. 1067–75.
- View Article
- Google Scholar
33. Audibert J, Michiardi P, Guyard F, Marti S, Zuluaga MA. Usad: Unsupervised anomaly detection on multivariate time series. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020.
- View Article
- Google Scholar
34. González GG, Casas P, Fernández A, Gómez G. Steps towards continual learning in multivariate time-series anomaly detection using variational autoencoders. Proceedings of the 22nd ACM Internet Measurement Conference, 2022. 774–5.
- View Article
- Google Scholar
35. Chaovalitwongse WA, Fan Y-J, Sachdeo RC. On the Time Series $K$-Nearest Neighbor Classification of Abnormal Brain Activity. IEEE Trans Syst, Man, Cybern A. 2007;37(6):1005–16.
- View Article
- Google Scholar
36. Abid A, Guiloufi AB, Nasri N, Kachouri A, Mahfoudhi A, Abid M. Centralized KNN anomaly detector for WSN. International Multi-Conference on Systems, Signals and Devices (SSD). Mahdia, TUNISIA, 2015.
- View Article
- Google Scholar
37. Ester M, Kriegel HP, Sander J, Xiaowei X. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD-96 Proceedings Second International Conference on Knowledge Discovery and Data Mining, 1996. 226–31.
- View Article
- Google Scholar
38. Bandaragoda TR, Ting KM, Albrecht D, Liu FT, Zhu Y, Wells JR. Isolation‐based anomaly detection using nearest‐neighbor ensembles. Computational Intelligence. 2018;34(4):968–98.
- View Article
- Google Scholar
39. Fujieda S, Takayama K, Hachisuka T. Wavelet convolutional neural networks. 2018. https://arxiv.org/abs/1805.08620
- View Article
- Google Scholar
40. Shumway RH, Stoffer DS. ARIMA models. Time series analysis and its applications: with R examples. Springer. 2017. 75–163.
- View Article
- Google Scholar
41. Zong B, Song Q, Min MR, Cheng W, Lumezanu C, Cho D-k, et al., editors. Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection. International Conference on Learning Representations; 2018.
- View Article
- Google Scholar
42. Zhang CX, Song DJ, Chen YC, Feng XY, Lumezanu C, Cheng W. A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. 33rd AAAI Conference on Artificial Intelligence / 31st Innovative Applications of Artificial Intelligence Conference / 9th AAAI Symposium on Educational Advances in Artificial Intelligence; 2019; Honolulu, HI, 2019.
- View Article
- Google Scholar
43. Shi Y, Wang B, Yu Y, Tang X, Huang C, Dong J. Robust anomaly detection for multivariate time series through temporal GCNs and attention-based VAE. Knowledge-Based Systems. 2023;275:110725.
- View Article
- Google Scholar
44. Hundman K, Constantinou V, Laporte C, Colwell I, Soderstrom T. Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding. 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD); 2018; London, ENGLAND, 2018.
- View Article
- Google Scholar
45. Goh J, Adepu S, Junejo KN, Mathur A. A dataset to support research in the design of secure water treatment systems. 11th International Conference on Critical Information Infrastructures Security (CRITIS); 2016; Paris, FRANCE, 2018.
- View Article
- Google Scholar
46. Moody GB, Mark RG. The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag. 2001;20(3):45–50. pmid:11446209
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Adams C, Alonso L, Atkin B, Banning J, Bhola S, Buskens R, et al. Monarch: Google’s planet-scale in-memory time series database. Proceedings of the VLDB Endowment. 2020;13(12):3181–94.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Andersen MP, Culler DE. BTrDB: optimizing storage system design for timeseries processing. 14th USENIX Conference on File and Storage Technologies (FAST 16); 2016.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Venkatasubramanian V, Rengaswamy R, Yin K, Kavuri SN. A review of process fault detection and diagnosis: Part I: Quantitative model-based methods. Computers & Chemical Engineering. 2003;27(3):293–311.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Mustafa FE, Ahmed I, Basit A, Alvi U-E-H, Malik SH, Mahmood A, et al. A review on effective alarm management systems for industrial process control: Barriers and opportunities. International Journal of Critical Infrastructure Protection. 2023;41:100599.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Zamanzadeh Darban Z, Webb GI, Pan S, Aggarwal C, Salehi M. Deep learning for time series anomaly detection: A survey. ACM Comput Surv. 2024;57(1):1–42.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Zhong Z, Fan Q, Zhang J, Ma M, Zhang S, Sun Y. A survey of time series anomaly detection methods in the aiops domain. 2023. https://arxiv.org/abs/2308.00393
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Jiang M, Hou C, Zheng A, Hu X, Han S, Huang H, et al. Weakly supervised anomaly detection: A survey. 2023. https://arxiv.org/abs/2302.04549
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Singh P, Saman Azari M, Vitale F, Flammini F, Mazzocca N, Caporuscio M, et al. Using log analytics and process mining to enable self-healing in the Internet of Things. Environ Syst Decis. 2022;42(2):234–50.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Zonta T, da Costa CA, da Rosa Righi R, de Lima MJ, da Trindade ES, Li GP. Predictive maintenance in the Industry 4.0: A systematic literature review. Computers & Industrial Engineering. 2020;150:106889.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Blázquez-García A, Conde A, Mori U, Lozano JA. A Review on outlier/anomaly detection in time series data. ACM Comput Surv. 2021;54(3):1–33.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Su Y, Zhao YJ, Niu CH, Liu R, Sun W, Pei D. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). Anchorage, AK, 2019.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Munir M, Siddiqui SA, Dengel A, Ahmed S. DeepAnT: A deep learning approach for unsupervised anomaly detection in time series. IEEE Access. 2019;7:1991–2005.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Xu H, Feng Y, Chen J, Wang Z, Qiao H, Chen W, et al. Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications. Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW ’18, 2018. 187–96.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Li D, Chen DC, Shi L, Jin BH, Goh J, Ng SK. MAD-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. Tech Univ Munchen, Klinikum Rechts Isar, Munich, GERMANY, 2019.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Liu J, Yang D, Zhang K, Gao H, Li J. Anomaly and change point detection for time series with concept drift. World Wide Web. 2023;26(5):3229–52.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Goswami M, Challu C, Callot L, Minorics L, Kan A. Unsupervised model selection for time-series anomaly detection. 2022. https://arxiv.org/abs/2210.01078
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Guo H, Wang Y, Zhang J, Lin Z, Tong Y, Yang L. Label-efficient interactive time-series anomaly detection. 2022. https://arxiv.org/abs/2212.14621
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. Yang W, Zhang K, Hoi SC. A causal approach to detecting multivariate time-series anomalies and root causes. 2022. https://arxiv.org/abs/2206.15033
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref19] 19. Han X, Zhang L, Wu Y, Yuan S. On root cause localization and anomaly mitigation through causal inference. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023. 699–708.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref20] 20. Page ES. Continuous Inspection Schemes. Biometrika. 1954;41(1/2):100.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref21] 21. Lowry CA, Woodall WH, Champ CW, Rigdon SE. A multivariate exponentially weighted moving average control chart. Technometrics. 1992;34(1):46.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref22] 22. Kalman RE. A new approach to linear filtering and prediction problems. 1960.

[ref23] 23. Mallat SG. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans Pattern Anal Machine Intell. 1989;11(7):674–93.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref24] 24. Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC. Estimating the support of a high-dimensional distribution. Neural Comput. 2001;13(7):1443–71. pmid:11440593
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref25] 25. Breunig MM, Kriegel HP, Ng RT, Sander J. Lof: Identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 2000.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref26] 26. Liu FT, Ting KM, Zhou ZH. Isolation forest. 2008 eighth ieee international conference on data mining; 2008.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref27] 27. Tan SC, Ting KM, Liu TF. Fast anomaly detection for streaming data. IJCAI proceedings-international joint conference on artificial intelligence, 2011.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref28] 28. Tuli S, Casale G, Jennings NR. TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data. Proceedings of the VLDB Endowment, 2022. 1201–14.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref29] 29. Xu J, Wu H, Wang J, Long M. Anomaly transformer: Time series anomaly detection with association discrepancy. 2021.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref30] 30. Deng A, Hooi B. Graph neural network-based anomaly detection in multivariate time series. AAAI. 2021;35(5):4027–35.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref31] 31. Akhtar MU, Liu J, Xie Z, Cui X, Liu X, Huang B. Multilingual entity alignment by abductive knowledge reasoning on multiple knowledge graphs. Engineering Applications of Artificial Intelligence. 2025;139:109660.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref32] 32. Siffer A, Fouque P-A, Termier A, Largouet C. Anomaly Detection in Streams with Extreme Value Theory. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017. 1067–75.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref33] 33. Audibert J, Michiardi P, Guyard F, Marti S, Zuluaga MA. Usad: Unsupervised anomaly detection on multivariate time series. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref34] 34. González GG, Casas P, Fernández A, Gómez G. Steps towards continual learning in multivariate time-series anomaly detection using variational autoencoders. Proceedings of the 22nd ACM Internet Measurement Conference, 2022. 774–5.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref35] 35. Chaovalitwongse WA, Fan Y-J, Sachdeo RC. On the Time Series $K$-Nearest Neighbor Classification of Abnormal Brain Activity. IEEE Trans Syst, Man, Cybern A. 2007;37(6):1005–16.
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref36] 36. Abid A, Guiloufi AB, Nasri N, Kachouri A, Mahfoudhi A, Abid M. Centralized KNN anomaly detector for WSN. International Multi-Conference on Systems, Signals and Devices (SSD). Mahdia, TUNISIA, 2015.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref37] 37. Ester M, Kriegel HP, Sander J, Xiaowei X. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD-96 Proceedings Second International Conference on Knowledge Discovery and Data Mining, 1996. 226–31.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

[ref38] 38. Bandaragoda TR, Ting KM, Albrecht D, Liu FT, Zhu Y, Wells JR. Isolation‐based anomaly detection using nearest‐neighbor ensembles. Computational Intelligence. 2018;34(4):968–98.
View Article
Google Scholar

[112] View Article

[113] Google Scholar

[ref39] 39. Fujieda S, Takayama K, Hachisuka T. Wavelet convolutional neural networks. 2018. https://arxiv.org/abs/1805.08620
View Article
Google Scholar

[115] View Article

[116] Google Scholar

[ref40] 40. Shumway RH, Stoffer DS. ARIMA models. Time series analysis and its applications: with R examples. Springer. 2017. 75–163.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref41] 41. Zong B, Song Q, Min MR, Cheng W, Lumezanu C, Cho D-k, et al., editors. Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection. International Conference on Learning Representations; 2018.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref42] 42. Zhang CX, Song DJ, Chen YC, Feng XY, Lumezanu C, Cheng W. A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. 33rd AAAI Conference on Artificial Intelligence / 31st Innovative Applications of Artificial Intelligence Conference / 9th AAAI Symposium on Educational Advances in Artificial Intelligence; 2019; Honolulu, HI, 2019.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref43] 43. Shi Y, Wang B, Yu Y, Tang X, Huang C, Dong J. Robust anomaly detection for multivariate time series through temporal GCNs and attention-based VAE. Knowledge-Based Systems. 2023;275:110725.
View Article
Google Scholar

[127] View Article

[128] Google Scholar

[ref44] 44. Hundman K, Constantinou V, Laporte C, Colwell I, Soderstrom T. Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding. 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD); 2018; London, ENGLAND, 2018.
View Article
Google Scholar

[130] View Article

[131] Google Scholar

[ref45] 45. Goh J, Adepu S, Junejo KN, Mathur A. A dataset to support research in the design of secure water treatment systems. 11th International Conference on Critical Information Infrastructures Security (CRITIS); 2016; Paris, FRANCE, 2018.
View Article
Google Scholar

[133] View Article

[134] Google Scholar

[ref46] 46. Moody GB, Mark RG. The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag. 2001;20(3):45–50. pmid:11446209
View Article
PubMed/NCBI
Google Scholar

[136] View Article

[137] PubMed/NCBI

[138] Google Scholar

Figures

Abstract

1. Introduction

2. Related work

3. Methodology

3.1. Problem formulation

3.2. Model architecture

3.3. Training and inference

Algorithm 1. STGAD Training Algorithm (WGAN-GP)

Algorithm 2 STGAD Inference and Dual-Score Anomaly Scoring

4. Experimental setup

4.1. Datasets and preprocessing

4.2. Windowing and model configuration

4.3. Evaluation protocol

5. Results and analysis

5.1. Comparative evaluation with baseline methods

5.2. Ablation analysis

5.3. Robustness analysis

5.4. Sensitivity analysis

5.5. Interpretability analysis

6. Conclusion

References