Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Unsupervised cross domain adaptive anomaly detection network for Internet of Things traffic

  • Tiange Yuan ,

    Roles Conceptualization, Investigation, Writing – original draft

    tiangeyuan@126.com

    Affiliation Research Institute Of Nuclear Power Operation, China Nuclear Power Operation Technology Corporation, LTD., Wuhan, China

  • Di Zhai,

    Roles Conceptualization, Methodology, Validation, Writing – review & editing

    Affiliation Research Institute Of Nuclear Power Operation, China Nuclear Power Operation Technology Corporation, LTD., Wuhan, China

  • Anchao Li

    Roles Data curation, Software, Writing – review & editing

    Affiliation Research Institute Of Nuclear Power Operation, China Nuclear Power Operation Technology Corporation, LTD., Wuhan, China

Abstract

The rapid growth of networks of connected devices demands robust methods for detecting anomalies in multivariate traffic. Traditional approaches often fail when data distributions shift across environments or when labeled anomalies are scarce. We introduce the Unsupervised Cross Domain Adaptive Anomaly Detection Network called CDA-ADN. This framework employs a conditional variational sequence encoder with temporal attention to learn domain invariant representations of traffic sequences. Domain specific adaptation layers align input and output distributions by applying transformations guided by latent features. A contrastive learning mechanism at both local time steps and global sequence levels separates normal patterns from anomalies. Training occurs in two stages. In the first stage the model learns general normal behavior on a source environment using only normal samples. In the second stage a very small number of unlabeled normal samples from the target environment are used for lightweight fine tuning of the adaptation layers, without requiring any labeled anomaly samples in the target domain. Experiments on two benchmark Internet of Things traffic datasets demonstrate that CDA-ADN outperforms auto encoder and variational auto encoder methods by a wide margin in accuracy Matthews correlation coefficient and sensitivity under label scarce conditions. These results confirm the efficacy of the unsupervised cross domain approach for real world IoT security.

Introduction

With the increasing deployment of Internet of Things (IoT) devices across industries such as manufacturing, healthcare, and smart infrastructure, the volume of multivariate network traffic data has grown significantly [1]. This data, continuously generated by sensors and interconnected systems, is essential for monitoring system performance, ensuring security, and detecting anomalous behaviors. However, identifying anomalies in IoT traffic remains a challenging task due to the dynamic nature of data, the presence of concept drift, and the scarcity of labeled anomalies [2,3].

Anomaly detection plays a critical role in ensuring the security and reliability of IoT networks by identifying deviations from normal behavior that may indicate cyberattacks, faults, or system failures [4]. Conventional methods, including rule-based techniques and supervised machine learning models, often rely on predefined thresholds or require extensive labeled datasets. These approaches struggle to generalize across different domains, particularly when data distributions shift due to environmental changes or evolving attack patterns [5]. Moreover, training deep learning models for anomaly detection often necessitates large amounts of labeled data, which are difficult to obtain in real-world scenarios [68].

Existing anomaly detection methods for IoT traffic can be broadly categorized into rule-based, statistical, and machine learning-based approaches. Rule-based methods rely on predefined thresholds or signatures to flag anomalies, making them effective for known attack patterns but inadequate for evolving threats. Statistical techniques, such as autoregressive models and Gaussian mixture models, attempt to model normal traffic distributions and detect deviations, but they often struggle with high-dimensional, dynamic IoT data [9]. More recently, deep learning-based methods, including Autoencoders (AEs), Long Short-Term Memory (LSTM) networks, and Transformer-based architectures, have gained traction for their ability to learn complex temporal dependencies in multivariate time-series data [10,11].

While deep learning models have demonstrated strong performance in anomaly detection, they typically require large amounts of labeled data for supervised training [12,13]. However, in real-world IoT environments, obtaining sufficient labeled anomaly samples is challenging due to data privacy concerns, the rarity of certain attack types, and the high cost of manual annotation [14]. Additionally, traditional deep learning models trained on a specific dataset often fail to generalize across different domains, as variations in network traffic patterns and device characteristics lead to performance degradation [15]. These limitations highlight the need for an adaptive and unsupervised approach that can effectively detect anomalies in diverse and dynamically changing IoT environments without relying on labeled target domain data.

In this work, we adopt an unsupervised cross-domain anomaly detection setting where no labeled anomaly samples from the target environment are required. CDA-ADN learns general normal behavioral patterns from a source domain and subsequently adapts to a new target domain using only a very small number of unlabeled normal samples. These normal samples are employed solely to calibrate domain-specific components and do not introduce supervised anomaly information. This setting aligns with widely accepted assumptions in unsupervised domain adaptation and one-class anomaly detection, ensuring that the proposed framework remains fully unsupervised with respect to anomaly identification.

To address the challenges of cross-domain anomaly detection in IoT environments, we propose the Cross-Domain Adaptive Anomaly Detection Network (CDA-ADN) – an unsupervised framework that integrates domain adaptation with multi-granularity contrastive learning. Unlike traditional approaches requiring labeled target data, CDA-ADN operates without requiring labeled target domain data, which is first verified by [16], leveraging a combination of probabilistic latent space modeling, dynamic feature adaptation, and hybrid contrastive learning to achieve robust anomaly detection across diverse IoT environments. Our main contributions are:

  • Conditional Variational Sequence Encoding: We introduce a GRU-based conditional variational encoder that captures temporal dependencies and learns transferable representations by conditioning on contextual device features. The encoder enforces structured latent distributions through weighted KL divergence and temporal attention, enabling robust generalization across domains.
  • Dynamic Input-Output Adaptation Layers: We design domain-specific adaptation layers that dynamically align feature distributions using latent-guided affine transformations. These layers mitigate domain shifts by normalizing input and output sequences, ensuring consistent feature representations across source and target domains.
  • Multi-Granularity Contrastive Learning: We propose a hybrid contrastive learning framework that aligns features at both local (time-step) and global (sequence) levels. Combined with Wasserstein distance-based domain adaptation, this approach maximizes the separation between normal and anomalous patterns while maintaining domain-invariant feature consistency.
  • Two-Stage Optimization for Domain Adaptation: We further clarify our contribution by emphasizing that the proposed two-stage optimization strategy includes a lightweight fine-tuning step using only a minimal set of normal target-domain samples. This fine-tuning does not rely on any labeled anomalies and is performed exclusively to adapt the latent-guided affine transformation layers to the target distribution. As such, the method preserves the unsupervised nature of the framework while enabling effective cross-domain alignment in label-scarce Internet of Things environments.

The rest of this paper is organized into four unnumbered sections. Related Work reviews prior approaches to IoT anomaly detection and domain adaptation. Method introduces the CDA-ADN framework, detailing its architecture, adaptation mechanisms, and training strategy. Experiments presents our evaluation results and comparisons with baseline methods. Finally, Conclusion summarizes the main findings and outlines directions for future research.

Related work

Anomaly detection in multivariate IoT traffic has been extensively studied, evolving from traditional rule-based methods to deep learning-based techniques and transfer learning approaches. Early methods primarily relied on predefined rules and statistical models. Rule-based approaches, such as Snort and Suricata, use known attack signatures or manually set thresholds to detect anomalies [4]. While effective for well-defined attack patterns, these methods fail to detect novel or evolving threats. Statistical models, including autoregressive integrated moving average (ARIMA) [2] and Gaussian mixture models (GMM) [5], attempt to learn normal network behaviors and flag deviations. However, these methods assume stationarity in data distribution and struggle with high-dimensional, dynamic IoT traffic, where concept drift is common [3].

Deep learning techniques have significantly improved anomaly detection by automatically extracting relevant features from raw traffic data [17,18]. Autoencoders and Long Short-Term Memory networks are widely used for time-series anomaly detection, where reconstruction errors indicate deviations from normal patterns [9]. Variational Autoencoders extend AEs by modeling a probabilistic latent space, enhancing robustness to noise and unseen anomalies [6]. More recently, Transformer-based architectures have been introduced to capture long-range dependencies in IoT data, with models such as the Anomaly Transformer [14] leveraging attention mechanisms to distinguish normal from anomalous behaviors. Hybrid models that combine CNNs with LSTMs have also been proposed to integrate spatial and temporal feature learning, achieving promising results on real-world datasets [11]. Despite their advantages, most deep learning methods require large-scale labeled training data, which is difficult to obtain in real-world IoT applications due to the rarity of labeled anomalies and privacy constraints. In parallel, recent surveys have provided comprehensive overviews of deep learning methods for time-series anomaly detection and graph neural networks for time-series analysis, including anomaly detection tasks [19,20]. These works highlight the rapid development of representation learning techniques for time-series anomalies and motivate the need for models that remain robust under distribution shifts across domains.

To mitigate the reliance on labeled data, transfer learning has been explored as a solution for cross-domain anomaly detection. By leveraging knowledge from a well-defined source domain, transfer learning enables adaptation to a target domain with minimal labeled data. Recent studies have applied domain adaptation techniques using adversarial training and feature alignment to bridge distributional differences [15]. More recent work on time-series domain adaptation under feature and label shifts, as well as sensor-level inter-domain alignment for multivariate time series, further emphasizes the importance of learning domain-invariant temporal representations in an unsupervised or label-efficient manner [21,22]. In the context of IoT security, ResADM [16] demonstrates the potential of transfer learning for cyber-physical systems by improving generalization across datasets.

While existing approaches have made notable progress, they are often constrained by their dependence on labeled target data or their inability to generalize across dynamically changing IoT environments. To address these limitations, we propose the Cross-Domain Adaptive Anomaly Detection Network, an unsupervised transfer learning framework that eliminates the need for labeled target domain data while enhancing adaptability across different IoT environments. By integrating contrastive learning with a GRU-based Conditional Variational Autoencoder, CDA-ADN improves feature discrimination between normal and anomalous patterns. Additionally, input-output adaptation layers mitigate domain discrepancies, allowing the model to generalize effectively to unseen domains. Our approach enables few-shot domain adaptation using only a minimal set of normal target domain samples, reducing computational overhead while maintaining high anomaly detection performance. Experimental results on real-world IoT datasets demonstrate that CDA-ADN outperforms existing methods in terms of accuracy, MCC, and sensitivity, making it a promising solution for scalable and adaptive anomaly detection.

Method

We propose the CDA-ADN, an unsupervised framework that integrates domain adaptation with multi-granularity contrastive learning to tackle the challenges of detecting anomalies in multivariate IoT traffic. As shown in Fig 1, CDA-ADN features: (1) a conditional variational sequence encoder with temporal attention for cross-domain feature extraction, (2) dynamic input-output adaptation layers for domain alignment, and (3) a multi-granularity contrastive learning mechanism enhanced with Wasserstein distance-based domain adaptation. The framework learns domain-invariant temporal patterns through a variational encoder-decoder, aligns feature distributions via latent-guided normalization, and enforces consistency across domains at local and global levels using hybrid contrastive learning. Unlike conventional methods requiring labeled target data, CDA-ADN achieves fully unsupervised anomaly detection through latent space regularization with probabilistic constraints, minimal target domain fine-tuning, and contrastive alignment, ensuring adaptability to diverse IoT environments.

thumbnail
Fig 1. Framework of the Cross-Domain Adaptive Anomaly Detection Network (CDA-ADN).

The process begins with multivariate IoT traffic data () as input. The data is segmented into fixed-length sequences before being processed by a GRU-based Conditional Variational Autoencoder to extract latent representations while preserving temporal dependencies. Input-output adaptation layers align feature distributions between the source and target domains, reducing domain discrepancies. Contrastive learning further enhances feature separation, enabling robust anomaly detection based on the learned representations. Anomalies are identified using a threshold on reconstruction error or feature deviations.

https://doi.org/10.1371/journal.pone.0344009.g001

At the beginning of the training pipeline, CDA-ADN performs source-domain learning using only normal traffic sequences, without requiring any labels from the target domain. During the subsequent domain adaptation stage, a very small number of unlabeled normal target-domain samples are used exclusively to calibrate the domain-specific transformation layers. No labeled anomaly data from the target domain are employed at any point. This refinement ensures terminological consistency and clearly distinguishes our unsupervised setting from semi-supervised alternatives.

Problem formulation for cross-domain anomaly detection

Anomaly detection in multivariate IoT traffic presents significant challenges due to dynamic variations in network conditions, device heterogeneity, and the absence of labeled anomalies in the target domain. Traditional supervised models struggle to generalize across domains because of distribution shifts, making them ineffective in real-world deployments. To address this issue, we formulate anomaly detection as an unsupervised cross-domain adaptation problem, where the goal is to detect anomalies in an unseen target domain without requiring labeled target data.

We consider a source domain and a target domain , where each domain consists of multivariate time-series sequences:

where T represents the sequence length, and d denotes the number of features, such as network traffic attributes or sensor readings. The objective is to learn a function that transforms sequences into a latent space , where normal and anomalous behaviors are well-separated.

However, due to domain shifts, the probability distributions of the source and target domains may differ:

This distributional discrepancy hinders the direct application of models trained on to . To overcome this limitation, we incorporate input-output adaptation layers to standardize feature representations and minimize domain discrepancies. Additionally, we leverage contrastive learning to enforce similarity among normal samples and maximize separation between normal and anomalous samples.

Anomalies are identified through reconstruction errors and deviations within the latent space. Given an encoded representation h, the model reconstructs the input sequence , and the reconstruction loss is computed as

Higher reconstruction errors indicate deviations from normal traffic patterns, making this a key criterion for anomaly detection. However, relying solely on reconstruction errors may not be sufficient, as certain anomalies may still be reconstructed with low error. To further improve feature discrimination, we introduce multi-granularity contrastive learning, where representations of similar samples are pulled together while dissimilar ones are pushed apart:

where and denote the sets of positive (similar) and negative (dissimilar) pairs, represents the distance between representations, and m is a margin parameter.

To enforce cross-domain feature alignment, we introduce a Wasserstein distance-based domain adaptation loss:

where denotes the set of all joint distributions with marginals and . This loss minimizes distributional discrepancies between and , enabling better generalization in the target domain.

In the problem formulation, we explicitly state that the few-shot fine-tuning mechanism operates only on a limited subset of unlabeled normal samples from the target domain. This process adjusts the latent-guided affine adaptation layers to account for domain-specific characteristics while preserving the unsupervised nature of anomaly detection, as no anomaly labels from the target domain are used.

The final optimization objective integrates reconstruction loss, contrastive loss, KL divergence, and domain adaptation loss:

Through joint optimization of these components, the model learns domain-invariant representations, ensuring consistency of normal patterns across domains while effectively distinguishing anomalies. This formulation enables unsupervised anomaly detection in the target domain without requiring labeled anomaly samples.

Conditional variational sequence encoding

To effectively capture temporal dependencies and learn transferable representations, we employ a conditional variational sequence encoder based on a Gated Recurrent Unit (GRU). This encoder transforms an input sequence X and contextual information (e.g., device type, protocol) into a structured latent space where normal and anomalous behaviors can be effectively distinguished. Unlike deterministic autoencoders, our model introduces a probabilistic latent space conditioned on , enhancing generalization and anomaly detection.

Concretely, the encoder uses a single layer GRU with hidden size H = 128 that processes the input sequence X together with the contextual vector and produces a final hidden state . Two linear projection heads and map to the mean and variance of a dimensional latent variable c, which defines the conditional posterior used in the reparameterization step. The KL divergence term is weighted by a coefficient that is linearly increased from 0 to during the first 20 training epochs and then kept fixed, which stabilizes early optimization while still enforcing a well structured latent space at convergence.

Given an input sequence and contextual information , the encoder extracts a latent representation by learning the parameters of a variational distribution:

where and are neural networks that parameterize the mean and variance of the latent distribution. The latent variable c is sampled as:

where ensures non-negative variance.

To dynamically prioritize critical time steps in the latent space, we introduce a temporal attention mechanism that directly operates on the latent variable c. For each time step t, the attention weights are computed as:

where and are learnable parameters. The attention-weighted latent representation is then given by:

This mechanism allows the model to focus on latent features from key time steps, enhancing anomaly detection robustness.

The decoder reconstructs the input sequence from the attention-weighted latent variables and contextual information . The reconstruction is formulated as:

where is a GRU-based network. The reconstruction loss is computed using a weighted mean squared error:

with being a time-dependent weight function prioritizing critical time steps.

To enforce structured latent distributions while incorporating attention, we propose a weighted KL divergence loss:

where are the attention weights. This ensures that the model pays higher attention to time steps with significant distribution shifts.

Dynamic domain adaptation layers

In real-world IoT deployments, variations in device configurations, network conditions, and environmental factors introduce domain shifts between the source domain and the target domain . Directly applying a model trained on to often leads to performance degradation due to differences in feature distributions. To mitigate this issue, we introduce input-output adaptation layers that dynamically align feature representations across domains while preserving temporal dependencies. These layers leverage the latent variable c to guide domain-specific feature transformations, ensuring robust adaptation.

The input adaptation layer standardizes feature distributions by transforming input sequences before passing them into the variational sequence encoder. Concretely, given the time-dependent latent variables , we first aggregate them into a sequence-level descriptor

For each domain , the input adaptation layer predicts feature-wise scaling and shifting vectors from

and the adapted input sequence for domain d is computed as

This latent-guided affine transformation enables feature-wise alignment of input distributions across domains prior to encoding.

On the output side, the decoder produces reconstructed sequences , which are further refined using an output adaptation layer. Similarly, the output adaptation layer uses to parameterize domain-specific normalization statistics

and aligns reconstructed features across domains via

These latent-guided affine transformations explicitly couple the shared latent representation with domain-wise feature normalization, making the dynamics of the input-output adaptation layers and the resulting cross-domain alignment more transparent and reproducible.

By incorporating these adaptation layers, the model effectively reduces domain discrepancies while enhancing the robustness of feature representations across diverse IoT environments. The use of as a conditioning variable ensures that the adaptation process is guided by the latent representation of the input sequence, enabling more precise domain alignment.

Multi-granularity contrastive learning and domain adaptation

To ensure that the learned feature representations generalize well across domains, we introduce a multi-granularity contrastive learning framework that enhances domain-invariant representation learning. The objective is to encourage feature consistency at both local (time-step) and global (sequence) levels while maximizing the separation between normal and anomalous patterns.

Let and denote the latent representations of sequences from the source domain and target domain , respectively. Given a batch of N sequences, we construct positive and negative pairs at two granularities:

  • Time-Step Level: For each sequence, positive pairs are formed between adjacent time steps, while negative pairs are formed between distant time steps.
  • Sequence Level: Positive pairs are formed between sequences of the same class (normal or anomalous), while negative pairs are formed between sequences of different classes.

Formally, the positive and negative sets are defined as:

where in the target domain is estimated based on feature proximity to known normal samples.

The contrastive loss integrates both local and global granularities:

where α is a balancing hyperparameter. Both local and global losses share the same formulation:

where correspond to local or global positive and negative pairs, denotes the distance between latent representations, is the sigmoid function, and m is a margin hyperparameter.

To align feature distributions across domains, we use the Wasserstein distance-based domain adaptation loss:

where denotes the set of all joint distributions with marginals and .

We choose the Wasserstein-1 distance for domain alignment because it provides a well-behaved discrepancy measure with informative gradients even when the source and target feature distributions have limited overlap under large domain shifts. In contrast to the Jensen–Shannon divergence commonly used in GAN-based alignment, which may lead to vanishing gradients when the two distributions are far apart, Wasserstein-1 directly reflects the optimal transport cost required to move probability mass from the source latent space to the target latent space, which typically yields more stable optimization. This property is particularly important in our setting, where the latent representations learned from heterogeneous IoT environments may exhibit substantial mismatch. Following the standard formulation of Wasserstein learning, we implement the objective with a gradient-penalized critic network to enforce the Lipschitz constraint, which further stabilizes training, improves reproducibility, and ultimately yields more reliable cross-domain alignment in the latent space [23,24].

Overall training objective

To ensure clarity and reproducibility, we summarize all loss components of CDA-ADN in a unified formulation. The domain-adaptation loss is implemented using the Wasserstein-1 distance with a gradient-penalized critic network. Given source-domain latent representations and target-domain latent representations , the critic is trained to maximize

while enforcing 1-Lipschitz continuity via the gradient penalty

where is sampled along straight lines between and . The domain-adaptation objective is therefore

For contrastive learning, we employ margin-based losses at both the local time-step level and the global sequence level. For a pair of latent features with pseudo-label indicating positive or negative relation, the contrastive loss is

where m is the contrastive margin. Local and global contrastive objectives are combined as

with balance coefficient α and temperature parameter τ used in the similarity normalization process.

Finally, the total training objective integrates the variational reconstruction loss , the domain-adaptation loss , and the multi-granularity contrastive loss :

where β and γ are weighting coefficients. Table 5 provides the key hyperparameters used in the contrastive and domain-adaptation losses for completeness.

Training and optimization strategy

The final objective of our model is to jointly optimize reconstruction accuracy, latent space regularization, multi-granularity contrastive learning, and domain-invariant feature alignment. The total loss function integrates the reconstruction loss , the KL divergence loss , the multi-granularity contrastive loss , and the Wasserstein distance-based domain adaptation loss :

where , , , and are weighting coefficients that balance different loss components.

To train the model, we employ a two-stage optimization process, as illustrated in Fig 2. In the first stage, the conditional variational sequence encoder and decoder are pre-trained on the source domain to learn normal behavior representations by minimizing:

thumbnail
Fig 2. Two-stage training workflow of CDA-ADN.

Stage I pre-trains the model on source-domain normal traffic to learn general normal representations. Stage II performs lightweight adaptation on the target domain using only a small subset of normal samples, where only the domain-specific adaptation components are fine-tuned while the shared encoder-decoder is kept fixed. After adaptation, the model is directly applied to the target domain to compute anomaly scores and make anomaly decisions.

https://doi.org/10.1371/journal.pone.0344009.g002

This step ensures that the model learns a structured latent space before applying cross-domain adaptation.

In the second stage, the model is fine-tuned using both and , incorporating multi-granularity contrastive learning and Wasserstein distance-based domain adaptation. The contrastive pairs for are dynamically generated using a memory bank approach to maintain representative samples and enhance feature discrimination across domains.

The above two-stage procedure assumes access to a small subset of mostly normal samples from the target domain that are used to calibrate the latent-guided adaptation layers. When this subset is extremely small, the adaptation parameters become underconstrained and the benefit over source-only training is reduced, so CDA-ADN behaves closer to a source-only detector that still leverages contrastive and Wasserstein regularization but with limited domain-specific refinement. In practice, the target subset may also contain a small fraction of anomalous samples. Since these samples are treated as normal during fine-tuning, they can bias the adapted feature statistics and slightly reduce sensitivity to those particular anomalous patterns. A natural extension is to combine the adaptation step with a simple robustness mechanism, for example by discarding target samples with very high reconstruction error before updating the adaptation layers or by using robust aggregation operators instead of plain averages when estimating target-domain statistics. Such strategies preserve the unsupervised nature of CDA-ADN while mitigating the impact of scarce or mildly contaminated target samples.

The optimization follows the Adam optimizer with a learning rate η:

where θ represents the model parameters.

To ensure stable training, we apply batch normalization within the input-output adaptation layers and employ early stopping based on validation loss to prevent overfitting. The final trained model is evaluated on the target domain without requiring labeled anomaly samples, demonstrating its effectiveness in unsupervised cross-domain anomaly detection.

Experiments

To evaluate the effectiveness of the proposed method, we conduct experiments on real-world IoT datasets. The evaluation focuses on comparing CDA-ADN with baseline methods, analyzing the impact of key components such as contrastive learning and adaptation layers, and assessing the model’s robustness and consistency. Results demonstrate the superior cross-domain anomaly detection performance of CDA-ADN, showcasing its potential for IoT applications.

Experimental setup

The effectiveness of CDA-ADN is evaluated on two real-world IoT datasets. The performance is compared with baseline methods using multiple evaluation metrics to assess both classification accuracy and domain generalization.

The first dataset, WUSTL-IIOT-2021 [25], consists of network traffic collected from an industrial IoT system. It includes both normal traffic and attack scenarios such as denial-of-service (DoS), injection, and backdoor attacks. The second dataset, ACI-IoT-2023 [26], is collected from a real-world IoT environment and contains diverse cyber threats, including reconnaissance, brute-force attacks, and distributed denial-of-service (DDoS). These datasets differ significantly in terms of network structure and attack patterns, making them suitable for evaluating the cross-domain adaptability of the proposed model. In addition, we include the ToN_IoT dataset [27], a large-scale IoT telemetry and network traffic benchmark containing both benign behaviors and diverse cyberattacks. Its heterogeneous sources and attack diversity make it a challenging target domain for evaluating cross-domain anomaly detection under distribution shifts.

To quantify model performance, we use accuracy, Matthews correlation coefficient (MCC), and sensitivity. Accuracy measures the proportion of correctly classified instances, while MCC provides a balanced evaluation considering true positives, false positives, and false negatives, making it well-suited for imbalanced anomaly detection tasks [28]. Sensitivity reflects the model’s ability to correctly identify anomalies, which is critical for security applications.

CDA-ADN is implemented using PyTorch and trained on an NVIDIA A100 GPU. The model is optimized using the Adam optimizer [29] with an initial learning rate of and a batch size of 128. Training is performed for a maximum of 100 epochs with early stopping based on validation loss. The contrastive loss margin is set to 0.5, while the weights for the domain adaptation and reconstruction losses are set to 0.1 and 1.0, respectively.For both datasets, continuous traffic features are standardized using z-score normalization with mean and standard deviation computed on the source-domain training split and then reused for all target-domain samples. The normalized streams are segmented into fixed-length windows of T = 128 time steps with a stride of S=64, which serve as the multivariate inputs to CDA-ADN. The main training configurations are summarized in Table 1.

Anomalies are decided by thresholding a reconstruction based anomaly score. The decision threshold τ is selected on a held out validation subset that contains only normal samples from the source domain by setting τ to the 95th percentile of the validation scores.

Baseline methods

To evaluate the performance of CDA-ADN, we compare it with state-of-the-art anomaly detection models, including both conventional statistical models and deep learning-based approaches.

  • Autoencoder (AE) [30]: An unsupervised deep learning model that reconstructs input data and detects anomalies based on reconstruction errors.
  • Variational Autoencoder (VAE) [6]: A probabilistic extension of AE that learns a structured latent space, improving anomaly detection generalization.
  • Anomaly Transformer (AT) [14]: A Transformer-based time-series anomaly detector that models long-range temporal dependencies and measures association discrepancy between time steps to highlight abnormal patterns.
  • Graph Deviation Network (GDN) [31]: A graph neural network based model that leverages an adaptive sensor graph to capture feature correlations and detects anomalies by measuring deviations from learned graph-structured normal patterns.

Both AE and VAE are trained on normal data and rely on reconstruction errors for anomaly detection. These methods serve as strong baselines for evaluating CDA-ADN’s ability to enhance anomaly separation and domain adaptation.

Experimental results and analysis

The performance of CDA-ADN and baseline methods on the target domain is summarized in Tables 2 and 3. To provide a more comprehensive evaluation, we report the mean and standard deviation across multiple runs for five metrics: Accuracy, MCC, Sensitivity, AUC-ROC, and AUC-PR. While Accuracy, MCC, and Sensitivity reflect thresholded classification performance, AUC-ROC and AUC-PR offer threshold-independent views that are particularly informative under class imbalance.

thumbnail
Table 2. Cross-domain detection performance from WUSTL-IIoT-2021 (source) to ACI-IoT-2023 (target). Values are reported as mean ± standard deviation over five independent runs.

https://doi.org/10.1371/journal.pone.0344009.t002

thumbnail
Table 3. Cross-domain detection performance from WUSTL-IIoT-2021 (source) to ToN_IoT (target). Values are reported as mean ± standard deviation over five independent runs.

https://doi.org/10.1371/journal.pone.0344009.t003

Tables 2 and 3 summarize the cross-domain detection results under two transfer settings, i.e., WUSTL-IIoT-2021 → ACI-IoT-2023 and WUSTL-IIoT-2021 → ToN_IoT. Across both target domains, CDA-ADN consistently achieves the best overall performance against all baselines, including reconstruction-based methods (AE, VAE), Transformer-based AT, and graph-based GDN, indicating strong robustness under domain shift.

For WUSTL-IIoT-2021 → ACI-IoT-2023 (Table 2), CDA-ADN achieves the highest Accuracy (92.8 ± 0.7%) and MCC (0.80 ± 0.02), outperforming the strongest baselines GDN (88.6 ± 0.9%, 0.73 ± 0.03) and AT (87.5 ± 1.0%, 0.70 ± 0.03). The advantage is also reflected in Sensitivity (0.88 ± 0.03) and F1-score (0.76 ± 0.03), which confirms that CDA-ADN improves anomaly recall while maintaining balanced precision-recall behaviour. In addition, CDA-ADN yields the best threshold-independent metrics, with AUC-ROC of 0.92 ± 0.01 and AUC-PR of 0.64 ± 0.03, suggesting consistently stronger discrimination across decision thresholds under class imbalance.

For WUSTL-IIoT-2021 → ToN_IoT (Table 3), the overall performance of all methods decreases, which is expected due to a larger domain gap and more challenging target distribution. Nevertheless, CDA-ADN remains clearly superior, achieving the best Accuracy (88.2 ± 0.9%) and MCC (0.77 ± 0.02), with notable improvements over GDN (82.1 ± 1.2%, 0.69 ± 0.03) and AT (81.0 ± 1.3%, 0.66 ± 0.03). CDA-ADN also attains the highest Sensitivity (0.86 ± 0.03) and F1-score (0.73 ± 0.03), indicating better anomaly detection capability in the target domain while preserving strong Specificity (0.92 ± 0.01). The AUC-ROC (0.91 ± 0.01) and AUC-PR (0.61 ± 0.03) further support that the proposed framework maintains robust ranking quality even when the transfer becomes more difficult.

Overall, the consistent gains across both transfer directions and across complementary metrics demonstrate that CDA-ADN improves cross-domain anomaly detection by jointly enhancing domain alignment and anomaly separability, yielding stable performance under varying degrees of distribution shift.

The improved performance of CDA-ADN can be attributed to its key components: input-output adaptation layers for domain alignment and contrastive learning for anomaly separation. These mechanisms enable CDA-ADN to achieve a high degree of generalization across different domains while maintaining consistent performance. As shown in Fig 3, CDA-ADN achieves the highest MCC and Sensitivity among all compared models, further confirming its superior anomaly detection capability.

thumbnail
Fig 3. MCC and sensitivity comparison across different models (mean ± standard deviation).

https://doi.org/10.1371/journal.pone.0344009.g003

Ablation study

To evaluate the contributions of different components in CDA-ADN, we conduct an ablation study by systematically removing key modules and analyzing their impact on performance. Specifically, we assess the effects of removing the multi-granularity contrastive learning module, the dynamic input–output adaptation layers, and the conditional variational encoder, as well as a reduced configuration that removes both contrastive learning and adaptation layers. The results are presented in Table 4.

thumbnail
Table 4. Ablation study results on the target domain (mean ± std).

https://doi.org/10.1371/journal.pone.0344009.t004

The results in Table 4 highlight that each component contributes meaningfully to cross-domain anomaly detection. Removing the multi-granularity contrastive learning module results in a clear drop in MCC from 0.80 to 0.73 and in sensitivity from 0.88 to 0.79. This suggests that contrastive learning enhances feature separability, making it easier to distinguish between normal and anomalous patterns under domain shift.

Removing adaptation layers also negatively impacts performance, as shown by the drop in MCC to 0.75 and sensitivity to 0.82. This confirms that the latent-guided input–output transformations effectively reduce cross-domain distribution mismatch and improve generalization to the target domain. Replacing the conditional variational encoder with a deterministic GRU autoencoder leads to a noticeable drop in MCC and sensitivity as well, indicating that probabilistic latent modeling together with KL regularization helps learn a more structured and transferable latent space.

When both contrastive learning and adaptation layers are removed, the performance deterioration is more pronounced. The MCC score drops further to 0.69, and sensitivity decreases to 0.75, indicating a substantial degradation in the model’s ability to detect anomalies. The absence of both components causes the model to rely mainly on reconstruction learning, which is insufficient for capturing domain-invariant yet discriminative features. This confirms that the contrastive module and adaptation layers provide complementary benefits and work synergistically to enhance model robustness under domain shift.

These findings underline the necessity of integrating conditional variational encoding, multi-granularity contrastive learning, and dynamic adaptation layers to achieve robust and generalizable anomaly detection. Removing any single component results in measurable degradation, and removing multiple components leads to a substantial decline in performance.

Sensitivity analysis of hyperparameters

To further assess the robustness of CDA-ADN with respect to the contrastive learning objective, we perform a sensitivity analysis on two key hyperparameters: the balance coefficient α between local and global contrastive terms, and the margin m that controls the separation between positive and negative pairs in the contrastive loss. We vary α and m over a relatively wide range while keeping all other settings fixed, and report the classification accuracy on the target domain. The default configuration used in our main experiments is and m = 0.5.

As shown in Table 5, CDA-ADN maintains consistently high accuracy in a broad and practically reasonable range of α and m. For and , the accuracy remains around 92%–93%, close to the default setting (92.8%), with only minor fluctuations. When the hyperparameters take more extreme values, such as or (0.9, 1.0), the accuracy decreases more noticeably (to about 90.7%–90.9%), indicating that excessive emphasis on either local or global contrastive structure, or an overly small/large margin, can harm performance. Overall, these results demonstrate that CDA-ADN is robust to the choice of contrastive hyperparameters within a wide range around the default configuration.

In addition to contrastive-learning hyperparameters, we further examine two implementation-critical factors specific to CDA-ADN: the number K of unlabeled normal target-domain samples used in the fine-tuning stage and the latent dimensionality of the conditional variational encoder. With fixed, increasing K from 10 to 50 and 100 improves the target-domain accuracy from about 90.8% to 92.0% and 92.8%, while further increasing to K = 200 yields only a marginal gain to roughly 93.0%, suggesting that performance saturates once a modest amount of normal target data is available. Conversely, with K = 100 fixed, varying in {16, 32, 64, 128} results in accuracies of approximately 91.3%, 92.3%, 92.8%, and 92.4%, respectively. These trends indicate that CDA-ADN already performs strongly with K in the range of 50–100 and that in the range of 32–64 provides a stable trade-off between representation capacity and robustness.

thumbnail
Table 5. Sensitivity of CDA-ADN to the balance coefficient α and margin m on the target domain (accuracy, mean ± std over five runs).

https://doi.org/10.1371/journal.pone.0344009.t005

Conclusion

This paper presents the Cross-Domain Adaptive Anomaly Detection Network (CDA-ADN), an unsupervised transfer learning framework designed for multivariate anomaly detection in IoT traffic data. By integrating a variational sequence encoder with input-output adaptation layers and contrastive learning, the model effectively captures domain-invariant representations, reducing the need for labeled target domain data. Experimental evaluations on WUSTL-IIOT-2021 and ACI-IoT-2023 as well as ToN_IoT demonstrate that CDA-ADN outperforms traditional autoencoder-based methods in terms of accuracy, MCC, and sensitivity. The ablation study further validates the contributions of contrastive learning and adaptation layers in improving domain generalization. Future work will explore the extension of this approach to dynamic adaptation strategies for handling temporal variations in IoT environments.

References

  1. 1. Börner K, Scrivner O, Cross LE, Gallant M, Ma S, Martin AS, et al. Mapping the co-evolution of artificial intelligence, robotics, and the internet of things over 20 years (1998-2017). PLoS One. 2020;15(12):e0242984. pmid:33264328
  2. 2. Cook AA, Misirli G, Fan Z. Anomaly detection for IoT time-series data: a survey. IEEE Internet Things J. 2020;7(7):6481–94.
  3. 3. Yang L, Shami A. A lightweight concept drift detection and adaptation framework for IoT data streams. IEEE Internet Things M. 2021;4(2):96–101.
  4. 4. Sharma B, Sharma L, Lal C. Anomaly detection techniques using deep learning in IoT: a survey. 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE); 2019. p. 146–9.
  5. 5. Provotar OI, Linder YM, Veres MM. Unsupervised anomaly detection in time series using LSTM-based autoencoders. 2019 IEEE International Conference on Advanced Trends in Information Theory (ATIT); 2019. p. 513–7.
  6. 6. Kingma DP, Welling M. Auto-encoding variational Bayes. arXiv:13126114 [Preprint]. 2013.
  7. 7. Zhou X, Liu S, Chen A, Chen H. Learning in CubeRes model space for anomaly detection in 3D GPR data. In: Larson K, editor. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24. International Joint Conferences on Artificial Intelligence Organization; 2024. p. 5662–70. Available from: https://doi.org/10.24963/ijcai.2024/626
  8. 8. Zhou X, Liu S, Chen A, Chen Q, Xiong F, Wang Y, et al. Underground anomaly detection in GPR data by learning in the C3 model space. IEEE Trans Geosci Remote Sens. 2023;61:1–11.
  9. 9. Lin S, Clark R, Birke R, Schönborn S, Trigoni N, Roberts S. Anomaly detection for time series using VAE-LSTM hybrid model. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2020. p. 4322–6.
  10. 10. Nizam H, Zafar S, Lv Z, Wang F, Hu X. Real-time deep anomaly detection framework for multivariate time-series data in industrial IoT. IEEE Sensors J. 2022;22(23):22836–49.
  11. 11. He J, Dong Z, Huang Y. Multivariate time series anomaly detection with adaptive Transformer-CNN architecture fusing adversarial training. 2024 IEEE 13th Data Driven Control and Learning Systems Conference (DDCLS); 2024. p. 1387–92.
  12. 12. Alturif G, Saleh W, El-Bary AA, Osman RA. Towards efficient IoT communication for smart agriculture: a deep learning framework. PLoS One. 2024;19(11):e0311601. pmid:39570960
  13. 13. Babbar H, Rani S, Driss M. Effective DDoS attack detection in software-defined vehicular networks using statistical flow analysis and machine learning. PLoS One. 2024;19(12):e0314695. pmid:39693292
  14. 14. Xu J, Wu H, Wang J, Long M. Anomaly transformer: time series anomaly detection with association discrepancy. arXiv:211002642 [Preprint]. 2022.
  15. 15. Sriram S, Vinayakumar R, Alazab M, Soman KP. Network flow based IoT botnet attack detection using deep learning. IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS); 2020. p. 189–94.
  16. 16. Wang H, Zhang H, Zhu L, Wang Y, Deng J. ResADM: a transfer-learning-based attack detection method for cyber–physical systems. Appl Sci. 2023;13(24):13019.
  17. 17. Hassan MA, Granelli F. Harnessing 1D-CNN for received power prediction in sub-6 GHz RIS: Part I. 2025 IEEE International Conference on Communications Workshops (ICC Workshops). IEEE; 2025. p. 917–22.
  18. 18. Hassan MA, Granelli F. Deep learning-driven optimal beam prediction for drone connectivity via flying-RIS and base station. 2025 IEEE 101st Vehicular Technology Conference (VTC2025-Spring); 2025. p. 1–6.
  19. 19. Zamanzadeh Darban Z, Webb GI, Pan S, Aggarwal C, Salehi M. Deep learning for time series anomaly detection: a survey. ACM Comput Surv. 2024;57(1):1–42.
  20. 20. Jin M, Koh HY, Wen Q, Zambon D, Alippi C, Webb GI, et al. A survey on graph neural networks for time series: forecasting, classification, imputation, and anomaly detection. IEEE Trans Pattern Anal Mach Intell. 2024;46(12):10466–85. pmid:39141471
  21. 21. He H, Queen O, Koker T, Cuevas C, Tsiligkaridis T, Zitnik M. Domain adaptation for time series under feature and label shifts. International Conference on Machine Learning. PMLR; 2023. p. 12746–74.
  22. 22. Wang Y, Xu Y, Yang J, Chen Z, Wu M, Li X, et al. Sensor alignment for multivariate time-series unsupervised domain adaptation. AAAI. 2023;37(8):10253–61.
  23. 23. Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. arXiv:170107875 [Preprint]. 2017. https://doi.org/10.48550/arXiv.1701.07875
  24. 24. Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC. Improved training of wasserstein gans. Adv Neural Inf Process Syst. 2017;30.
  25. 25. Zolanvari M, Teixeira MA, Gupta L, Khan KM, Jain R. Machine learning-based network vulnerability analysis of industrial Internet of Things. IEEE Internet Things J. 2019;6(4):6822–34.
  26. 26. Bastian N, Bierbrauer D, McKenzie M, Nack E. ACI IoT Network Traffic Dataset 2023; 2023.
  27. 27. Moustafa N. A new distributed architecture for evaluating AI-based security systems at the edge: network TON_IoT datasets. Sustain Cities Soc. 2021;72:102994.
  28. 28. Chicco D, Jurman G. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. Bioinformatics. 2020;36(8):250–5.
  29. 29. Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv:14126980 [Preprint]. 2014.
  30. 30. Sakurada M, Yairi T. Anomaly detection using autoencoders with nonlinear dimensionality reduction. Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis; 2014. p. 4–11.
  31. 31. Deng A, Hooi B. Graph neural network-based anomaly detection in multivariate time series. AAAI. 2021;35(5):4027–35.