Figures
Abstract
The rapid evolution of cyber threats poses significant challenges to the adaptability and performance of anomaly detection systems. This study presents an innovative hybrid deep learning framework that integrates Convolutional Neural Networks (CNN), Long Short-Term Memory networks (LSTM), and Transformer models with a novel self-learning mechanism to enhance network traffic anomaly detection. Our key contributions include: (1) a synergistic two-stage model fusion architecture that captures both spatial and temporal traffic patterns; (2) an adaptive learning mechanism with multi-metric drift detection that autonomously responds to evolving threats; and (3) a knowledge preservation strategy that maintains detection capabilities while adapting to new attack patterns. The proposed CNN-LSTM model achieves F1-scores of 0.9778 and 0.9695 on the UNSW-NB15 and CICIDS2017 datasets respectively for binary classification of normal vs. anomalous traffic. The LSTM-Transformer model further classifies specific anomaly types with accuracies of 0.9632 and 0.9528 on these datasets, representing significant improvements over recent methods. Experiments demonstrate the framework’s robustness, maintaining an average accuracy of 0.955 ( 0.005) over a 15-day simulated period with multiple induced concept drifts. The self-learning mechanism, with multi-metric drift detection and an efficient model update strategy, enables the system to detect drifts and recover performance within 23.4 ± 0.20 hours post-drift, while achieving a 92.8% detection rate for zero-day attacks. The proposed framework offers a promising direction for developing efficient and autonomous cybersecurity systems capable of handling dynamic and evolving threat landscapes.
Citation: Wang J, Huang N, Zhang H, Liu L, Fu Q, Cao K, et al. (2025) Self-learning model fusion for network anomaly detection: A hybrid CNN-LSTM-transformer framework. PLoS One 20(10): e0332502. https://doi.org/10.1371/journal.pone.0332502
Editor: Ayei Egu Ibor, University of Calabar, NIGERIA
Received: February 2, 2025; Accepted: September 1, 2025; Published: October 29, 2025
Copyright: © 2025 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data used in this study are publicly available from the UNSW-NB15 dataset (https://research.unsw.edu.au/projects/unsw-nb15-dataset) and CICIDS2017 dataset (https://www.unb.ca/cic/datasets/ids-2017.html).
Funding: This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) - Innovative Human Resource Development for Local Intellectualization program grant funded by the Korea government (MSIT) (IITP-2025-RS-2022-00156334, contribution rate: 70%) and the Liaoning Provincial Department of Science and Technology Applied Basic Research Program: Research on Intrusion Detection Technology and Intelligent Defense Strategies for Industrial Internet (Project No. 2025JH2/101300016).
Competing interests: The authors have declared that no competing interests exist.
Introduction
In the digital age, cybersecurity has become a key issue in protecting national infrastructure, business operations, and personal privacy [1]. With the popularity of Internet of Things (IoT) devices, the promotion of 5G technology, and the widespread application of cloud computing, the complexity of network connections and data traffic have increased exponentially, bringing unprecedented challenges to network security [2]. The contemporary digital landscape has witnessed an unprecedented escalation in the multifaceted nature of cyber threats [3]. This period has been characterized by a marked intensification in the frequency, magnitude, and intricacy of malicious cyber activities. The repercussions of these evolving threats extend beyond mere technological disruptions, manifesting in substantial economic ramifications on a global scale. A comprehensive analysis conducted by Cybersecurity Ventures, a respected authority in the field, projects a staggering trajectory for the financial impact of cybercrime. Their forecast indicates that the aggregate global economic burden attributable to cyber malfeasance is anticipated to reach a monumental $10.5 trillion annually by the year 2025 [4]. This projection underscores the critical imperative for robust cybersecurity measures and highlights the potential for severe economic destabilization in the absence of adequate protective strategies.
Network traffic anomaly detection plays a vital role in the first line of active defense against cyber threats [5]. An effective anomaly detection system can identify known attack patterns and discover unknown threats, providing timely warnings and response opportunities for network administrators [6]. However, traditional anomaly detection methods often exhibit limitations, such as high false positive rates, low detection efficiency, and difficulty in handling zero-day attacks [7,8]. Therefore, the development of advanced and adaptive anomaly detection technology has become a top priority in the field of network security [9].
Early anomaly detection methods relied on statistical analysis and expert systems, with Denning’s statistical model [1] laying the foundation. Machine learning techniques, including support vector machines (SVM), decision trees, and K-nearest neighbors (KNN), emerged in the early 21st century [10], offering improved detection accuracy but facing challenges in feature engineering and generalization.
The landscape of anomaly detection underwent a paradigm shift around 2010, driven by the convergence of big data infrastructures and advanced computational capabilities. Deep learning architectures began demonstrating remarkable efficacy in processing high-dimensional data and identifying complex patterns [11]. This transformation was particularly evident in network security applications, where the need for sophisticated threat detection mechanisms had become increasingly critical.
The evolution of deep learning in network anomaly detection has witnessed several significant developments. Convolutional Neural Networks (CNNs) were first adapted for network traffic analysis by Wang et al. [12], who pioneered the transformation of network data into "image-like" representations. This approach was further enhanced by Ahmad et al. [13], who developed multi-channel architectures capable of processing both raw and statistical features simultaneously. The temporal aspects of network traffic were addressed through Long Short-Term Memory (LSTM) networks, with notable work by Yang et al. [14] integrating bidirectional LSTM with attention mechanisms to capture complex sequential patterns.
Unsupervised learning approaches have also made substantial contributions to the field. Autoencoder-based systems, such as Kitsune [15] and variational autoencoders [16], have effectively addressed the challenge of limited labeled data in network environments. These methods excel in discovering hidden patterns and identifying anomalies without extensive labeled training data. The introduction of Generative Adversarial Networks (GANs) by Ring et al. [17] and their subsequent enhancement by Li et al. [18] has significantly improved model resilience and addressed data imbalance issues in anomaly detection.
Recent advances have seen the emergence of more sophisticated architectures. Graph Neural Networks (GNNs) have been successfully applied to topology-aware security analysis [19,20], while Transformer-based models have demonstrated exceptional capability in capturing long-range dependencies in network traffic [21]. These developments have been complemented by advances in ensemble learning, reinforcement learning, and federated learning approaches [22–24], which collectively enhance model generalization, adaptation, and privacy preservation.
Despite these significant advancements, several fundamental challenges persist in the field of network anomaly detection. The efficient processing and analysis of massive, high-dimensional network traffic data remains a significant technical hurdle [25]. Meeting real-time detection requirements in dynamic environments continues to pose substantial challenges [26], while the need to adapt to evolving attack patterns and unknown threats demands ongoing innovation in detection methodologies [27]. These challenges underscore the importance of developing more robust and adaptive detection frameworks that can effectively operate in complex, evolving network environments.
The landscape of network security detection continues to evolve with significant advancements in deep learning approaches. Recent works have demonstrated promising results in using various neural architectures for anomaly detection. Wang et al. [28] proposed a multi-strategy improved model for environmental monitoring that highlighted the importance of adaptive techniques in dynamic environments, while their work on wireless sensor deployment optimization [29] established the significance of balanced resource allocation in network systems. More directly relevant to our domain, Wang et al. [30] developed a CNN-BiLSTM approach for industrial intrusion detection, achieving notable improvements in accuracy over traditional methods. However, several critical challenges remain unaddressed in the current literature: (1) limited integration of spatial and temporal features, (2) inadequate adaptation to concept drift, and (3) imbalance between detection accuracy and computational efficiency.
This investigation proposes an innovative, synergistic deep learning framework to address these contemporary challenges in anomaly detection. The framework integrates the salient features of CNNs, LSTM and Transformer models, augmented by self-learning mechanisms. This amalgamation aims to enhance the model’s generalization capabilities and operational efficiency, thereby advancing the state-of-the-art in identifying complex, evolving network anomalies. Specifically, the main objectives of this study are as follows,
- The proposed integrated neural architecture synergistically captures and analyzes both spatial configurations and temporal evolution of network traffic patterns, significantly enhancing the model’s capacity for comprehensive anomaly detection;
- The adaptive learning mechanisms developed enable the model to continuously respond to emerging and evolving cyber threats, demonstrating improved resilience against novel attack vectors;
- The optimization strategies implemented for computational efficiency and model scalability, coupled with preservation of anomaly detection accuracy, ensure robust performance across diverse network landscapes and data scales, meeting real-time detection requirements in large-scale environments.
Through experimental verification, on the UNSW-NB15 [31] and CICIDS2017 [32] benchmark dataset, the following main conclusions are drawn:
- The proposed CNN-LSTM and LSTM-Transf-ormer hybrid model is superior to existing methods in detection accuracy and efficiency, and effectively integrates spatial and temporal feature analysis;
- The self-learning mechanism introduced significantly improves the detection ability of the model to new and unknown attacks, and enhances the adaptability of the system;
- Compared with traditional methods, our method shows better generalization ability and computational efficiency, and can meet the real-time detection needs in large-scale network environment.
These findings not only advance the technical capabilities in network traffic analysis but also provide new directions for developing more intelligent and adaptive network security defense systems. The remainder of this paper is organized as follows: the Preliminaries section presents the relevant background and materials. The Proposed framework section details the proposed methodology. The Experiments section describes the experimental setup and results. Finally, the Conclusions section concludes the paper with discussions on future research directions.
Preliminaries
Deep learning architectures for network traffic analysis
Modern deep learning architectures have revolutionized network traffic analysis through their ability to automatically extract complex patterns from high-dimensional data [33]. The evolution of these architectures has enabled increasingly sophisticated approaches to anomaly detection [34], particularly in handling the complex, multi-dimensional nature of network traffic data [35].
Convolutional Neural Networks (CNNs) have demonstrated remarkable capabilities in capturing spatial correlations within network traffic features. For an input traffic sequence representing
timesteps and
features, CNN applies hierarchical feature extraction through convolution operations:
, where W represents learnable filters and σ is a nonlinear activation function. This architecture’s inherent ability to learn hierarchical features has proven particularly effective in identifying spatial patterns characteristic of network attacks [36], eliminating the need for manual feature engineering that often limits traditional approaches.
The temporal aspects of network traffic necessitate architectures capable of capturing sequential dependencies [37]. Long Short-Term Memory (LSTM) networks address this challenge through a sophisticated gating mechanism that effectively prevents gradient vanishing in processing long sequential data [38]. The core operations of LSTM are governed by (1):
In this formulation, the gating mechanisms () control information flow through the network, enabling the model to capture long-term dependencies crucial for detecting evolving attack patterns [39]. The LSTM’s ability to maintain and update internal state makes it particularly effective in identifying attacks that manifest over extended time periods [40].
The Transformer architecture represents a significant advancement in sequence modeling through its introduction of self-attention mechanisms. The core attention operation is defined as (6)
where and
represent learnable transformations of the input features. This mechanism enables the model to dynamically capture global dependencies across entire traffic sequences, offering unprecedented capabilities in identifying complex attack patterns that manifest across different time scales [41]. The multi-head attention mechanism further enhances this capability by allowing the model to attend to different aspects of the input simultaneously [42].
Data drift detection in network security
Network traffic patterns exhibit inherent temporal evolution, manifesting as data drift when the distribution of traffic patterns shifts between the training (source) domain ps(x) and the operational (target) domain pt(x). This phenomenon poses a fundamental challenge to the long-term effectiveness of anomaly detection systems [43]. The performance degradation under drift can be quantified through the accuracy metric (7):
which typically decreases when .
The detection and adaptation to data drift have emerged as critical components of robust network security systems. However, traditional single-metric drift detection methods face a fundamental limitation in network security contexts: the temporal decoupling between statistical significance and functional impact. Methods like ADWIN [44] may trigger unnecessary adaptations for benign traffic changes, while performance-only approaches like DDM [45] may miss early indicators of sophisticated attacks. Our multi-metric approach addresses this challenge by requiring simultaneous validation across statistical, performance, and distributional dimensions, enabling systems to maintain detection efficacy through adaptive learning while preserving knowledge of previously encountered attack signatures.
Self-learning mechanisms
Self-learning mechanisms enable anomaly detection systems to adapt autonomously using unlabeled traffic data, addressing the practical constraints of obtaining labeled samples in operational environments. The autoencoder paradigm exemplifies this approach, learning normal traffic patterns through the optimization objective (8):
where and
represent encoder and decoder functions respectively [46]. This framework enables the identification of anomalies through reconstruction error analysis, providing a foundation for unsupervised adaptation to evolving traffic patterns.
Recent studies have introduced more sophisticated self-learning approaches, including contrastive learning for feature space optimization. The contrastive learning objective can be formulated as (9):
where represents two augmented views of the same traffic sample, and
represents the other negative samples. This approach enables the model to learn discriminative features without explicit labels.
Furthermore, adaptive threshold mechanisms have been developed to dynamically adjust detection boundaries. The adaptive threshold at time t can be updated according to (10):
where and
represent the running mean and standard deviation of the anomaly scores, and α is a sensitivity parameter that controls the detection boundary. The integration of these self-learning mechanisms with deep learning architectures has opened new possibilities for autonomous, adaptive anomaly detection systems. The combination of representation learning and adaptive mechanisms enables continuous model refinement while maintaining robustness to concept drift, a crucial capability in operational network security environments [47].
Proposed framework
Framework overview
The rapid evolution of cyber threats presents unprecedented challenges to network security systems, characterized by sophisticated attack patterns and dynamic behavior changes. Traditional approaches often struggle with these challenges, particularly in scenarios involving zero-day attacks and concept drift. We propose a novel hybrid deep learning framework, as illustrated in S1 Fig, that integrates specialized neural architectures with adaptive learning mechanisms to address these challenges effectively.
Our framework employs a hierarchical detection paradigm that decomposes network anomaly detection into two specialized stages, each optimized for specific aspects of the detection task, with detailed architecture specifications shown in Table 1. The first stage utilizes a CNN-LSTM architecture for binary classification, leveraging CNN’s capability to extract spatial correlations while LSTM captures temporal dependencies. The second stage employs an LSTM-Transformer architecture for fine-grained attack classification, where the Transformer’s self-attention mechanism enables complex pattern recognition across different traffic characteristics.
Two-stage deep model fusion
The CNN-LSTM module serves as the framework’s initial detection layer, transforming network traffic features through hierarchical convolution operations. The CNN component maps input data into a structured feature space where spatial correlations between traffic attributes are captured. For a traffic sequence with
timesteps and
features, the CNN applies feature extraction through convolution operations, enabling the identification of local patterns indicative of anomalous behavior.
The LSTM-Transformer module builds upon the binary classification results, processing identified anomalous traffic for detailed attack type classification. This stage employs bidirectional LSTM layers to create rich temporal representations, followed by a Transformer encoder that uses multi-head self-attention to capture global dependencies. This architecture enables the model to identify subtle differences between attack types while maintaining computational efficiency.
This hierarchical integration addresses the multi-scale nature of network traffic anomalies that single-model approaches cannot handle effectively. Network attacks manifest across different temporal and spatial scales—from microsecond-level packet timing anomalies to hour-long distributed attack campaigns. Independent CNN processing captures local traffic correlations but misses extended attack sequences, while standalone LSTM analysis models temporal patterns but struggles with high-dimensional feature interactions. Direct Transformer application to raw network traffic creates computational overhead that prevents real-time processing in operational environments.
The two-stage architecture optimizes detection efficiency by matching computational complexity to analysis requirements. Binary anomaly detection requires less sophisticated pattern analysis than fine-grained attack classification, making CNN-LSTM processing suitable for initial filtering. The subsequent LSTM-Transformer stage applies intensive analysis only to suspected anomalous traffic, achieving comprehensive detection while maintaining operational feasibility. This design enables the framework to process typical network speeds of 10-500 Mbps while preserving detection accuracy for complex attack patterns.
Adaptive learning mechanism
The framework implements its self-learning capability through an adaptive learning mechanism that enables autonomous response to evolving attack patterns while preserving knowledge of known threats. As illustrated in S2 Fig, this mechanism operates through multi-metric drift detection and selective model updating. The system monitors potential distribution shifts between the source domain ps(x) and target domain pt(x) through statistical testing and performance monitoring, triggering adaptation when significant changes are detected. This adaptive learning mechanism represents the core of our framework’s self-learning ability, allowing it to continuously improve detection accuracy without human intervention as new attack patterns emerge.
Our multi-metric drift detection method employs a comprehensive approach to identify distribution shifts in network traffic patterns. The drift detection is formulated as follows:
Given a reference distribution Dref from the source domain ps(x) and current traffic distribution D from the potential target domain pt(x), we compute a composite drift score using three complementary metrics:
- Statistical divergence (Dstat): We employ the Kolmogorov-Smirnov test to measure the statistical significance of distribution differences between reference and current traffic features (11):
(11)
- Performance degradation (Dperf): This metric quantifies the decline in model performance when applied to new traffic patterns (12):
(12)
where Performance() represents the model’s F1-score on the respective datasets. - Distribution shift (Ddist): We calculate the Jensen-Shannon divergence between feature distributions to capture subtle shifts that might not be reflected in immediate performance drops (13):
(13)
whereand KL represents the Kullback-Leibler divergence.
These three metrics capture different aspects of potential drift: Dstat reflects statistically significant changes, Dperf indicates functional impact on model performance, and Ddist measures intrinsic distribution changes. The composite drift score is calculated as a weighted combination (14):
where α, β, and γ are importance weights determined through experimental validation. A drift is detected when drift_score exceeds threshold , triggering the model update process. This multi-metric approach provides robustness against false positives while ensuring sensitivity to meaningful distribution shifts across diverse network environments.
Model updates employ Elastic Weight Consolidation to preserve important parameters while adapting to new patterns. When a concept drift is detected, our framework initiates a systematic recovery process through the following steps:
- Important parameter identification: The framework computes the Fisher Information Matrix F to identify parameters crucial for maintaining existing knowledge (15):
(15)
whererepresents individual model parameters and
is the probability of predicting class y given input x.
- Constrained optimization: The model parameters are updated with a loss function that balances new learning objectives against the preservation of existing knowledge (16):
(16)
whererepresents the original parameter values before update, and λ controls the strength of knowledge preservation.
This approach enables continuous adaptation to emerging threats while maintaining detection capability for known attack patterns. Through extensive experimentation, we determined that this process enables the model to recover baseline performance within 23.4 ± 0.20 hours post-drift detection, significantly faster than traditional retraining approaches.
The integration of these components creates a robust framework capable of maintaining high detection accuracy across diverse network environments while efficiently processing high-dimensional traffic data. The framework’s effectiveness is thoroughly evaluated through comprehensive experiments detailed in Sect 4.
Framework implementation
Our framework processes network traffic data through a sophisticated pipeline that transforms raw packet data into comprehensive traffic representations. The core components of our framework and their primary functions are summarized in Table 2.
The feature processing begins with traffic flow aggregation, where individual packets are grouped into meaningful flows based on their temporal and spatial relationships. This aggregation process considers both protocol-specific characteristics and statistical properties of the traffic patterns, enabling the framework to capture complex attack signatures that manifest across multiple packets or flows.
The CNN-LSTM module employs a hierarchical feature extraction approach, where convolutional layers progressively identify increasingly abstract patterns in the traffic data. Through careful architectural design, the CNN component captures local feature correlations while maintaining computational efficiency. The LSTM layers then analyze these features in their temporal context, enabling the detection of sophisticated attacks that evolve over time. The integration of these components allows the model to maintain high detection accuracy while processing traffic data in real-time.
The LSTM-Transformer module builds upon this foundation through its sophisticated attention mechanisms. By incorporating bidirectional LSTM encoding with multi-head self-attention, the module can identify subtle relationships between different aspects of anomalous traffic. This capability is particularly crucial for distinguishing between various attack types that may share similar surface characteristics but differ in their underlying patterns or intentions.
A critical aspect of our implementation is the self-learning mechanism’s integration with both detection stages. This mechanism continuously monitors the traffic distribution and model performance, automatically detecting when the current detection patterns need to be updated. The update process employs Elastic Weight Consolidation to prevent catastrophic forgetting, ensuring that the model maintains its ability to detect known attack patterns while adapting to new threats.
The framework’s effectiveness stems from the synergistic interaction between these components, enabling robust performance across diverse network environments and attack scenarios. Algorithm 1 illustrates the core detection process, while Algorithm 2 and Algorithm 3 detail the adaptive learning and drift detection mechanisms respectively. These algorithms collectively enable the framework to maintain high detection accuracy while efficiently processing high-dimensional traffic data in real-time environments.
Algorithm 1. Two-stage network anomaly detection.
Input:
Output:
1: Initialize model parameters ,
,
3: for each traffic window w in X do
4:
5:
6:
7: if score > threshold then
9:
10:
11: return
12: end if
13: end for
14: return “normal”
Algorithm 2. Adaptive learning mechanism.
Input:
Output:
1:
2:
3:
4:
5: if then
6:
7:
8:
9:
10:
11: return
12: end if
13: return M
Algorithm 3. Real-time pattern analysis.
Input:
Output:
1: Initialize pattern buffer B
2: for each window W in do
3:
4:
5:
6: if then
7:
8: end if
9: end for
10: return
Experiments
Experimental setup
Our experimental evaluation employs two widely-used network security datasets that represent different network environments and attack scenarios. The UNSW-NB15 dataset encompasses over 2.5 million records with 49 features, featuring nine distinct attack types generated in a controlled environment. The CICIDS2017 dataset contains approximately 2.8 million network flows with 78 features, captured from real-world network traffic and including 14 contemporary attack patterns. A detailed comparison of these datasets is presented in Table 3.
The data preprocessing pipeline consists of several systematic stages designed to enhance data quality and model performance:
- Data cleaning: We addressed missing values using attribute-specific strategies: categorical features were filled with the most frequent value, while numerical features were imputed using the median value to minimize the influence of outliers. Duplicate records were removed to prevent biasing the model toward redundant patterns.
- Normalization: Numerical features were standardized using z-score normalization (μ = 0, σ = 1) to ensure balanced contribution of features with different scales (17):
(17)
This step is particularly important for the gradient-based optimization in our deep learning models. - Outlier handling: We employed the Interquartile Range (IQR) method (τ = 1.5) to detect and address outliers (18):
(18)
(19)
Values outside these bounds were capped rather than removed, preserving data points while limiting their potential to skew the model training. - Feature selection and dimensionality reduction: We implemented a multi-stage feature selection process:
- Initial filtering through correlation analysis (threshold = 0.1) to eliminate highly redundant features;
- Random Forest importance ranking to identify the most discriminative features;
- Sequential Forward Selection with cross-validation to optimize the feature subset while minimizing information loss.
Through comprehensive feature analysis, we identified the most significant contributors to attack detection performance across both datasets, as shown in S3 Fig. Flow duration and packet size consistently emerge as the most crucial features, with importance scores above 0.80 across both datasets, while protocol-specific characteristics show varying degrees of importance depending on the network environment.
To address potential redundancy in the feature space, we employed Principal Component Analysis (PCA) for the LSTM-Transformer stage, retaining components that explained 95% of the variance while reducing computational complexity. For the CNN-LSTM stage, we preserved the original feature space to maintain interpretability and spatial correlations.
This systematic feature engineering approach resulted in optimized feature sets of 35 and 45 features for UNSW-NB15 and CICIDS2017 respectively, balancing model performance with computational efficiency. The selected features encompass temporal characteristics, volumetric properties, and protocol-specific attributes, providing a comprehensive representation of network traffic patterns.
All experiments were conducted using TensorFlow 2.15.0 on a system equipped with Google Colaboratory environment, ensuring consistent evaluation conditions across all comparative analyses.
Performance evaluation
Our framework demonstrates superior performance across both datasets in binary anomaly detection and multi-class attack classification tasks. On the UNSW-NB15 dataset, the framework achieves an F1-score of 0.9778 for binary detection and 0.9632 accuracy for attack classification, representing significant improvements over existing approaches. The framework’s effectiveness is further validated on the CICIDS2017 dataset, achieving F1-scores of 0.9695 and 0.9528 for binary and multi-class tasks respectively. A comprehensive comparison with existing methods is presented in Tables 4 and 5.
The results demonstrate that our integrated CNN-LSTM-Transformer framework outperforms both single-paradigm models and alternative hybrid architectures. Notably, while CTGAN shows competitive performance in binary detection, it underperforms in the more challenging multi-class scenario compared to our approach. The pure Transformer model, despite its theoretical advantages in capturing long-range dependencies, achieves lower F1-scores than our hybrid approach in both binary and multi-class detection tasks. This suggests that the integration of convolutional operations for spatial feature extraction with sequential and attention-based models for temporal analysis provides complementary capabilities that pure attention-based approaches cannot fully replicate.
The framework’s adaptability is further illustrated in S4 Fig, which shows the accuracy comparison between adaptive and non-adaptive models over a 15-day period. The adaptive model (blue line) maintains consistently high accuracy around 0.95 despite concept drift events (marked by red vertical lines), while the non-adaptive model (green line) shows significant performance degradation after each drift occurrence. This demonstrates the effectiveness of our self-learning mechanism in preserving model performance under evolving network conditions.
The superior performance can be attributed to several key factors:
- The two-stage detection architecture effectively captures both general anomaly patterns and specific attack characteristics;
- The integration of CNN-LSTM and LSTM-Transformer modules provides comprehensive feature analysis at multiple scales;
- The self-learning mechanism enables rapid adaptation to emerging threats while maintaining detection accuracy.
Particularly noteworthy is the framework’s consistent performance across both datasets, demonstrating its robustness and generalization capability. The performance improvement is more pronounced in complex attack scenarios and zero-day attack detection, where the framework achieves a 92.8% detection rate for previously unseen attack patterns.
Model analysis
To comprehensively evaluate our framework’s design choices and component contributions, we conducted extensive ablation studies on both datasets. The ablation results, presented in Tables 6 and 7, demonstrate the significant contribution of each component to the framework’s overall performance.
The ablation results illuminate the detection challenges that each component addresses and reveal why their combination proves necessary for comprehensive network anomaly detection. The base CNN-LSTM model achieves F1-scores of 0.9412 and 0.9385, establishing strong detection capability that encounters specific limitations when processing network traffic with distributed attack signatures.
Investigation of misclassified samples reveals that the base model struggles with attacks whose indicators appear across non-contiguous time windows. Consider an advanced persistent threat that establishes legitimate-appearing connections early in a session, maintains dormancy for hours, then exploits these connections for data exfiltration. The CNN-LSTM architecture processes traffic in sequential windows, making correlation of these temporally distant but functionally related events challenging.
The Transformer addition addresses this limitation directly, yielding improvements of 1.32% and 0.70% on the respective datasets. This gain magnitude reflects the Transformer’s role in handling the subset of attacks requiring global temporal correlation rather than replacing the CNN-LSTM’s local processing capabilities. Analysis of newly detected attacks shows that 89% involve multi-stage intrusions where attack indicators span extended time periods with normal traffic interspersed.
The self-learning mechanism produces the most substantial improvements—2.34% and 2.40%—by addressing the pervasive challenge of evolving attack patterns. Without adaptation, model performance degrades by an average of 8.7% over the evaluation period as attack tools and techniques evolve. The self-learning component not only prevents this degradation but enables continuous improvement through exposure to emerging threat patterns.
The integrated framework demonstrates how component interactions create detection capabilities that exceed simple additive combinations. Sequential processing creates synergy where CNN-LSTM output provides structured representations that facilitate Transformer attention computation. Raw traffic features would require the Transformer to simultaneously learn feature extraction and global correlation, reducing both efficiency and accuracy. The CNN-LSTM preprocessing enables the Transformer to focus on pattern correlation across temporal distances.
The self-learning mechanism enables dynamic component weighting based on current traffic characteristics. During periods dominated by local attack patterns, the system emphasizes CNN-LSTM contributions. When multi-stage attacks increase in frequency, Transformer influence grows proportionally. This adaptive weighting prevents over-reliance on any single component while maximizing the strengths of each.
Error recovery operates through the complementary nature of component limitations. CNN-LSTM excels at local pattern recognition but may miss distributed attack signatures. The Transformer captures global correlations but requires structured input to operate efficiently. Self-learning adapts both components to evolving threats but needs stable base detection to identify meaningful patterns. Each component compensates for the others’ constraints while amplifying their capabilities.
Different attack types reveal the specialized value of each component. Volume-based attacks like DDoS exhibit strong local signatures that CNN-LSTM detects with 96.2% accuracy. The Transformer contributes minimal improvement for these attacks since their indicators cluster temporally. Self-learning provides enhancement by adapting to evolving attack tools and amplification techniques. Stealth attacks present the opposite pattern, where advanced persistent threats achieve only 78.4% detection with CNN-LSTM alone, as these attacks deliberately distribute their signatures across time. The Transformer contributes substantial improvement by correlating these distributed indicators, while self-learning recognizes new stealth techniques as they emerge.
Zero-day attacks demonstrate the most variable performance, with CNN-LSTM achieving detection rates between 45-89% depending on similarity to known attack patterns. The Transformer provides moderate assistance through pattern generalization, but self-learning delivers the most significant improvement by rapidly incorporating new attack signatures into the detection model. This analysis demonstrates why each component addresses specific aspects of the network anomaly detection challenge and why their integration creates detection capabilities that no single component can achieve alone.
Discussion
The experimental results demonstrate both the capabilities and limitations of our proposed framework across different network environments and attack scenarios. Our evaluation on the UNSW-NB15 and CICIDS2017 datasets reveals significant improvements in detection accuracy, with F1-scores of 0.9778 and 0.9695 respectively, representing substantial advances over existing approaches. The framework’s ability to maintain such high performance levels across different network environments suggests robust generalization capabilities, particularly in handling complex attack patterns and previously unseen threats.
The 92.8% zero-day detection rate emerges from the framework’s multi-layered generalization capability and adaptive learning architecture. This performance was evaluated using unseen attack patterns that were temporally separated from training data, representing genuine zero-day scenarios where attack signatures are unavailable during model development.
The detection mechanism operates through three complementary pathways that address different aspects of zero-day threat identification. The CNN-LSTM foundation provides pattern generalization by learning abstract traffic representations that capture attack behaviors independent of specific implementation details. During training, the CNN layers learn to identify spatial correlations in traffic features that indicate malicious intent—such as anomalous packet size distributions, unusual protocol combinations, or irregular timing patterns. These learned representations enable detection of new attacks that exploit similar network vulnerabilities through different technical approaches.
The LSTM component contributes temporal generalization by modeling attack progression patterns that persist across different attack variants. Zero-day attacks often follow recognizable phases—reconnaissance, exploitation, and data exfiltration—even when specific techniques differ from known attacks. The LSTM’s memory mechanism captures these temporal signatures, enabling identification of attack sequences that follow familiar strategic patterns despite employing novel tactical approaches. The Transformer component provides the most sophisticated zero-day detection capability through its attention-based correlation of distant traffic events. Many advanced zero-day attacks employ distributed strategies where attack indicators appear across non-contiguous time periods to evade detection. The Transformer’s self-attention mechanism enables correlation of these distributed signatures, identifying attack patterns that span extended temporal ranges. This capability proves particularly valuable for detecting advanced persistent threats that establish legitimate-appearing connections before exploiting them for malicious purposes.
The self-learning mechanism enhances zero-day detection through continuous model refinement based on operational feedback. When the system encounters borderline cases where initial classification confidence is low, the multi-metric drift detection evaluates whether these cases represent genuine anomalies requiring investigation. Confirmed zero-day attacks trigger focused parameter updates that incorporate new attack signatures while preserving existing knowledge through Elastic Weight Consolidation. This creates an evolutionary learning process where the system’s zero-day detection capability improves through exposure to diverse threat patterns.
Compared to traditional approaches, our framework addresses fundamental limitations in zero-day detection methodologies. Signature-based systems fail completely against zero-day attacks due to their reliance on predefined attack patterns. Single-model anomaly detection systems, while capable of identifying novel patterns, typically generate excessive false positives that make operational deployment impractical. The CNN-LSTM baseline in our ablation study achieves 0.9412 F1-score, representing strong detection capability that nonetheless misses complex attacks requiring global temporal correlation.
Pure deep learning approaches without adaptive capabilities suffer from concept drift, where performance degrades as attack patterns evolve beyond the training distribution. Our self-learning mechanism directly addresses this limitation, enabling sustained performance against evolving zero-day threats. The 23.4-hour recovery time following drift detection demonstrates the system’s ability to rapidly incorporate new threat intelligence while maintaining operational effectiveness.
The framework’s zero-day detection capability also demonstrates robustness against adversarial evasion attempts. The multi-component architecture creates multiple detection pathways that attackers must simultaneously evade to avoid detection. An attack that successfully evades CNN-based spatial analysis may still be detected through LSTM temporal analysis or Transformer global correlation. This architectural redundancy, combined with continuous learning adaptation, creates a moving target that complicates adversarial attack development.
The two-stage detection architecture balances detection accuracy with computational efficiency through selective processing. Table 8 shows the framework processes 1250 samples per second for binary classification and 950 samples per second for attack classification, with memory usage of 2.8 GB and 3.5 GB respectively.
The LSTM-Transformer component operates with complexity for self– attention and
for LSTM processing, where T represents sequence length and d denotes feature dimensions. This complexity remains manageable because only 8-12% of network traffic typically requires the computationally intensive second-stage analysis. The CNN-LSTM stage filters traffic at line speed, reserving Transformer processing for suspected anomalous patterns.
The framework’s processing capacity suits typical network monitoring requirements. The two-stage design enables efficient resource utilization by applying intensive Transformer processing only to traffic requiring detailed analysis. High-throughput environments can maintain real-time performance through batch processing and GPU acceleration for the LSTM-Transformer component when needed.
Memory utilization scales linearly with batch size rather than sequence length, as both LSTM hidden states and attention weights operate within fixed window sizes. The framework maintains detection performance above 95% across different batch sizes, enabling flexible deployment based on available computational resources. Optimization techniques including attention pruning and weight quantization can reduce computational requirements by 20-30% and memory usage by 50% respectively, with minimal impact on detection accuracy.
Practical implications
This research offers several actionable insights for operational network security implementations. The framework’s practical applicability spans multiple dimensions of cybersecurity operations:
The adaptive recovery mechanism provides a defined resilience metric for security operations planning, enabling organizations to quantify their vulnerability window and allocate resources accordingly. This represents a significant operational advantage over systems requiring manual reconfiguration after detecting novel attacks.
For zero-day threat mitigation, the framework’s 92.8% detection capability can be implemented through an integration protocol where suspicious traffic identified by the CNN-LSTM component triggers escalated scrutiny and potential containment actions before attack vectors can fully materialize.
Resource provisioning can be precisely calculated using the computational efficiency metrics in Table 8. With 1250 samples/second processing capacity in binary detection mode, organizations can determine hardware requirements as a direct function of their network traffic volume, optimizing capital expenditure while maintaining detection efficacy.
The hierarchical detection architecture enables implementation of tiered security response policies, where routine traffic undergoes baseline screening while anomalous patterns receive comprehensive analysis through the LSTM-Transformer component, creating an efficient resource allocation model aligned with risk-based security management principles.
For organizations with established security operations centers (SOCs), the framework provides multiple integration pathways with existing cybersecurity infrastructure. The CNN-LSTM component serves as an intelligent preprocessing layer for traditional intrusion detection systems, reducing computational overhead while extending detection coverage to unknown threats. Integration with SIEM platforms enables the LSTM-Transformer component to provide enriched threat analysis and automated incident classification. The self-learning mechanism operates through feedback loops where analyst confirmations automatically refine detection parameters, reducing operational overhead while accelerating adaptation to emergent threats.
The modular architecture supports distributed deployment scenarios where organizations can coordinate threat detection across network boundaries through federated learning approaches, enabling collaborative security intelligence while preserving data privacy. Multi-site implementations can deploy specialized framework instances for different network segments, with centralized coordination providing comprehensive threat visibility across diverse infrastructure environments.
These implementation pathways demonstrate the framework’s versatility across diverse operational contexts, providing tangible cybersecurity enhancements for organizations regardless of their security maturity level.
Conclusions
This research advances the field of network security through a novel approach to anomaly detection, introducing a synergistic framework that combines deep learning architectures with autonomous learning capabilities. Our work demonstrates that integrating spatial-temporal feature analysis with adaptive learning mechanisms can significantly enhance the robustness and reliability of network security systems. The framework’s success across different network environments and attack scenarios validates the effectiveness of our hierarchical detection approach and highlights the potential of self-learning systems in cybersecurity applications. Beyond the immediate technical achievements, this research opens new pathways for developing more intelligent and autonomous security systems. The framework’s ability to maintain effectiveness while adapting to emerging threats represents a significant step toward self-evolving cybersecurity solutions. This advancement is particularly relevant in the context of increasingly sophisticated cyber threats and the growing complexity of network infrastructures. Our findings suggest that future cybersecurity systems will benefit from deeper integration of adaptive learning mechanisms and context-aware detection strategies.
The broader implications of this work extend to the practical aspects of network security management, where the balance between detection accuracy and operational efficiency remains a critical challenge. Our research demonstrates that through careful architectural design and innovative learning mechanisms, it is possible to achieve both high detection accuracy and practical deployment efficiency. This understanding will guide future developments in autonomous security systems, contributing to the evolution of more resilient and adaptable cybersecurity solutions.
Despite these contributions, we recognize several limitations that indicate promising research directions. Further validation in specialized network environments such as industrial control systems and IoT deployments would enhance generalizability. Reducing dependency on labeled training data through transfer learning and few-shot learning techniques represents another critical area for investigation. Additionally, enhancing the framework’s resilience against adversarial evasion attempts and improving model explainability would increase practical utility for security analysts. These areas of future research align with our vision of increasingly autonomous, efficient, and trustworthy anomaly detection systems capable of meeting emerging cybersecurity challenges.
Supporting information
S2 Fig. Visualization of the adaptive learning process showing drift detection and model updating.
https://doi.org/10.1371/journal.pone.0332502.s002
(TIF)
S4 Fig. A line graph showing the accuracy performance of adaptive and non-adaptive models about the 15-day period.
https://doi.org/10.1371/journal.pone.0332502.s004
(TIF)
References
- 1. Denning DE. An intrusion-detection model. IIEEE Trans Software Eng. 1987;SE-13(2):222–32.
- 2. Thakkar A, Lohiya R. A review on machine learning and deep learning perspectives of IDS for IoT: recent updates, security issues, and challenges. Arch Computat Methods Eng. 2020;28(4):3211–43.
- 3.
Cevallos MJF, Rizzardi A, Sicari S, Coen Porisini A. Deep reinforcement learning for intrusion detection in Internet of Things: best practices, lessons learnt, and open challenges. 2023.
- 4. Lallie HS, Shepherd LA, Nurse JRC, Erola A, Epiphaniou G, Maple C, et al. Cyber security in the age of COVID-19: a timeline and analysis of cyber-crime and cyber-attacks during the pandemic. Comput Secur. 2021;105:102248. pmid:36540648
- 5. Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A. Cybersecurity data science: an overview from machine learning perspective. J Big Data. 2020;7(1).
- 6. Ferrag MA, Maglaras L, Moschoyiannis S, Janicke H. Deep learning for cyber security intrusion detection: approaches, datasets, and comparative study. Journal of Information Security and Applications. 2020;50:102419.
- 7. Ring M, Wunderlich S, Scheuring D, Landes D, Hotho A. A survey of network-based intrusion detection data sets. Computers & Security. 2019;86:147–67.
- 8. Yuan X, Huang B, Wang Y, Yang C, Gui W. Deep learning-based feature representation and its application for soft sensor modeling with variable-wise weighted SAE. IEEE Trans Ind Inf. 2018;14(7):3235–43.
- 9.
Zhou Y, Han M, Liu L, He JS, Wang Y. Deep learning approach for cyberattack detection. In: IEEE INFOCOM 2018 -IEEE conference on computer communications workshops (INFOCOM WKSHPS). IEEE; 2018. p. 262–7.
- 10.
Parvaneh S, Rubin J. Electrocardiogram monitoring and interpretation: from traditional machine learning to deep learning, and their combination. In: 2018 Computing in Cardiology Conference (CinC). IEEE; 2018. p. 1–4.
- 11. Singh G, Khare N. A survey of intrusion detection from the perspective of intrusion datasets and machine learning techniques. International Journal of Computers and Applications. 2021;44(7):659–69.
- 12. Wang W, Sheng Y, Wang J, Zeng X, Ye X, Huang Y, et al. HAST-IDS: learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection. IEEE Access. 2018;6:1792–806.
- 13. Ahmad Z, Shahid Khan A, Wai Shiang C, Abdullah J, Ahmad F. Network intrusion detection system: a systematic study of machine learning and deep learning approaches. Trans Emerging Tel Tech. 2020;32(1):e4150.
- 14. Yang Y, Zheng K, Wu C, Yang Y. Improving the classification effectiveness of intrusion detection by using improved conditional variational autoencoder and deep neural network. Sensors (Basel). 2019;19(11):2528. pmid:31159512
- 15.
Mirsky Y, Doitshman T, Elovici Y, Shabtai A. Kitsune: an ensemble of autoencoders for online network intrusion detection. arXiv preprint 2018. https://doi.org/abs/1802.09089
- 16.
Chen J, Sathe SK, Aggarwal CC, Turaga DS. Outlier detection with autoencoder ensembles. In: SDM 2017. https://api.semanticscholar.org/CorpusID:27006739
- 17. Ring M, Schlör D, Landes D, Hotho A. Flow-based network traffic generation using generative adversarial networks. Computers & Security. 2019;82:156–72.
- 18.
Li D, Chen D, Jin B, Shi L, Goh J, Ng S-K. MAD-GAN: multivariate anomaly detection for time series data with generative adversarial networks. Lecture Notes in Computer Science. Springer; 2019. p. 703–16. https://doi.org/10.1007/978-3-030-30490-4_56
- 19. Zhang Z, Cui P, Zhu W. Deep learning on graphs: a survey. IEEE Trans Knowl Data Eng. 2022;34(1):249–70.
- 20. Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, et al. Graph neural networks: a review of methods and applications. AI Open. 2020;1:57–81.
- 21. Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M. Transformers in vision: a survey. ACM Comput Surv. 2022;54(10s):1–41.
- 22.
Bonawitz K, Eichner H, Grieskamp W, Huba D, Ingerman A, Ivanov V. Towards federated learning at scale: system design. arXiv preprint 2019. https://arxiv.org/abs/1902.01046
- 23. Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Nitin Bhagoji A, et al. Advances and open problems in federated learning. FNT in Machine Learning. 2021;14(1–2):1–210.
- 24. Li Q, Wen Z, Wu Z, Hu S, Wang N, Li Y, et al. A survey on federated learning systems: vision, hype and reality for data privacy and protection. IEEE Trans Knowl Data Eng. 2023;35(4):3347–66.
- 25. Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS. A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst. 2021;32(1):4–24. pmid:32217482
- 26. Pang G, Shen C, Cao L, Hengel AVD. Deep learning for anomaly detection. ACM Comput Surv. 2021;54(2):1–38.
- 27. Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A. A survey on concept drift adaptation. ACM Comput Surv. 2014;46(4):1–37.
- 28. Wang J, Wang N, Wang H, Cao K, El-Sherbeeny AM. GCP: A multi-strategy improved wireless sensor network model for environmental monitoring. Computer Networks. 2024;254:110807.
- 29. Wang J, Luo D, Peng F, Chen W, Liu J, Zhang H. Wireless sensor deployment optimisation based on cost, coverage, connectivity, and load balancing. IJSNET. 2023;41(2):126.
- 30. Wang J, Si C, Wang Z, Fu Q. A new industrial intrusion detection method based on CNN-BiLSTM. Computers, Materials & Continua. 2024;79(3).
- 31.
Moustafa N, Slay J. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS). 2015. p. 1–6. https://doi.org/10.1109/milcis.2015.7348942
- 32. Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers & Security. 2012;31(3):357–74.
- 33. Wang Y, Yao H, Zhao S. Auto-encoder based dimensionality reduction. Neurocomputing. 2016;184:232–42.
- 34. Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE. A survey of deep neural network architectures and their applications. Neurocomputing. 2017;234:11–26.
- 35. Ahmad S, Lavin A, Purdy S, Agha Z. Unsupervised real-time anomaly detection for streaming data. Neurocomputing. 2017;262:134–47.
- 36. Khraisat A, Alazab A. A critical review of intrusion detection systems in the internet of things: techniques, deployment strategy, validation strategy, attacks, public datasets and challenges. Cybersecur. 2021;4(1).
- 37. Zhang C, Patras P, Haddadi H. Deep learning in mobile and wireless networking: a survey. IEEE Commun Surv Tutorials. 2019;21(3):2224–87.
- 38. Shone N, Ngoc TN, Phai VD, Shi Q. A deep learning approach to network intrusion detection. IEEE Trans Emerg Top Comput Intell. 2018;2(1):41–50.
- 39.
Du M, Li F, Zheng G, Srikumar V. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017.
- 40. Yin C, Zhu Y, Fei J, He X. A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access. 2017;5:21954–61.
- 41.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017. p. 6000–10.
- 42. Xiao F, Zhang Z, Yao Y. CTNet: hybrid architecture based on CNN and transformer for image inpainting detection. Multimedia Systems. 2023;29(6):3819–32.
- 43. Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G. Learning under concept drift: a review. IEEE Trans Knowl Data Eng. 2018;:1–1.
- 44.
Bifet A, Gavaldà R. Learning from time-changing data with adaptive windowing. In: SDM. 2007. https://api.semanticscholar.org/CorpusID:2279539
- 45.
Gama J, Medas P, Castillo G, Rodrigues P. Learning with drift detection. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer; 2004. p. 286–95. https://doi.org/10.1007/978-3-540-28645-5_29
- 46.
Kingma DP, Welling M. Auto-encoding variational bayes. CoRR. 2013. https://doi.org/abs/1312.6114
- 47.
Bifet A, Holmes G, Pfahringer B, Gavaldà R. Detecting sentiment change in Twitter streaming data. In: Workshop on applications of pattern analysis; 2011. https://api.semanticscholar.org/CorpusID:34675234
- 48. Du J, Yang K, Hu Y, Jiang L. NIDS-CNNLSTM: network intrusion detection classification model based on deep learning. IEEE Access. 2023;11:24808–21.
- 49. Xu L, Xu K, Qin Y, Li Y, Huang X, Lin Z, et al. TGAN-AD: transformer-based GAN for anomaly detection of time series data. Applied Sciences. 2022;12(16):8085.
- 50. Soflaei MRAB, Salehpour A, Samadzamini K. Enhancing network intrusion detection: a dual-ensemble approach with CTGAN-balanced data and weak classifiers. J Supercomput. 2024;80(11):16301–33.