Few-Shot network intrusion detection based on prototypical capsule network with attention mechanism

Handi Sun; Liang Wan; Mengying Liu; Bo Wang

doi:10.1371/journal.pone.0284632

Abstract

Network intrusion detection plays a crucial role in ensuring network security by distinguishing malicious attacks from normal network traffic. However, imbalanced data affects the performance of intrusion detection system. This paper utilizes few-shot learning to solve the data imbalance problem caused by insufficient samples in network intrusion detection, and proposes a few-shot intrusion detection method based on prototypical capsule network with the attention mechanism. Our method is mainly divided into two parts, a temporal-spatial feature fusion method using capsules for feature extraction and a prototypical network classification method with attention and vote mechanisms. The experimental results demonstrate that our proposed model outperforms state-of-the-art methods on imbalanced datasets.

Citation: Sun H, Wan L, Liu M, Wang B (2023) Few-Shot network intrusion detection based on prototypical capsule network with attention mechanism. PLoS ONE 18(4): e0284632. https://doi.org/10.1371/journal.pone.0284632

Editor: Mahdi Zareei, Technologico de Monterrey, MEXICO

Received: August 31, 2022; Accepted: April 4, 2023; Published: April 20, 2023

Copyright: © 2023 Sun et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data are available from https://www.unb.ca/cic/datasets/ids-2018.html.

Funding: Natural Science Foundation of China No. 62262004. (http://www.nsfc.gov.cn). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: No.

1. Introduction

Network intrusion detection is particularly important for network security, which ensures network security by distinguishing network attack traffic from normal network traffic [1]. Nowadays, deep learning (DL) as one of the most popular technologies has been applied to intrusion detection by many researchers. Deep learning-based intrusion detection is essentially a classification task that constructs a classification model by learning from a training set to identify network attack traffic. It has been shown in many studies that using DL can make intrusion detection models stable and high detection rate [2–6].

However, there is few instances of malicious traffic in the real network, so most intrusion detection datasets are imbalanced [7,8], that is, there is a significant class imbalance in the dataset. To address this, many researchers balance the dataset by increasing or decreasing the number of samples [9–11]. But as mentioned in [12], undersampling may lead to overfitting because of few samples, and it is difficult for oversampling to generate data that fit the real distribution.

To solve the above problem, we use Few-Shot Learning (FSL), which aims to achieve better classification performance using small amounts of labeled data [13]. The core of FSL is a similar process to human learning: it utilizes previous information to learn new tasks and requires not much data on new tasks. Research shows that FSL can make up for some shortcomings of DL, for instance, reducing the time spent collecting and labeling large datasets [14]. Therefore, FSL will be an effective method to solve the data imbalance problem caused by insufficient samples in network intrusion detection, and a scarcity of anomalous data is more realistic in real networks, FSL is becoming an alternative to traditional DL methods to simulate a more realistic environment.

This paper makes the key contributions to addressing the existing issues above, and is summarized as follows:

We propose a prototypical network classification model with attention and vote mechanisms. The spatial attention mechanism is used in the calculation of the class center to assign a weight to each region of the feature map and select more informative feature regions, to obtain more representative class centers. At the same time, when calculating the similarity between the sample and the class center, a voting mechanism is added to improve the few-shot classification ability.
We propose a capsule-based temporal-spatial feature fusion model for feature extraction. We replace neurons for pooling with capsules for less information loss in CNN. At the same time, we fuse the temporal-spatial features of traffic through the automatic feature extraction capability of DL to obtain a more representative feature representation and improve the detection effect of the model.
We improve the evaluation indicator by adding the proportion of few-shot categories in the calculation so that the evaluation results can better reflect the detection effect of small sample categories.

The rest of the paper is organized as follows: Section 2 presents the related work on techniques of the proposed method; Section 3 describes the proposed method in detail; Section 4 demonstrates the experiments to demonstrate the effectiveness of our method and compare it with state-of-the-art methods; Finally, Section 5 concludes this paper.

2. Related works

In this section, several issues related to the content of this paper are discussed, including network intrusion detection, FSL, and FSL in intrusion detection.

2.1 Network intrusion detection

In recent years, due to the rise of Artificial intelligence, researchers have applied DL to intrusion detection, which has made great achievements in detection efficiency and effect.

Zhang et al. [15] used the original data rather than statistical features, which reduced the loss of data information. In addition, they integrated an improved LeNet-5 and Long short term memory (LSTM) with better experimental results. Zhong et al. [16] proposed an intrusion detection framework that combines LSTM with AutoEncoder. They utilized AutoEncoder to calculate abnormal scores to flag network traffic and used these markers to train LSTM. The experimental results compared to Support Vector Machine (SVM), Isolation Forests (IF), and Gaussian Mixture Models (GMM). Li et al. [17] used the PCA algorithm to extract raw traffic features and proposed an intrusion detection model based on Transformer, which improves the detection ability of imbalanced datasets. Wei et al. [18] developed an attention-based LSTM model for more accurate detection. Lei et al. [19] leveraged multi-feature correlation for feature selection and applied the CNN with attention mechanism to capture features. Gupta et al. [20] suggested a cost-sensitive deep neural network, which assigns large weight to abnormal samples making it more costly of distinguishing the abnormal wrongly. Bedi et al. [21] proposed a two-layer ensemble structure for intrusion detection. The first layer integrates three structures for binary classification to identify attacks. These attacks are then sent to the second layer to be identified as different attack classes using multi-class eXtreme Gradient Boosting (XGBoost).

2.2 Few-shot learning

In the case of extremely limited training samples, FSL improves performance on new tasks with previous knowledge. FSL can be classified in three ways, data, model, or tuning algorithms [22].

The first type of FSL is mainly based on data enhancement. The sample size of FSL is usually small, so some data is generated from a small amount of original data to make the model perform well. A common method is to learn a generator by using the auxiliary labeled dataset. Andresini et al. [23] used generative adversarial network (GAN) for data enhancement. Their method leads to better detection accuracy when compared to other methods on four benchmark datasets. Although data augmentation is a straightforward approach, the resulting dataset is often task-specific and not easily extendable to other types of data, such as text or audio, which contain structural and grammatical information.

The second type of FSL is based on the model. The main models of FSL include siamese network [24], matching network [25], prototypical network [26], and relational network [27]. Zhang et al. [28] proposed a Contrastive learning-based Task Adaptation model (CTA) for few-shot intent recognition. They improve the prototypical network by changing the computing class prototype into the computing task prototype to improve the classification effect.

The third type of FSL is based on tuning algorithms. Xing et al. [29] brought dictionary learning methods into FSL and mapped feature embeddings to a more discriminative subspace for specific tasks. The experimental results show that the performance of the method has been improved.

2.3 Few-shot learning in intrusion detection

In intrusion detection, most methods are based on data enhancement and models, and few methods are based on tuning algorithms. Therefore, we mainly introduce the application of the first two kinds of methods in intrusion detection.

Iliyasu et al. [30] proposed a method for few-shot intrusion detection leveraging discriminative representation learning with a Supervised AutoEncoder. They first used known samples to train discriminant AutoEncoders for feature extraction and then used the AutoEncoder to fit a classification model with new attack categories in the stage of few-shot detection. Xu et al. [31] proposed a meta-learning framework to implement few-shot intrusion detection. This method defines a binary classification task, and constructs pairs of network traffic samples, including normal samples and malicious samples, for the model training. The experimental results show that the method is universal and performs well. Wang et al. [32] proposed a Siamese capsule network and an unsupervised subtype sampling scheme to solve the problem of insufficient training data of network attack traffic. Yu et al. [33] utilized the Siamese network as the classification model, consisting of two two-layer CNN. The method achieves an accuracy of 99.99% on the CIC-IDS-2017 and CIC-IDS-2018 datasets. To increase the amount of training data, Ye et al. [34] designed a pseudo sample generation algorithm called Latent Dirichlet Allocation (LDA). In real scenarios demonstrate, the experimental results show that the proposed method can effectively detect malicious traffic when only a few samples are learned. Wang et al. [35] ranked statistical features and used CNN to generate new features. The new features were combined with the original features and fed into a prototypical network for classification. Although this method achieved good detection results at the time, they used statistical feature data and the dataset used was not new enough.

3. The proposed method

In this section, we design a few-shot intrusion detection method based on a prototypical capsule network with the attention mechanism according to reference [28]. Fig 1 is the architecture of our method. Before entering the data into the neural network, we preprocess the data to make it conform to the input format of the neural network. During the training and testing phase, respectively, we design a temporal-spatial feature fusion method using capsules for feature extraction and a prototypical network classification method with spatial attention and vote mechanisms. In this section, we describe each of these modules in detail.

Download:

Fig 1. The architecture of the proposed method.

https://doi.org/10.1371/journal.pone.0284632.g001

3.1 Data preprocessing

This paper uses the dataset of Pcap files, CSE-CIC-IDS2018, instead of feature-ready CSV files. The structure of Pcap files is shown in Fig 2. The Pcap header contains file information, such as version number, timestamp, and file start flag. There are multiple packets after the Pcap header, each of them containing a packet header and packet data. However, the Pcap file cannot be directly input to the neural network, so we preprocess the data into the neural network input format.

Download:

Fig 2. The structure of Pcap files.

https://doi.org/10.1371/journal.pone.0284632.g002

Network traffic granularity affects the analysis of data format and data distribution. Dainotti et al. [36] summarized five types of granularity commonly used in network traffic research, including TCP connections, flows, sessions, services, and hosts. We extract the quintuple (source IP address-source port, destination IP address-destination port, protocol) from the Pcap file, and integrate multiple packets into session samples according to the quintuple. Next, we anonymize addresses (both MAC addresses and IP addresses), because the model may classify sessions based on addresses only. There are two ways to anonymize, using random numbers of the same length or setting all addresses to the same, and we do the latter. We replace all MAC addresses with 00:00:00:00:00:00 and all IP addresses with 0.0.0.0. The session is byte data encoded in hexadecimal, so we need to convert the byte to numeric. A byte can represent a value of 0–255, which is consistent with pixel values in the image, so we convert a session to a pixel value, and the resulting image is shown in Fig 3. Fig 4 shows a session instance of split, anonymize, and encode. In addition, the neural network input size is limited, so the sample size N must be unified, and N will be determined experimentally later.

Download:

Fig 3. Visualization of a session sample.

https://doi.org/10.1371/journal.pone.0284632.g003

Download:

Fig 4. An instance of data preprocessing.

https://doi.org/10.1371/journal.pone.0284632.g004

3.2 Temporal-spatial feature fusion method using capsules

Network traffic has not only spatial characteristics, but also temporal characteristics, only one of them as the detection object is not comprehensive. Therefore, we design a temporal- spatial feature fusion model using capsules for feature extraction, and its structure shows in Fig 5.

Download:

Fig 5. Temporal-spatial feature fusion model using capsule.

https://doi.org/10.1371/journal.pone.0284632.g005

3.2.1 Spatial feature extraction.

Although Convolutional neural networks (CNN) are often used to extract spatial features, there are certain limitations. First of all, data is transmitted between neurons as scalars. Since the scalar has only content and no direction, there are certain defects in the spatial position relationship between CNN identification features. While the feature location of network traffic is very sensitive, the confusion of location relationships will inevitably affect the accuracy of classification results. Secondly, there are max-pooling levels to explore the relationship between features in classical CNN, which will lead to the loss of high-level feature information extracted from the network. For few-shot data, both lack of samples and loss of feature information will undoubtedly affect the classification accuracy.

We utilize Capsule Networks (CapsNet) to extract spatial features of samples. Since network intrusion usually produces very significant local features, compared to other DL methods, CapsNet has the unique advantage of using local features. Meanwhile, dynamic routing of CapsNet is used to avoid feature loss caused by pooling operation. CapsNet is mainly composed of the main capsule layer and digital capsule layer. The operation process can be divided into three steps: the first step is matrix transformation, which is formulated as follows (1) where u_i is the output of the low-level capsule, W_ij is the weight matrix between capsule layers, reflecting the spatial relationship between low-level features and high-level features. u_j|i is the output of the high-level capsule predicted by the low-level capsule. The second step is input weighting, weighting and summing the prediction vector to get the output vector. The mathematical formula is as follows: (2) where the parameters are defined by a dynamic routing algorithm. The third step is nonlinear transformation. The capsule network uses a new activation function, called Squash function. The activation function formula can be expressed as (3)

The calculation process of dynamic routing is shown in Fig 6. During the calculation, b_ij is initialized to 0, c_ij is obtained by calculating b_ij with the following formula: (4) b_ij is updated by the following formula: (5)

Download:

Fig 6. The calculation process of dynamic routing.

https://doi.org/10.1371/journal.pone.0284632.g006

The above process is cycled according to the number of dynamic routing iterations to finally obtain a set of optimal parameters.

3.2.2 Temporal feature extraction.

From the perspective of time characteristics, network traffic is a series of data packets that are consecutive in time. In this paper, LSTM is used to learn the temporal characteristics of samples. LSTM is a variant of Recurrent Neural Network (RNN), which is a good way to solve the gradient explosion and gradient disappearance problems of simple RNN. After the features of samples are generated by the two models respectively, the features are fused to the temporal-spatial features of samples and then passed into the classification model.

3.3 Prototypical network with attention and vote

In this paper, the prototypical network is employed as a classification model to accomplish the detection task. Its overall structure is shown in Fig 7. Based on the prototypical network, the spatial attention mechanism and the voting mechanism are added to improve performance. In this section, we introduce the proposed classification method from the perspective of two mechanisms.

Download:

Fig 7. Proposed classification method.

https://doi.org/10.1371/journal.pone.0284632.g007

3.3.1 Calculate prototypes using attention.

The basic task of network intrusion detection is to classify network traffic samples with a classifier. There are K samples and labels in task D = {(x₁,y₁),…(x_i,y_i)…(x_k,y_k)}, x_i∈R^h×w, y_i∈{0,1,…n}. The purpose of this task is to build a classification model f whose input is the sample x_i and output is the predicted value of the corresponding label y_i of the sample. Generally in DL, the number of samples K is large and divided into train set and test set. We use few-shot learning, instead of focusing on a specific task, the model builds a task model F, and learns from the task in the task set T = {T_A,T_B,T_C…}to complete a new task T_N. The process of constructing a task set is as follows: in the first step, N classes are randomly selected from the dataset, and 2K samples are randomly selected from each class. In the second step, from the N*2K samples extracted, K samples of each class are randomly selected as the train set of this task, and the remaining K samples of each class form the test set. In few-shot learning, the train set in each task is renamed the support set, and the test set is renamed the query set.

In this paper, we use the prototypical network. All extracted features of each class in the support set are summed and averaged to obtain the center of each class, which is the prototype. When a new sample is input into the network, the similarity between its features and all prototypes is calculated, and the maximum similarity is used to determine which category the sample belongs to.

However, we believe that it is too simple to compute the prototype by arithmetic mean. The information content of different regions in the feature map is not the same, and the arithmetic mean cannot take advantage of this, which may lead to an inaccurate prototype, and affect the performance of the classification model. Therefore, We introduce spatial attention so that the model selectively outputs information, concentrating on key information and obtaining a more accurate prototype. Its process is unfolded in Fig 8. Suppose that sample (k = 1,…K, c = 1,…C) represents the kth sample selected from category C, and f_θ () is its feature map. We calculate the weight of each region in the feature map to obtain the attention map f_att () of the sample, the formula is as follows: (6) where average pooling and maximum pooling are carried out to get f_Max () and f_Avg (). [.] represents the splicing operation, σ is the linear function Sigmoid. Then, the weighted feature map f_ATT () is obtained by multiplying f_θ () with f_att (). In the training stage, the prototype C* of the class is calculated according to the formula (7).

(7)

Download:

Fig 8. The process of spatial attention.

https://doi.org/10.1371/journal.pone.0284632.g008

3.3.2 Vote classification.

After computing the prototype, the samples of the query set are input into the model, and the similarity between their weighted feature maps and each prototype is calculated to complete the classification. To make the classification results more fault-tolerant, we not only use Euclidean distance but also cosine distance and Manhattan distance when calculating the similarity between samples and prototype. The formulas are shown in (8)–(10): (8) (9) (10) where the Euclidean distance measures linear distance, the cosine distance measures the directional difference, and the Manhattan distance calculates the total absolute wheelbases. We take the class corresponding to the minimum value of the three kinds of similarity as the classification result, then vote on the three classification results, and choose the classification result with more proportion as the final prediction result of the model.

4. Experiments

The experiments in this section have the following three objectives:

Choose the better values of the significant parameter: the length of session samples N.
Perform ablation experiments to verify the validity of the model.
Compare our method with the current related research works in terms of evaluation indicators, focusing on the detection effect of the few-shot class.

4.1 Dataset and evaluation indicators

In this paper, we selected CSE-CIC-IDS2018, a relatively new dataset in the field of intrusion detection. CSE-CIC-IDS2018 was generated by simulating the network traffic distribution of the Internet. It contains a large amount of original network traffic with extremely imbalanced attack data, makes it very suitable for our study. Considering the classification effect of few-shot classes in this paper, we select the categories with a small number of samples in the dataset and sample some categories with a large number of samples to form the dataset of this paper. The resulting dataset is shown in Table 1.

Download:

Table 1. The dataset of experiments.

https://doi.org/10.1371/journal.pone.0284632.t001

To calculate the performance of the proposed method, evaluation indicators of Accuracy, Precision, Recall, and F1-Score are used as follows: (11) (12) (13) (14) where TP is the True Positive, FP is the False Positive, FN is the False Negative, and TN is the True Negative. Because of the imbalanced dataset, the Accuracy cannot reflect the true validity of our proposed model. Even if the model identifies malicious samples as normal samples, the Accuracy is still quite high due to the large proportion of normal samples. However, such high Accuracy is meaningless for intrusion detection, because few-shot attacks are identified as normal. In fact, the detection effect of these few-shot categories is very poor. Therefore, we use multiple evaluation indicators that more accurately respond to the effectiveness of the model.

To comprehensively measure all categories of the multi-classification task, calculate Macro-Averages (Macro-Avg) and Weighted-Averages (Weighted-Avg) for evaluation indicators. Macro-Avg is the arithmetic average, as shown in Eq (15): (15)

Weighted-Avg uses the percentage of sample numbers as weights to reflect the detection effect of classes with large number of samples. However, this paper focuses on few-shot classes, so we improve the weight calculation process to increase the weights of few-shot classes. Taking Precision as an example, assume that there are n classes, α_i is the ratio between the number of samples of a class and the total number of samples, and P_i is Precision of a class. The formula is as follows: (16)

4.2 Determine the length of session samples

During data preprocessing, we use padding or truncating to unify the length of session samples N. If N is too small, a large amount of payload information will be lost, while if N is too large, too many zeros will be filled, causing data noise. Whether N is too small or too large, the model performance will be reduced and the training effect will be low. Therefore, selecting the appropriate N can improve the representation ability and training performance of the model. In this section, we perform comparison experiments to determine the value of N. We set the candidate range to five values, N = {64, 144, 256, 1024, 2500, 4096}.

The experiment training is on the train set, and the experimental evaluation is obtained on the test set. Except for the length of the samples, other parameters are the same in the experiment. N of the model with the relatively higher F1-score Macro-Ave, Weighted-Avg, and Accuracy is selected. The result is shown in Fig 9. The horizontal axis is the session sample length N, while the vertical axis is the above two values, which are represented by blue, red and green dashed lines, respectively.

Download:

Fig 9. F1-score Macro-Ave, Weighted-Avg, and Accuracy of N.

https://doi.org/10.1371/journal.pone.0284632.g009

It is observed in Fig 9 that, when N is 1024, the three values of the model are the highest, respectively 0.9522, 0.9566, and 0.9888. That is to say that when N is 1024, the detection performance of the model is the best. Therefore, we choose 1024 as the length of session samples.

4.3 Evaluate the proposed method

As mentioned above, we select some categories from CSE-CIC-IDS2018 to form the experimental dataset. The length of session samples in the dataset is 1024, which is used as the input parameter of the model for training and testing. The result is shown in Fig 10 as a heat map.

Download:

Fig 10. Heat map: The classification result of the proposed method.

https://doi.org/10.1371/journal.pone.0284632.g010

Fig 10 represents the Precision, Recall, and F1-score for each category from benign to SQL-Injection. The darker the color, the higher the value, as shown in the right vertical bar. The value in parentheses indicates the number of session samples for that category in the test set. There are the overall accuracy, Macro-Ave and Weighted-Avg of precision, recall, and F1-score at the bottom. As is shown in Fig 10 that the proposed method performs well, which not only ensures the detection performance of most classes but also improves the detection effect of the few-shot classes. Therefore, the proposed method is effective for data imbalance in intrusion detection.

In addition to analyzing the overall results, we also evaluate the feature extraction model and the classification model separately to verify their performance. We use accuracy and Weighted-Avg of precision, recall, and F1-score as evaluation indicators. Fig 11 shows the ablation experiment results for the feature extraction model. It is observed that the scores of the four evaluation indicators are all the lowest when CNN or LSTM is used alone. After the fusion of CNN and LSTM, the Accuracy and Recall are improved by 3% and %9 compared to CNN with better performance used alone, indicating that our improvement is effective. On this basis, the capsule structure is added, and the four values were increased by about 5%, 3%, 6%, and 3%, to achieve the best detection effect.

Download:

Fig 11. Evaluation of feature extraction model.

https://doi.org/10.1371/journal.pone.0284632.g011

Fig 12 shows the ablation experiment results for the classification model. The abscissa is the four contrasting models, which are prototypical network, prototypical network with attention mechanism, prototypical network with vote mechanism, and prototypical network with both attention and vote mechanism. As can be seen from Fig 12, compared with prototypical network alone, four indexes of the proposed model have been greatly improved, and the values are 0.9888, 0.9484, 0.9635, and 0.9456, respectively.

Download:

Fig 12. Evaluation of classification model.

https://doi.org/10.1371/journal.pone.0284632.g012

4.4 Comparison

In this section, we compare the proposed model with traditional machine learning methods and also with two few-shot intrusion detection methods for few-shot categories.

Table 2 shows the comparisons of the proposed model in terms of Weighted-Avg with five selected common traditional machine learning algorithms, namely K-Nearest Neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM), XGBoost, Naive-Bayes (NB).

Download:

Table 2. Compared with traditional machine learning.

https://doi.org/10.1371/journal.pone.0284632.t002

As mentioned above, this paper chooses to address the data imbalance problem in intrusion detection based on model-level instead of taking a data-level approach. Consequently, it is meaningless to choose a data-level method for comparison. We choose the same model-based improvement method, the PBCNN in reference [29]. At the same time, since this paper is inspired by reference [28] and improves on this basis, it needs to be compared with the Siamese capsule network proposed by reference [28] to prove the effectiveness of our improvement. On the same dataset, Precision, Recall, and F1 of few-shot classes, Bruteforce-XSS, Infiltration, Bruteforce-web, and SQL-Injection, are compared respectively. The results are shown in Tables 3 and 4.

Download:

Table 3. Compare with PBCNN.

https://doi.org/10.1371/journal.pone.0284632.t003

Download:

Table 4. Compare with Siamese capsule network.

https://doi.org/10.1371/journal.pone.0284632.t004

It is shown in Table 3 that our method has a better detection effect on few-shot categories in the dataset. The four evaluation indexes are all higher than PBCNN, and the differences are very large. We believe that the number of samples in PBCNN dataset is larger than that used by us, and the detection effect is not as good as that shown in reference [29] when the dataset in this paper is used. While the Siamese capsule network performs better than PBCNN, it is still inferior to the proposed method. Since there is an unsupervised sampling operation in reference [28], however, this paper does not perform other operations on the data except for preprocessing.

To evaluate the time complexity, we calculate the training and testing time of the three models at the same condition. As shown in Table 5, our method spends a little more training time than the other two methods, because our method extracts not only spatial but also temporal features. And the testing time of the three methods is similar. In summary, our method gets a much better performance by compromising a small amount of time. It can be stated that our model can better adapt to the situation of data imbalance in intrusion detection, and the detection effect of few-shot categories is better.

Download:

Table 5. Comparison of time complexity.

https://doi.org/10.1371/journal.pone.0284632.t005

5. Conclusions

In this paper, we point out the data imbalance problem caused by insufficient samples in network intrusion detection. We introduce few-shot learning to improve the detection model from both feature extraction and classification. In the feature extraction stage, we incorporate capsules in CNN and combine LSTM to extract the temporal-spatial features. In the classification stage, we improve the prototypical network of FSL models and introduce the spatial attention mechanism and the voting mechanism. After evaluation and comparison, the proposed method is proven to improve the detection performance of few-shot classes when the intrusion detection data is imbalanced.

In the future, we will design a multi-scale input model to make the payload information be utilized as much as possible, to compensate for the loss of payload information caused by unified input, and further improve the detection performance of few-shot classes.

References

1. Ahmed M, Mahmood AN, Hu J. A survey of network anomaly detection techniques. Netw Comput Appl. 2016;60: 19–31.
- View Article
- Google Scholar
2. Liu H, Han D, Li D. Behavior analysis and blockchain based trust management in vanets. J Parallel Distrib Comput. 2021;151: 61–69.
- View Article
- Google Scholar
3. Zheng WF. Intrusion Detection Algorithm Based on Convolutional Neural Network. In: ICCEA 2020:International Conference on Computer Engineering and Application;2020 March 18–20;Guang Zhou, China.
4. Garcia JFC, Blandon GET. Deep Learning-Based Intrusion Detection and Preventation System for Detecting and Preventing Denial-of-Service Attacks. IEEE Access. 2022;10: 83043–83060.
- View Article
- Google Scholar
5. Akgün D, Hizal S, Cavusoglu Ü. A new DDoS attacks intrusion detection model based on deep learning for cyber security. Comput Secur. 2022;118:1–13.
- View Article
- Google Scholar
6. Zhang Y, Chen X, Guo D, Song M, Teng Y, Wang X. PCCN Parallel Cross Convolu- tional Neural Network for Abnormal Network Traffic Flows Detection in Multi-Class Imbalanced Network Traffic Flows. IEEE Access. 2019;7: 119904–119916.
- View Article
- Google Scholar
7. Oksuz K, Cam BC, Kalkan S, Akbas E. Imbalance problems in object detection: a re-view. IEEE Trans Pattern Ana Mac Intell. 2021;43(10): 3388–3415. pmid:32191882
- View Article
- PubMed/NCBI
- Google Scholar
8. He H, Garci EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2009;21:1263–1284.
- View Article
- Google Scholar
9. Man J, Sun G. A residual learning-based network intrusion detection system. Secur Commun Networks. 2021;2021: 5593435:1–5593435:9. http://dx.doi.org/10.1155/2021/5593435.
- View Article
- Google Scholar
10. Zhang W, Ramezani R, Naeim A. WOTBoost: Weighted oversampling technique in boosting for imbalanced learning. In:IEEE Big Data 2019:2019 IEEE International Conference on Big Data;2019 Dec 9–12;Los Angeles, CA, USA.p.2523-2531.
11. Gao X, Shan C, Hu C, Niu Z, Liu Z. An adaptive ensemble machine learning model for intrusion detection. IEEE Access. 2019;7: 82512–82521.
- View Article
- Google Scholar
12. Barandela R, Sanchez JS, García V, Rangel E. Strategies for learning in class imbalance problems. Pattern Recognit. 2003;36:849–851.
- View Article
- Google Scholar
13. Xie Y, Wang H, Yu B, Zhang C. Secure collaborative few-shot learning. Knowledge-Based Syst. 2020;203:1–10.
- View Article
- Google Scholar
14. Duan R, Li D, Tong Q, Yang T, Liu X, Liu X. A Survey of Few-Shot Learning: An Effective Method for Intrusion Detection. Secur Commun Netw. 2021;2021:1–10.
- View Article
- Google Scholar
15. Zhang Y, Chen X, Jin L, Wang X, Guo D. Network intrusion detection: based on deep hierarchical network and original flow data. IEEE Access. 2019;7:37004–37016.
- View Article
- Google Scholar
16. Zhong Y, Chen W, Wang Z, Chen Y, Wang K, Li Y, et al. HELAD:A novel network anomaly detection model based on heterogeneous ensemble learning. Comput Netw. 2020;169:1–16.
- View Article
- Google Scholar
17. Li M, Han DZ, Li D, Liu H, Chang CC. MFVT: an anomaly traffic detection method merging feature fusion network and vision transformer architecture. EURASIP J WIREL COMM. 2022;39:1–22.
- View Article
- Google Scholar
18. Wei WT, Gu HX, Deng WS, Xiao Z, Ren XM. ABL-TC: A lightweight design for network traffic classification empowered by deep learning. NEUROCOMPUTING. 2022:489;333–344.
- View Article
- Google Scholar
19. Lei SW, Xia CH, Li Z, Li XJ, Wang TB. HNN: A Novel Model to Study the Intrusion Detection Based on Multi-Feature Correlation and Temporal-Spatial Analysis. IEEE T NETW SCI ENG. 2021;8:3257–3274.
- View Article
- Google Scholar
20. Gupte N, Jindal V, Bedi P. CSE-IDS: Using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based intrusion detection systems. COMPUT SECUR. 2022;112:1–21.
- View Article
- Google Scholar
21. Bedi P, Gupta N, Jindal V. I-SiamIDS: an improved Siam-IDS for handling class im- balance in network-based intrusion detection systems. Appl Intell. 2021;51:1133–1151.
- View Article
- Google Scholar
22. Wang Y, Yao Q, Kwok JT, Ni LM. Generalizing from a Few Examples: A Survey on Few-shot Learning. ACM Comput Surv. 2020;53:1–34.
- View Article
- Google Scholar
23. Andresini G, Appice A, Rose LD, Malerba D. GAN augmentation to deal with imbalance in imaging-based intrusion detection. Futur Gener Comp Syst. 2021;123:108–127.
- View Article
- Google Scholar
24. Koch G, Zemel R, Salakhutdinov R. Siamese Neural Networks for One-Shot Image Recognition. In: ICML 2015:International Conference on Machine Learning 2015;2015 Jul 6–11;Lille, France.
25. Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D. Matching networks for one shot learning. In: NIPS 2016:Annual Conference on Neural Information Processing Systems 2016;Dec 5–10; Barcelona, Spain.p.3630-3638.
26. Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. In: NIPS 2017:Annual Conference on Neural Information Processing Systems 2017;Dec 4–9; Long Beach, CA, USA. p.4077-4087.
27. Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM. Learning to compare: relation network for few-shot learning. In: CVPR 2018:IEEE Conference on Computer Vision and Pattern Recognition;2018 June 18–22;Salt Lake City, UT, USA.p.1199-1208.
28. Zhang X, Cai F, Hu X, Zheng J, Chen H. A Contrastive learning-based Task Adaptation model for few-shot intent recognition. Inf Process Manage. 2022;59:1–14.
- View Article
- Google Scholar
29. Xing L, Shao S, Liu W, Han A, Pan X, Liu B. Learning task-specific discriminative embeddings for few-shot image classification. Neurocomputing. 2022;488:1–13.
- View Article
- Google Scholar
30. Iliyasu AS, Abdurrahman UA, Zheng L. Few-Shot Network Intrusion Detection Using Discriminative Representation Learning with Supervised Autoencoder. Appl Sci-Basel. 2022;12:1–17.
- View Article
- Google Scholar
31. Xu C, Shen J, Du X. A Method of Few-Shot Network Intrusion Detection Based on Meta-Learning Framework. IEEE Trans Inf Forensics Secur. 2020;15:1540–1552.
- View Article
- Google Scholar
32. Wang ZM, Tian J, Qin J, Fang H, Chen LM. A Few-Shot Learning-Based Siamese Capsule Network for Intrusion Detection with Imbalanced Training Data. Comput Intell Neurosci. 2021;2021:1–17. pmid:34557226
- View Article
- PubMed/NCBI
- Google Scholar
33. Yu L, Dong J, Chen L, Li M, Xu B, Li Z, et al. PBCNN: Packet Bytes-based Convolutional Neural Network for Network Intrusion Detection. Comput Netw. 2021;194:108–117.
- View Article
- Google Scholar
34. Ye T, Li G, Ahmad I, Zhang C, Lin X, Li J. FLAG: Few-Shot Latent Dirichlet Generative Learning for Semantic-Aware Traffic Detection. IEEE Trans Netw Serv Manag. 2022;19(1):73–88.
- View Article
- Google Scholar
35. Wang SZ, Xia CH, Wang TB. Feature Generation: A Novel Intrusion Detection Model Based on Prototypical Network. LECT NOTES ARTIF INT. 2020; 11944: 564–577.
- View Article
- Google Scholar
36. Dainotti A, Pescape A, Claffy KC. Issues and future directions in traffic classification. IEEE Network. 2021;26:35–40.
- View Article
- Google Scholar

[ref1] 1. Ahmed M, Mahmood AN, Hu J. A survey of network anomaly detection techniques. Netw Comput Appl. 2016;60: 19–31.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Liu H, Han D, Li D. Behavior analysis and blockchain based trust management in vanets. J Parallel Distrib Comput. 2021;151: 61–69.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Zheng WF. Intrusion Detection Algorithm Based on Convolutional Neural Network. In: ICCEA 2020:International Conference on Computer Engineering and Application;2020 March 18–20;Guang Zhou, China.

[ref4] 4. Garcia JFC, Blandon GET. Deep Learning-Based Intrusion Detection and Preventation System for Detecting and Preventing Denial-of-Service Attacks. IEEE Access. 2022;10: 83043–83060.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref5] 5. Akgün D, Hizal S, Cavusoglu Ü. A new DDoS attacks intrusion detection model based on deep learning for cyber security. Comput Secur. 2022;118:1–13.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref6] 6. Zhang Y, Chen X, Guo D, Song M, Teng Y, Wang X. PCCN Parallel Cross Convolu- tional Neural Network for Abnormal Network Traffic Flows Detection in Multi-Class Imbalanced Network Traffic Flows. IEEE Access. 2019;7: 119904–119916.
View Article
Google Scholar

[15] View Article

[16] Google Scholar

[ref7] 7. Oksuz K, Cam BC, Kalkan S, Akbas E. Imbalance problems in object detection: a re-view. IEEE Trans Pattern Ana Mac Intell. 2021;43(10): 3388–3415. pmid:32191882
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref8] 8. He H, Garci EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2009;21:1263–1284.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref9] 9. Man J, Sun G. A residual learning-based network intrusion detection system. Secur Commun Networks. 2021;2021: 5593435:1–5593435:9. http://dx.doi.org/10.1155/2021/5593435.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref10] 10. Zhang W, Ramezani R, Naeim A. WOTBoost: Weighted oversampling technique in boosting for imbalanced learning. In:IEEE Big Data 2019:2019 IEEE International Conference on Big Data;2019 Dec 9–12;Los Angeles, CA, USA.p.2523-2531.

[ref11] 11. Gao X, Shan C, Hu C, Niu Z, Liu Z. An adaptive ensemble machine learning model for intrusion detection. IEEE Access. 2019;7: 82512–82521.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref12] 12. Barandela R, Sanchez JS, García V, Rangel E. Strategies for learning in class imbalance problems. Pattern Recognit. 2003;36:849–851.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref13] 13. Xie Y, Wang H, Yu B, Zhang C. Secure collaborative few-shot learning. Knowledge-Based Syst. 2020;203:1–10.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref14] 14. Duan R, Li D, Tong Q, Yang T, Liu X, Liu X. A Survey of Few-Shot Learning: An Effective Method for Intrusion Detection. Secur Commun Netw. 2021;2021:1–10.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref15] 15. Zhang Y, Chen X, Jin L, Wang X, Guo D. Network intrusion detection: based on deep hierarchical network and original flow data. IEEE Access. 2019;7:37004–37016.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref16] 16. Zhong Y, Chen W, Wang Z, Chen Y, Wang K, Li Y, et al. HELAD:A novel network anomaly detection model based on heterogeneous ensemble learning. Comput Netw. 2020;169:1–16.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref17] 17. Li M, Han DZ, Li D, Liu H, Chang CC. MFVT: an anomaly traffic detection method merging feature fusion network and vision transformer architecture. EURASIP J WIREL COMM. 2022;39:1–22.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref18] 18. Wei WT, Gu HX, Deng WS, Xiao Z, Ren XM. ABL-TC: A lightweight design for network traffic classification empowered by deep learning. NEUROCOMPUTING. 2022:489;333–344.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref19] 19. Lei SW, Xia CH, Li Z, Li XJ, Wang TB. HNN: A Novel Model to Study the Intrusion Detection Based on Multi-Feature Correlation and Temporal-Spatial Analysis. IEEE T NETW SCI ENG. 2021;8:3257–3274.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref20] 20. Gupte N, Jindal V, Bedi P. CSE-IDS: Using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based intrusion detection systems. COMPUT SECUR. 2022;112:1–21.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref21] 21. Bedi P, Gupta N, Jindal V. I-SiamIDS: an improved Siam-IDS for handling class im- balance in network-based intrusion detection systems. Appl Intell. 2021;51:1133–1151.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref22] 22. Wang Y, Yao Q, Kwok JT, Ni LM. Generalizing from a Few Examples: A Survey on Few-shot Learning. ACM Comput Surv. 2020;53:1–34.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref23] 23. Andresini G, Appice A, Rose LD, Malerba D. GAN augmentation to deal with imbalance in imaging-based intrusion detection. Futur Gener Comp Syst. 2021;123:108–127.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref24] 24. Koch G, Zemel R, Salakhutdinov R. Siamese Neural Networks for One-Shot Image Recognition. In: ICML 2015:International Conference on Machine Learning 2015;2015 Jul 6–11;Lille, France.

[ref25] 25. Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D. Matching networks for one shot learning. In: NIPS 2016:Annual Conference on Neural Information Processing Systems 2016;Dec 5–10; Barcelona, Spain.p.3630-3638.

[ref26] 26. Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. In: NIPS 2017:Annual Conference on Neural Information Processing Systems 2017;Dec 4–9; Long Beach, CA, USA. p.4077-4087.

[ref27] 27. Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM. Learning to compare: relation network for few-shot learning. In: CVPR 2018:IEEE Conference on Computer Vision and Pattern Recognition;2018 June 18–22;Salt Lake City, UT, USA.p.1199-1208.

[ref28] 28. Zhang X, Cai F, Hu X, Zheng J, Chen H. A Contrastive learning-based Task Adaptation model for few-shot intent recognition. Inf Process Manage. 2022;59:1–14.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref29] 29. Xing L, Shao S, Liu W, Han A, Pan X, Liu B. Learning task-specific discriminative embeddings for few-shot image classification. Neurocomputing. 2022;488:1–13.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref30] 30. Iliyasu AS, Abdurrahman UA, Zheng L. Few-Shot Network Intrusion Detection Using Discriminative Representation Learning with Supervised Autoencoder. Appl Sci-Basel. 2022;12:1–17.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref31] 31. Xu C, Shen J, Du X. A Method of Few-Shot Network Intrusion Detection Based on Meta-Learning Framework. IEEE Trans Inf Forensics Secur. 2020;15:1540–1552.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref32] 32. Wang ZM, Tian J, Qin J, Fang H, Chen LM. A Few-Shot Learning-Based Siamese Capsule Network for Intrusion Detection with Imbalanced Training Data. Comput Intell Neurosci. 2021;2021:1–17. pmid:34557226
View Article
PubMed/NCBI
Google Scholar

[84] View Article

[85] PubMed/NCBI

[86] Google Scholar

[ref33] 33. Yu L, Dong J, Chen L, Li M, Xu B, Li Z, et al. PBCNN: Packet Bytes-based Convolutional Neural Network for Network Intrusion Detection. Comput Netw. 2021;194:108–117.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref34] 34. Ye T, Li G, Ahmad I, Zhang C, Lin X, Li J. FLAG: Few-Shot Latent Dirichlet Generative Learning for Semantic-Aware Traffic Detection. IEEE Trans Netw Serv Manag. 2022;19(1):73–88.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref35] 35. Wang SZ, Xia CH, Wang TB. Feature Generation: A Novel Intrusion Detection Model Based on Prototypical Network. LECT NOTES ARTIF INT. 2020; 11944: 564–577.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref36] 36. Dainotti A, Pescape A, Claffy KC. Issues and future directions in traffic classification. IEEE Network. 2021;26:35–40.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

Figures

Abstract

1. Introduction

2. Related works

2.1 Network intrusion detection

2.2 Few-shot learning

2.3 Few-shot learning in intrusion detection

3. The proposed method

3.1 Data preprocessing

3.2 Temporal-spatial feature fusion method using capsules

3.2.1 Spatial feature extraction.

3.2.2 Temporal feature extraction.

3.3 Prototypical network with attention and vote

3.3.1 Calculate prototypes using attention.

3.3.2 Vote classification.

4. Experiments

4.1 Dataset and evaluation indicators

4.2 Determine the length of session samples

4.3 Evaluate the proposed method

4.4 Comparison

5. Conclusions

References