Fig 1.
Workflow of the proposed malware classification method based on FSGCN.
This figure illustrates the process from handling imbalanced datasets using the SMOTE algorithm, constructing directed graphs for API sequences, extracting multi-dimensional attribute features, computing first-order and second-order adjacency matrices, generating feature embeddings with GCN, converting these embeddings into grayscale images, and finally classifying the images using CNN.
Fig 2.
An example of computing the first-order and second-order adjacency matrices for a graph.
A:Example of first-order vertex computation. B: Example of Second-order-indegree . C: Example of Second-order-outdegree.
Table 1.
Descriptions of the API node attributes.
Fig 3.
The overall process of the first-order and second-order graph convolutional networks.
Fig 4.
The distribution of samples in Dataset 1 and Dataset 2. (A) is Dataset1, (B) is Dataset2.
Table 2.
Dataset overview.
Fig 5.
Examples of normalized grayscale images for samples of three different types of malware graph embedding representations after transformation in Dataset 1.
(a), (b), and (c) belong to the Normal type, while (d), (e), and (f) belong to the Ransom type, and (g), (h), and (i) belong to the Miner type.
Table 3.
Performance compared with traditional malware classification methods on two datasets.
Table 4.
Performance compared with graph convolution based malware classification algorithms.
Fig 6.
The loss curves of GCN, GraphSAGE, GAT, and our model.
(a): The loss curves of GCN. (b): The loss curves of GraphSAGE. (c):he loss curves of GAT. (d):The loss curves of Our work.
Table 5.
Performance Comparison of our work with state-of-arts-methods.
Table 6.
Ablation analysis of the dimensionality of node attribute features in our method across two datasets.
Fig 7.
Multi-dimensinal feature Precision-Recall (PR) performance of our work in dataset2.
(a): The Precision-Recall curves of 1-dimensional feature. (b): The Precision-Recall curves of 3-dimensional feature. (c):The Precision-Recall curves of 5-dimensional feature. (d):The Precision-Recall curves of 7-dimensional feature.
Fig 8.
Visualization of the t-SNE feature space for dimensions 1, 3, 5, and 7 of our model in dataset2.
(a): The t-SNE view of 1-dimensional feature. (b): The t-SNE view of 3-dimensional feature. (c):Thet-SNE view of 5-dimensional feature. (d):The t-SNE view of 7-dimensional feature.
Fig 9.
Confusion matrix performance of multi-dimensinal feature in dataset2.
(a): Confusion matrix performance of 1-dimensional feature. (b): Confusion matrix performance of 3-dimensional feature. (c):Confusion matrix performance of 5-dimensional feature. (d):Confusion matrix performance of 7-dimensional feature.
Table 7.
Ablation analysis of our method on Dataset 2 With and without SMOTE.
Fig 10.
Comparison of loss curves for dataset 2 with 5-dimensional features: w/o SMOTE vs. w/ SMOTE.