Figures
Abstract
Artificial Intelligence (AI) is being integrated into increasingly more domains of everyday activities. Whereas AI has countless benefits, its convoluted and sometimes vague internal operations can establish difficulties. Nowadays, AI is significantly employed for evaluations in cybersecurity that find it challenging to justify their proceedings; this absence of accountability is alarming. Additionally, over the last ten years, the fractional elevation in malware variants has directed scholars to utilize Machine Learning (ML) and Deep Learning (DL) approaches for detection. Although these methods yield exceptional accuracy, they are also difficult to understand. Thus, the advancement of interpretable and powerful AI models is indispensable to their reliability and trustworthiness. The trust of users in the models used for cybersecurity would be undermined by the ambiguous and indefinable nature of existing AI-based methods, specifically in light of the more complicated and diverse nature of cyberattacks in modern times.
The present research addresses the comparative analysis of an ensemble deep neural network (DNNW) with different ensemble techniques like RUSBoost, Random Forest, Subspace, AdaBoost, and BagTree for the best prediction against imagery malware data. It determines the best-performing model, an ensemble DNNW, for which explainability is provided. There has been relatively little study on explainability, especially when dealing with malware imagery data, irrespective of the fact that DL/ML algorithms have revolutionized malware detection. Explainability techniques such as SHAP, LIME, and Grad-CAM approaches are employed to present a complete comprehension of feature significance and local or global predictive behavior of the model over various malware categories. A comprehensive investigation of significant characteristics and their impact on the decision-making process of the model and multiple query point visualizations are some of the contributions. This strategy promotes advanced transparency and trustworthy cybersecurity applications by improving the comprehension of malware detection techniques and integrating explainable AI observations with domain-specific knowledge.
Citation: Nazim S, Alam MM, Rizvi SS, Mustapha JC, Hussain SS, Suud MM (2025) Advancing malware imagery classification with explainable deep learning: A state-of-the-art approach using SHAP, LIME and Grad-CAM. PLoS One 20(5): e0318542. https://doi.org/10.1371/journal.pone.0318542
Editor: Zafi Sherhan Syed, Mehran University of Engineering & Technology, PAKISTAN
Received: August 23, 2024; Accepted: January 17, 2025; Published: May 28, 2025
Copyright: © 2025 Nazim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Dataset is publicly available at this location: https://www.kaggle.com/datasets/gauravpendharkar/blended-malware-image-dataset.
Funding: This study received financial support in the form of an Article Processing Charge from the Universiti Kuala Lumpur, Malaysia.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The amalgamation of AI in our day-to-day life is getting more prevalent. As stated by a quantitative evaluation performed by Grand View Research, the global AI market was estimated at USD 93.5 billion in 2021, with an estimated compound annual growth rate (CAGR) of 38.1% from 2022 to 2030. AI is now being exploited considerably not only in the field of cybersecurity but also in several other realms. The global cybersecurity industry was valued by Mordor Intelligence to cost $156.24 billion in 2020 and is expected to multiply at a yearly rate of 14.5% to $352.25 billion by 2026 [1].
It is significant that, in 2021, worldwide cyberattacks increased by 29%, as stated by the 2021 Cyber Trends Report, even though there were approximately 0.3 billion more internet users globally in 2021 than there were the preceding year. An attack on an IT company in June 2022 restricted unemployment compensation and job-search assistance to hundreds of people in several states in the United States, which will lead to severe social instability during the COVID-19 epidemic. As stated in a paper published by the European Union Agency for Network and Information Security (ENISA) [2], It is estimated that the safety and reliability of digital technology will become even more important under the newly established ethical and social standards created by the COVID-19 pandemic [3]. These statistics and results reveal the significant points that cyberspace and associated systems and machines have experienced more hackers and intruders these days [1].
Thus, to ensure the confidentiality, accessibility, and authenticity of data exchanged over the World Wide Web, a trustworthy and safe cybersecurity technology needs to be developed. The constantly increasing content distributed via cyberspace poses challenges for conventional signature-based and rule-based cybersecurity solutions. However, cyber attackers persistently exploit distinctive, manipulative, and sophisticated attack tactics and utilize progressive technology, such as artificial intelligence, to keep themselves a step ahead of law enforcement and augment their malicious behavior’s complexity and efficacy [4]. Accordingly, to boost the efficacy of cyber protective frameworks, cybersecurity investigators are now concentrating on artificial intelligence-based techniques, specifically ML and DL, in addition to more typical (non-AI) safety measures like game theory, rate control, and autonomous systems [3].
The implementation of AI techniques, particularly ML and DL algorithms, has presented potential in a variety of cybersecurity domain programs, such as intrusion detection, spam e-mail filtering, botnet detection, fraud detection, and malicious application identification. Nonetheless, these methods have no deficiencies, which are more costly than conventional cybersecurity measures [5]. Conversely, some cybersecurity technologists have composed their frameworks to be more intricate and difficult to comprehend to attain better precision at the cost of interpretability. The European Union’s General Data Protection Regulation has exposed this lack of comprehensibility, preserving the capacity to comprehend the rationale underlining an AI algorithmic decision that has an adversarial effect on people [6].
AI algorithms experience vagueness, which makes it difficult to comprehend their underlying operational mechanisms, even when they seem effective in outcomes and predictions. In an area like cybersecurity, this attribute illustrates the more challenging situations since it presents obvious hazards to designating significant decision-making responsibility to a technique vulnerable to self-justification [7]. Considering this state, XAI endorses progressing toward more interpretable AI to overcome this issue. XAI methods intend to develop approaches that contribute to outstanding accuracy and prepare more interpretable frameworks. Hence, to trust the evaluations made by cybersecurity frameworks, AI must be understandable and explainable. To fulfill these requirements, several approaches have been offered to make AI decisions more comprehensible to humans and these explainable techniques are usually shortened as “XAI” [1].
Challenges, motivation and contribution
A broad range of systems powered by AI can be utilized to identify malware with a spectrum of explainability; yet, many of these approaches have been designed for non-imagery data and consequently do not effectively tackle specific challenges inherent in malware imagery analysis. Malware imaging data involves sophisticated approaches with the capacity to address both reliable classification and comprehensibility, considering its diversified behavioral characteristics across multiple virus families. The currently available explainability approaches provide certain information but have constraints in their utilization; they generally only offer limited explanations that might not fully capture the diverse nature of malware imaging. Furthermore, a diverse approach is required to facilitate more complex and multidimensional explainability since distinct types of malware vary dramatically in appearance and behavior. This involves developing solutions that are competent to present layered interpretations toward various malware attributes in addition to performing an extensive evaluation of malware images, confirming that local and global interpretations are offered. Bridging this gap is critical for constructing accurate frameworks that experts may use in high-stakes cybersecurity scenarios.
The motivation behind this research is to accommodate the rising demand for robust and transparent AI-based solutions in malware imagery analysis. Since cyberattacks are becoming increasingly prevalent and advanced, it becomes essential that researchers understand the rationale behind the underlying malware classification decisions to formulate substantial protections. The ultimate goal of the present investigation is to address the explainability of the reliability gap by employing interpretable approaches that might boost model transparency without compromising accuracy. The present study aims to provide comprehensive insights using SHAP, LIME, and Grad-CAM so cybersecurity practitioners can fully understand model behavior, develop reliance on automated solutions, and make wise decisions in digital security scenarios.
The current research proposes an innovative approach to malware classification that beats out previous approaches by integrating an ensemble deep neural network with a blended malware dataset. This approach incorporates the MalImg and MaleVis datasets to develop a wide-ranging, diversified dataset that optimizes model perseverance, in contrast to prior investigations that typically involve distinct datasets or basic neural networks. The technique is particularly advantageous for managing the complications of malware detection with higher efficiency and comprehensibility since the ensemble deep neural network framework leverages multiple algorithms to enhance classification accuracy. It leverages SHAP, LIME, and Grad-CAM for comprehensibility, introducing an explainability with the best predictive framework. This research strengthens global and local system observations by incorporating these comprehensibility approaches, delivering a deeper understanding of the decision-making processes involved in malware identification and classification. This methodology fills the highlighted research gap by demonstrating the distinctive insights that each explainability methodology contributes in addition to establishing that integrating numerous explainability techniques is feasible. The research improves this area toward more secure and reliable digital security by paving a road map for the development of DL approaches in malware analysis that are more transparent, interpretable, and trustworthy.
Novelty of study
The following key aspects are utilized to highlight the novelty of this study:
- The key domain is to comprehensively deal with the explainability gap in malware classification techniques to develop specifically on malware imagery data.
- The best-performing framework for explainability is identified by comparing ensemble DNNW with different ensemble strategies, including RUSBoost, Random Forest, Subspace, AdaBoost, and BagTree.
- The XAI application implements SHAP, LIME and Grad-CAM to give a deep neural network model a variety of explainability viewpoints.
- Interpretability is boosted by the extensive, visually accompanied study of feature importance spanning several query points delivered by detailed feature importance analysis.
- Promotes the alignment of XAI findings with domain knowledge, producing more transparent and trusted malware detection solutions.
Background
A systematic categorization of the different strategies and methodologies that can be employed to explain an AI model has been developed gradually. The key point to note is the difference between explainability and interpretability. Interpretability is about determining the relationship between causes and impacts inside an AI system. However, explainability extends beyond interpretability by delivering a human-understandable explanation of the methodology and justification behind a prediction-making process of the model [7]. A meaningful transformation of real comprehensibility (interpretable models) and posthoc explanations (supplementary techniques to shed interpretability on state-of-the-art black-box frameworks) is prepared in Fig 1, which also demonstrates the existing taxonomy. These techniques entail producing local explanations for precise inputs or the global understanding of the performance of the model [1]. Here’s a brief synopsis:
Interpretability/explainability: If the performance of the model is considered sufficient during the evaluation phase, the subsequent phase is to comprehend or justify the classification findings to the level that the classification is understandable and explicable. Different models have been exploited to imitate real-world data; however, the explanation of possibly critical data depends on the input. The forecasting engine examines the findings using features, graphs, and images. There are two potential methods to produce these justifications: either local rationales generated during the testing stage or global rationales that adjust the parameters of the forecasting engine during the training phase [1, 8]. The following section proposes more knowledge on these various categories of explanations.
XAI categorization
The XAI frameworks are classified into four classes based on interpretability.
- Model-Specific or Model-Agnostic:
Model Specific: It regulates whether the inference approach is restrained to a specific framework or not. The strategies and techniques that are exclusive to such a framework are known as model-specific. Model Agnostic: Any ML methods can be formulated interpretably by using model-agnostic techniques. These frameworks cannot view the internal model data, such as weights or structural parameters [4]. - Intrinsic and Extrinsic (posthoc): These terms encompass whether a model may be explained autonomously or if approaches that investigate frameworks after training are indispensable to achieve interpretability. Straightforward and transparent frameworks such as decision trees are intrinsic. Interpretability is attained indirectly by applying an interpretation technique after training is extrinsic [1].
- Local or Global Interpretation: The capacity of the explanation method, whether it relates to a single data record or the behavior of the whole model, is categorized as local or global [9]. Global techniques and tools explain the entire model, whereas local approaches only interpret a single prediction [6].
Global interpretation: It inspects the framework systematically, highlighting its universal properties, frameworks, and illustrations. This type of understanding provides insight into the dynamics of the complete model beyond every statistic. It evaluates the degree of weights and biases of models that contributed towards predictions. In a nutshell, it exhibits the allocation of the estimated results regarding the attributes. For example, global interpretation may incorporate evaluating the influence of different layers and neurons on the forecasting process in a neural network trained to classify malware. For global interpretation, techniques like feature significance scores are commonly utilized, which measure the involvement of each feature in the final forecast [6]. However, as the magnitude of hyperparameters increases, it becomes more challenging to achieve global interpretability, specifically when working with feature domains that are more substantial than three dimensions. For example, consider a DL model employed to classify malware by utilizing a variety of static and dynamic variables as training data [10, 11]. The total significance of features like PE headers, API calls, and network traffic patterns are investigated using a global interpretation technique. The framework might be made to demonstrate that API requests have a greater influence on predictions than PE headers by using feature importance scores. The framework is developed based on significant scores of features, demonstrating that API requests have a larger impact on predictions than PE headers [4, 8].
Local interpretation: The key objective of local interpretation is understanding the prediction of the model for a single instance. This method helps to describe the justification behind the prediction made by the model. For example, in the case of malware classification and detection, local interpretation could notice the specific features of a malware sample (like particular API calls or byte sequences). The model mostly utilizes these features to determine malicious samples. Shapley Additive Explanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) are preferably used to perform the local interpretation. These methods fabricate explanations and expose the most influential features in the prediction [8]. For example, SHAP values compute the impact of each feature determined by cooperative game theory, while LIME may disturb the input features and watch changes in the outcome of the model to find key features. Consider a malware sample that is classified as malicious by the framework. The local explanation performed by SHAP analysis highlights those features that have a great impact on the decision-making process of the model. For example, the justification can explain the reason behind the significance of odd API calls and particular byte sequences in the malware classification that are crucial in recognizing it as malicious software [12].
XAI frameworks and methods
Explainable AI approaches can be categorized into different groups based on the types of justifications (feature attribution, rules, or counterfactual) they delivered:
- Local Interpretable Model-agnostic Explanations (LIME): The explanation of the decisions made by a DL model can be performed by employing LIME, a model-agnostic explanation technique. It performs operations in two steps: Initially, it sympathetically disturbs an input instance to generate many illustrations. The subsequent step implicates directing the illustrations to the black-box model, which then employs the prediction to train an intermediary framework—typically a decision tree or linear regression that is inherently understandable. The final justifications of the input framework are originated from the parameters of the interpretable model [12].
- Shapley Additive Explanation (SHAP): utilizes a game theory methodology to render justifications. It utilizes Shapley values for feature identification; however, it is an approximation method like LIME. The Shapley value characterizes the average insignificant impact of a feature across all potential arrangements or feature combinations used to make a prediction. Shapley values satisfy the specifications of local accuracy and stability, which are fundamental traits for feature identification [1].
- Gradient: The association between input attributes and output is rationalized by the gradient approach. It leverages the information movement in a neural network during backpropagation. It computes the gradient of the prediction performed by the model concerning the input [6, 13].
- Integrated Gradient: The integrated gradient technique involves identifying the gradient of the output class against the input. However, important features are offered minimal magnitudes in some situations while classifying class scores. This causes an impact on the training process saturation of deep neural networks. The integrated gradient technique employs gradients individually to deal with this situation. It aggregates gradients along a linear path from the initial point to a specific test instance. Based on the use case, an initial point is determined. By aggregating the values, this method estimates the percentage that each input contributes to the outcome [14, 15].
- DL Important Features (DeepLIFT): This method calculates feature identification scores by evaluating the stimulation of each neuron to a benchmark activity. Similar to integrated gradients, this technique determines appropriate stimulations based on the particular use case. Identifying a suitable stimulation is essential for the accurate computation of feature attribution [1].
- Occlusion: This technique is a deviation-based local explanation approach that successively substitutes input features with a fixed value, commonly zero, to calculate feature significance. The relevance of a feature to the training model is established by the degree of accuracy reduction examined when the feature is substituted. The justification explained earlier delivers the most significant features discovered by the model as output when applied to a machine-learning framework. Each explanation method yields a feature explanation based on its fundamental principle of operation. These feature assessments are indispensable in understanding the learning capacity and reliability of the framework [1].
Research problem
The evolution of malware offers a serious hazard to cybersecurity since new variations are continuously evolving. To address this risk significantly, malware must be precisely classified into families [16]. This approach accelerates a thorough understanding of the behavioral dynamics of threats and facilitates the advancement of powerful defensive techniques. Both victims and security professionals are obscured concerning the underlying designs and patterns that explain the reason behind the sample malware association to a particular family. That is why traditional malware detection techniques often prove unsatisfactory in this context. Hence, the key challenges and issues, specifically highlighting the limited explainability in existing AI-based malware detection methods, especially for imagery data.
Literature review and research gap analysis
Much advancement has been achieved in malware identification, particularly due to the advent of the ML and DL approaches. However, a shortage of investigations persists on how interpretability could be integrated into these models [17, 18]. A significant amount of present work focuses on standard, non-imagery malware data to achieve excellent identification and categorization accuracy by leveraging features retrieved through binary or textual analysis. These endeavors have led to powerful classifiers, but much less attention has been focused on how interpretable these frameworks are, specifically when addressing malware imagery. The comprehensibility of malware imagery data is significantly deficient, as Table 1 demonstrates since the majority of comprehensible research currently available is based on non-imagery data. Although ML/DL methodologies have been employed in studies like [19–21] for image-based malware detection and classification, they are not fully understandable. In comparison with [20, 21], which performs image-based malware classification without any explainability, [19] uses Grad-CAM, which provides minimal comprehensibility. However, there is preliminary work on malware imagery investigation, but it lacks adequate explainability, revealing a major research gap. This is highlighted by the integration of research articles [20, 21] in the present study. Bridging this gap is critical to establishing more transparent and trusted malware detection systems in progressively digital environments. This present investigation bridges this gap by presenting the first comprehensive investigation of malware image datasets that combines interpretability techniques like LIME, SHAP, and Grad-CAM with a high rate of classification. This article renders a novel contribution to this discipline by addressing both the explainability and performance of these models laying the ground for more explainable and transparent malware classification methods [22, 23].
Dataset and experimental analysis
This research leverages a blended malware dataset to provide robust malware classification as well as the interpretability and explainability of the predictive framework. The study presents a comprehensive collection of malware images as visual representations of binary files by integrating the MalImg and MaleVis datasets, which renders it feasible to visually evaluate and categorize them using deep learning techniques [31]. The amalgamated dataset serves as a strong basis for training ensemble deep neural networks and other ensemble methods for obtaining the best classification accuracy since it encompasses both balanced and imbalanced class populations. However, the understanding and comprehensibility of the algorithm’s decision-making process are among the primary areas to investigate for this research. The most promising predictions from the deep ensemble framework are thus comprehended using three explainable AI techniques: Grad-CAM, LIME, and SHAP.
Performance evaluation
The blended malware dataset has been categorized by employing the ensemble models, which include ensemble deep neural network [33], RUSBoost, Random Forest, Subspace, AdaBoostM2 and Bagging as baseline classifiers. These frameworks have been chosen since they may enhance predicted accuracy by joining several weak learners to generate a more robust framework. This is very beneficial when addressing the intricate and unpredictable nature of malware data. Such ensemble techniques perform effectively with a dataset like this because they can substantially and effectively improve accuracy by identifying patterns in the data and managing imbalances. Establishing these robust baselines before using explainability approaches such as SHAP, LIME and Grad-CAM is fundamental to guarantee that the interpreted models are reliable and effective in their predictions.
The Table 2 estimates the performance evaluation of several ensemble models to determine how well they perform when pertained to the same dataset. The Ensemble Deep Neural Network [32] obtains the largest accuracy of 96.06%, demonstrating the effectiveness of DL models in classifying intricate patterns in malware data. RUSBoost and BagTree also demonstrate robust performance with accuracy rates of 94.23% and 93.45%, respectively. Progressing generalization and addressing class imbalance can be achieved with these strategies. However, the Subspace technique demonstrates a substantially lower accuracy of 73.50%, representing that it may not be as suitable for this specific dataset. The average performance of the AdaBoostM2, with an accuracy of 87.42%, denotes that despite being adaptable, it may not be able to detect every detail of the data as efficiently as other techniques. The accuracy of the Random Forest is 93.35%, which matches meticulously with the performance of RUSBoost, highlighting its strength as an ensemble method. The prevailing work referenced displays a comparable efficiency level with precision of 95%, which recommends that the proposed ensemble deep neural network, offer an advancement over conventional approaches.
The experimental findings of an ensemble deep neural network with a learning rate of 0.003 across 500 epochs are shown in Figs 2 and 3. It indicates that the network effectively captures the core attributes from the data, which is determined by its excellent accuracy and well-balanced performance metrics. Furthermore, explaining the variations in classification efficacy among these algorithms are the confusion matrices for the different ensemble models: RUSBoost (Fig 4), Random Forest (Fig 5), Subspace (Fig 6), AdaBoostM2 (Fig 7), and Bagging (Fig 8). RUSBoost demonstrates good classification efficiency, effectively regulating class imbalance and eliminating misclassification errors, using 3000 learning cycles (Fig 4).
The term reliability for detecting a range of trends in the information is illustrated by the Random Forest algorithm (Fig 5), which leverages 3000 trees and demonstrates excellent accuracy.
Nevertheless, the Subspace model (Fig 6) exhibits restrictions in this specific use case, with an elevated number of false positives and false negatives and relatively lower performance. After performing over 3000 iterations of training, AdaBoostM2 (Fig 7) attains an appropriate proportion between efficiency and accuracy, beating Subspace but underperforming relative to RUSBoost or Random Forest. Bagging, with 3000 trees (Fig 8), demonstrates constant performance across classes and performs similarly to Random Forest, emphasizing the strength and trustworthiness of ensemble algorithms that accumulate numerous weak learners. Generally, these graphs show the positive traits as well as the limitations associated with each algorithm.
Explainability analysis and statistical intuitions
The SHAP, LIME, and Grad-CAM visualizations perform a comprehensive explainability analysis, showing statistical intuitions into feature importance and how they impact the prediction of the framework. The SHAP bar graphs of conditional and interventional algorithms demonstrate context-dependent and independent participation of individual features by presenting Mean Absolute SHAP values across different malware categories, consequently evaluating feature importance. This is augmented by LIME explainability, which stipulates instance-specific understandings that underline the local arrangement of feature impact by generating feature, score and feature importance maps. Grad-CAM illustrations provide spatial transparency in the decision-making process by emphasizing and focusing on image areas that are most significant to the prediction of the framework. These approaches collectively present a powerful statistical analytic framework under the explainability umbrella, granting an in-depth understanding of feature contributions and encouraging transparent and explainable malware classification.
SHAP (SHapley Additive exPlanations) analysis
A common approach for interpreting DL/ML model predictions is providing statistical significance to distinct features, referred to as SHAP (SHapley Additive exPlanations) values. SHAP outcomes, obtained from cooperative game theory, contribute to evaluating the relevance of each feature to a forecast, facilitating the analysis of intricate models. It is especially important to grasp which features impact model decisions across samples or categories for critical applications like malware detection. This is where SHAP explanations can be beneficial.
SHAP analysis is performed through two algorithms: Interventional and Conditional. The figures, created by an interventional approach (Fig 11) and a conditional technique (Fig 9), evaluate the Mean Absolute SHAP (SHapley Additive exPlanations) values across various malware types. These two graphs represent a global view of feature significance for each malware category across different scenarios by integrating data over several query points (25). Contrary to concentrating on a single instance, this method emphasizes the global behavior of the model once exposed to a large dataset.
The comprehensibility techniques are contrasted in Figs 10 and 12. The conditional SHAP values in Fig 10 satisfy certain conditional dependencies across the dataset while highlighting feature significance. This approach reveals subtle dependencies that may define malware patterns of action by exposing instances in which specific traits assist in identifying malware categories under standard interdisciplinary relationships. Fig 12, revealing interventional SHAP values, presents a distinct approach by examining the effect of each feature independently from these associations. The interventional technique presents a more straightforward, unconditioned evaluation of feature significance with a lower impact through feature relationships by separating the effect of every attribute from the categorized outcome.
These features are significant markers for particular malware classes, irrespective of contextual associations, as evidenced by the inline feature relevance across conditional and interventional SHAP values in Figs 10 and 12. However, fluctuations in feature scores across the two figures reveal that the dependability of the framework on specific elements might vary based on whether conditional dependencies are taken into consideration. This demonstrates the significance of both techniques: Interventional SHAP delivers knowledge of explicit feature contributions while ignoring the effects of contextual dependencies. The conditional SHAP is especially beneficial for interpreting complicated and dependent feature associations.
LIME explainability
A model-agnostic method called LIME (Local Interpretable Model-agnostic Explanations) estimates the model spatially by employing an explainable model, like linear regression, to interpret distinct predictions. The activity entails generating disruptions to the input data, yielding predictions for the modified data, and subsequently utilizing a fundamental framework to fit the data and resolve which features substantially determine the prediction. Data disturbance, forecasting, applying an explainable model, and feature significance grading are all steps in the LIME process. It recommends thorough knowledge at the feature level, highlighting specific components that impact the prediction of the framework.
Figs 13, 14 and 15 demonstrate the feature map, score map, and feature importance map, respectively, as produced by the LIME approach. The categorized imagery features that facilitate the framework to formulate predictions are projected in the feature map (Fig 13). The areas that participate the most in the classification score are emphasized in the score map (Fig 14).
The features are categorized in the feature importance map (Fig 15) according to their significance in the prediction process. These illustrations deliver an extensive comprehension of segments and features of the image that contribute to the model for classification, which are depicted in Table 3. These prominent features demonstrated important details regarding the way malware may endeavor to avoid tracking, maintain perseverance, or interface with other platforms.
Grad-CAM explainability
The Gradient-weighted Class Activation Mapping (Grad-CAM) technique implies calculating the gradient of the class score concerning the feature maps of a convolutional layer. This practice enables the revelation of certain regions within an input image that performs a key function in the classification decision. A heat map emphasizing significant areas of the illustration is the result. The technique involves the choice of a convolutional layer, estimating gradients, implementing global average pooling to attain significant weights, subsequently weighting the feature maps, and utilizing the Rectified Linear Unit (ReLU) to produce the heatmap. Grad-CAM assists in identifying the areas of attention for the model, offering feature awareness, such as headers or code patterns, which are essential for the classification. This improves confidence in the model by confirming adjustment with domain knowledge. Fig 17 exhibits the Grad-CAM justifications of the four sample original images from the Blended Malware Dataset, depicted in Fig 16. The Grad-CAM illustrations superimpose heatmaps onto the original images, demonstrating the focus areas for the model for formulating its classification decision. This aids in illustratively authenticating that the concerned point of the model aligns with familiar malware behaviors and formations.
Grad-CAM and LIME evaluate both interpretability techniques and stipulate complementary verdicts. Grad-CAM emphasizes significant regions in the input image to offer a high-level, visual justification that is beneficial for visual examination and interpreting the graphical focus of the model. LIME, on the other hand, delivers a comprehensive, feature-level explanation by identifying specific elements and their importance for the prediction. This exhaustive exploration assists in understanding the influence of local features on the prediction process of the model. By integrating these approaches, DL models for malware detection become more explainable and trustworthy, enhancing the efficacy and understandability of cybersecurity solutions.
Conclusion and future work
This study explores ensemble models, including an exclusive deep neural network, in the context of the explainability and efficacy of malware imagery datasets. The deep neural network established its feature extraction and classification dominance by obtaining outstanding accuracy. It is trained with a learning rate of 0.003 across 500 epochs. RUSBoost and Bagging, two of the ensemble techniques, are revealed to be relatively effective at managing class imbalance while offering reliable findings for classification. On the other hand, the efficiency of the Subspace algorithm is lower, demonstrating that it may not be regarded as adequate for this kind of data. By balancing the better versus middling, less effective models, the results of AdaBoostM2 are mediocre.
Explainability assessments are directed by employing illustrations such as feature map, score map and feature importance map of LIME, and Grad-CAM graphical representations. The score map offered details regarding the confidence levels of the model across different categories, while the feature map highlighted the ability of the neural network to separate relevant features. Grad-CAM representations highlighted the regions in the sample images that contributed the most to the model predictions, and the feature importance representation scored the significance of different characteristics in the model’s decision-making process. The SHAP analysis is performed through interventional and conditional algorithms to provide a global perspective on the behavior of the model highlighting which features consistently impact across various samples.
Integrating performance metrics and explainability techniques underscores the importance of reliability and comprehensibility in evaluating mode. The findings indicate that while RUSBoost and Bagging are fairly effective models for this dataset, the deep neural network performs exceptionally outstanding in terms of accuracy and also offers a clear understanding of its decision-making process through graphical explainability. Its excellent efficiency and transparency combine to make it an effective malware detection tool. Future exploration may converge on improving these algorithms and investigating alternative approaches for augmenting the accuracy and explainability of the models employed in the malware classification projects.
Computing and data resources
The experimental results were generated by utilizing the 7th-generation Core i5 machine supplied with 16 GB of RAM. The evaluation is performed using Matlab 2022a with its Deep Learning Toolbox within a Windows 11 environment. Dataset URL: https://www.kaggle.com/datasets/gauravpendharkar/blended-malware-image-dataset
References
- 1. Capuano N, Fenza G, Loia V, Stanzione C. Explainable artificial intelligence in cybersecurity: a survey. IEEE Access. 2022;10:93575–600.
- 2.
Svetozarov Naydenov R, Malatras A, Lella I, Theocharidou M, Ciobanu C, et al. EUA for Cybersecurity. ENISA threat landscape 2022 - July 2021 to July 2022. Svetozarov Naydenov R, Malatras A, Lella I, Theocharidou M, Ciobanu C, Tsekmezoglou E, editors; 2022.
- 3. Zhang Z, Hamadi HA, Damiani E, Yeun CY, Taher F. Explainable artificial intelligence applications in cyber security: state-of-the-art in research. IEEE Access. 2022;10:93104–39.
- 4. Charmet F, Tanuwidjaja HC, Ayoubi S, Gimenez P-F, Han Y, Jmila H, et al. Explainable artificial intelligence for cybersecurity: a literature survey. Ann Telecommun. 2022;77(11–12):789–812.
- 5.
Islam MU, Mozaharul Mottalib M, Hassan M, Alam ZI, Zobaed SM, Fazle Rabby M. The past, present, and prospective future of XAI: a comprehensive review. In: Ahmed M, Islam SR, Anwar A, Moustafa N, Pathan ASK, editors. Explainable artificial intelligence for cyber security: next generation artificial intelligence. Cham: Springer; 2022. p. 1–29. https://doi.org/10.1007/978-3-030-96630-0_1
- 6. Mohseni S, Zarei N, Ragan ED. A multidisciplinary survey and framework for design and evaluation of explainable AI systems. ACM Trans Interact Intell Syst. 2021;11(3–4):1–45.
- 7. Sharma DK, Mishra J, Singh A, Govil R, Srivastava G, Lin JC-W. Explainable artificial intelligence for cybersecurity. Comput Electric Eng. 2022;103:108356.
- 8. Saqib M, Mahdavifar S, Fung BCM, Charland P. A comprehensive analysis of explainable AI for malware hunting. ACM Comput Surv. 2024;56(12):1–40.
- 9. Dwivedi R, Dave D, Naik H, Singhal S, Omer R, Patel P, et al. Explainable AI (XAI): core ideas, techniques, and solutions. ACM Comput Surv. 2023;55(9):1–33.
- 10. Sultana S, Alam MM, Su’ud MM, Mustapha JC, Prasad M. A Deep dive into robot vision - an integrative systematic literature review methodologies and research endeavor practices. ACM Comput Surv. 2024;56(9):1–33.
- 11.
Nazim S, Hussain SS, Moinuddin M, Zubair M, Tanweer R. Parametric evaluation of improved deep learning networks for Musculoskeletal Disorder (MSD) classification. In: 2021 International Conference on Computer & Information Sciences (ICCOINS). IEEE; 2021. p. 102–8. https://doi.org/10.1109/iccoins49721.2021.9497139
- 12.
Hariharan S, Velicheti A, Anagha A, Thomas C, Balakrishnan N. Explainable artificial intelligence in cybersecurity: a brief review. In: 2021 4th International Conference on Security and Privacy (ISEA-ISAP). IEEE; 2021. p. 1–12.
- 13. Orouskhani M, Rauniyar S, Morella N, Lachance D, Minot SS, Dey N. Deep learning imaging analysis to identify bacterial metabolic states associated with carcinogen production. Discov Imaging. 2025;2(1):2. pmid:40098681
- 14.
D’almeida F, Mendes R, Rios T. Explainable artificial intelligence and cybersecurity: a systematic literature review. 2023.
- 15. Kolbe M, Torres Alavez JA, Mottram R, Bintanja R, van der Linden EC, Stendel M. Model performance and surface impacts of atmospheric river events in Antarctica. Discov Atmos. 2025;3(1):4. pmid:40130261
- 16.
Rice S, Maglaras L. Artificial intelligence: practical and ethical challenges. In: Ahmed M, Islam S, Anwar A, Moustafa N, Pathan A, editors. Artificial intelligence: practical and ethical challenges. Cham: Springer; 2022. p. 59–71.
- 17. Li J. Cyber security meets artificial intelligence: a survey. Front Inf Technol Electron Eng. 2018;19(12):1462–74.
- 18. Sultana S, Hussain SS, Hashmani M, Ahmad J, Zubair M. A deep learning hybrid ensemble fusion for chest radiograph classification. NNW. 2021;31(3):191–209.
- 19. Naeem H, Alshammari BM, Ullah F. Explainable artificial intelligence-based IoT device malware detection mechanism using image visualization and fine-tuned CNN-based transfer learning model. Comput Intell Neurosci. 2022;2022:7671967. pmid:35875737
- 20. Aslan O, Yilmaz AA. A new malware classification framework based on deep learning algorithms. IEEE Access. 2021;9:87936–51.
- 21.
Rahman MM, Ahmed A, Khan MH, Mahin MRH, Kibria FB, Karim DZ, et al. CNN vs transformer variants: malware classification using binary malware images. In: 2023 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT). IEEE; 2023. p. 308–15. https://doi.org/10.1109/comnetsat59769.2023.10420585
- 22.
Sultana S, Hussain SS, Hashmani M, Fayez FA, Umair M. A neoteric variant of deep learning network for chest radiograph automated annotation. In: 2021 International Conference on Computer & Information Sciences (ICCOINS). IEEE; 2021. p. 114–9. https://doi.org/10.1109/iccoins49721.2021.9497215
- 23. Nazim S, Rizvi SSH, Moinuddin M, Zubair M, Tanweer MR, Sultana S. An advanced comparative analysis of proposed deep learning architectures for musculoskeletal disorder classification. IJBIS. 2023;42(3/4):349.
- 24.
Liu Y, Tantithamthavorn C, Li L, Liu Y. Explainable AI for Android malware detection: towards understanding why the models perform so well?. In: 2022 IEEE 33rd International Symposium on Software Reliability Engineering. IEEE; 2022. p. 169–80.
- 25.
Kim J, Hwang C, Lee T. A study on evaluation methods for interpreting AI results in malware analysis. 2021.
- 26. Gulmez S, Gorgulu Kakisim A, Sogukpinar I. XRan: explainable deep learning-based ransomware detection using dynamic analysis. Comput Secur. 2024;139:103703.
- 27.
Smmarwar S, Gupta G, Kumar S. Xai-amd-dl: an explainable ai approach for android malware detection system using deep learning. In: 2023 IEEE World Conference on Applied Intelligence and Computing. IEEE; 2023.
- 28.
Hafiz M, Khan N, Kamal Z, Hossain S, Barman S. A robust malware classification approach leveraging explainable AI. In: 2024 International Conference on Intelligent Systems for Cybersecurity (ISCS). IEEE; 2024.
- 29. Dong Y, Tamari K, Kishigami M, Katsuki S, Minami K, Tatekawa S, et al. Irradiated cell-derived exosomes enhance cell proliferation and radioresistance via the MAPK/Erk pathway. Cancer Genom Proteom. 2024;21(1):12–7. pmid:38151290
- 30.
Kasarapu S, Bhusal D, Rastogi N, Pudukotai Dinakarrao SM. Comprehensive analysis of consistency and robustness of machine learning models in malware detection. In: Proceedings of the Great Lakes Symposium on VLSI. ACM; 2024.
- 31. Ullah F, Alsirhani A, Alshahrani MM, Alomari A, Naeem H, Shah SA. Explainable malware detection system using transformers-based transfer learning and multi-model visual representation. Sensors (Basel). 2022;22(18):6766. pmid:36146112
- 32.
Shaik A, Pendharkar G, Kumar S, Balaji S. Comparative analysis of imbalanced malware byteplot image classification using transfer learning. In: Proceedings of the Conference. p. 2310.
- 33. Nazim S, Hussai SS, Moinuddin M, Zubair M, Ahmad J. A neoteric ensemble deep learning network for musculoskeletal disorder classification. NNW. 2021;31(6):377–93.