FM-DLM: A new method for image classification based on the fusion of multi-level deep learning models

Guanghao Jin; Hengguang Li; Hui Du; Qingzeng Song

doi:10.1371/journal.pone.0338137

Abstract

Currently, deep learning models are widely used in many classification applications, but their utilization is limited by some factors. The large models can ensure classification of wide range, but they cannot be deployed to some small devices. The small models can be deployed to the small devices, but the number of labels is limited. To solve these problems, this paper proposes a classification method based on the Fusion of Multi-level Deep Learning Models (FM-DLM). We apply the Baidu-AI platform as a Level 0 model for classification of wide range samples. Then, we use the difference between Level 1 models to perform dataset prediction. Then, we can use the Level 2 models that were trained on the predicted dataset, which is to perform label classification. Finally, we use label distribution to achieve higher accuracy. The experimental results show that our method can achieve higher accuracy than the existing methods while ensuring a wide range of classification.

Citation: Jin G, Li H, Du H, Song Q (2026) FM-DLM: A new method for image classification based on the fusion of multi-level deep learning models. PLoS One 21(1): e0338137. https://doi.org/10.1371/journal.pone.0338137

Editor: Claudionor Ribeiro da Silva, Universidade Federal de Uberlandia, BRAZIL

Received: March 12, 2025; Accepted: November 18, 2025; Published: January 27, 2026

Copyright: © 2026 Jin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: References [23] https://paperswithcode.com/sota/image-classification-on-cifar-10 references [24] https://paperswithcode.com/sota/image-classification-on-cifar-100 references [25] https://paperswithcode.com/dataset/mini-imagenet references [26] https://paperswithcode.com/dataset/eurosat references [27] https://www.kaggle.com/datasets/puneet6060/intel-image-classification references [28,29] https://www.kaggle.com/datasets/hojjatk/mnist-dataset Code is shared at the following link https://zenodo.org/records/16777230.

Funding: The fund of Beijing Polytechnic(2023X005-KXD) The fund of Beijing Polytechnic plays role in study design, data collection.

Competing interests: The authors have declared that no competing interests exist.

Introduction

With the development of deep learning technology, various models are constantly being designed [1,2]. The current development trend is design large models, which requires more computing resources and leads to high training cost [3,4]. Therefore, some small devices may not be able to implement these large models.

With the emergence of commercial large model platforms, we can perform the classification task by calling APIs (Application Programming Interfaces) [5,6]. Although these platforms can provide the classification service through the network, the training of new samples is not convenient, which causes low classification accuracy on these samples.

The fusion of deep learning models is another solution. To increase accuracy, some methods fuse the output of multiple models [7,8]. On the other hand, the accuracy depends on the output of each model. Therefore, the selection of high-precision models is the decisive factor. In addition, if these models are trained on different datasets, we should first predict the dataset.

To address these challenges, we try to efficiently utilize a large model platform and the fusion of deep learning models to achieve a wide range of classification and high accuracy. The contributions of this paper can be summarized as follows. The first one is that we efficiently match the result of the large model to the label of the datasets, which ensures a wide range of classification. The second is that our method optimizes the fusion methods, which achieves dataset prediction while ensuring high accuracy. The third one is that our method can be deployed to different types of small devices, which solves the problem of device dependency.

The structure of our paper is as follows. The first section is the introduction and the second one is related work. The third section introduces our method and the fourth one introduces the experiment. The fifth section summarizes this paper and discusses future work.

Related work

In this paper, we choose some existing deep learning models, large model platform and fusion methods as the baseline. The first type of deep models is based on SNN (Spiking Neural Networks) mechanism. The second type of deep models include others that have different kinds of structures. We introduce a large model platform that will be utilized at our Level 0. Then, we introduce some fusion methods that will be utilized at the other levels.

The first type of deep learning models is based on SNN. The ANN-SNN model applies the quantization clip-floor-shift activation function to replace the ReLU (‌Rectified Linear Unit), which can better approximate the activation function [9]. Hybrid training SNN (Spiking Neural Networks) utilizes a computationally efficient training technique [10]. The Low-Latency SNN model proposes a low-latency deep spiking network trained with gradient descent, which optimizes the membrane leak and the firing threshold [11]. The direct training SNN model proposes a neuron normalization technique to adjust neural selectivity and develops a direct learning algorithm for deep SNNs [12]. The TSSL-BP (Temporal Spike Sequence Learning Back Propagation) model uses a novel temporal spike sequence learning back propagation method for training deep SNNs [13]. The TDBN (Threshold Dependent Batch Normalization) model enables direct training of a very deep SNN and the efficient implementation of its inference on hardware [14]. These models utilize different optimization techniques based on the structure of the SNN. Thus, the performance is limited by the structure of the SNN.

We introduce some different deep learning models to expand the range of model selection in our method case. The TET (Temporal Efficient Training) model introduces a temporal efficient training approach to compensate for the loss of momentum in gradient descent [15]. The MPD (Membrane Potential Distribution) model attempts to rectify the membrane potential distribution by designing a novel distribution loss, which can explicitly penalize the undesired shifts without introducing any additional operations in the inference phase [16]. The WRN (Wide Residual Networks) model proposes a ground radar target classification algorithm and an attention mechanism [17]. The DeiT (Data-efficient image Transformers) model produces a competitive convolution-free transformer by training only on ImageNet [18]. The Swin (Shifted Window) model presents a new vision transformer that capably serves as a general-purpose backbone for computer vision [19]. These models utilize different structures and parameter tuning techniques, which achieve high accuracy on various datasets. On the other hand, there are no models that can achieve the highest accuracy on all datasets. Thus, we try to efficiently utilize these models to achieve high accuracy on multiple datasets.

To enable a wide range of classification, we select the Baidu-AI platform as the large model platform [20]. This platform can identify more than 100,000 kinds of objects and scenes, and provide corresponding API services to fully meet the application needs of various developers and enterprise users. The API request is used for general object and scene recognition; that is, for an input picture (which can be decoded normally and has an appropriate aspect ratio), multiple object and scene labels in the picture are output.

To efficiently use multiple models to achieve high accuracy, we choose some of the latest fusion methods. These methods can fuse the outputs of models on multiple datasets. Voting methods combine the top-performing models to achieve the high accuracy [21]. Weighted voting methods try to fuse various deep neural network models to achieve high accuracy [22].

We summarize these methods in Table 1. Basically, these are summarized as three types,

Download:

Table 1. Literature table of related work.

https://doi.org/10.1371/journal.pone.0338137.t001

single models, large model platforms and fusion methods. We introduce the advantages and limitations of these methods in this table.

Our method

In order to help understand the methods in this paper, we first provide some definitions.

Preliminaries

Firstly, we define S_n as a sample and G(S_n) as the ground truth of S_n. We name a dataset to be D_j. Then, we define a deep learning model M_i that is trained on D_j as M_i,j. We define the output of model M_i,j on S_n as F (M_i,j, S_n). Then, we can define the accuracy of M_i,j on {S_n} as . On some samples, if the model with the highest classification accuracy is M_i,j, we define these samples to belong to D_j. We define L_{k, j} as a label of D_j. We define is the number of labels in {L_{k, j}}. Then, we construct the levels by the following equation.

(1)

where represents the number of labels in Level and is the number of layers at Level j. Equation 1 represents the relationship between models in different levels. Lower-level models have more labels than higher-level models. On the contrary, due to the targeted training of corresponding datasets, we assume that the higher-level model achieves higher accuracy.

We can define the classification task in this paper as follows. When there are multiple datasets {D_j}, a classification process should first predict which D_y includes S_n. Then, on the predicted dataset D_y, it should classify the label L_x,y of S_n.

Our framework

In this subsection, we illustrate our method (named FM-DLM) as shown in Fig 1. Step 1 is the selection of models, including the selection of a large model platform, based on whether they are shared and can be deployed to our devices. Step 2 is model training, which follows the training process outlined in the relevant papers. Step 3 is model selection, mainly selecting some trained models with higher accuracy on the corresponding dataset.

Download:

Fig 1. The framework our method.

https://doi.org/10.1371/journal.pone.0338137.g001

After these preparations, we can utilize our method to classify the samples. Step 4 uses the large model platform (named as the Level 0 model) to classify the labels of samples. Then, our method filters the label from Level 0 to that of Level 1, which is used to temporally predict the dataset. Step 5 aims to predict the dataset more precisely according to the differences among Level 1 models. Step 6 selects the trained models (named Level 2 models) of the predicted dataset and fuse their outputs to classify the label. Step 7 uses the label distribution to optimize the results, which further increases classification accuracy.

In practical applications, each device can choose the appropriate type and number of models based on storage and hard disk capacity. When we use the model platform at Level 0, we only need to ensure the internet connection. When we select the Level 1 models, we can choose the appropriate number of models according to memory capacity. Then, we can also run the models one by one in memory, which is suitable for small memory sizes. We summarize the levels as shown in Table 2.

Download:

Table 2. The introduction of the levels.

https://doi.org/10.1371/journal.pone.0338137.t002

Step 1: Select models

Firstly, we collect some models based on the following conditions. The first condition is that these models should be open-source and shareable. The second condition is that the memory consumption of these models is smaller than that of our GPU. The third condition is that we can deploy these models without any bug.

We train these models following the training steps that are introduced in the related papers. Some models may have low performance on datasets that are not introduced in these papers. In other words, the tuning of these models is highly related to the corresponding datasets. Thus, the selection of trained models in Step 3 is important.

Step 2: Train models

If we use a commercial large model platform as Level 0 model, we don’t need model training at this level. These large model platforms can classify a wide range of labels.

When training the models at Level 1, we first prepare some existing models {M_i} and public datasets {D_j}. We select each dataset D_j and divide it into a training set , a validation one and a testing one . Then, each model M_i is trained on to obtain a trained model M_i,j. When we train M_i on another D_jj, we can get a trained model M_i,jj. Each model is trained on one dataset in our method.

Step 3: Select trained models

The rule for selecting trained models is based on their accuracy on the validation sets. We choose some models {M_x,j} with higher accuracy among the trained models on each dataset as follows.

(2)

where is defined by Equation 1 and Top_i means we select the top i models that achieved higher accuracy than the others on the validation set. The number i is also determined by the validation set.

Step 4: Filter

We can classify the label of a sample by Level 0 model. If the classified label belongs to the collected datasets {D_j}, we can continue to the next process. Otherwise, the classified label is the final result. The large model platform has a relatively wide distribution of labels, but the label format follows its own rules. Therefore, there are some differences between the labels provided by the large model platform and those of {D_j}. Thus, we should map the labels of Level 0 to those of Level 1. We can define the mapping as follows.

(3)

where is the label of Level 0 and L_k,j is the label of Level 1. is defined by Equation 1, and is the probability when the ground truth G(S_n) is the label L_k,j.

Step 5: Dataset prediction

When we classify S_n that belongs to a certain dataset by Step 4, we should predict which dataset contains this sample. We use the difference among Level 1 models to classify the dataset that may include this sample.

We define the probability of a label L_{k, j} on a sample S_n by the output of M_i,j as P (M_i,j, S_n, L_k,j). On the validation set, we can get the weights {W_i,j} as shown in the following equation.

(4)

where here is defined as the accuracy of trained model M_i,j on the validation set D_j. We define the difference between M_i,j with M_ii,j on a sample S_n as follows.

(5)

where P (M_i,j, S_n, L_k,j), W_i,j and W_ii,j are defined by Equation 4. W_i,j is the weight that is related to model M_i,j and W_ii,j is the weight that is related to model M_ii,j. We can select M_ii,j that achieve the highest accuracy on . Then, we can predict the dataset that may include S_n by the following equation.

(6)

where we define D_y as the predicted dataset that may contain the sample S_n.

Step 6: Label classification

After the D_y is obtained through Equation 6, we can classify the labels by the following equation:

(7)

where W_i,y and P (M_i,y, S_n, L_k,y) are introduced in Equation 4. P (M_i,y, S_n, L_k,y) is the output of the trained models of D_y.

Step 7: Label distribution

Generally, a validation set is used to simulate the corresponding testing set. Therefore, we can assume that the label distribution of the validation set is the same as that of the testing set. We compute the distribution of label L_k,j on dataset using the following equation.

(8)

where presents the number of samples (that belong to ), for which the ground truth is L_k,j. is the number of all samples that belong to . After the labels are classified by Equation 7, the scores of some results may be low. We can set thresholds based on the validation set to select some of these results. For these results, we further increase the accuracy using the following equation:

(9)

where P (M_i,y, S_n, L_k,y) is defined by Equation 4 and is defined by Equation 8. We also compute the hyper-parameter on the validation set.

Meticulous pseudo code related to our method

We introduce the pseudo code related to our method in Table 3.

Download:

Table 3. Pseudo code of FM-DLM.

https://doi.org/10.1371/journal.pone.0338137.t003

Experiment

Experimental setup

We selected three public datasets: the CIFAR-10 dataset [23], the CIFAR-100 dataset [24], and the Mini-ImageNet dataset [25]. Generally, for each dataset, we use 70% samples for training and 10% of validation, and 20% for testing.

We select some shared models, which are ANN-SNN [9], Hybrid training SNN [10], Low-latency SNN [11], Direct training SNN [12], TSSL-BP [13], TDBN [14], TET [15], MPD [16], WRN [17], DeiT [18], Swin (Shifted Window) [19]. The selection depends on the possibility of implementation on our device. We select Baidu-AI large model platform as the Level 0 model. Table 4 shows the details of classification by Baidu-AI.

Download:

Table 4. The introduction of Baidu-AI.

https://doi.org/10.1371/journal.pone.0338137.t004

For better comprehension, we use Table 5 to explain the evaluation metrics.

Download:

Table 5. The experimental process of our method (FM-DLM).

https://doi.org/10.1371/journal.pone.0338137.t005

The evaluation of trained models (Step 3)

Table 6 shows the classification accuracy of different models on three public datasets. Due to the different optimization details, the accuracy of these models is different on various data sets. We build our framework based on the selection of these models at Step 3.

Download:

Table 6. The evaluation of trained models (accuracy of label classification, percentage).

https://doi.org/10.1371/journal.pone.0338137.t006

The evaluation of dataset prediction (Step 5)

Table 7 shows the accuracy of the dataset prediction at Step 5. Compared with the existing methods, our method is 3.5% higher on CIFAR-10, 2.47% higher on CIFAR-100, and 3.96% higher on Mini-ImageNet than those of the existing methods.

Download:

Table 7. CIFAR-10, CIFAR-10 and Mini-ImageNet (accuracy of dataset prediction, percentage).

https://doi.org/10.1371/journal.pone.0338137.t007

The evaluation of label classification with dataset prediction (Step 6)

Table 8 shows the accuracy of the label classification with of Step 6. Compared with the existing methods, our method is 4.01% higher on CIFAR-10, 3.01% higher on CIFAR-100, and 5.03% higher on Mini-ImageNet than those of the existing methods.

Download:

Table 8. Accuracy of label classification with dataset prediction, percentage.

https://doi.org/10.1371/journal.pone.0338137.t008

The evaluation of label distribution (Step 7)

In the label distribution step, we assigned a random distribution to the samples of labels. Table 9 shows the accuracy of the label distribution. Compared with our method (Step 6), our method with label distribution (Step 7) is 1.83% higher on CIFAR-10, 2.1% higher on CIFAR-100, and 1.5% higher on Mini-ImageNet.

Download:

Table 9. Accuracy of label classification with dataset prediction, percentage.

https://doi.org/10.1371/journal.pone.0338137.t009

The evaluation of ablation

Table 10 shows the ablation experimental results of the steps. From Step 1 to Step 4, the best performance achieved by the Baidu-AI. After we added dataset prediction at Step 5, we can use the corresponding model to classify the labels, which allows our method to achieve the best performance. When we fuse the outputs of models to classify labels at Step 6, our method also achieves the best performance. Furthermore, when we optimize the results using label distribution at Step 7, our method can achieve higher accuracy than that of Step 6.

Download:

Table 10. The evaluation of each step (accuracy of label classification with dataset prediction, percentage).

https://doi.org/10.1371/journal.pone.0338137.t010

The evaluation of the model selection

Fig 2 shows how the selection of models affects the classification accuracy on Mini-ImageNet. In this figure, the blue column shows the methods with the worst (models achieve lower accuracy than those of the others) 5 trained models on each dataset. The orange column shows the methods with the best (models that achieve the highest accuracy on the corresponding dataset) trained models on the corresponding datasets. The green column shows the methods with random selection of the trained models (randomize the number of models and the selection of these models), and then we compute the average accuracy of 100 times. As this figure shows, the selection of the models play important role to the accuracy. Our method achieve the highest accuracy among all of these selections.

Download:

Fig 2. Selection of models (accuracy of label classification with dataset prediction).

https://doi.org/10.1371/journal.pone.0338137.g002

The voting method does not consider the importance of high-precision models, which reduces accuracy. The weighted voting method solves this problem, but it only uses the classified label, which is the final output of each model. In contrast, our method fully utilizes the probability of each label before the final output. Therefore, our method can achieve higher accuracy than other methods.

The experiments on more datasets and models

Fig 3 shows our methods on 6 datasets. With the collected 3 datasets, we further collected the EuroSAT dataset [26], Intel-image-classification dataset (named as Intel) [27] and MNIST [28,29]. Furthermore, we adopted related models for MNIST, which are F2PQNN [28] and NoRD [29]. These models achieves high accuracy on MNIST (F2PQNN is 99.09% and NoRD is 96.74%), so we adopt these models in our method. The accuracy on 6 datasets is lower than that of 3 datasets. The variety of samples is increased as there are more datasets, which reduces the accuracy of dataset prediction. Thus, label classification by Level 0 model plays important role in reducing the difficulty of dataset prediction at Level 1. Compared with the weighted voting method, our method achieved higher accuracy on each dataset.

Download:

Fig 3. The evaluation on more datasets.

https://doi.org/10.1371/journal.pone.0338137.g003

The analysis

Fig 4 shows a simple illustration of our method. As this figure shows, when a sample S_n belongs to D₁, the corresponding trained models {M_i,1} are more easily output the similar probabilities of the labels. Furthermore, the probability of ground truth will be higher than those of the others. On the other hand, as the trained models {M_i,0} on D₀ cannot effectively capture the features of this sample, it leads to different outputs of models.

Download:

Fig 4. A simple illustration of our method.

https://doi.org/10.1371/journal.pone.0338137.g004

Hardware efficiency metrics

The CPU and GPU that are applied in our experiment are shown in Table 11. We select Mini-ImageNet as an example dataset. We record the maximum execution time, maximum memory consumption and FLOPs(G) of single model (we record maximum value among all models), the existing methods and our method, which are shown in Table 12. Our methods generate multiple models on each dataset, which causes the runtime to be larger than that of single model. Furthermore, the connection time with Baidu-AI consumes more time than that on either CPU or GPU.

Download:

Table 11. The execution time and memory consumption.

https://doi.org/10.1371/journal.pone.0338137.t011

Download:

Table 12. The execution time and memory consumption.

https://doi.org/10.1371/journal.pone.0338137.t012

Compared with the execution time of single model (we record the model achieved maximum execution time), the existing fusion methods run multiple models on the GPU, which leads to longer execution time. Furthermore, these fusion methods fuse the outputs of multiple models on the CPU side, which lead to additional execution time. Our method outputs the probability of the labels on GPU side and computes the final results on CPU side, which leads to longer execution time than those of existing methods.

The existing methods and our one run the models one by one on the GPU side. Thus, the maximum memory consumption of the existing methods is the same as that of single model. Our method needs to store the probability of models, which lead to additional memory consumption.

The execution time for connecting to the Baidu-AI is the same in the cases of single models and fusion methods. Without classification by Baidu-AI, none of the methods can perform wide range classification.

The introduction of the employed acronyms

We use Table 13 to introduce the employed acronyms in this paper.

Download:

Table 13. The introduction of the employed acronyms.

https://doi.org/10.1371/journal.pone.0338137.t013

Conclusions

This paper proposes a new way to efficiently utilize different levels of deep learning models to achieve high classification accuracy while ensuring wide-range classification. Our method solve the matching problem between large model platforms and the deep learning models. Furthermore, we improve the accuracy of dataset prediction and label classification, which is higher than that of the existing fusion methods. Our method can be deployed on small devices, which is important for many applications.

In future work, we will conduct more experiments to study how the diversity of samples affects the performance of trained models, aiming to further increase classification accuracy. Furthermore, as the same wrong results lower the performance of the fusion methods, the similarity of trained models will also be a focus for future research.

References

1. Devi SN, Natarajan R, Gururaj HL, Flammini F, Sulaiman Alfurhood B, Krishna S. Ridge Regressive Data Preprocessed Quantum Deep Belief Neural Network for Effective Trajectory Planning in Autonomous Vehicles. Complexity. 2024;2024(1):1–13.
- View Article
- Google Scholar
2. Karaköse E. An Efficient Satellite Images Classification Approach Based on Fuzzy Cognitive Map Integration With Deep Learning Models Using Improved Loss Function. IEEE Access. 2024;12:141361–79.
- View Article
- Google Scholar
3. Pietroń M, Żurek D, Śnieżyński B. Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction. J Comput Sci. 2023;67:101971.
- View Article
- Google Scholar
4. Yao F, Zhang Z, Ji Z, Liu B, Gao H. LBB: load-balanced batching for efficient distributed learning on heterogeneous GPU cluster. J Supercomput. 2024;80(9):12247–72.
- View Article
- Google Scholar
5. Chai X, Zhang M, Tian H. AI for Science: Practice from Baidu Paddle. In: 2024 Portland International Conference on Management of Engineering and Technology (PICMET), Portland, OR, USA, 2024. p. 1–12.
- View Article
- Google Scholar
6. Jones N. How should we test AI for human-level intelligence? OpenAI’s o3 electrifies quest. Nature. 2025;637(8047):774–5. pmid:39805930
- View Article
- PubMed/NCBI
- Google Scholar
7. Thangavel K, Palanisamy N, Muthusamy S, Mishra OP, Sundararajan SCM, Panchal H, et al. A novel method for image captioning using multimodal feature fusion employing mask RNN and LSTM models. Soft Comput. 2023;27(19):14205–18.
- View Article
- Google Scholar
8. Wang S, Ni L, Zhang Z, Li X, Zheng X, Liu J. Multimodal prediction of student performance: A fusion of signed graph neural networks and large language models. Pattern Recogn Lett. 2024;181:1–8.
- View Article
- Google Scholar
9. Jiang C, Zhang Y. A Noise-Based Novel Strategy for Faster SNN Training. Neural Comput. 2023;35(9):1593–608. pmid:37437192
- View Article
- PubMed/NCBI
- Google Scholar
10. He X, Li Y, Zhao D, Kong Q, Zeng Y. MSAT: biologically inspired multistage adaptive threshold for conversion of spiking neural networks. Neural Comput Applic. 2024;36(15):8531–47.
- View Article
- Google Scholar
11. Rathi N, Roy K. DIET-SNN: A Low-Latency Spiking Neural Network With Direct Input Encoding and Leakage and Threshold Optimization. IEEE Trans Neural Netw Learn Syst. 2023;34(6):3174–82. pmid:34596559
- View Article
- PubMed/NCBI
- Google Scholar
12. Wu Y, Deng L, Li G, Zhu J, Xie Y, Shi L. Direct training for spiking neural networks: Faster, larger, better. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, p. 1311–8, 2019. https://doi.org/10.1609/aaai.v33i01.3301131
13. Zhang W, Li P. Temporal spike sequence learning via backpropagation for deep spiking neural networks. Adv Neural Inf Process Syst. 2020;33:12022–1203.
- View Article
- Google Scholar
14. Zheng H, Wu Y, Deng L, Hu Y, Li G. Going Deeper With Directly-Trained Larger Spiking Neural Networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, volume 35. 2021. p. 11062–70.
- View Article
- Google Scholar
15. Deng S, Li Y, Zhang S, Gu S. Temporal efficient training of spiking neural network via gradient re-weighting. In: International Conference on Learning Representations. 2021.
- View Article
- Google Scholar
16. Guo Y, Tong X, Chen Y, Zhang L, Liu X, Ma Z, et al. RecDis-SNN: Rectifying Membrane Potential Distribution for Directly Training Spiking Neural Networks. In: Conference on Computer Vision and Pattern Recognition (CVPR), 2022. p. 326–35.
- View Article
- Google Scholar
17. Alsekait D, Zakariah M, Amin SU, Khan ZI, Alqurni JS. Privacy preservation in iot devices by detecting obfuscated malware using wide residual network. Comput Mater Cont. 2024;81(11):2395–436.
- View Article
- Google Scholar
18. Yadav RK, Daniel A, Semwal VB. Enhancing Human Activity Detection and Classification Using Fine Tuned Attention-Based Transformer Models. SN Comput Sci. 2024;5(8):1–21.
- View Article
- Google Scholar
19. Yao D, Shao Y. A data efficient transformer based on Swin Transformer. Vis Comput. 2023;40(4):2589–98.
- View Article
- Google Scholar
20. https://ai.baidu.com/tech/imagerecognition
21. Aurangzeb S, Aleem M. Evaluation and classification of obfuscated Android malware through deep learning using ensemble voting mechanism. Sci Rep. 2023;13(1):3093. pmid:36813846
- View Article
- PubMed/NCBI
- Google Scholar
22. Açıkkar M, Tokgöz S. An improved KNN classifier based on a novel weighted voting function and adaptive k-value selection. Neural Comput Appl. 2023;36(8):4027–45.
- View Article
- Google Scholar
23. Kundroo M, Kim T. Demystifying Impact of Key Hyper-Parameters in Federated Learning: A Case Study on CIFAR-10 and FashionMNIST. IEEE Access. 2024;12:120570–83.
- View Article
- Google Scholar
24. Huang Y, Zhu Y-H, Zhigao Z, Ou Y, Kong L. Classification of Long-Tailed Data Based on Bilateral-Branch Generative Network with Time-Supervised Strategy. Complexity. 2021;2021(1):1–10.
- View Article
- Google Scholar
25. Bhakta S, Nandi U, Changdar C, Ghosal SK, Pal RK. emapDiffP: A novel learning algorithm for convolutional neural network optimization. Neural Comput Appl. 2024;36(20):11987–2010.
- View Article
- Google Scholar
26. Günen MA. Performance comparison of deep learning and machine learning methods in determining wetland water areas using EuroSAT dataset. Environ Sci Pollut Res Int. 2022;29(14):21092–106. pmid:34746985
- View Article
- PubMed/NCBI
- Google Scholar
27. Available from: https://www.kaggle.com/datasets/puneet6060/intel-image-classification
28. Li J, Yuan P, Zhang J, Shen S, He Y, Xiao R. F2PQNN: a fast and secure two-party inference on quantized convolutional neural networks. Comput J. 2025;68(8):998–1012.
- View Article
- Google Scholar
29. Sharma S, Lodhi SS, Srivastava V, Chandra J. NoRD: A framework for noise-resilient self-distillation through relative supervision. Appl Intell. 2025;55(7).
- View Article
- Google Scholar

[ref1] 1. Devi SN, Natarajan R, Gururaj HL, Flammini F, Sulaiman Alfurhood B, Krishna S. Ridge Regressive Data Preprocessed Quantum Deep Belief Neural Network for Effective Trajectory Planning in Autonomous Vehicles. Complexity. 2024;2024(1):1–13.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Karaköse E. An Efficient Satellite Images Classification Approach Based on Fuzzy Cognitive Map Integration With Deep Learning Models Using Improved Loss Function. IEEE Access. 2024;12:141361–79.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Pietroń M, Żurek D, Śnieżyński B. Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction. J Comput Sci. 2023;67:101971.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Yao F, Zhang Z, Ji Z, Liu B, Gao H. LBB: load-balanced batching for efficient distributed learning on heterogeneous GPU cluster. J Supercomput. 2024;80(9):12247–72.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Chai X, Zhang M, Tian H. AI for Science: Practice from Baidu Paddle. In: 2024 Portland International Conference on Management of Engineering and Technology (PICMET), Portland, OR, USA, 2024. p. 1–12.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Jones N. How should we test AI for human-level intelligence? OpenAI’s o3 electrifies quest. Nature. 2025;637(8047):774–5. pmid:39805930
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref7] 7. Thangavel K, Palanisamy N, Muthusamy S, Mishra OP, Sundararajan SCM, Panchal H, et al. A novel method for image captioning using multimodal feature fusion employing mask RNN and LSTM models. Soft Comput. 2023;27(19):14205–18.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref8] 8. Wang S, Ni L, Zhang Z, Li X, Zheng X, Liu J. Multimodal prediction of student performance: A fusion of signed graph neural networks and large language models. Pattern Recogn Lett. 2024;181:1–8.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref9] 9. Jiang C, Zhang Y. A Noise-Based Novel Strategy for Faster SNN Training. Neural Comput. 2023;35(9):1593–608. pmid:37437192
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref10] 10. He X, Li Y, Zhao D, Kong Q, Zeng Y. MSAT: biologically inspired multistage adaptive threshold for conversion of spiking neural networks. Neural Comput Applic. 2024;36(15):8531–47.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref11] 11. Rathi N, Roy K. DIET-SNN: A Low-Latency Spiking Neural Network With Direct Input Encoding and Leakage and Threshold Optimization. IEEE Trans Neural Netw Learn Syst. 2023;34(6):3174–82. pmid:34596559
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref12] 12. Wu Y, Deng L, Li G, Zhu J, Xie Y, Shi L. Direct training for spiking neural networks: Faster, larger, better. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, p. 1311–8, 2019. https://doi.org/10.1609/aaai.v33i01.3301131

[ref13] 13. Zhang W, Li P. Temporal spike sequence learning via backpropagation for deep spiking neural networks. Adv Neural Inf Process Syst. 2020;33:12022–1203.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref14] 14. Zheng H, Wu Y, Deng L, Hu Y, Li G. Going Deeper With Directly-Trained Larger Spiking Neural Networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, volume 35. 2021. p. 11062–70.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref15] 15. Deng S, Li Y, Zhang S, Gu S. Temporal efficient training of spiking neural network via gradient re-weighting. In: International Conference on Learning Representations. 2021.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref16] 16. Guo Y, Tong X, Chen Y, Zhang L, Liu X, Ma Z, et al. RecDis-SNN: Rectifying Membrane Potential Distribution for Directly Training Spiking Neural Networks. In: Conference on Computer Vision and Pattern Recognition (CVPR), 2022. p. 326–35.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref17] 17. Alsekait D, Zakariah M, Amin SU, Khan ZI, Alqurni JS. Privacy preservation in iot devices by detecting obfuscated malware using wide residual network. Comput Mater Cont. 2024;81(11):2395–436.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref18] 18. Yadav RK, Daniel A, Semwal VB. Enhancing Human Activity Detection and Classification Using Fine Tuned Attention-Based Transformer Models. SN Comput Sci. 2024;5(8):1–21.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref19] 19. Yao D, Shao Y. A data efficient transformer based on Swin Transformer. Vis Comput. 2023;40(4):2589–98.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref20] 20. https://ai.baidu.com/tech/imagerecognition

[ref21] 21. Aurangzeb S, Aleem M. Evaluation and classification of obfuscated Android malware through deep learning using ensemble voting mechanism. Sci Rep. 2023;13(1):3093. pmid:36813846
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref22] 22. Açıkkar M, Tokgöz S. An improved KNN classifier based on a novel weighted voting function and adaptive k-value selection. Neural Comput Appl. 2023;36(8):4027–45.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref23] 23. Kundroo M, Kim T. Demystifying Impact of Key Hyper-Parameters in Federated Learning: A Case Study on CIFAR-10 and FashionMNIST. IEEE Access. 2024;12:120570–83.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Huang Y, Zhu Y-H, Zhigao Z, Ou Y, Kong L. Classification of Long-Tailed Data Based on Bilateral-Branch Generative Network with Time-Supervised Strategy. Complexity. 2021;2021(1):1–10.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref25] 25. Bhakta S, Nandi U, Changdar C, Ghosal SK, Pal RK. emapDiffP: A novel learning algorithm for convolutional neural network optimization. Neural Comput Appl. 2024;36(20):11987–2010.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref26] 26. Günen MA. Performance comparison of deep learning and machine learning methods in determining wetland water areas using EuroSAT dataset. Environ Sci Pollut Res Int. 2022;29(14):21092–106. pmid:34746985
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref27] 27. Available from: https://www.kaggle.com/datasets/puneet6060/intel-image-classification

[ref28] 28. Li J, Yuan P, Zhang J, Shen S, He Y, Xiao R. F2PQNN: a fast and secure two-party inference on quantized convolutional neural networks. Comput J. 2025;68(8):998–1012.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref29] 29. Sharma S, Lodhi SS, Srivastava V, Chandra J. NoRD: A framework for noise-resilient self-distillation through relative supervision. Appl Intell. 2025;55(7).
View Article
Google Scholar

[85] View Article

[86] Google Scholar

Figures

Abstract

Introduction

Related work

Our method

Preliminaries

Our framework

Step 1: Select models

Step 2: Train models

Step 3: Select trained models

Step 4: Filter

Step 5: Dataset prediction

Step 6: Label classification

Step 7: Label distribution

Meticulous pseudo code related to our method

Experiment

Experimental setup

The evaluation of trained models (Step 3)

The evaluation of dataset prediction (Step 5)

The evaluation of label classification with dataset prediction (Step 6)

The evaluation of label distribution (Step 7)

The evaluation of ablation

The evaluation of the model selection

The experiments on more datasets and models

The analysis

Hardware efficiency metrics

The introduction of the employed acronyms

Conclusions

References