Figures
Abstract
Corn is a critical food crop globally, widely cultivated due to its strong adaptability. However, it is susceptible to various diseases, necessitating advanced intelligent detection methods to enhance disease prevention, control efficacy, and production efficiency. Traditional disease recognition models suffer from high computational costs or inadequate feature extraction capabilities, making it challenging to achieve efficient and accurate disease identification in complex environments. To improve the accuracy and efficiency of corn leaf disease identification and to meet the requirements of portable devices, this paper proposes a novel ES-ShuffleNetV2 (Exponential Linear Unit + Spatial Group-wise Squeeze-and-Excitation Block) lightweight recognition model for corn diseases. The proposed model builds upon the ShuffleNetV2 architecture. Firstly, an improved attention mechanism, SGSE, is incorporated immediately following the first convolutional layer to emphasize fine-grained features in corn leaf disease images, enhancing the model’s focus on key characteristics. Secondly, the model replaces the ReLU activation function in the down-sampling and basic units with the ELU function, facilitating smoother gradient propagation and faster convergence by allowing a small negative gradient inflow. Additionally, layer pruning techniques are employed to eliminate redundant parameters, reduce model complexity, and enhance operational efficiency on mobile devices. Experimental results demonstrated that the ES-ShuffleNetV2 model achieved recognition accuracy of 97.07%, surpassing the base model’s accuracy of 95.43%. After pruning, the new model reduced parameters by 30.45% and FLOPs by 30.26% compared to the original model, meeting the criteria for a lightweight recognition model. Furthermore, the ES-ShuffleNetV2 model outperformed competing models in Accuracy and F1-Score, validating its effectiveness in corn leaf disease recognition and providing valuable insights for future research.
Citation: Bai L, Zhu S, Gao H (2025) A lightweight model and corn leaf disease recognition. PLoS One 20(11): e0336945. https://doi.org/10.1371/journal.pone.0336945
Editor: Marco Antonio Moreno-Armendariz, Instituto Politecnico Nacional, MEXICO
Received: January 23, 2025; Accepted: October 29, 2025; Published: November 17, 2025
Copyright: © 2025 Bai et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: A representative, de-identified subset of the dataset used in this study has been deposited in Figshare under the title Corn Leaf Disease Recognition Dataset and is publicly available at https://doi.org/10.6084/m9.figshare.30264037. This repository contains sample images, partial experimental results, and metadata for reproducibility. Due to privacy concerns, the full dataset cannot be made publicly available. However, it may be made available upon reasonable request from the Research Center of Intelligent Planting and Processing Technology of Crops in Anhui Province. Requests should be directed to yjs2022135@ahstu.edu.cn.
Funding: This research was funded by the Anhui Provincial Department of Education’s Natural Science Foundation Major Project, China, grant number 2023AH040275; the Key Discipline Construction Project of Anhui Science and Technology University, China, grant number XK-XJGY002; and Anhui Provincial Department of Education’s Natural Science Foundation Key Projects, China, grant number 2023AH051866. The funders provided support for data collection and publication of the manuscript but had no role in study design, data analysis, or the decision to publish.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
Corn is one of the most important staple crops globally, widely cultivated around the world for use in feed, industrial materials, and other fields, and holds significant economic value [1]. However, corn production faces threats from various diseases, such as corn leaf spot, Fusarium head blight, and rice blast, among others. The occurrence of these diseases can hinder the expected growth of corn, leading to severe yield reduction or even plant death, posing significant challenges to agricultural production and food security [2]. In traditional agricultural production, disease recognition mainly relies on farmers’ experience and visual observation. However, due to the lack of systematic scientific methods, this approach suffers from significant issues of inaccuracy and inefficiency. Particularly for diseases with early or subtle symptoms or those in complex environments, traditional methods tend to overlook key symptoms, leading to missed optimal control opportunities, thus exacerbating the spread and proliferation of the disease [3].
The adoption of advanced intelligent technologies and tools has become increasingly necessary to improve the precision and efficiency of disease detection. Since the 1980s, machine learning technologies have gradually emerged and been explored for applications in agriculture [4]. Early studies primarily relied on image scanning and traditional image processing techniques to analyze crop diseases caused by environmental stress, such as harmful gas pollution. These studies focused on disease recognition in leaves and stems, as well as monitoring crop growth conditions, demonstrating the feasibility of image analysis methods for crop health management. However, as traditional image processing techniques were increasingly applied to crop disease identification, their limitations in feature extraction capability and recognition accuracy became evident. Background interference in complex environments, the diversity of crop leaf morphologies, and subtle disease symptom changes have made it difficult for traditional methods to meet precise recognition requirements. To overcome these technical bottlenecks, Hinton et al. [5] proposed Deep Belief Networks (DBN) in 2006, marking the formal introduction of deep learning concepts. With its powerful autonomous feature extraction and generalization capabilities, deep learning has achieved remarkable success in areas such as visual perception, audio analysis, and natural language processing, laying a theoretical and technical foundation for innovations in disease recognition technologies.
Recently, deep learning–based approaches for crop disease identification have been widely explored and implemented. Wu Yehui et al. [6] proposed an image recognition method using a lightweight model based on a randomly augmented Swin-Tiny Transformer. By optimizing the Swin-Tiny Transformer model and fine-tuning parameters on a corn disease dataset, an identification accuracy of 93.59% was achieved, with a parameter size of 28.80M. Luo Yang et al. [7] introduced a crop leaf grading and disease recognition algorithm using backbone information sharing and multi-receptive field feature adaptive fusion. Experimental results demonstrated that CLGDRNet achieved recognition accuracies of mAP@0.5 and mAP@0.5:0.95 at 85.0% and 76.1%, respectively, on the early-cured tobacco leaf grading dataset and 97.6% and 74.2%, respectively on the apple leaf disease dataset, with a model size of 5.0M. Liang Xiuman et al. [8] improved the Fire module of the SqueezeNet network by incorporating spatial attention mechanisms and dense connection modules in deeper network layers to improve the ability to extract and reutilize features. Two lightweight CNN models were constructed for identifying diseased apple leaves, achieving recognition accuracies of 89.60% and 94.37%, corresponding to increases of 2.98 and 7.75 percentage points over the original network, with parameter sizes of 0.9M and 2.5M. Yang et al. [9] developed a self-supervised multi-network fusion classification model to identify common strawberry disease types, attaining an accuracy of 92.48%. Ahmad Loti Nurul Nabilah et al. [10] utilized deep learning–based feature extraction techniques to detect key disease and pest traits from chili leaf images. The features were fed into machine learning classifiers, achieving a best accuracy of 92.10%. Azgomi Hossein et al. [11] employed a multilayer perceptron neural network for apple disease recognition, achieving a best accuracy of 73.7%. Jixia H et al. [12] employed remote sensing datasets of pine wood nematode disease to train five well-known models—AlexNet, GoogLeNet, SqueezeNet, ResNet-18, and VGG16—using transfer learning. A combined macro-architecture and micro-module adjustment strategy was used to improve the model structure. Experimental findings indicated that the enhanced SqueezeNet model delivered the highest recognition efficiency and accuracy, achieving 94.90%. These studies emphasize that crop disease recognition accuracy requires further enhancement to satisfy the needs of practical applications. Chen Yu et al. [13] proposed the YOLOv5-CBM model for tea leaf disease recognition, achieving an accuracy of 97.3% with a parameter size of 26.8M. Huang Lüwen et al. [14] designed a tea leaf disease recognition model, CBAM-TealeafNet, based on discrete wavelet transform (DWT) and MobileNetV3 fusion, achieving an overall recognition accuracy of 98.70% for five distinct tea leaf diseases, having a parameter size of 3.16M. Sun Wenbin et al. [15] introduced a disease recognition approach utilizing visible spectra combined with an enhanced attention mechanism, designing a new attention module (SMLP) and a crop disease recognition model (SMLP_ResNet). The disease recognition rates on the two datasets reached 86.93% and 99.32%, respectively, with a model weight size of 48.6MB. Shweta Bondre et al. [16] proposed the IFMR-CNN method, achieving a 96% accuracy in disease localization and classification. Madakannu Arun et al. [17] developed an improved LeNet architecture based on deep convolutional neural networks (CNN) for classifying corn leaf diseases, achieving an accuracy of 97.89% after multiple experiments. Karlekar et al. [18] proposed a soybean leaf disease recognition method for complex backgrounds employing deep learning techniques. They designed a visual data processing module (IPM) to remove complex backgrounds and segment leaf areas before CNN network training, attaining a recognition accuracy of 98.14%. Jing Jiaping et al. [19] introduced the BC-YOLOv5 method in the recognition of tomato diseases, achieving 95% accuracy. Lei Tang et al. [20] introduced an enhanced multi-scale inverse bottleneck residual network, based on ResNet-50 and incorporating a triplet parallel attention mechanism, achieving 98.73% accuracy on the apple leaf disease dataset with 116.3M parameters. Jinsheng Deng et al. [21] presented the Ghost ResNeSt-Attention RReLU-Swish network model (GR-ARNet) for banana leaf disease recognition. They introduced a novel K-level VisuShrink algorithm (KVA) for denoising banana leaf images, achieving an average accuracy of 96.98%. Despite achieving high accuracy, these models often have large parameter sizes, necessitating further optimization to reduce model size, shorten inference time, and further enhance recognition accuracy.
In summary, applying deep learning-based object detection techniques for corn disease identification is feasible; however, specific challenges still require attention.
- Traditional neural network models, such as ResNet and vision transformers (e.g., Swin Transformer [22]), exhibit strong feature extraction capabilities but are limited by high complexity, characterized by large parameter sizes and significant computational costs. These constraints impede their deployment on mobile devices. Consequently, lightweight models play a critical role in reducing inference time and enhancing recognition speed, thus fulfilling the requirements for efficient, lightweight, and real-time applications in agricultural settings.
- Lightweight models such as ShuffleNet, SqueezeNet, and the MobileNet series effectively reduce computational costs but face limitations in feature representation. These models often struggle to capture global features of disease data, leading to lower recognition accuracy and a tendency to overlook detailed information during detection. Additionally, complex background interference in real-world data poses further challenges to accurate disease identification.
This paper proposes a lightweight disease recognition model, ES-ShuffleNetV2, to address the shortcomings of current methods. The main contributions of this work are as follows:
- Based on ShuffleNetV2, an improved SGSE attention mechanism is introduced after the first convolution layer, focusing on fine-grained features in corn disease samples. This enhancement enhances the model’s capability to represent features and improves disease recognition accuracy in complex backgrounds.
- Employing the ELU activation function in both the basic and down-sampling units enhances the model’s nonlinear expression capability, thereby accelerating network convergence and improving disease recognition performance.
- A layer pruning strategy is introduced to effectively reduce redundant parameters, decrease model complexity, and enable efficient inference.
Testing on a dataset containing six types of corn diseases demonstrates that the proposed model markedly enhances accuracy and efficiency of corn leaf disease recognition while maintaining its lightweight design. These results provide a novel technical approach for the development of innovative precision agriculture.
2 Dataset collection and construction
2.1 Dataset image collection
Common corn diseases comprise extensive spot diseas, small spot disease, rust, gray leaf spot, northern corn leaf blight, stalk rot, and corn smut, among others [23]. This study focuses on six typical corn diseases—Common rust, Bipolaris maydis, Curvularia lunata(wakker) boed spot, Northern leaf blight, Own spot, and Sheath blight—as the subjects for disease recognition analysis [24–26].
The practical dataset used in this study was sourced from the Anhui Academy of Agricultural Sciences, with data collection conducted in corn fields located in the northern region of Anhui Province. To ensure the authenticity and representativeness of the data, the research team utilized camera equipment to capture images of diseased corn plants from multiple angles. The shooting environment was complex, with the background including factors such as straw, soil, and weeds, simulating the actual field conditions. The camera model used was the NIKON D810, with images captured at a resolution of 1001x1500 pixels.
The dataset comprises a total of 2,725 images of different corn leaf diseases, including 470 of Common rust, 645 of Bipolaris maydis, 260 of Curvularia lunata (Wakker) leaf spot, 546 of Northern leaf blight, 356 of Own spot, and 448 of Sheath blight. Sample images from the dataset are presented in Fig 1.
a: Common rust,b: Bipolaris maydis,c: Curvularia lunata(wakker) boed spot,d: Northern leaf blight,e: Own spot,f: Sheath blight.
2.2 Dataset construction
The collected raw images underwent data augmentation, as shown in Fig 2, which included operations such as horizontal flipping, random scaling, random rotation, adding Gaussian noise, and contrast adjustment. The specific parameter ranges employed for data augmentation techniques, such as rotation, scaling, and noise addition, are clearly presented in Fig 2. These operations enhance the robustness of the samples to a certain extent, improving the model’s generalization ability for corn leaf disease recognition under natural conditions, while also boosting the target detection algorithm’s ability to remember labels. To decrease the need for computational resources and speed up training, the images were resized to 256 × 256 pixel resolution. Through data augmentation and sample normalization, a total of 10,062 corn disease images were obtained.
To prevent data leakage, the original dataset was randomly divided into training, validation, and test subsets in a 6:2:2 ratio. Data augmentation was applied exclusively to the training set, while the validation and test sets were subjected only to normalization. The detailed distribution of images across each subset is illustrated in Fig 3. This procedure ensured the proper construction of the experimental dataset.
3 Construction of the ES-ShuffleNetV2 detection model
The ES-ShuffleNetV2 model improves upon ShuffleNetV2. It integrates the ELU activation function and the enhanced SGSE attention mechanism while employing model-pruning techniques to eliminate redundant parameters. This approach addresses the challenges of high resource consumption, insufficient feature selection, and low accuracy in traditional deep-learning models for corn leaf disease recognition. The model enhances recognition accuracy while maintaining computational efficiency, making it suitable for resource-constrained corn leaf disease recognition scenarios. The ES-ShuffleNetV2 model is illustrated in Fig 4.
3.1 ShuffleNetV2 base model
In the task of corn leaf disease recognition, the ShuffleNetV2 model serves as an efficient, lightweight convolutional neural network, providing an effective solution for real-time image analysis in environments with limited resources, such as mobile devices and embedded platforms. ShuffleNetV2 was proposed by Zhang et al. in 2018 [27] as an optimized version of the original ShuffleNet, with its network architecture shown in Table 1. The model significantly reduces computational complexity and memory access requirements while maintaining strong feature representation capabilities through mechanisms such as group convolution, depthwise separable convolution, channel split, and channel shuffle.
Specifically, ShuffleNetV2 initially splits the input feature channels into two portions: a portion extracts fine-grained disease features through depthwise separable convolutions. In contrast, the remaining portion preserves the continuity of the information flow via skip connections. Subsequently, the channel shuffle mechanism recombines the two parts of the features, facilitating information exchange across different feature dimensions and enhancing both feature diversity and inter-channel relationships. This design not only effectively reduces information redundancy but also significantly improves the model’s performance in identifying corn disease features. Furthermore, the model’s lightweight bottleneck module minimizes redundant calculations in group convolutions, lowering the parameter count and enhancing suitability for memory-limited settings. Therefore, this paper selects it as the base model.
3.2 Feature extraction enhancement module
Currently, attention mechanisms such as spatial, channel, and convolutional attention have been widely applied to enhance feature extraction [28]. However, these methods frequently result in higher model complexity and lower operational efficiency. To strike a balance between recognition accuracy and real-time efficiency, this study presents a novel lightweight feature enhancement module called the SGSE module, as shown in Fig 5. This module combines the properties of Squeeze-and-Excitation (SE) and Spatial Group-wise Enhancement (SGE) mechanisms. By leveraging the synergistic effects of channel and spatial attention, it enhances feature extraction while minimizing interference, all without adding extra computational burden. The SE module initially gathers global features for each channel through global average pooling, then produces channel attention weights with two 1 × 1 convolution layers, effectively emphasizing the input features. To enhance the depiction of local features, the SGE module partitions the feature map into multiple groups and calculates spatial attention for each. Using the Softmax function, the attention weights are normalized, enabling spatial dimension feature enhancement. Additionally, a new Hard-Sigmoid function is introduced in the attention weight calculation process. Compared to the traditional Sigmoid function, Hard-Sigmoid [29] offers a nearly linear response and higher computational efficiency, effectively reducing the model’s computational overhead while maintaining a nonlinear mapping of the weights. Ultimately, the module outputs features that have been jointly enhanced along both channel and spatial dimensions, efficiently capturing significant regions and detailed information within the image. Experimental results demonstrate that the SGSE module significantly improves model performance in corn leaf disease recognition tasks.
3.2.1 SE attention module.
The SE attention module [30,31] adapts the channel weights to help the network focus on important attributes including color, texture, and form, reducing background interference and improving disease classification accuracy.
Fig 6 shows the principle of the SE attention mechanism, which consists of two operations: Squeeze and Excitation.
Squeeze: The global average pooling (GAP) operation is applied to compress the feature map of each channel into a scalar. Specifically, for a given input feature map 𝑋∈𝑅𝐻×𝑊×𝐶 (where 𝐻 and 𝑊 are the height and width of the feature map, and 𝐶 is the number of channels), the global average pooling operation computes the mean of each channel, resulting in a vector 𝑧 of length 𝐶:
Excitation: This operation learns the importance of each channel through a simple, fully connected network, typically a two-layer small MLP. First, the compressed feature vector 𝑧 is input into a fully connected layer with a ReLU activation function. Then, a Sigmoid activation function outputs the weight for each channel. Finally, the learned channel weights 𝑠 are multiplied by the original feature map 𝑋 to achieve channel recalibration:
denotes the global average feature of channel c, H denote the feature map’s height, W denote the feature map’s width, c is the channel count, X is the input feature map,
and
are the weights of the fully connected layer,
is the Sigmoid activation function,
is the channel weight vector.
In this way, the SE module adaptively assigns different weights to different channels, enhancing critical feature representations while suppressing less informative ones. (The above formulation follows Hu et al., 2018.).
3.2.2 SGE attention mechanism module.
The Spatial Group-wise Enhance (SGE) module enhances the model’s ability to represent image features by performing spatial grouping and local enhancement on the feature map [32].
As illustrated in Fig. 7, the input tensor is divided into 𝐺 groups along the channel dimension, where each group contains 𝐶/𝐺 channels. This grouping helps to reduce computational complexity and introduces spatial attention to each group, allowing the model to capture local information at different positions and fine-tune the extraction of fine-grained features in disease regions.
For each group, average pooling is performed along the channel dimension to obtain the spatial response map, which is then normalized using the softmax function to generate spatial attention weights. The recalibrated group feature map is computed as:
where denotes the spatial attention weights of the
group obtained through softmax normalization, and ⊙ represents element-wise multiplication,
represents the recalibrated feature of the
group.
Through this mechanism, SGE amplifies important regional features and suppresses irrelevant ones, thereby enhancing the fine-grained feature extraction capability of the model. (The above formulation follows Li et al., 2019.).
3.3 Training performance enhancement function
Traditional fully convolutional neural networks typically use the ReLU activation function. However, in corn leaf disease recognition, the disease features often occupy relatively small regions in the image. The use of ELU activation function can effectively accelerate model convergence and improve noise resistance, thus improving the model’s capacity to capture subtle disease characteristics [33]. Compared to the ReLU function, the smooth transition of the ELU function in the negative region allows for more stable gradient propagation, preventing the issue of zero gradients in the negative region of ReLU. This smooth transition in the negative region, which allows for more stable gradient propagation and prevents the issue of zero gradients in the negative region of ReLU, leads to faster model training, better convergence, and improved model accuracy and generalization ability [34].
The graph of the ELU activation function and its derivative is shown in Fig 8. The calculation is defined as:
and its derivative is expressed as:
x is the input, usually a weighted sum of neurons. is a hyperparameter that controls the slope of the negative part. It is usually set to
. (The above formulation follows Clevert et al., 2015.)
3.4 Model parameter optimization methods
Due to their large number of parameters, existing models often suffer from high computational and memory demands, which makes efficient deployment on resource-limited mobile devices difficult [35–37]. To address the issue of high computational and memory costs associated with existing models, we employ model layer pruning in this paper to remove redundant parameters. This reduction in model complexity decreases the computational load and memory usage during inference, ultimately achieving lightweight performance for improved efficiency on mobile devices. Additionally, layer pruning reduces overfitting and improves the model’s generalization performance, thus maintaining performance while significantly improving operational efficiency.
In this paper, the basic ShuffleNetV2 network consists of three stages (see Table 1). The second and fourth stages consist of one down-sampling unit followed by three basic units, whereas the third stage includes one down-sampling unit and seven basic units. Layer pruning is applied to remove redundant intermediate layers, resulting in a more lightweight model, as shown in Fig 9. In the final design, the second and fourth stages each keep one down-sampling unit and one basic unit, while the third stage preserves one down-sampling unit along with two basic units. This modification significantly reduces computational complexity while preserving model performance.
3.5 Model recognition flowchart
The specific implementation steps of the ES-ShuffleNetV2 disease recognition model developed for corn diseases are shown in Fig 10. The corresponding pseudocode of this implementation is summarized in Table 2.
Step 1: Standardize the size of the input corn disease images to 256x256 pixels, with the input tensor shape of [256, 256, 3], and normalize the pixel values in each channel.
Step 2: Extract preliminary features through a Conv2D layer with a kernel size of 3x3, a stride of 2, and 24 output channels, using the ReLU activation function to introduce non-linear feature expression capabilities.
Step 3: Apply the SGSE module to enhance the feature map with spatial and channel attention. The module uses global average pooling, channel weight computation, weighted operations, and spatial group enhancement to focus on key features. The results are fused with the convolution output to improve feature representation.
Step 4: For spatial down-sampling, use a MaxPooling2D layer with a 3x3 pooling window and a stride of 2, preserving significant features while reducing computational load.
Step 5: In the core feature extraction phase, the model further extracts and enhances features through multiple v2_block modules. In each module, the input feature map is divided into two branches, where depthwise separable convolutions are applied to extract different spatial features. The channels are restored using 1x1 convolutions, and the ELU activation function enhances the model’s non-linear expression. The features from both branches are then concatenated and channel-shuffled to improve inter-channel feature interaction, progressively extracting richer feature representations.
Step 6: After feature extraction, compress the features using a Conv2D layer. When the scale factor is 2, the output channel count is 2048; otherwise, it is 1024. The kernel size is 1x1, with a stride of 1.
Step 7: Use GlobalAveragePooling2D to convert the feature map into a one-dimensional feature vector, extract global semantic features, and map them to the 6 corn disease categories using a fully connected layer. The output is a probability vector of shape [6].
Step 8: Apply the softmax activation function to convert the output into category probabilities, indicating the likelihood of the input image belonging to each category.
Step 9: The model predicts the disease category of the input image based on the probability vector, enabling automatic identification and classification of corn diseases.
4 Experimental designs
4.1 Experimental platform
The operating system used in this test platform is Windows 10, the CPU is Intel(R) Core (TM) i5-10400F, and the GPU is NVIDIA GeForce GTX 1650 with 16GB memory. The test software environment includes Anaconda IDE, Python 3.6.2, TensorFlow 1.14.0, and CUDA 10.0.
4.2 Evaluation metrics
In classification tasks, dataset imbalance is a common issue, where the accuracy comes from categories with many samples, often resulting in incorrect classification of minority class samples [38]. To thoroughly assess the performance of the proposed ES-ShuffleNetV2 model, we use Accuracy, Precision, Recall, and F1-Score as evaluation metrics. Their formulas are expressed as follows:
Accuracy represents the ratio of correctly classified samples to the total number of samples in a classification task.
Precision represents the proportion of samples that are correctly predicted by the classification model among those labeled as positive in the test set.
Recall represents the proportion of actual positive samples in the test set that are correctly predicted by the classification model.
The F1-Score is the harmonic mean of precision and recall, offering a balanced metric that accounts for both.
In the formula, TP indicates the number of samples the model correctly predicts as positive, TN denotes the number correctly predicted as negative, FP represents the number incorrectly predicted as positive, and FN refers to the number incorrectly predicted as negative.
Additionally, during model training, the loss value, FLOPs, inference time, and parameter count are also key performance evaluation metrics. A faster decrease in the loss value indicates quicker convergence and a lower value suggests better robustness and performance. FLOPs affect the model’s computational complexity, inference speed, and power consumption. Models with high FLOPs require more computational resources and consume more energy. Reducing FLOPs improves the model’s efficiency in resource-constrained environments, such as mobile devices. Inference time denotes the average processing time required per image during model inference, serving as an indicator of the model’s real-time performance and deployment efficiency. The parameter count determines the model’s storage requirements, training time, and computational resources. Models with more parameters have more substantial expressive capabilities but are more prone to overfitting. Reducing the parameter count can lower resource consumption and enhance deployment efficiency on resource-constrained devices.
4.3 Training parameters settings
The parameter settings for the model training process are provided in Table 3.
5 Results and analysis
5.1 Comparative tests on different parameters
This study investigates the impact of different batch sizes (4, 8, 16, 32) and learning rates (0.1, 0.01, 0.001) on the model’s disease detection performance. A total of twelve experiments were performed, with the training and testing outcomes for each group summarized in Table 4. The results showed that both batch size and learning rate significantly impacted the model’s performance during training and testing. Furthermore, to ensure the robustness and reliability of the experimental results, this study presents the average values for each batch performance in the Table. This approach aims to minimize the impact of random fluctuations and strengthen the statistical validity of the findings.
5.1.1 Impact of batch-size on model.
In deep learning, Batch Size denotes the number of training samples processed in each forward and backward pass. The size of the Batch Size significantly impacts the model’s training process. Smaller Batch Sizes result in longer training times and more fluctuation in the curve. In comparison, larger Batch Sizes lead to smoother training. However, they may cause overfitting, especially when the training set has insufficient samples or low data diversity, which can affect the model’s ability to generalize. The experimental results in Fig 11. show that Batch Size influences the convergence speed and stability of model training. When the initial learning rate is 0.01, increasing the batch size from 4 to 32 accelerates the convergence of the accuracy and loss values on the validation set. However, it increases the oscillation, leading to enhanced training fluctuation and reduced generalization ability. When the batch size is 16, the model achieves the best accuracy and convergence stability. Therefore, a batch size of 16 was selected for training the model on the dataset.
5.1.2 Impact of learning rates on the model.
The learning rate sets the step size for parameter updates in each iteration, serving as a scaling factor for weight changes during backpropagation. A more minor learning rate results in slower model updates, extended convergence time, and a gradual decrease in the loss function, significantly increasing training duration. Conversely, a tremendous learning rate accelerates convergence but may skip the optimal point or fail to converge, causing instability in the training process and oscillations in the loss function curve. Experiments were carried out to assess how the learning rate affects model performance, with values set at 0.1, 0.01, and 0.001, while keeping all other parameters unchanged. Fig 12. illustrates the variation curves of validation accuracy and loss values under different learning rates when the batch size was set to 16. The results indicate that with a learning rate of 0.1, the loss values oscillated significantly, and the accuracy was low; with a learning rate of 0.001, the model exhibited overfitting; with a learning rate of 0.01, the training achieved the best results, characterized by high accuracy and stable curves.
a: Val accuracy; b: Val loss.
5.2 Corn leaf disease recognition experiment
Table 5 presents the specific evaluation metrics for the ES-ShuffleNetV2 model’s recognition experiments. The model achieved an average precision of 97.07% in disease recognition, demonstrating its capability for rapid and accurate identification of corn diseases, thereby meeting the expected objectives of the model improvement. Despite the potential influence of class imbalance on model performance, the consistently high values of Precision, Recall, and F1-Score observed across all categories collectively indicate that the model was not significantly compromised in this case.
5.3 Comparison of attention mechanism modules
The SGSE module, proposed as an enhancement to the SE module, improves spatial and channel attention for corn disease feature maps. Likewise, the CBAM (Convolutional Block Attention Module) combines attention mechanisms across both channel and spatial dimensions [39]. To evaluate the effectiveness of the SGSE module, comparative experiments were conducted under identical conditions by incorporating various attention modules into the model. As presented in Table 6, the results demonstrate that the enhanced SGSE module surpasses other attention mechanisms in most metrics—such as Precision, Recall, F1-Score, Training Accuracy, and Testing Accuracy—with only slight differences in a few measures. Moreover, the SGSE module achieves this enhanced performance while maintaining comparable model complexity and computational cost relative to SE and CBAM, demonstrating that the improvement in accuracy is not achieved at the expense of significantly increased parameters or FLOPs. Fig 13 illustrates the validation accuracy and loss curves corresponding to various attention mechanisms. In comparison with SE and CBAM, which exhibit significant oscillations in their curves, the SGSE module demonstrates a smoother trend and higher test accuracy, reflecting superior training stability and generalization performance.
a: Val accuracy; b: Val loss.
5.4 Comparison of the model before and after pruning
Under the same experimental conditions, this study conducted comparative experiments on both the baseline and the improved models before and after pruning, as shown in Table 7. The results demonstrate that, compared to the unpruned model, the pruned ES-ShuffleNetV2 disease recognition model experienced a 0.93% decrease in training accuracy, while its Precision, Recall, and F1-Score each improved by 0.67%,0.73%,0.70%. Regarding computational efficiency, the number of FLOPs decreased from 10,200,613–7,112,563, representing a reduction of 30.27%, and the number of parameters decreased from 1,272,060–884,388, a reduction of 30.48%, respectively, thereby satisfying the lightweight requirements for deployment on resource-constrained devices. Although the baseline model also experienced significant reductions in computational complexity after pruning, it achieved slight improvements in accuracy, which further confirms the effectiveness of the pruning strategy. These findings validate the effectiveness and necessity of applying pruning to the proposed model.
5.5 Contrast experiments of different activation functions
To investigate how various activation functions affect model performance, a comparative experiment was conducted utilizing three widely adopted activation functions: Mish, Swish, ReLU, and ELU [40,41]. As presented in Table 8, all activation functions demonstrated robust performance across various metrics. Notably, ELU attained the top values in Precision, Recall, and F1_Score, with each metric reaching 97.07%. Furthermore, ELU exhibited superior inference efficiency, achieving an average inference time of 9.31 ms per image, surpassing both Mish, ReLU and Swish. These findings indicate that ELU is more appropriate as the activation function for this model when considering both accuracy and real-time performance.
5.6 Ablation experiment
Ablation experiments were designed to demonstrate the impact of each improvement on the model. The outcomes of these ablation studies are summarized in Table 9. Based on ShuffleNet V2, the network model proposed in this paper is ES-ShuffleNet V2. The network obtained by modifying the activation function of ShuffleNet V2 to ELU is called E-ShuffleNet V2. The network obtained by adding the improved SGSE attention mechanism to the ShuffleNet V2 network is called S-ShuffleNet V2.
The comparison between the training results of ShuffleNet V2 and ES-ShuffleNet V2 indicates that the recognition accuracy of the ES-ShuffleNetV2 model improved by 1.64%, while the testing accuracy, recall, F1-Score, and precision each improved by 1.64%, 1.64%, 1.62%, 1.60%. Furthermore, the comparison between E-ShuffleNet V2 and ShuffleNet V2 demonstrates that the ELU activation function offers stronger nonlinear representational capabilities than the ReLU function, resulting in a 0.37% increase in recognition accuracy. In addition, the comparison between S-ShuffleNet V2 and ShuffleNet V2 indicates that the integration of the improved SGSE attention mechanism effectively enhances the model’s feature representation ability, leading to a 0.55% increase in recognition accuracy. Based on these findings, this study adopts the ES-ShuffleNet V2 model for corn disease detection.
5.7 Confusion matrix
Confusion Matrix is a tool for assessing the performance of a classification model, as shown in Fig 14. It is a two-dimensional structure that visually represents the model’s prediction results, allowing for a comprehensive assessment of accuracy, precision, recall, and F1 score. Rows correspond to the true categories, while columns correspond to the categories predicted by the model. From the confusion matrix, it can be observed that the model achieves high accuracy for Common rust, Bipolaris maydis, Curvularia lunata(wakker) boed spot, Sheath blight, Northern leaf blight, with accuracy reaching 100%. Although the accuracy of Own spot is slightly lower due to considerable feature variations among disease samples and complex backgrounds, the overall precision remains nearly 100%.
5.8 Cross dataset validation
To further evaluate the generalization capability of the proposed ES-ShuffleNet V2 model under different data distributions, a cross-dataset validation experiment was conducted. The validation dataset was derived from PlantVillage, from which four representative categories were selected, with 500 images per category: a. Cercospora leaf spot, b. Common rust, c. Healthy, and d. Northern leaf blight.
The experimental results are shown in Fig 15 and Table 10. The confusion matrices demonstrate that ES-ShuffleNet V2 exhibits clearer diagonal patterns and fewer misclassifications, indicating superior cross-dataset recognition performance. Among all models, ES-ShuffleNet V2 achieved the best results, significantly outperforming MobileNetV2, ShuffleNet V2, and GoogleNet. Further analysis revealed that most misclassifications occurred between Cercospora leaf spot (a) and Northern leaf blight (d), which is consistent with real-world scenarios where these two diseases exhibit highly similar visual symptoms. In contrast, GoogleNet performed the worst, with an Accuracy of only 81.75%, and showed especially high misclassification rates for classes a and d.
a: Cercospora leaf spot; b: Common rust; c: Healthy, d: Northern leaf blight.
Overall, these findings demonstrate that ES-ShuffleNet V2 not only performs well on the self-constructed dataset but also maintains high recognition accuracy when applied to datasets with different distributions, confirming its practicality and strong generalization ability for real-world agricultural applications.
5.9 Robustness verification
To ensure that the high recognition accuracy of the proposed model was not the result of overfitting or memorization, a robustness evaluation was performed under three challenging conditions: Gaussian noise with a standard deviation of 10, Gaussian blur with a kernel size of 5 × 5, and illumination variation achieved by increasing brightness and contrast by approximately 20%.
The results are summarized in Table 11. Under standard conditions, the proposed ES-ShuffleNet V2 achieved an average accuracy of 97.07%, exceeding that of the baseline ShuffleNet V2 95.43%. When Gaussian noise was introduced, the accuracy of ES-ShuffleNet V2 decreased slightly to 95.14%, while the baseline dropped to 93.97%, indicating stronger resilience to random perturbations. Under Gaussian blur, ES-ShuffleNet V2 maintained 94.52% accuracy, outperforming the baseline by 1.1%. The most significant performance gap appeared under brightness variation: ES-ShuffleNet V2 achieved 88.48% accuracy compared with 81.17% for the baseline, a 7.31% improvement. These results demonstrate that the proposed model maintains high recognition accuracy even under challenging imaging conditions, showing stronger robustness and generalization ability than the baseline.
5.10 Model performance comparison experiment
The proposed ES-ShuffleNet V2 model is compared with the MobilenetV2, GoogleNet, AlexNet, and InceptionV3 models. After 60 iterations, the final evaluation metrics for each model are shown in Fig 16 [42–45]. The ES-ShuffleNet V2 model, improved based on ShuffleNet V2, achieves higher classification accuracy, lower parameter count, and superior performance.
5.11 Discussion
The ES-ShuffleNetV2 model proposed in this study demonstrated strong performance in corn leaf disease recognition, achieving an average recognition accuracy of 97.07%, thus enabling accurate identification of corn diseases. By incorporating the SGSE module to enhance the model’s feature representation capability, the model outperformed other attention mechanisms in terms of Precision, Recall, and F1-Score, confirming the effectiveness of the proposed module. However, the current scope of disease categories is limited to six common corn diseases, which may restrict the model’s applicability in more complex and diverse agricultural scenarios. Comparative experiments on activation functions revealed that the ELU function provided superior nonlinear representational capability compared to ReLU and other commonly used functions. This improvement not only enhanced recognition accuracy but also accelerated detection speed, confirming the advantage of ELU in this task. Furthermore, a pruning strategy was applied to reduce the model’s computational complexity and parameter count; as a result, FLOPs and parameters of ES-ShuffleNetV2 were reduced by 30.26% and 30.45%, respectively, compared to the baseline model, achieving both model simplification and performance enhancement. These results highlight the effectiveness of pruning in lightweight model design.
Moreover, to eliminate potential data leakage, the dataset was re-divided at the raw image level, and data augmentation was applied only to the training set. After this correction, the overall accuracy slightly decreased, indicating that the new test set provides a stricter and more realistic evaluation. Nevertheless, the model maintained strong generalization performance under this more rigorous setup. Importantly, cross-dataset validation demonstrated that ES-ShuffleNetV2 maintains high recognition accuracy across datasets with different distributions. This finding indicates that the model possesses strong generalization capability, further supporting its practical applicability in real-world agricultural scenarios. Although the accuracy of both models decreased under noisy, blurred, and uneven lighting conditions, this degradation is reasonable because such perturbations distort key visual details and reduce image clarity. Even so, the smaller drop observed in ES-ShuffleNetV2 suggests that the SGSE module improves the network’s ability to capture meaningful spatial information, while the ELU activation function helps maintain more stable responses when brightness changes. These results indicate that the proposed improvements not only enhance accuracy under normal conditions but also provide better robustness to real-world imaging variations.
In addition, batch size and learning rate were found to have a significant impact on the model’s convergence speed and generalization performance. Based on accuracy and convergence stability, a batch size of 16 and a learning rate of 0.01 were selected as the optimal training settings. While the recognition accuracy for certain diseases approached 100%, the accuracy for Own Spot disease was slightly lower due to variations in sample characteristics and complex image backgrounds.
Future work will focus on several aspects to further enhance the applicability and impact of the ES-ShuffleNetV2 model. One direction involves extending disease recognition to additional crop species and verifying the model’s generalization through cross-regional and cross-seasonal datasets. Another focus will be on incorporating multi-modal information—such as hyperspectral data and environmental factors—to improve the system’s robustness under diverse field conditions. Moreover, lightweight optimization techniques, including model quantization and knowledge distillation, will be investigated to support deployment on mobile or embedded devices for real-time disease monitoring. Finally, model interpretability will be further explored using Grad-CAM and related visualization methods, so that the model’s decision process becomes more transparent and useful for agricultural practitioners. These efforts together are expected to extend the current study and increase its practical value in precision agriculture.
6 Conclusion
This paper proposes an improved corn leaf disease recognition model, ES-ShuffleNetV2, based on ShuffleNetV2, intended to overcome the limitations of existing models in resource-limited settings, including high computational demands and constrained recognition accuracy. By introducing the improved SGSE attention module and replacing the activation function with ELU, the model’s feature extraction capability and training efficiency were significantly enhanced. In addition, a pruning strategy was employed to reduce model complexity, achieving accurate disease recognition with lower computational resource requirements. The ES-ShuffleNetV2 model was validated through corn leaf disease recognition experiments and achieved an average accuracy of 97.00%. It maintained high recognition accuracy across all six corn disease categories. Compared with other attention mechanisms, the SGSE module delivered the best performance in terms of accuracy. Comparative experiments on activation functions demonstrated that the ELU function achieved both the highest accuracy and the shortest inference time, with a single-image inference time of only 9.31ms. Ablation experiments confirmed the effectiveness of each structural improvement, among which the SGSE module had the most notable impact. The ES-ShuffleNetV2 model achieved a test accuracy of 97.07% in the ablation study, representing a 1.64% improvement over the baseline. It should be emphasized that cross-dataset and robustness evaluations demonstrate that ES-ShuffleNetV2 maintains high recognition accuracy across different data distributions, underscoring its strong generalization ability and practical applicability in real-world agricultural disease monitoring. Future research will aim to apply the model to a broader spectrum of crop diseases and further optimizing feature extraction strategies to enhance robustness and scalability.
Acknowledgments
The authors sincerely thank the Anhui Academy of Agricultural Sciences for their assistance in preparing the data used in this study.
References
- 1. Ai Y, Jane J-L. Macronutrients in corn and human nutrition. Compr Rev Food Sci Food Saf. 2016;15(3):581–98. pmid:33401819
- 2. Nkuna BL, Chirima JG, Newete SW, Van der Walt AJ, Nyamugama A. Unraveling the relationship between soil nutrients and maize leaf disease occurrences in Mopani District Municipality, Limpopo Province, South Africa. Agronomy. 2024;14(10):2237.
- 3. Zhao D. Cognitive process and information processing model based on deep learning algorithms. Neural Networks, 2025, 183.
- 4. Wu W, Ma X, Wang Q, Gong M, Gao Q. Learning deep representation and discriminative features for clustering of multi-layer networks. Neural Netw. 2024;170:405–16. pmid:38029721
- 5. Hinton GE, Osindero S, Teh Y-W. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54. pmid:16764513
- 6. Wu Y, Li R, Ji R. Maize disease recognition and application based on random augmentation swin-tiny transformer. J Jilin University (Science Edition). 2024;62 (02):381–90.
- 7. Yang LUO, Zifen HE, Yinhui Z, et al. Crop leaf grade and disease recognition method based on backbone information sharing and multi-receptive field feature adaptive fusion. Trans Chinese Soc Agricul Mach. 2025; 56 (01): 377–87.
- 8. Liang X, Gao S, Liu Z. Exploration of identifying apple leaf diseases using lightweight convolutional neural network model. China Plant Prot. 2024;44(04):41–9.
- 9. Guo-feng Y, Yong Y, Zi-kang HE. A rapid, low-cost deep learning system to classify strawberry disease based on cloud service. J Integ Agricul. 2022; 21(02): 460–73.
- 10. Ahmad Loti NN, Mohd Noor MR, Chang S-W. Integrated analysis of machine learning and deep learning in chili pest and disease identification. J Sci Food Agric. 2021;101(9):3582–94. pmid:33275806
- 11. Hossein A, Roshannia FH, Reza MMS. Diagnosis of some apple fruit diseases by using image processing and artificial neural networkJ. Food Control,2023,145.
- 12. Huang J, Lu X, Chen L, Sun H, Wang S, Fang G. Accurate identification of pine wood nematode disease with a deep convolution neural network. Rem Sens. 2022;14(4):913.
- 13. Chen Y, Xuemei W, Zhen Z, et al. Method for identifying tea diseases in natural environment using improved YOLOv5sJ. Trans Chinese Soc Agricu Eng. 2023;39 (24):185–94.
- 14. Lyuwen H, Feifan G, Bo Q, et al. Recognizing tea diseases with fusion on 2D DWT and MobileNetV3. Trans Chinese Soc Agricu Eng. 2023;39 (24):207–14.
- 15. Wen-bin S, Rong W, Rong-hua G. Crop disease recognition based on visible spectrum and improved attention module. Spect Spect Anal. 2022;42 (05):72–1580.
- 16. Bondre S, Patil D. Crop disease identification segmentation algorithm based on Mask‐RCNN. Agron J. 2023;116(3):1088–98.
- 17. Ahila Priyadharshini R, Arivazhagan S, Arun M, Mirnalini A. Maize leaf disease classification using deep convolutional neural networks. Neural Comput Applic. 2019;31(12):8887–95.
- 18. Karlekar A, Seal A. SoyNet: soybean leaf diseases classification. Comp Electr Agricu. 2020,172:105342.
- 19. Jing J, Li S, Qiao C, Li K, Zhu X, Zhang L. A tomato disease identification method based on leaf image automatic labeling algorithm and improved YOLOv5 model. J Sci Food Agric. 2023;103(14):7070–82. pmid:37326973
- 20. Tang L, Yi J, Li X. Improved multi-scale inverse bottleneck residual network based on triplet parallel attention for apple leaf disease identification. J Integ Agricu. 2024;23(3):901–22.
- 21. Deng J, Huang W, Zhou G, Hu Y, Li L, Wang Y. Identification of banana leaf disease based on KVA and GR-ARNet. J Integrative Agri. 2024;23(10):3554–75.
- 22.
Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012–22.
- 23. Parveen R, Chethan D, Durga Y, Biswas M, Debnath S. Assessing the status of banded leaf and sheath blight disease of maize (Zea mays L.) caused by Rhizoctonia solani f. sp. sasakii in West Bengal, India. Int J Plant Soil Sci. 2024;36(12):488–95.
- 24.
Iandola NF, Moskewicz WM, Ashraf K, et al. SqueezeNet: alexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR. 2016.
- 25. Ramirez‐Cabral NYZ, Kumar L, Shabani F. Global risk levels for corn rusts (Puccinia sorghi and Puccinia polysora) under climate change projections. J Phytopathol. 2017;165(9):563–74.
- 26. Ijaz B, Fan X. Understanding Northern corn leaf blight (NCLB) disease resistance in maize: past developments and future directions. Plant Stress. 2024;14:100625.
- 27.
Ningning M. Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proceedings of the European conference on computer vision (ECCV). 2018.
- 28. Sheng K, Chen P. An efficient mixed attention module. IET Computer Vision. 2023;17(4):496–507.
- 29.
Howard AG. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arxiv preprint arxiv:1704.04861. 2017.
- 30. Zhang Z, Zhang T, Zhu X, Li J. SEHRNet: a lightweight, high‐resolution network for aircraft keypoint detection. IET Image Proces. 2024;18(9):2476–89.
- 31.
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, 2018. 7132–41. https://doi.org/10.1109/cvpr.2018.00745
- 32.
Xiang L, Xiaolin H, Jian Y. Spatial group-wise enhance: improving semantic feature learning in convolutional networks. CoRR. 2019.
- 33.
Clevert D, Unterthiner T, Hochreiter S. Fast and accurate deep network learning by exponential linear units (ELUs). CoRR. 2015.
- 34. Matthew JM, Mustafa MBNM. Enhancement of hybrid deep neural network using activation function for EEG based emotion recognition. Traitement du Signal. 2024;41(4):1991–2002.
- 35. Duhan S, Gulia P, Gill NS, Yahya M, Yadav S, Hassan MM, et al. An analysis to investigate plant disease identification based on machine learning techniques. Expert Systems. 2024;41(8).
- 36. Poyatos J, Molina D, Martinez AD, Del Ser J, Herrera F. EvoPruneDeepTL: An evolutionary pruning model for transfer learning based deep neural networks. Neural Netw. 2023;158:59–82. pmid:36442374
- 37. Wang Y, Tao J, Gao H. Corn disease recognition based on attention mechanism network. Axioms. 2022;11(9):480.
- 38. Zhu S, Gao H. MC-ShuffleNetV2: a lightweight model for maize disease recognition. Egyptian Inform J. 2024;27:100503.
- 39.
Woo S, Park J, Lee JY. Cbam: convolutional block attention module. Proceedings of the European conference on computer vision (ECCV). 2018: 3–19.
- 40.
Misra D. Mish: a self regularized non-monotonic neural activation function. CoRR. 2019.
- 41.
Ramachandran P, Zoph B, Le QV. Searching for activation functions. arxiv preprint arxiv:1710.05941. 2017.
- 42.
Szegedy C, L W, Jia Y. Going deeper with convolutions. CoRR. 2014.
- 43.
Szegedy C, Vanhoucke V, Loffe S. Rethinking the inception architecture for computer visionJ. CoRR. 2015.
- 44. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90.
- 45.
Sandler M, Howard A, Zhu M. Mobilenetv2: inverted residuals and linear bottlenecks. Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 10–4520.