Figures
Abstract
Accurately recognizing rice seed varieties poses significant challenges due to their diverse morphological characteristics and complex classification requirements. Traditional image recognition methods often struggle with both accuracy and efficiency in this context. To address these limitations, this study proposes the Deep Space and Channel Residual Network with Double Attention Mechanism (RSCD-Net) to enhance the recognition accuracy of 36 rice seed varieties. The core innovation of RSCD-Net is the introduction of the Space and Channel Feature Extraction Residual Block (SCR-Block), which improves inter-class differentiation while minimizing redundant features, thereby optimizing computational efficiency. The RSCD-Net architecture consists of 16 layers of SCR-Blocks, structured into four convolutional stages with 3, 4, 6, and 3 units, respectively. Additionally, a Double Attention Mechanism (A2Net) is incorporated to enhance the network’s global receptive field, improving its capacity to distinguish subtle variations among seed types. Experimental results on a self-collected dataset demonstrate that RSCD-Net achieves an average accuracy of 81.94%, surpassing the baseline model by 4.16%. Compared with state-of-the-art models such as InceptionResNetV2, ConvNeXt, MobileNetV3, and Swin Transformer, RSCD Net has improved by 1.17%, 3%, 24.72%, and 13.22%, respectively, showcasing its superior performance. These findings confirm that RSCD-Net provides an effective and efficient solution for rice seed classification, offering a promising reference for addressing similar fine-grained recognition challenges in agricultural applications.
Citation: Zhang T, Zhang C, Yang Z, Wang M, Zhang F, Li D, et al. (2025) Multi-class rice seed recognition based on deep space and channel residual network combined with double attention mechanism. PLoS One 20(5): e0322699. https://doi.org/10.1371/journal.pone.0322699
Editor: Xiaoyong Sun, Shandong Agricultural University, CHINA
Received: November 25, 2024; Accepted: March 26, 2025; Published: May 16, 2025
Copyright: © 2025 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This thesis supported by National Natural Science Foundation of China (62062048) and China Yunnan Province Science and Technology Plan Project(202201AT070113), The fund sponsor plays the role of providing financial support in this article, and is the fourth author of this article, Meng WANG.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Rice is a crop of immense economic and nutritional significance, cultivated and consumed extensively around the globe. It is typically available in both fresh and processed forms, playing a crucial role in boosting farmers’ incomes and enhancing agricultural systems. Rich in essential nutrients such as carbohydrates, proteins, fats, and B vitamins, rice is vital for maintaining human health. Asia is the primary region for rice consumption, where it serves as a staple food, particularly in populous countries like India and China. Additionally, several countries in Africa, Latin America, and the Caribbean also incorporate rice into their diets. Over half of the countries worldwide regard rice as their staple food, underscoring its critical role in the global food supply [1]. In addition to being a food source, rice is significant for feed production, agricultural research, industrial raw materials, and the bioenergy sector. To enhance the quality and nutritional value of rice, researchers have developed thousands of varieties, including indica, japonica, and giant rice. Since the specific variety can greatly influence the quality and nutritional content, accurate identification is essential for seed producers, breeders, and consumers alike. Moreover, rice can be processed into various by-products, such as yellow wine and oils that are rich in unsaturated fatty acids. Given that the quality of rice seeds directly affects both yield and quality [2], precise identification and differentiation of these seeds are particularly important.
Traditional classification methods, such as manual sorting and biomolecular classification, are effective to a degree but encounter several challenges, including high costs, low efficiency, and the potential for seed damage. To overcome these issues, research has increasingly shifted toward machine vision and deep learning image recognition technologies. These advanced techniques significantly enhance both the efficiency and accuracy of seed classification by analyzing various characteristics, such as shape, color, chaff, size, and texture. Common classifiers employed for this purpose include Support Vector Machines (SVM), k-Nearest Neighbors, Artificial Neural Networks (ANN), Deep Neural Networks (DNN) [3], and Multi-column Back Propagation Neural Networks (BPNN) [4]. For example, Ruslan developed a method utilizing RGB image analysis and machine learning to successfully identify weedy rice among conventional rice seeds [5]. Similarly, Zhou and colleagues integrated Hyperspectral Imaging (HSI) with various machine learning techniques, including Support Vector Machines, Random Forests, and Gradient Boosting Classifiers, to anticipate beet seed germination [6]. While traditional image processing methods perform well in a variety of contexts, they often face limitations in classification accuracy and efficiency, especially when dealing with complex edge feature recognition tasks.
The integration of deep learning techniques has markedly advanced the field of image processing, particularly in crop classification. These techniques not only offer efficient and robust solutions but also excel at image recognition tasks, positioning them as a key focus in seed variety classification research. For example, the RiceNet system developed by Din N M U et al [7]. enhanced the accuracy of identifying five distinct rice varieties, achieving a prediction accuracy of 94%. This performance exceeds that of traditional machine learning methods, such as HOG-SVM and SIFT-SVM, as well as pre-trained deep learning models like InceptionV3 and ResNetInceptionV2. Koklu M et al. analyzed 75,000 grain images using Artificial Neural Networks (ANN), Deep Neural Networks (DNN), and Convolutional Neural Networks (CNN) [8]. Their work resulted in a remarkable classification success rate of up to 100%, with CNN algorithms demonstrating particularly strong performance. Sharma A et al. introduced the iRSVPred model, which focuses on identifying ten major basmati rice varieties in India [9]. This model achieved perfect accuracy on both the training and internal validation sets, while recognition rates on external validation datasets reached or exceeded 80%. Lin P et al. compared traditional methods with Deep Convolutional Neural Networks (DCNN), finding that DCNN significantly improved the accuracy of rice grain image classification to 95.5% [10]. Additionally, Panmuang M et al. employed the VGG16 model for image screening of five rice varieties, attaining an accuracy of 85% [11]. Further research by Kiratiratanapruk K et al. indicated that utilizing the InceptionResNetV2 deep learning model for classifying 14 rice varieties resulted in an accuracy of 95.15%, outpacing traditional statistical learning methods [12]. Finally, Gilani G et al. developed an 18-layer convolutional neural network specifically for classifying seven major rice varieties cultivated in Pakistan, achieving 100% classification accuracy for each seed type [13].
Three-dimensional information provides a richer and more accurate representation of appearance features compared to two-dimensional images. Qian et al. utilized three-dimensional point cloud data alongside deep learning networks to classify eight varieties of rice, achieving an average accuracy of 89.4%, which marks a 1.2% improvement over the PointNet model [14]. Meanwhile, He et al. developed a multimodal fusion detection method based on an enhanced voting approach. This method combines two-dimensional and three-dimensional prediction probabilities through weighted averaging, resulting in an impressive average accuracy of 97.4% [15].
High-spectral imaging systems offer both spatial and spectral information, significantly enhancing classification accuracy. Chatnuntawech I et al. integrated hyperspectral imaging with deep convolutional neural networks (DCNN) for rice seed classification, achieving an average accuracy of 91.09% [16]. Tang et al. developed a hyperspectral image classification model based on a multilayer perceptron (MLP) network and residual learning, successfully distinguishing three types of processed rice seeds with an impressive accuracy of 98.48% [17]. Jin et al. employed high-spectral techniques alongside deep learning models such as LeNet, GoogleNet, and ResNet to classify ten varieties of rice seeds, with the ResNet model attaining the highest accuracy at 86.08% [18].
Research on the application of deep learning in rice seed classification and recognition remains limited, with few studies examining more than 15 rice varieties. In our study, we developed a dataset that encompasses 36 distinct rice seed varieties. Because rice classification belongs to fine-grained classification [19], and the characteristics between these varieties are often very similar, there may also be significant differences within the same variety, which adds complexity to the classification and recognition process. The outcomes from experiments using deep learning networks, such as ResNet, ConvNext, and InceptionResNetV2, as referenced in existing rice classification studies, have proven unsatisfactory when applied to our dataset. Additionally, while research that integrates hyperspectral imaging and 3D point cloud data may yield improved results, it also incurs higher costs. Moreover, this paper addresses another crucial challenge: many current studies depend on large quantities of training samples to train their networks. In contrast, our study seeks to effectively classify 36 different rice seed varieties using a more limited training set of only 240 samples per variety. Employing deep learning for the classification of crop seeds presents an efficient and straightforward approach to seed variety classification tasks. However, the exploration of deep learning applications specifically for rice variety classification remains underdeveloped. This study seeks to enhance the recognition accuracy of 36 rice seed varieties by leveraging deep learning technology and introducing a model titled the Deep Space and Channel Residual Network Combined with Double Attention Mechanism (RSCD-Net). The study commences with the design of a Space and channel Feature Extraction Residual Block (SCR-Block), which aims to amplify inter-class distinctions while minimizing redundancy in feature information, thereby streamlining model computation. The architecture comprises 16 residual modules, supplemented by batch normalization (BN) layers, as well as both max and average pooling layers. These residual modules are organized into a comprehensive convolutional structure featuring configurations of 3, 4, 6, and 3 to optimize feature extraction. By integrating a double attention mechanism recognized as Double Attention Networks (A2Net), the model’s global receptive field is significantly enhanced, leading to improved recognition capabilities for various seed varieties. In conclusion, this study proposes a deep learning-based method for rice seed classification. This innovative approach addresses the challenges commonly associated with traditional classification methods, such as high costs, time consumption, and high misclassification rates, providing a more convenient and effective solution. Furthermore, it offers a superior framework for multi-category and fine-grained image recognition. The anticipated applications of this method are expected to enhance the accuracy and efficiency of rice seed classification, thereby contributing to advancements in rice research and agricultural production. The structure of the article is arranged as shown in Fig 1.
Materials and methods
Dataset creation
In this experiment, a high-definition industrial electron microscope from the Leyue brand was utilized to collect the experimental dataset (Fig 2). To simulate the daily rice trading and laboratory conditions found in agricultural research institutes, LED lighting was used to provide supplementary illumination during the experiment, ensuring consistent brightness, focal length, and shooting depth for each seed. The seeds were systematically arranged on a black acrylic board to achieve the highest quality of experimental data, with output images saved in JPG format at a resolution of 1280 × 720 pixels. Throughout the data collection process, both high-quality and low-quality images were captured, including those featuring moldy seeds, noisy backgrounds, and dust contamination (Fig 3). Following data collection, researchers at the agricultural research institute carefully screened the dataset to ensure it was free from any human interference. The process of collecting data samples is shown in Fig 4.
Thirty-six rice seed varieties from different regions of China were thoughtfully selected for this study. Among these, 11 commercially promoted varieties, such as Heyouwuxiang, Taihexiangnuo, and Lijingyou 221, are featured. The remaining 25 varieties were sourced from a specific agricultural research institute in China, including D44, D45, and others (illustrated in Fig 5). For each variety, 390 surface feature images were captured, culminating in a comprehensive dataset of 14040 images. These images were organized into three sets: 300 for training, 40 for validation, and 50 for testing. It is noteworthy that none of the rice varieties used in this research have been previously studied in the context of deep learning for rice seed classification. To thoroughly evaluate the recognition accuracy of the developed network model, several genetically related rice seeds from the same region were selected, facilitating a more precise and realistic assessment of the model’s performance.
Experimental methods
Due to the specific variations among different rice seed types—including subtle differences in size, lemma hair length, and color within the same species—this type of image classification challenge is referred to as fine-grained recognition. Traditional deep learning networks often fall short of achieving the necessary recognition accuracy in this domain. To address this issue and propose a more effective model for multi-category fine-grained recognition, this study introduces an innovative rice seed identification network. This model is designed to overcome the challenges presented by small seed sizes, subtle inter-variety differences, and the intricacies of fine-grained recognition.
Space and channel feature extraction residual block (SCR-Block).
Throughout the evolution of machine learning, various model compression strategies and advanced network design methods have emerged, including network pruning [20], weight quantization [21], low-rank factorization [22], and knowledge distillation [23]. However, these techniques are primarily regarded as post-processing optimizations for existing models, often limiting their performance by the upper bounds of the initial model. To more effectively extract features from the surface images of rice seeds, this study presents a lightweight Space and channel Feature Extraction Residual Network (SCR-Block), as depicted in Fig 6. The SCR-Block brings significant enhancements to space and channel strategies while incorporating a residual network to mitigate the vanishing gradient problem, thereby optimizing network performance and bolstering model stability and robustness. The core objective of the SCR-Block is to effectively reduce redundant information within feature data, leading to a substantial decrease in both model parameters and Floating Point Operations (FLOPs), while simultaneously enhancing the model’s feature expression capabilities. By exploring space and channel redundancies among features, it successfully diminishes feature redundancy and improves the backbone network’s feature extraction efficiency. This approach not only reduces computational complexity but also significantly enhances the network’s accuracy and operational efficiency.
The SCR-Block comprises two primary components: the Spatial Reconstruction Unit (SRU) and the Channel Reconstruction Unit (CRU) [24]. The SRU implements an innovative “separate-reconstruct” approach to effectively minimize spatial redundancy in feature maps, optimizing the model’s efficiency in processing spatial information. In parallel, the CRU utilizes a “segmentation-conversion-fusion” strategy to reduce redundancy in the channel dimension, enhancing the network’s efficiency at that level.
The Spatial Reconstruction Unit (SRU) is a mechanism that capitalizes on the spatial redundancy of features for enhanced processing. It operates through a two-step process: separation and reconstruction, as illustrated in Fig 7. During the separation phase, the aim is to distinguish between feature maps that convey rich information and those that exhibit lower spatial content correlation and minimal informative value. To evaluate the information content in various feature maps, the SRU employs the scaling factors derived from the Group Normalization (GN) layer. For an intermediate feature map X ∈ ℝN×C×H×W, N is the batch dimension, C is the channel dimension, and H and W are the spatial height and width. the SRU initiates the process by normalizing the input feature X. This normalization involves two steps: first, the mean value for each channel is subtracted, and then the result is divided by the standard deviation
. This is shown in Equation (1):
In this context, and
denote the mean and standard deviation of the random variable X, respectively. Epsilon
is a small positive constant incorporated to enhance the algorithm’s numerical stability. Furthermore, the parameters gamma
and
are trainable within the layer, allowing for the definition of an affine transformation. Within the Group Normalization (GN) layer, the trainable parameter
is responsible for quantifying the spatial pixel variance per batch and channel. The value of
increases with the richness of spatial information in the feature map and the variation among spatial pixels. Simultaneously, the normalization weight
can be derived from Equation (2) and illustrates the relative significance of different feature maps.
In this process, the weighted feature map, denoted as , is transformed into the range of (0, 1) using the sigmoid function and then gated with a threshold
. We set the weights that exceed the threshold to 1 to obtain the information weights, represented as
, while weights below the threshold are set to 0, resulting in the non-information weights, denoted as
. In the experiments, the threshold is set at 0.5. This entire procedure for obtaining
can be summarized in Equation (3). By employing this method, the model effectively filters out features that are more relevant to the current task, while suppressing those that are less important, ultimately enhancing the model’s performance and generalization capability.
The input feature X is weighted separately using two sets of weights, and
, resulting in two weighted features with distinct information contents:
, which possesses a higher information content, and
, which contains a lower information content. This process effectively partitions the input features into two categories. On one side,
is rich in information and expressiveness, effectively capturing the critical aspects of spatial content. Conversely,
contains less information and primarily reflects redundancy within the features. Through this separation mechanism, the SRU emphasizes valuable spatial information while minimizing unnecessary redundancy, thereby enhancing the efficiency of processing and utilizing feature maps within the neural network.
In the reconstruction process, features with higher information content, referred to as , are combined with those possessing lower information content, denoted as
. This amalgamation produces a feature enriched with information, which not only optimizes space but also enhances feature usage. To further encourage information exchange between these features, cross-reconstruction operations are utilized. These operations effectively integrate the weighted, distinctive informational features
and
, thereby enhancing the information flow between them. The resulting cross-reconstructed features
and
are then concatenated to form a more refined spatial feature map. This mapping incorporates information from various features, enabling it to capture more detailed and comprehensive visual content, while ultimately providing high-quality feature representations for subsequent image processing or analysis tasks.
The subsequent processes are as follows, from Equation (4) to Equation (8):
In this context, signifies element-wise multiplication,
denotes element-wise summation, and
refers to concatenation. When the Spatial Recurrent Unit (SRU) is applied to the intermediate input feature X, it effectively differentiates between informative and less informative features. This process reconstructs the features to enhance their representational quality while reducing redundancy in the spatial dimension. However, the refined spatial feature map
still presents some redundancy in the channel dimension. Consequently, the SRU not only enhances the spatial expressiveness of the features but also optimizes their utilization efficiency.
The Channel Reconstruction Unit (CRU) addresses the issue of redundancy among feature channels in Convolutional Neural Networks (CNNs), as shown in Fig 8. Employing a sophisticated “separate-transform-merge” strategy, the CRU optimizes feature channels with precision. This approach effectively reduces the redundant information between channels, resulting in more efficient feature representation. Consequently, the CRU alleviates both the parameter load and computational requirements of the network, while simultaneously enhancing its feature extraction capabilities. Overall, the CRU streamlines the model structure and strengthens its representational power by refining feature channels. The structure of the CRU is depicted below.
The segmentation process divides the refined spatial feature into two distinct parts: one containing
channels and the other containing
channels. A
convolution kernel is then applied to compress the channel numbers of both groups, resulting in
and
. For the transformation operation,
serves as the input for “rich feature extraction.” This involves performing Global Weighted Context (GWC) and Pixel-Wise Context (PWC) operations separately, after which the results are combined to produce the output
. Concurrently,
acts as supplementary input for “rich feature extraction,” undergoing PWC, and its output is integrated with the original input using a union operation to yield
. In the fusion phase, a simplified Selective Kernel Network (SKNet) approach is employed to adaptively merge
and
. The process begins with global average pooling to consolidate global space and channel statistical information, generating pooled values S1 and S2. SoftMax is then applied to both S1 and S2 to derive feature weight vectors
and
. Finally, the output
is computed as follows:
where
represents the refined channel feature.
Double attention networks (A2Net).
Traditional methods for feature extraction typically involve calculating first-order statistics, such as average pooling and max pooling. In recent years, many advanced networks have begun to utilize bi-linear pooling to capture second-order statistics of features [25], thereby facilitating the generation of global representations within the network. Compared to traditional methods, bi-linear pooling is more effective in capturing feature capabilities and maintaining complex relationships. Given the subtlety of features among various types of rice seeds and the need for fine-grained distinctions, employing bi-linear pooling for feature extraction can significantly enhance the global receptive field. Bi-linear pooling provides all feature vector pairs () within two input feature maps A and B, as shown in formula (9).
A and B are the outputs of two different convolutional layers that transform input X, and
.
After resolving the spatial feature aggregation issue, the next step involves distributing these spatial features to each position in the input. This distribution allows subsequent networks to capture global information with the use of smaller convolutional kernels. The A2Net network assigns adaptive visual primitives to the local feature vector at each position, leading to enhanced flexibility [26]. As a result, the features at each position can complement one another, making the training process more straightforward and improving the ability to capture complex relationships. As shown in Equation (10),
employs soft attention to select a feature vector, enabling the network to effectively capture global information. Using the softmax function to normalize
to a unit sum function, it was found that it has better convergence. Note that the set of weight vectors is also generated by convolutional layers and softmax normalizers,
,
contains the parameters of this layer. Each position
will generate its own attention vector based on the needs of its local feature
, and select a desired subset from the global features to help supplement the current position and form feature
.
The Double Attention Networks (A2Net) integrate a spatial feature aggregation module with a global information capture module. The computational model of A2Net, depicted in Fig 9, corresponds to the double attention operation outlined in Equation (11). The fundamental concept of A2Net involves employing an auxiliary network to guide the attention distribution of the main network. The auxiliary network first extracts features from the input image, subsequently generating an attention map that emphasizes the importance of various regions. The main network then utilizes this attention map to refine its attention distribution, enabling it to focus more intently on areas pertinent to the task at hand. A2Net significantly enhances the network’s capability to capture both local key information and global contextual insights, effectively highlighting the crucial features within the image data. By incorporating this innovative network module, the model is better equipped to identify and process visual information with greater accuracy, resulting in a marked improvement in overall recognition performance.
Deep space and channel residual network combined with double attention mechanism (RSCD-Net)
Due to the complexities involved in identifying rice species and the challenges posed by fine-grained recognition, this study presents a novel network model known as the Deep Space and Channel Residual Network Combined with Double Attention Mechanism (RSCD-Net), as depicted in Fig 10. The model parameters of d are shown in Fig 11. The model accepts an image of rice seeds as input and initiates feature extraction through a series of convolutional layers, batch normalization (BN) layers, ReLU activation functions, and max pooling operations. Following this initial phase, the model utilizes 16 layers of SCR-Block to extract crucial features, as well as space and channel information from the image. The resulting feature representations are then processed by the Double Attention Network (A2Net), which efficiently filters out the most significant features. The extraction process culminates in the network’s average pooling layer. As demonstrated in the model’s structure, the 16 layers of SCR-Block are organized into four primary convolutional blocks: the first block comprises 3 SCR-Blocks, the second includes 4 SCR-Blocks, the third consists of 6 SCR-Blocks, and the fourth again contains 3 SCR-Blocks.
Results
Experimental setup
We conducted experiments with 36 different varieties of rice seeds, utilizing a total of 14040 images, the specific division of the dataset is shown in Table 1. To maintain fairness and comparability in the experimental results, all network models were configured and executed under the same parameter settings, as detailed in Table 2. This methodology ensures an accurate assessment of each network’s performance, thereby enhancing the scientific validity and reliability of the experimental findings. The original size of the images in the dataset is 1280x720. In order to reduce computational complexity while retaining the main information of the image, the network preprocessed the image and adjusted its size to 224x224.
Analysis of experimental results
The performance of the RSCD-Net network was evaluated using a confusion matrix [27]. This network’s design philosophy seamlessly integrates both space and channel convolutional neural network alongside residual learning. As a result, its experimental outcomes were compared with those of the classic baseline network, ResNet50, as shown in Figs 12 and 13. The visualization of the confusion matrix indicates that the RSCD-Net network’s recognition performance is predominantly aligned along the diagonal, achieving an accuracy rate of 81.94%. This surpasses the 77.78% accuracy of the ResNet50 network, reflecting a notable improvement of 4.16%. For specific rice seed varieties, including D251, D33, D34, and D48, the accuracy rates increased by 20%, 22%, 10%, and 40%, respectively—all exceeding a minimal improvement of 0.1. Particularly noteworthy is variety D48, which demonstrated the most significant enhancement, with the number of correctly identified samples increasing from 30 in the ResNet50 network to 50 in the RSCD-Net network, resulting in a remarkable 40% boost in accuracy. Furthermore, 15 varieties achieved a perfect recognition accuracy with the RSCD-Net network—four more than with the ResNet50 network. Despite the diversity among rice seed varieties and the subtle differences between samples, some varieties had less-than-optimal accuracy rates when assessed with the ResNet50 network. However, following feature extraction using the RSCD-Net, the number of correctly identified samples for these varieties significantly improved. For example, the number of correct identifications for variety D251 increased from 10 in the ResNet50 network to 20 in the RSCD-Net network, leading to an 20% enhancement in accuracy. In the case of variety D33, the number of correct identifications grew from 10 in the ResNet50 network to 21 in the RSCD-Net network, marking a 22% improvement in accuracy.
The results clearly demonstrate that the RSCD-Net network excels in extracting features from multi-variety, fine-grained samples, significantly outperforming the traditional ResNet50 network in both accuracy and robustness.
Ablation experiment
To present model performance more intuitively and scientifically, this paper employs Average Accuracy (A), Precision (P), Recall (R), and F1-score (F) as metrics for evaluating classification performance, with their respective formulas outlined in Equations (12) through (15). To maintain scientific rigor and fairness in assessment, all networks under comparison underwent five independent training sessions conducted under identical sample and environmental conditions. The weight files associated with the best-performing results were selected for sample identification. This approach ensures controlled variables and facilitates an equitable comparison of performance across all networks.
In this context, TP (True Positive) signifies instances where the actual class is positive, and the model accurately predicts it as such. TN (True Negative) represents cases where the actual class is negative, and the model correctly identifies it as negative. FP (False Positive) refers to situations in which the actual class is negative, yet the model incorrectly predicts it as positive. FN (False Negative) indicates instances where the actual class is positive, but the model erroneously predicts it as negative.
The RSCD-Net network was compared with the classic residual network ResNet50, an improved version of ResNet50 incorporating the SCR-Block module (denoted as ResNet50-SC), and another enhanced version integrating the A2Net module (denoted as ResNet50-A2) through ablation experiments to analyze the superiority of the improvements made in the RSCD-Net network.
The experimental results presented in Table 3 clearly indicate that RSCD-Net surpasses other comparative network models in accuracy, achieving the highest performance overall. The results of the ablation experiment data indicate that the effectiveness of the SCR-Block module in the base network for efficiently capturing features of the target data. This improvement is particularly beneficial for identifying different varieties of rice and yields superior results when handling large-scale sample datasets. On the other hand, ResNet50-A2 achieves a 0.72% accuracy increase over ResNet50, suggesting that the A2Net attention mechanism positively influences the control of local features, has a significant effect on improving the accuracy of fine-grained image recognition.
This article delves deeper into the results of the ablation experiment through visualized heat maps (as shown in Figs 14 and 15). The figures reveal that various networks effectively extract key features of rice seeds, including the edges of the lemma, overall size, and epidermal texture. Among these networks, RSCD-Net demonstrates the most concentrated attention distribution on rice seeds, followed by ResNet50-SC, ResNet50-A2, and finally ResNet50.Notably, both ResNet50-SC and ResNet50 extract a significant number of features from non-subject regions. However, in comparison, ResNet50-SC not only captures features from non-subject areas but also achieves more comprehensive feature extraction from the main region of the rice seed. In contrast, ResNet50 exhibits limited feature extraction in the seed’s primary area, indicating that the SCR-Block effectively enhances the network’s capability to extract salient features from the image subject. Furthermore, ResNet50-A2 excels in extracting the primary features of rice seeds while minimizing attention to non-subject regions. Compared to ResNet50, ResNet50-A2 more precisely focuses its attention on the key subject of the image, demonstrating the effectiveness of A2-Net in enhancing fine-grained image recognition accuracy. This also highlights its strong robustness and ability to improve feature extraction precision within deep learning models.
The experimental results clearly show that the combination of the SCR-Block module and A2Net is very compatible and produces excellent integration effects, further validating the advantages and effectiveness of the RSCD-Net design.
Comparison with other networks
To illustrate the performance of the RSCD-Net network, we conducted a comparison with several leading networks, including ResNet50 [28], ConvNext [29], MobileNetV3 [30], and InceptionResNetV2 [31], Swin Transformer [32], and MobileNetV3 [33], assessing them on metrics such as accuracy, precision, recall, and F1-score. The results can be found in Tables 4 and 5, where the top values for each metric are highlighted in bold and underlined. The confusion matrices of each model are shown in Figs 16 to 17.
Based on the data provided in the Tables 6 and 7, Figs 16 and 17, the new model RSCD-Net showcases exceptional performance in terms of accuracy (P), recall (R), F1-score (F), and average accuracy (A), effectively recognizing more data categories than the other three network models.
Comprehensively analyze the performance of each model on the four evaluation indicators, and the specific optimal values are shown in the Table 6:
Overall, RSCD-Net surpassed the performance of the other models. The recall rates (R) associated with various seeds reveal that RSCD-Net significantly outperformed the others, leading the field with 19 optimal metrics—four more than the second-place model, ConvNext. Additionally, all two networks demonstrated recognition accuracies of at least 18% for all rice seed varieties, underscoring their ability to capture subtle features that distinguish between different varieties. This validates the effectiveness of employing deep learning techniques for rice seed classification.
The evaluation metrics for the new model, RSCD-Net, indicate that its optimal values are predominantly found in categories that exhibit similar inter-species characteristics. This suggests the model’s effectiveness in classifying rice seeds that closely resemble one another in appearance. In contrast, for rice seeds with more pronounced differences in their inter-species characteristics, the evaluation metrics among the various models show only minimal variation. According to the experimental data in Table 7, these findings that clearly underscore the significant advantages of RSCD-Net in addressing complex multi-category classification tasks that involve a variety of fine-grained rice seed identifications. This not only highlights the model’s effectiveness in rice seed classification and recognition but also indirectly illustrates the limitations of lightweight networks in managing multi-variety, fine-grained image recognition tasks.
Conclusions
This study presents a novel deep learning model, RSCD-Net, designed specifically for the classification and recognition of diverse rice seed varieties. By integrating the Space and Channel Residual Block (SCR-Block) and a Double Attention Mechanism (A2Net), the proposed model effectively enhances feature extraction, minimizes redundant information, and optimizes computational efficiency. The backbone of RSCD-Net consists of 16 layers of SCR-Blocks, systematically arranged to maximize feature differentiation. Additionally, A2Net expands the network’s global receptive field, further improving its ability to distinguish between different rice seed varieties. To address overfitting and improve generalization, data augmentation techniques were applied during training, increasing the diversity of the dataset.
Experimental validation was conducted on a self collected dataset that has not been previously used in deep learning-based rice seed classification. Despite its relatively small sample size compared to datasets in other related studies, RSCD-Net achieved an average accuracy of 81.94%, outperforming ResNet50 by 4.16% (77.78%). Furthermore, RSCD-Net demonstrated a significant improvement in recognition rates across a broader spectrum of rice seed varieties, clearly surpassing the performance of baseline models. To thoroughly evaluate the model’s effectiveness, extensive ablation experiments and comparisons with leading architectures, including InceptionResNetV2 and ConvNeXt, were conducted. The results confirm that RSCD-Net achieves superior overall performance, consistently exceeding the accuracy of both ablation and comparison models.
A comprehensive review of existing literature on rice seed classification reveals that most studies focus on no more than 10 seed varieties, significantly limiting their practical applications. This study makes a notable contribution by developing a model capable of accurately classifying 36 different rice seed types, including both widely cultivated varieties and specialized strains collected from an agricultural research institute. The exceptional classification capability of RSCD-Net offers practical benefits for farmers, rice traders, and researchers, providing a highly accurate, efficient, and cost-effective method for rice seed identification. Furthermore, the findings of this study lay the foundation for the development of a more comprehensive rice seed classification database, supporting future advancements in precision agriculture and smart farming technologies.
In summary, RSCD-Net represents a powerful and scalable solution for fine-grained rice seed classification, demonstrating superior performance in both accuracy and efficiency. Its integration of advanced feature extraction mechanisms and attention-based learning strategies positions it as a valuable tool for both academic research and real-world applications in the agricultural sector. Future work will explore further enhancements to scalability, dataset expansion, and real-time deployment, ensuring broader adoption of RSCD-Net in agricultural automation and smart seed classification systems.
References
- 1. Zhang G, Sun B, Zhao H, Wang X, Zheng C, Xiong K, et al. Estimation of greenhouse gas mitigation potential through optimized application of synthetic N, P and K fertilizer to major cereal crops: a case study from China. Journal of Cleaner Production. 2019;237:117650.
- 2. Chandio AA, Magsi H, Ozturk I. Examining the effects of climate change on rice production: case study of Pakistan. Environ Sci Pollut Res Int. 2020;27(8):7812–22. pmid:31889271
- 3. Yan C, Li Z, Zhang Y, Liu Y, Ji X, Zhang Y. Depth image denoising using nuclear norm and learning graph model. ACM Trans Multimedia Comput Commun Appl. 2020;16(4):1–17.
- 4. Yan C, Shao B, Zhao H, Ning R, Zhang Y, Xu F. 3D Room layout estimation from a single RGB image. IEEE Trans Multimedia. 2020;22(11):3014–24.
- 5. Ruslan R, Khairunniza-Bejo S, Jahari M, Ibrahim MF. Weedy rice classification using image processing and a machine learning approach. Agriculture. 2022;12(5):645.
- 6. Zhou S, Sun L, Xing W, Feng G, Ji Y, Yang J, et al. Hyperspectral imaging of beet seed germination prediction. Infrared Physics & Technology. 2020;108:103363.
- 7. Din NMU, Assad A, Dar RA, Rasool M, Sabha SU, Majeed T, et al. RiceNet: a deep convolutional neural network approach for classification of rice varieties. Exp Syst Appl. 2024;235:121214.
- 8. Koklu M, Cinar I, Taspinar YS. Classification of rice varieties with deep learning methods. Comput Electr Agric. 2021;187:106285.
- 9. Sharma A, Satish D, Sharma S, Gupta D. iRSVPred: a web server for artificial intelligence based prediction of major basmati paddy seed varieties. Front Plant Sci. 2020;10:1791. pmid:32158451
- 10. Lin P, Li XL, Chen YM, He Y. A deep convolutional neural network architecture for boosting image discrimination accuracy of rice species. Food Bioprocess Technol. 2018;11(4):765–73.
- 11.
Panmuang M, Rodmorn C, Pinitkan S. Image processing for classification of rice varieties with deep convolutional neural networks[C]//2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP). IEEE; 2021. p. 1–6.
- 12. Kiratiratanapruk K, Temniranrat P, Sinthupinyo W, Prempree P, Chaitavon K, Porntheeraphat S, et al. Development of paddy rice seed classification process using machine learning techniques for automatic grading machine. J Sens. 2020;2020:1–14.
- 13. Gilanie G, Nasir N, Bajwa UI, et al. RiceNet: convolutional neural networks-based model to classify Pakistani grown rice seed types[J]. Multimed Syst. 2021:1–9.
- 14. Qian Y, Xu Q, Yang Y, Lu H, Li H, Feng X, et al. Classification of rice seed variety using point cloud data combined with deep learning. Int J Agric Biol Eng. 2021;14(5):206–12.
- 15. He X, Cai Q, Zou X, Li H, Feng X, Yin W, et al. Multi-modal late fusion rice seed variety classification based on an improved voting method. Agriculture. 2023;13(3):597.
- 16. Chatnuntawech I, Tantisantisom K, Khanchaitit P, et al. Rice classification using spatio-spectral deep convolutional neural network. arXiv preprint arXiv:1805.11491, 2018.
- 17. Tang X-J, Liu X, Yan P-F, Li B-X, Qi H-Y, Huang F. An MLP network based on residual learning for rice hyperspectral data classification. IEEE Geosci Remote Sensing Lett. 2022;19:1–5.
- 18. Jin B, Zhang C, Jia L, Tang Q, Gao L, Zhao G, et al. Identification of rice seed varieties based on near-infrared hyperspectral imaging technology combined with deep learning. ACS Omega. 2022;7(6):4735–49. pmid:35187294
- 19. Du R, Xie J, Ma Z, Chang D, Song Y-Z, Guo J. Progressive learning of category-consistent multi-granularity features for fine-grained visual classification. IEEE Trans Pattern Anal Mach Intell. 2022;44(12):9521–35. pmid:34752385
- 20. Liu Z, Sun M J, Zhou TH, et al. Rethinking the value of network pruning [J]. 2018.
- 21. Chen Y, Cai X, Liang X, et al. Compression algorithm for weights quantized deep neural network models [J]. J Xidian Univ. 2019;46(2):7. https:10.19665/j.issn1001-2400.2019.02.022
- 22. Zisheng L, Jicheng L, Guo L, Jianchao B, Xuenian L. A new model for sparse and low-rank matrix decomposition. J Appl Analysis Comput. 2017;7(2):600–16.
- 23. Hinton G, Vinyals O, Dean J.Distilling the knowledge in a neural network[J]. Comput Scie. 2015;14(7):38–39.
- 24.
Li JF, Wen Y, He LH. Scconv: Space and channel reconstruction convolution for feature redundancy[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023; p. 6153–62.
- 25. Gao Z, Wu Y, Zhang X, et al. Revisiting bilinear pooling: a coding perspective[J]. Proc AAAI Conf Artificial Intelligence. 2020;34(4):3954–61.
- 26. Chen YP, Kalantidis Y, Li JS, et al. A2-nets: Double attention networks[J]. Adv Neural Inform Process Syst. 2018;31.
- 27. Heydarian M, Doyle TE, Samavi R. MLCM: multi-label confusion matrix. IEEE Access. 2022;10:19083–95.
- 28. He KM, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[J]. IEEE. 2016.
- 29.
Liu Z, Mao H, Wu CY, et al. A convnet for the 2020s[c]//Proceedings of the lEEE/CVF Conference on Computer Visionand Pattern Recognition. 2022. p. 11976–86.
- 30.
Howard A, Sandler M, Chen B, et al. Searching for MobileNetV3[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE; 2020. https://doi.org/10.1109/ICCV.2019.00140
- 31. Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. 2016 [2024-11-12].
- 32.
Liu Z, Lin Y T, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. p. 10012–122.
- 33.
Qian SY, Ning CR, Hu YP. MobileNetV3 for image classification [C]//2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE). IEEE, 2021. p. 490–7.