Figures
Abstract
The lack of obvious difference between germinated seeds and non-germinated seeds will cause the low accuracy of detecting rice seed germination rate, remains a challenging issue in the field. In view of this, a new model named Rice Seed Germination-YOLOV8 (RSG-YOLOV8) is proposed in this paper. This model initially incorporates CSPDenseNet to streamline computational processes while preserving accuracy. Furthermore, the BRA, a dynamic and sparse attention mechanism is integrated to highlight critical features while minimizing redundancy. The third advancement is the employment of a structured feature fusion network, based on GFPN, aiming to reconfigure the original Neck component of YOLOv8, thus enabling efficient feature fusion across varying levels. An additional detection head is introduced, improving detection performance through the integration of variable anchor box scales and the optimization of regression losses. This paper also explores the influence of various attention mechanisms, feature fusion techniques, and detection head architectures on the precision of rice seed germination rate detection. Experimental results indicate that RSG-YOLOV8 achieves a mAP50 of 0.981, marking a 4% enhancement over the mAP50 of YOLOv8 and setting a new benchmark on the RiceSeedGermination dataset for the detection of rice seed germination rate.
Citation: Li H, Liu L, Li Q, Liao J, Liu L, Zhang Y, et al. (2024) RSG-YOLOV8: Detection of rice seed germination rate based on enhanced YOLOv8 and multi-scale attention feature fusion. PLoS ONE 19(11): e0306436. https://doi.org/10.1371/journal.pone.0306436
Editor: Luca Bertolaccini, European Institute of Oncology: Istituto Europeo di Oncologia, ITALY
Received: June 17, 2024; Accepted: October 17, 2024; Published: November 12, 2024
Copyright: © 2024 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The underlying data are held at Figshare. DOI: 10.6084/m9.figshare.27288549.
Funding: This work is supported by Project of National Natural Science Foundation of China (52105539), Anhui Natural Science Foundation (2108085QD179), National Engineering Technology Research Center (2005DP173065-2022-01), Key Laboratory of Agricultural Sensors, Ministry of Agriculture and Rural Affairs, Anhui Provincial Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University (KLAS2022KF011). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Rice is a cornerstone of global food security, acting as a principal staple crop worldwide. The germination rate of rice seeds is a critical indicator for evaluating rice yield potential and is a key metric in seed quality assessment [1]. The diminutive nature and clustered configuration of rice seeds, however, complicate accurate assessment, often leading to decreased precision. Thus, the accurate detection of germinated seeds is paramount in precisely estimating rice yields.
In recent advancements, the confluence of computer hardware improvements and the evolution of computer vision and image processing techniques has facilitated significant progress in the assessment of seed germination rate [2, 3]. Zhang et al. unveiled a groundbreaking germination grain counting algorithm based on the premise that the intersection length between the embryo and the grain is shorter than the embryo’s circumference. They improved upon the coarse segmentation results from the k-means clustering algorithm through the application of a one-dimensional Gaussian filter and a fifth-degree polynomial for refinement [4]. Tan et al. introduced an algorithm adept at identifying and counting conjoined rice grains. By employing wavelet transform and Gaussian filtering, they enhanced the contrast of grayscale images and reduced noise. Furthermore, they addressed and unified over-segmented regions by deploying an advanced corner detection algorithm, verifying the alignment of segmentation line endpoints with corner points [5].
Moreover, Zhao et al. introduced a sophisticated approach that combines image segmentation, a Transformer encoder, a dedicated small target detection layer, and control distance intersection-over-union (CDIoU) loss to significantly enhance detection accuracy. Their convolutional neural network (CNN) is adept at identifying the germination status of rice seeds and autonomously quantifying the total number of germinated seeds [6]. Predominantly, these studies underscore the importance of exploiting differences in color or length between seeds and embryos to determine germination status. Nonetheless, the similar texture characteristics and size disparities of germinated seeds and non-germinated seeds, can obscure these differences, leading to potential misclassification and reduced accuracy in detection [4].
A comprehensive analysis of the existing literature reveals that although significant progress has been made in the field of target detection, there are still some research gaps. Firstly, the effectiveness of existing models in dealing with multi-scale feature fusion is still limited, especially in the task of target detection in complex contexts. In addition, although multiple attention mechanisms have been proposed, their performance differences in different application scenarios have not been fully explored.
To address the challenges of high accuracy in rice seed germination testing and to fill these gaps, the RSG-YOLOV8, a cutting-edge model is designed to accurately quantify rice seed germination rate. The principal contributions of this research are delineated as follows:
- Cross Stage Partial DenseNet (CSPDenseNet): The CSPDenseNet is proposed in this work, aimed at augmenting gradient flow and reducing computational demand. Each stage of CSPDenseNet is composed of a partial dense block coupled with a partial transition layer. Unlike the conventional DenseNet, where the base layer’s channel count substantially surpasses the growth rate, the CSPDenseNet utilizes a partial dense block. In this configuration, only half of the initial channels contribute to the dense layer operations, addressing nearly half of the computational bottleneck efficiently [7].
- Bi-level Routing Attention (BRA): The BRA algorithm is introduced to highlight salient features while minimizing redundancy through a dynamic and sparse attention mechanism. By eliminating non-essential key-value pairs at the coarse region level, the BRA ensures the retention of only crucial routing areas, thereby optimizing feature selection [8].
- Generalized Feature Pyramid Network (GFPN): The GFPN is incorporated in the work, an architecture devised to re-engineer the original ’Neck’ component of YOLOv8, facilitating efficacious feature integration across different scales. The GFPN employs dense connections and a Queen-Fusion structure to produce highly integrated features. Furthermore, the Concat operation is utilized over summation for feature fusion, significantly mitigating the risk of information loss [9].
- Added Detection Head: Addressing the limitations inherent in the original YOLOv8 detection heads for the context of rice seed detection, an innovative detection head is introduced in this work. This novel component leverages shallow information extracted from the initial C2f (shortcut) module of the input image and integrates it with a supplementary feature fusion network. The introduction of specific anchor box scales and the optimization of regression losses within this new detection head markedly improve detection accuracy. Consequently, this advancement significantly boosts the model’s performance, catering to the nuanced demands of rice seed detection with increased precision and effectiveness.
1. Rice seed germination rate detection model
1.1 YOLOv8 model.
This research extends the YOLOv8 framework as delineated in Fig 1, incorporating modifications designed to enhance rice seed detection accuracy. The YOLOv8 architecture is structured around three core components: the Backbone, Neck, and Head. The Backbone consists of Convolution (Conv), C2f (shortcut), and Spatial Pyramid Pooling-Fast (SPPF) modules. The Conv module executes convolution operations on the input images, assisting the C2f module in efficient feature extraction, whereas the SPPF module is instrumental in generating outputs of adaptive sizes [10].
The Neck, employing the combined architectures of the Feature Pyramid Network (FPN) and Path Aggregation Network (PANet), adeptly extracts and amalgamates features across multiple scales. This integration significantly enhances the model’s detection performance and robustness [11]. The Head component, with its decoupled structure, segregates classification and regression tasks into distinct branches, thereby reducing task conflict and fostering improved accuracy.
Notwithstanding its merits, YOLOv8 encounters several challenges. Predominantly, its architecture is characterized by an extensive reliance on convolutional and C2f blocks, escalating computational complexity and the total number of parameters. This complexity, coupled with a constrained detection head capacity, does not fully accommodate the specificities of rice seed detection, particularly in identifying objects beyond the model’s original scaling capabilities. Furthermore, there is room for optimization within the Backbone’s feature extraction and the Neck’s feature fusion processes [12].
In response to these challenges, the present paper proposes a set of modifications to the YOLOv8 model. These enhancements are aimed at optimizing the model’s structural efficiency and its effectiveness in the specific application of rice seed detection [13–16].
1.2 RSG-YOLOV8.
Building upon the advancements of the YOLOv8 model, this paper introduces the RSG-YOLOV8 model, depicted in Fig 2. The proposed model incorporates several key modifications aimed at enhancing both efficiency and accuracy. The integration of CSPDenseNet mitigates computational complexity without compromising precision, ensuring optimal performance with reduced computational overhead. Leveraging the BRA algorithm, which features dynamic and sparse attention mechanisms, further enhances feature extraction by prioritizing salient features and minimizing redundancy. This strategic focus boosts the model’s ability to discern crucial details within the input data. Moreover, the incorporation of a structured feature fusion network, based on GFPN, reconstructs the Neck component of YOLOv8, facilitating effective feature fusion across multiple scales and bolstering the model’s capacity to extract comprehensive information from diverse contexts. Lastly, the introduction of an added detection head refines detection performance by incorporating additional anchor box scales and optimizing regression losses. This enhancement improves the model’s ability to accurately localize and classify objects of interest accurately. Together, these components constitute the comprehensive architecture of the RSG-YOLOV8 model, poised to advance the state-of-the-art in object detection tasks.
1.2.1 CSPDenseNet. The Cross Stage Partial (CSP) Network, a modular design employed to enhance the performance of neural networks, is exemplified in the architecture of CSPDenseNet [7], as illustrated in Fig 3. By dividing the network into two distinct segments and incorporating partial connections between them, CSPDenseNet significantly enhances feature transfer and reuse. This configuration allows for a seamless exchange of information between stages, thus improving the feature representation capacity and overall efficacy of the model. Merging the dense connection structure of DenseNet with the innovative CSP module, CSPDenseNet represents a powerful synergy of complementary technologies.
Initially, a DenseNet backbone is established to leverage its dense connectivity. The CSP module is then strategically integrated into the mid-layers of this backbone, effectively splitting the network into two parts while maintaining partial connections between them. This design paradigm enables efficient information transmission, capitalizing on DenseNet’s strengths while simultaneously improving feature reuse and information transmission through the CSP module. Particularly in the context of detecting germinated rice seeds, this combination of methodologies within CSPDenseNet is poised to significantly enhance performance metrics.
1.2.2 BRA-based attention feature fusion. The primary aim of the multi-scale feature fusion network, situated in the Neck section, is to amalgamate feature maps extracted from various layers of the network, thereby augmenting the efficacy of multi-scale object detection. However, the feature fusion layer in YOLOv8 encounters challenges related to redundant information from distinct feature mappings. To mitigate this limitation, this paper explores the incorporation of an attention mechanism within the feature fusion process of the YOLOv8 model. Attention mechanisms, initially proposed to evaluate the relevance of specific features over others, have shown substantial promise in enhancing object detection performance within computer vision.
Five attention mechanisms—Squeeze-and-Excitation (SE) [17], Convolutional Block Attention Module (CBAM) [18], Efficient Channel Attention (ECA) [19], Coordinate Attention (CA) [20], Receptive Field Attention (RFA) [21], and BRA—are identified as significantly potential in improving object detection performance. Their distinctions arise from SE and ECA’s concentration on channel attention, RFA and BRA’s focus on spatial attention, while CBAM and CA enhance both channel and spatial attentions simultaneously. SE recalibrates feature responses at the channel level by explicitly modeling the interdependencies among convolutional feature channels. ECA captures local dependencies among channels with minimal computational demands, eliminating the reliance on global statistics. RFA introduces an effective attention mechanism for facilitating parameter sharing among convolutional kernels, whereas BRA embodies a dynamic, query-aware sparse attention mechanism, enabling a content-aware selection of the most relevant key/value tokens for each query.
Fig 4 illustrates the comprehensive architecture of the BRA module within RSG-YOLOV8. The feature fusion structure is enhanced through the integration of the BRA attention module, facilitating efficient multi-level feature fusion while mitigating redundant information across feature mappings. The introduction of dynamic sparse attention facilitates the integration of feature maps of varying scales by leveraging weight distributions of individual channels and spatial positions, thereby diminishing redundant feature information and enhancing the model’s detection accuracy.
To handle a 2D input feature map X∈RH×W×C, the initial step involves partitioning it into S×S non-overlapping regions, where each region contains feature vectors. This partitioning is achieved by reshaping X as
. Subsequently, the query, key, and value tensors, Q, K,
are obtained, through linear projections:
(1)
where
represent the projection weights for the query, key, and value, respectively.
The attending relationship, determining which regions should be attended to for each given region, is established by constructing a directed graph. Initially, region-level queries and keys, Qr, respectively, are derived by averaging Q and K per region. Subsequently, the adjacency matrix
of the region-to-region affinity graph is obtained through matrix multiplication between Qr and the transpose of Kr:
(2)
Entries in the adjacency matrix Ar quantify the semantic relationship between two regions. The subsequent crucial step involves pruning the affinity graph by retaining only the top-k connections for each region. This is achieved by deriving a routing index matrix
using the row-wise top-k operator:
(3)
Thus, the ith row of Ir contains k indices representing the most relevant regions for the ith region.
In the feature fusion process, RSG-YOLOV8 strategically places the BRA module subsequent to the Convolutional or Upsampling module, enabling the model to concentrate solely on specific regions post feature extraction. The primary objective of the BRA module is to eliminate irrelevant key-value pairs input at a broader regional level, retaining only pertinent domains. Initially, the BRA module takes the feature map as input, partitioning it into distinct regions and deriving queries, keys, and values through linear transformations.
Subsequently, the region-level relationship between queries and keys is fed into an adjacency matrix to construct a directed graph, precisely identifying the association of specific key-value pairs. This delineation determines the involvement of designated areas. Ultimately, utilizing the region-to-region routing index matrix facilitates multi-head self-attention between individual tokens. The double-layer path optimization of multi-head self-attention directs more attention towards the newly germinated part of the feature map of rice seeds, thereby augmenting the model’s capability to detect germinated rice seeds.
The approach proposed in this paper solely employs the attention module BRA from BiFormer, distinguishing it from existing methodologies that incorporate BiFormer into YOLOv8 [8, 22–24].
1.2.3 GFPN. FPN successfully addresses the challenge of integrating hierarchical features into convolutional neural networks, thereby enhancing the performance of object detection models across various scales. PANet further refines feature propagation and information sharing within the feature pyramid. Introducing a bottom-up pathway alongside FPN’s top-down approach, the Bi-directional Feature Pyramid Network (BiFPN) [25] effectively exploits multi-scale features. Conversely, GFPN implements dense connections and the Queen-Fusion with Concat technique to minimize information loss.
As shown in Fig 5, the Queen-fusion connection in P5 consists of the previous layer P4 downsampling, the previous layer P6 upsampling, the previous layer P5 and the current layer P4. In this work, bilinear interpolation and maximum pooling are applied as the upsampling and downsampling functions, respectively. Therefore, in the case of extreme large-scale changes, the model is required to have sufficient exchange of information between the upper and lower layers. Based on the mechanism of layer-hopping and cross-scale connectivity, GFPN can be as long as the ‘giraffe’s neck’. With such a ‘heavy neck’ and a lightweight backbone, RSG-YOLOV8 can strike a balance between higher accuracy and better efficiency.
C represent concatenation fusion style, and ‘Pk denotes node in next layer.
FPN and PANet play pivotal roles in facilitating multi-scale feature fusion within the Neck architecture of both YOLOv5 and YOLOv8. YOLOv8’s Neck incorporates the C2f module instead of C3 during the upsampling process, distinguishing it from YOLOv5. FPN extracts CNN feature maps and fuses them in a top-down manner with upsampling and coarse-grained maps, while PANet integrates bottom-up information to preserve spatial details. BiFPN, AFPN [26], and GFPN efficiently integrate features at different levels, thereby enhancing effectiveness by incorporating additional fusion levels. RSG-YOLOV8 enhances the FPN-PANet structure in YOLOv8 to improve multi-level feature fusion through multi-path network fusion [27].
1.2.4 Added detection head. The original YOLOv8 features three detection heads with dimensions of 20×20, 40×40, and 80×80 respectively. However, these heads fail to adequately meet the detection requirements in rice seed detection scenarios, resulting in suboptimal accuracy for objects exceeding the original scale. RSG-YOLOV8 addresses this limitation by introducing an additional detection head sized at 160×160 within the Head component. Moreover, it incorporates a novel feature fusion network structure within the Neck component to enhance detection capabilities across various object scales. Adjacent to the existing 80x80 detection scale in YOLOv8, the new detection head is added, integrating shallow information from the initial C2f (shortcut) module in the input image and combining it with an additional feature fusion network. This augmentation by RSG-YOLOV8 significantly improves the model’s ability to detect larger objects. RSG-YOLOV8’s architecture builds upon YOLOv8, integrating new modules such as BRA and CSP, alongside existing modules like Conv, C2f (shortcut), SPPF, Concat, Upsample, and Detect.
2. Dataset and parameter configuration
2.1 Dataset.
The dataset utilized in this study is the open-source RiceSeedGermination dataset, sourced from the Kaggle official website (https://www.kaggle.com). Comprising rice seed images from nine distinct populations, this dataset showcases wide phenotypic diversity across different strains within each population, including variations in length, shape, and color. In total, the dataset encompasses 600 rice seed images, segregated into training and testing sets at an 8:2 ratio. Within these images, a diverse array of seeds is depicted, characterized by varying sizes, shapes, and colors, alongside incidental impurities such as branch stalks, broken leaves, and rice awns. Notably, these seeds are randomly distributed throughout the images.
2.2 Parameter configuration.
RSG-YOLOV8 undergoes training and testing utilizing the Intel® Core® i5-10200H (2.40GHz) CPU and NVIDIA® GeForce GTX® 1650 12GB GPU. The software environment consists of the Windows version of PyCharm 2023.1. The proposed method is an implementation based on YOLOv8 architecture. Hyperparameters employed in training RSG-YOLOV8 and other comparative methods mirror those of YOLOv8. Training parameters are configured with a batch size of 1 and 300 epochs. The optimization process utilizes the AdamW optimizer with initial and final learning rate set to 0.0001, a momentum of 0.937, and a network input size of 640×640 [28].
3. Experiment and result analysis
3.1 Performance indicators.
To objectively evaluate the effectiveness of the proposed method, the paper assesses the performance of RSG-YOLOV8 using four evaluation metrics: Precision, Recall, mAP50, F1-score, and mAP50-95. Precision and Recall are calculated using the following formulas:
(4)
(5)
(6)
where TP represents the number of samples correctly classified as positive, FP indicates the number of samples incorrectly classified as positive, and FN represents the number of samples incorrectly classified as negative.
(7)
(8)
(9)
(10)
where AP denotes the area under the precision–recall curve for a specific category at various confidence thresholds. mAP represents the mean average precision, calculated by averaging the AP values across all categories. mAP50 refers to the mAP computed with an IOU threshold of 0.5, while mAP50−95 refers to the mAP calculated with an IOU threshold ranging from 0.5 to 0.95, providing a more stringent evaluation metric.
3.2 Ablation experiments.
To ascertain the efficacy of the proposed strategy in augmenting the performance of deep neural networks, this paper conducts an exhaustive series of ablation experiments. These experiments were meticulously crafted to gauge the individual impact of each component within the model on the overall performance metrics. Through this systematic evaluation, this paper aimed to elucidate the theoretical underpinnings and practical advantages of the proposed solution.
3.2.1 Ablation experiments of the overall structure. Through the removal of each component, the study assessed four incomplete RSG-YOLOV8 models. As illustrated in Fig 6, it is evident that BRA, GFPN, the added detection head, and Generalized Intersection over Union (GIoU) [29] all contribute to enhancing the accuracy of RSG-YOLOV8. ‘w/o GFPN’ denotes the utilization of the original Neck structure, FPN-PANet, from YOLOv8. Notably, the inclusion of the added detection head yields the most substantial improvement in overall accuracy, particularly for mAP50, followed by GFPN and BRA.
‘w/o’ represents without.
3.2.2 Comparison of different multi-scale feature fusion structures. This paper presents a comparative analysis of the proposed RSG-YOLOV8 with BRSG-YOLOV8 and ARSG-YOLOV8. In BRSG-YOLOV8 and ARSG-YOLOV8, the GFPN in the Neck component of RSG-YOLOV8 is substituted with BiFPN and AFPN for feature fusion. As depicted in Fig 7, RSG-YOLOV8 utilizing the GFPN structure exhibits significantly higher mAP50 and Recall metrics compared to models employing BiFPN and AFPN structures.
Replace the GFPN structure in the Neck part of RSG-YOLOV8 with BiFPN and AFPN. The best results are displayed in bold.
3.2.3 Comparison of different attention mechanisms. This study employed the RSG-YOLOV8 model to investigate various attention mechanisms. The initials of the model names outlined in Table 1 denote the attention mechanisms utilized, specifically S, E, C, A, R, and B, representing SE, ECA, CBAM, CA, RFA, and BRA, respectively. Among the alternative attention mechanisms considered, BRA yielded the most substantial performance enhancement. Additionally, CBAM (i.e. CRSG-YOLOV8) demonstrated an mAP50 second only to BRA (i.e. RSG-YOLOV8), with its accuracy values surpassing those of BRA. Despite ECA (i.e. ERSG-YOLOV8) exhibiting a higher mAP50-95 compared to BRA, its mAP50 notably lagged behind that of BRA.
Replace BRA in RSG-YOLOV8 with SE, ECA, CBAM, CA, RFA respectively. The best results are highlighted in bold.
3.2.4 Comparison of different regression losses. This study conducted ablation experiments to assess the impact of various regression losses in object detection, including GIoU, which measures the distance between two rectangles aligned along both axes, Distance-IoU (DIoU), Effective-IoU (EIoU), Scylla-IoU (SIoU) [30], and Wise-IoU (WIoU)v3 [23, 31]. These loss functions are denoted by the added letter of the model names in Table 2: G, D, E, S, W. In comparison to other regression losses, the original regression loss, Complete Intersection over Union (CIoU) in YOLOv8, demonstrates superior robustness for bounding boxes.
Replacing the CIoU in RSG-YOLOV8 with GIoU, DIoU, EIoU, SIoU, and WIoU, respectively. The best results are highlighted in bold.
The mAP50 of DIoU (RSGD-YOLO) closely aligns with that of CIoU (RSG-YOLOV8), suggesting DIoU as a viable competitor to CIoU. Conversely, for mAP50-95, GIoU (RSGG-YOLO) outperforms CIoU. The choice of regression loss depends on specific scenario criteria. In this study, mAP50 serves as the primary indicator for detecting the germination rate of rice seeds. Consequently, CIoU is selected as the regression loss in the proposed RSG-YOLOV8.
3.3 Comparison of the results of different models.
In order to make a fair comparison, this study chose the competitive model and evaluated them using the same evaluation indicators. As shown in Table 3, compared to YOLOv8, RSG-YOLOV8 has improved by 0.5%, 7.1%, 4%, and 5.8% in Precision, Recall, mAP50, and mAP50-95, respectively. RSG-YOLOV8 not only surpasses the baseline YOLOv8 model, but also outperforms YOLOv5, YOLO-r, and YOLOv7 model.
The best results are highlighted in bold.
Fig 8 illustrates the dynamic trends of various evaluation metrics throughout the training process of the YOLOv5, YOLOv7, YOLOv8, YOLO-r, and RSG-YOLOV8 models.
In Fig 8(A), the fluctuation of Precision among the models with epochs is depicted. Notably, RSG-YOLOV8 initially exhibits lower Precision, gradually ascending to its peak at the 300th epoch. YOLOv7 and YOLOv8 demonstrate parallel increases in Precision, with YOLOv8 surpassing YOLOv7 in later epochs. YOLOv5, represented by the black line, initially presents lower Precision, with marginal improvement over time and YOLO-r model has the fastest convergence speed.
Fig 8(B) showcases the evolution of Recall over epochs, mirroring the trends observed in Precision. RSG-YOLOV8 initiates with a subdued Recall, progressively augmenting thereafter. Similarly, YOLOv7, YOLO-r, and YOLOv8 exhibit consistent growth in Recall. Conversely, YOLOv5 demonstrates a more stabilized Recall progression.
Fig 8(C) delineates the variations in mAP50 over epochs. Initially, RSG-YOLOV8 displays a notably high mAP50, which continues to ascend steadily throughout the training duration, culminating in a significantly elevated level. While YOLOv8’s initial mAP50 slightly lags behind YOLOv7, it experiences rapid enhancement, eventually either matching or surpassing YOLOv7’s mAP50 in later epochs. YOLOv5 commences with a slightly higher initial mAP50, yet its growth rate decelerates over subsequent epochs.
Fig 8(D) illustrates the dynamics of mAP50-95 across epochs. Initially, with fewer epochs, minimal discrepancies in mAP50-95 are discernible among the models; however, as epochs progress, distinctions become more pronounced. The blue line representing YOLOv8 depicts a relatively erratic mAP50-95 trend throughout iterations, gradually improving with increasing epochs. Notably, RSG-YOLOV8 exhibits exceptional mAP50-95 performance with a larger number of epochs, surpassing its counterparts. While the mAP50-95 curves of YOLOv5 and YOLOv7 remain closely aligned, as epochs accumulate, YOLOv7 marginally outperforms YOLOv5 in terms of mAP50-95.
Fig 8(E) shows the dynamics of F1-score over epochs. YOLOv5’s F1-score rises smoothly with increasing epochs, demonstrating a stable training process; YOLOv7 is similar to YOLOv5, but has a faster performance improvement rate, reaching higher F1-score at certain epoch points; RSG-YOLOV8 does not have a high F1-score at the beginning, but with increasing epochs, the F1-score increases significantly, eventually surpassing other YOLO series models; YOLOv8 maintains high F1-score throughout the training process, showing good stability and generalisation ability; YOLO-r has a more fluctuating trend in F1-score, but also improves with epochs increase in general, but not as significantly as the other models.
Fig 9 illustrates the performance comparison among YOLOv5, YOLOv7, YOLOv8, and RSG-YOLOV8 models in germinated rice seed detection, with red prediction boxes denoting germinated rice seed labeled as ’yes.’ The images provide a clear depiction of each model’s performance on the same scene and object. Fig 9(A) and 9(G) depict original images randomly selected from the dataset. Fig 9(C) and 9(I) showcase the results recognized by YOLOv7, which already demonstrate the identification of some germinated rice seeds with relatively clear bounding boxes. Fig 9(B) and 9(H) present the outcomes of YOLOv5 recognition, showcasing further improvement compared to YOLOv7 in both object recognition and bounding box accuracy. Fig 9(D) and 9(J) display the recognition results of YOLOv8, which significantly elevate recognition effectiveness. Nearly all germinated rice seeds are accurately identified, with bounding box accuracy approaching perfection, aligning closely with the actual seed boundaries. Fig 9(E) and 9(K) showcase the results recognized by YOLO-r, which although identifies many germinated seeds, there are still false detections and missed detections. Lastly, Fig 9(F) and 9(L) demonstrate the results of RSG-YOLOV8 recognition, representing the latest technological advancement among these versions, and showcasing exceptional performance. In terms of recognition accuracy, it approaches near perfection, successfully identifying all objects while achieving the ultimate precision in bounding box delineation.
Conclusion
The RSG-YOLOV8 model, a novel variant of YOLOv8 tailored specifically for precise detection of rice seed germination in images, is introduced in this paper. Through meticulous optimization of the GFPN feature fusion structure and incorporation of the BRA attention mechanism, along with the addition of a detection head, the target detection capabilities of YOLOv8 are markedly enhanced by RSG-YOLOV8. These enhancements facilitate weighted feature fusion across multiple levels and diverse scales, culminating in the generation of high-quality anchor boxes with a dynamic focusing mechanism. The effectiveness of the proposed algorithm is rigorously validated using the RiceSeedGermination dataset. Experimental results demonstrate the superior detection accuracy of the RSG-YOLOV8 algorithm, with an mAP50 of 0.981, representing a noteworthy 4% improvement over the original YOLOv8 model. Furthermore, RSG-YOLOV8 surpasses alternative comparative experiments, achieving a Recall of 0.968, which is notably 7.1% higher than the baseline YOLOv8 model without any enhancements.
References
- 1. Škrubej U, Rozman Č, & Stajnko D (2015) Assessment of germination rate of the tomato seeds using image processing and machine learning. European Journal of Horticultural Science 80:68–75.
- 2. Guo Y, et al. (2021) Automatic and Accurate Calculation of Rice Seed Setting Rate Based on Image Segmentation and Deep Learning. Frontiers in Plant Science 12. pmid:34970287
- 3. Ye S, et al. (2023) SY-Net: A Rice Seed Instance Segmentation Method Based on a Six-Layer Feature Fusion Network and a Parallel Prediction Head Structure. Sensors 23(13):6194. pmid:37448042
- 4. Zhang Y, Huang H, Xiong B, & Ma Y (2023) An automated method for the assessment of the rice grain germination rate. PLOS ONE 18(1):e0279934. pmid:36595528
- 5. Tan S, Ma X, Mai Z, Qi L, & Wang Y (2019) Segmentation and counting algorithm for touching hybrid rice grains. Computers and Electronics in Agriculture 162:493–504.
- 6. Zhao J, et al. (2023) Deep-learning-based automatic evaluation of rice seed germination rate. Journal of the science of food and agriculture 103(4):1912–1924. pmid:36335532
- 7. Wang C, et al. (2019) CSPNet: A New Backbone that can Enhance Learning Capability of CNN. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW):1571–1580.
- 8. Zhu L, Wang X, Ke Z, Zhang W, & Lau RWH (2023) BiFormer: Vision Transformer with Bi-Level Routing Attention. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR):10323–10333.
- 9. Jiang Y, et al. (2022) GiraffeDet: A Heavy-Neck Paradigm for Object Detection. ArXiv 2202.04256.
- 10.
Appe SN, Arulselvi G, & Balaji G (2023) Detection and Classification of Dense Tomato Fruits by Integrating Coordinate Attention Mechanism With YOLO Model. Handbook of Research on Deep Learning Techniques for Cloud-Based Industrial IoT, (IGI Global), pp 278–289.
- 11. Venkatesan R & Balaji GN (2024) Balancing composite motion optimization using R-ERNN with plant disease. Applied Soft Computing 154:111288.
- 12. Zhu R, Hao F, & Ma D (2023) Research on Polygon Pest-Infected Leaf Region Detection Based on YOLOv8. Agriculture 13(12):2253.
- 13. Yang G, Wang J, Nie Z, Yang H, & Yu S (2023) A Lightweight YOLOv8 Tomato Detection Algorithm Combining Feature Enhancement and Attention. Agronomy 13(7):1824.
- 14. Li S, et al. (2023) A Glove-Wearing Detection Algorithm Based on Improved YOLOv8. Sensors 23(24):9906. pmid:38139751
- 15. Zhang Z, Tan L, & Tiong RLK (2024) RFAConv: Innovating Spatial Attention and Standard Convolutional Operation. Sensors 24(3):727.
- 16. Jia R, et al. (2024) Underwater Object Detection in Marine Ranching Based on Improved YOLOv8. Journal of Marine Science and Engineering 12(1):55.
- 17. Hu J, Shen L, Albanie S, Sun G, & Wu E (2020) Squeeze-and-Excitation Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 42:2011–2023. pmid:31034408
- 18. Woo S, Park J, Lee J-Y, & Kweon I-S (2018) CBAM: Convolutional Block Attention Module. ArXiv 1807.06521.
- 19. Wang Q, et al. (2019) ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR):11531–11539.
- 20. Hou Q, Zhou D, & Feng J (2021) Coordinate Attention for Efficient Mobile Network Design. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR):13708–13717.
- 21. Zhang X, et al. (2023) RFAConv: Innovating Spatial Attention and Standard Convolutional Operation. ArXiv 2304.03198.
- 22. Wang G, et al. (2023) UAV-YOLOv8: A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Photography Scenarios. Sensors 23(16):7190. pmid:37631727
- 23. Tong Z, Chen Y, Xu Z, & Yu R (2023) Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. ArXiv 2301.10051.
- 24.
Zhang Y, et al. (2023) Improved YOLOv8 Insulator Fault Detection Algorithm Based on BiFormer. 2023 IEEE 5th International Conference on Power, Intelligent Computing and Systems (ICPICS), (IEEE), pp 962–965.
- 25. Tan M, Pang R, & Le QV (2019) EfficientDet: Scalable and Efficient Object Detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR):10778–10787.
- 26. Yang G, et al. (2023) AFPN: Asymptotic Feature Pyramid Network for Object Detection. 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC):2184–2189.
- 27. Xu X, et al. (2022) DAMO-YOLO: A Report on Real-Time Object Detection Design. ArXiv 2211.15444.
- 28. Haridasan A, Thomas J, & Raj ED (2022) Deep learning system for paddy plant disease detection and classification. Environmental Monitoring and Assessment 195(1):120. pmid:36399232
- 29. Rezatofighi SH, et al. (2019) Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR):658–666.
- 30. Gevorgyan Z (2022) SIoU Loss: More Powerful Learning for Bounding Box Regression. ArXiv 2205.12740.
- 31. Zhang H, Wang Y, Dayoub F, & Sunderhauf N (2020) VarifocalNet: An IoU-aware Dense Object Detector. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR):8510–8519.