Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Research on rice disease recognition based on improved SPPFCSPC-G YOLOv5 network

  • Bo Yang ,

    Contributed equally to this work with: Bo Yang, Jinping He

    Roles Conceptualization, Supervision, Validation, Writing – original draft, Writing – review & editing

    Affiliation School of Information Engineering, Changchun University of Finance and Economics, Changchun, China

  • Lina Zhang ,

    Roles Investigation, Validation, Writing – review & editing

    zhangln@jlau.edu.cn

    Affiliation School of Information Technology, Jilin Agricultural University, Changchun, China

  • Jinping He

    Contributed equally to this work with: Bo Yang, Jinping He

    Roles Investigation, Validation, Writing – original draft

    Affiliation School of Information Engineering, Changchun University of Finance and Economics, Changchun, China

Abstract

Spatial Pyramid Pooling (SPP) is important in capturing remote contextual information for pixel-level prediction tasks in scene-resolved detection of rice diseases. In this paper, the detection objects of the rice disease dataset used in this paper have almost the same target size and do not need to be passed through different filters to obtain different receptive fields of view. Therefore, this paper proposed a new pooling structure, SPPFCSPC-G, which split the feature vector into 2 channels for processing. One channel was processed using grouped 1×1 Conv, while the other channel mainly used multiple filters with the same parallel structure (5×5 MaxPool). Additionally, multiple 1×1 and 3×3 grouped convolutions were concatenated in series in that branch (Group-Conv) to extract more complex features in rice. Finally, the 2 parts were connected (Concat) together, with each convolutional layer Conv divided into 4 groups as a way to reduce the amount of computation in the model. The project team incorporated SPPFCSPC-G into the Backbone of YOLOv5 and trained it on NVIDIA Tesla T4 (GPU). The experimental results showed that the performance of the method used in this paper improved, including Precision, Recall, mAP, and training speed, while reducing the size of computational parameters (Parameters), computational volume (GFLOPs), and model size (Param.). The project team carried out the trained YOLOv5 model on Intel Core i5 (CPU) for inference detection of rice leaves in real scenarios, and the experiments showed that both pre-inference and actual inference were faster. Moreover, the consumption of computational resources was almost minimized, and the model effectively identified rice diseases.

Introduction

Rice, an annual aquatic herbaceous plant in the grass family, is an important cereal crop in tropical Asia. More than half of the world’s population depends on rice as their main food source, and consumer demand for high-quality rice is increasing. However, rice cultivation is often plagued by a variety of diseases [1,2], which severely reduces crop yields and causes significant economic losses [3]. Therefore, the identification and management of rice diseases are essential to ensure sustainable agriculture and food security. Rapid and accurate identification of different types of rice diseases, along with appropriate follow-up solutions and treatment plans, are necessary to prevent further spread and minimize losses.

Detecting objects in an image, as one of the fundamental tasks in today’s agricultural detection and classification related fields, is often used as a starting point for many real-world applications, which include the application of image recognition techniques within the field of computer vision [48] to rice leaf pest recognition [4], apple disease recognition [5], citrus fruit recognition [6], plant leaf disease recognition [7,8], and so on.

The widely used deep learning based target detection algorithms can be categorized into two groups:(1) two-step target detection algorithms, such as Fast R-CNN (Region-Convolutional Neural Network) [4], Faster-RCNN [9], MaskR-CNN [10], and so on. These models divide target detection into two phases, firstly using Region Proposal Network (RPN) to extract candidate target region information, then using a classifier for classification, and finally using a detection network to complete the prediction and identification of candidate region categories and locations. (2) Single-step target detection algorithms, such as R-FCN ((Region-based Fully-Convolution Neural Networks) [11], SSD (Single Shot MultiBox Detector) [7], YOLO (You Only Look Once) [1219]. These algorithms integrate the two tasks of candidate region extraction and classification into a single network, and compute the category scores and positional deviations of the features obtained from the convolutional network directly through a convolutional kernel. The YOLO series, YOLOv1-v3 [1214], pioneered one-stage object detectors with one-stage, and YOLOv4 [15] categorized the target detection framework as Input, BackBone, Neck, and Head into four independent parts. The subsequent YOLOv5/v6/v7/v8 [1619] have greatly improved the detection speed and accuracy, and they are all candidates for the deployment of efficient detectors.

SPP module was proposed by KaiMing He et al [20], SPP module effectively avoids the problems of image distortion caused by cropping and scaling operations on image regions, and also solves the problem of repeated feature extraction by CNN for graph correlation, and improves the speed of generating candidate frames. In 2017, Liang-Chieh Chen et al [21], in the semantic segmentation model DeepLabv2 proposed the ASPP module, and they used multiple parallel filters with different rates for multi-scale feature extraction.2018 Songtao Liu et al [22], constructed the RFB module by combining multiple branches with different kernels and an extended Conv. Multiple kernels are analogous to pRFs of different sizes, while the extended convolutional layer assigns a separate eccentricity to each branch to mimic the ratio between the size of the pRF and the eccentricity, while a 1×1 Conv is concatenated in all branches to produce the final feature map.2020 Jocher Glenn [16] proposed SPP-Faster in the open-source YOLOv5 project. In the pooling process he used 2 serially identical small kernel convolutions to replace 1 large kernel, resulting in a parallel structure SPPF, which makes the model much less computationally intensive and the model speed is greatly improved. In 2020 CVPR, Qibin Hou et al [23] changed the convolution to a 1×N or N×1 narrow convolution, thus proposing the concept of Strip Pooling, which inherits the advantages of globally averaged pooling to collect long term dependencies while paying more attention to the local details. In 2022 Chuyi Li et al [17] In YOLOv6, on the basis of SPPF, the activation function SiLU in the convolutional layer is replaced by ReLU, from which SimSPPF is proposed. in 2023CVPR, ChienYao Wang et al [18] proposed the SPPCSPC structure in YOLOv7, which replaces the activation function in the convolutional with ReLU on the basis of the original SPP module or adding the Sigmod function after the convolutional layer, resulting in two new convolutional structures, CMS/CBM, which then capture feature information at different scales through four different fields of view (5, 9, 13MaxPool, 1Conv).

In this study, the SPPF component of the original YOLOv5 structure is modified to SPPFCSPC-G, while the convolutional grouping is introduced, and the superiority of the module is demonstrated through various experiments. Meanwhile the question about whether the performance of the optimizer on home-made datasets is better or worse depends on the theoretical derivation of the computation or on the data itself is explored.

Data collection and processing

To ensure the inclusiveness and comprehensiveness of the rice disease dataset, three common rice diseases, including leaf blight, brown spot and rice blast, and healthy leaves were selected to construct the dataset in this study. A total of 476 original images were collected with a resolution of 300 pixels × 300 pixels or 300 pixels × 500 pixels.

To address the temporal and stochastic characteristics of diseased rice leaves in natural environments, collecting complete samples of diseased leaves can be challenging. In order to expand the sample size, improve the accuracy of model training, and prevent overfitting of the convolutional neural network, this study employs data enhancement techniques based on geometric transformations, such as HSV color processing, variable color temperature, Gaussian blur, and pretzel noise. Using data enhancement techniques, the rice pest dataset was doubled and the final training set train was about 800 images. The sample dataset is divided into training set, validation set and test set in the ratio of 8:1:1. Fig 1 shows the image-enhanced leaf blight samples.

thumbnail
Fig 1. Data enhancement processing.

(a) Original Image,(b) HSV,(c)Non-uniform Scaling, (d)Uniform Scaling, (e)Color Temperature Adjustment, (f)Gaussian Blurring,(g) Gaussian noise,(h)Salt-and-pepper noise,(i) Average pooling,(j)Horizontal and vertical flip,(k)Random rectangle occlusion,(l)Adaptive histogram equalization.

https://doi.org/10.1371/journal.pone.0295661.g001

Rice leaf disease identification based on YOLOv5 model

Construction of YOLOv5 model

YOLOv5 [16], as a powerful real-time target detection framework inherited from YOLOv4 [15], has good anti-jamming and robustness, which can effectively extract the features of the image as well as learn the feature information implied in the rice disease dataset to achieve better recognition results. The single-level object detector YOLOv5 generally consists of the following four basic parts: Input, BackBone, Neck and Head. Input uses Mosaic data enhancement and proposes an adaptive anchor frame computation and adaptive image scaling method; the main structure in BackBone-CSPDarkNet53 are Focus module, CBL module, C3 module, and SPP module; BackBone determines the feature representation capability and is mainly responsible for feature extraction from the input image; Neck is used for multi-scale feature fusion of low-level physical features with high-level semantic features, and then establishes pyramidal feature mapping at all levels and passes these features to the subsequent prediction end. Head, the prediction end, consists of several convolutional layers and predicts the dynamic detection results based on the multilevel features assembled at the neck for final regression prediction. A typical YOLOv5s is shown in Fig 2.

Spatial Pyramid Pooling study

Classical SPP structure.

SPP (Spatial Pyramid Pooling) [20], which is equivalent to a standard channel layer. SPP takes an accepted image of any size and extracts spatial feature information of different sizes through a set of standard pooling operations to form a fixed-size feature map, which can be used as an input to the fully connected layer. This improves the robustness of the model to spatial layout and object variability. The core operation of pooling in YOLOv5 is the (5×5, 9×9, 13×13) MaxPool operation. Since SPP is a serial structure, it is pooled only once.

SPPF (Spatial Pyramid Pooling Faster) [16] serves the same purpose as SPP except that SPPF is a parallel structure and is given in the following Fig. SPPF replaces the large core structure of 9×9, 13×13 in the original SPP by using two 5×5MaxPool small core structures and connecting three identical 5×5MaxPool components. This results in secondary pooling for richer feature fusion. At the same time, the computation amount of the model becomes much smaller, and the model speed is greatly improved.

SimSPPF (Simplified SPPF) [17] only replaces the activation function SiLU of the convolutional layer with ReLU in the SPPF of YOLOv5, and does not make any other adjustments. The structural information of the specific module is shown in Fig 3.

Improved SPP structure.

ASPP (Atrous SPP) [21] is an application of the SPP structure to semantic segmentation, which combines Atrous Convolution to expand the receptive field of the convolution kernel without loss of resolution (no downsampling up-sampling). The essence of ASPP is to use a 1 ×1Conv to firstly perform a 1×1 Input downsampling, and then convolving it with a convolutional layer with padding of 6/12/18, dilation of 6/12/18, and kernel sizes all 3×3. The Input is pooled to 1× 1 with a pooling layer of size Input, then downscaled with a 1×1 convolution, and finally upsampled back to the original input size.

SPPCSPC (SPP Cross Stage Partial Channel) [18] is a serial structure that consists of multiple CBS modules connected together with three Maxpool connections. The CBS modules are formed by the Conv layer, the BN layer with the SiLU activation function, and the sensory horizons of the MaxPool are (5×5, 9×9, 13×13).

The SPPFCSPC-G (SPPF Cross Stage Partial Channel-Group) proposed in this paper is composed of multiple grouped convolutional blocks (including 1×1, 3×3), with three serial 5× 5MaxPool layers with the same feeling horizon, connected in parallel with the other components for the secondary pooling operation. Grouped convolution takes the ordinary convolution in component SPPFCSPC-G and performs convolutional grouping by dividing the input feature maps into four groups equally by channel, g = 4. After grouping, the number of channels in each group of the input feature maps Channel is Cin/g. In each grouping, the number of channels within each convolutional kernel is also reduced to Cin/g. This results in a drop in the number of operations FLOPS = (Cin/g) × Cout × Of for the entire grouped convolution over a pixel and the number of parameters for the entire convolution. The channels within each group perform independent convolution operations, and by doing so the computation can be spread across different groups, thus reducing the overall computation. Specific experimental data for paramenters such as FLOPS and parameters can be found in Chapter-Experimental Analysis of Calculated Parameters. (1) Of is the number of operations of a convolutional kernel on top of the complete feature map.

Size of convolution sum is Kernel = K1 × K2.

Because YOLOv5 contains a large number of convolution and pooling operations, the parallel computing power of GPUs and strong floating-point computing power can quickly complete the computation, reading, and writing of data. Therefore, we train the proposed method on the rice disease dataset using Tesla T4 for each model. SPPFCSPC-G, grouped convolution, and other ideas about the improvement of the SPP module are visualized in Fig 4.

thumbnail
Fig 4. Improved SPP structure image.

(a)ASPP,(b)SPPCSPC,(c)SPPFCSPC-G.

https://doi.org/10.1371/journal.pone.0295661.g004

Optimization

Stochastic Gradient Descent (SGD).

SGD [24] goes to update with the same learning rate, and uses gradient descent update for individual samples at each parameter update, only one sample at a time, which does not have to recalculate the gradient as in Batch gradient descent (BGD), which can offset the redundant computation caused by the large dataset, and the training speed is faster. However, this will make the SGD is not every iteration toward the overall optimization, the accuracy will be reduced; the advantage is that in the case of a large number of samples, only part of a few samples can be iterated to the optimal solution, the SGD gradient update rule is as follows: (2)

Training examples x(i) and labeling examples y(i) Perform parameter updates.

Adaptive Moment Estimation (Adam).

Adaptive Moment Estimation (Adam) [25] is a widely used optimization algorithm that combines the advantages of momentum and adaptive learning rate. It calculates the adaptive learning rate for each parameter by using first- and second-order moment estimation of the gradient, thus dynamically adjusting the learning rate to correct the bias. The computational procedure of Adam can be summarized as follows: (3) (4)

When the sum is initialized as a zero vector, it will be biased towards zero, so it is necessary to bias-correct the first and second moment estimates using and to offset these biases. According to the above equation, the parameters are continuously updated to obtain Adam’s gradient update rule as follows: (5) mt- First-order momentum, the first moment of the gradient, the exponentially shifted mean of the gradient direction at each moment.

vt—second-order moment estimate, an estimate of the uncentered variance of the gradient.

Setting of hyperparameters: first-order momentum decay coefficient β1 = 0.9; second-order momentum decay coefficient β2 = 0.999; set equilibrium factor ε = 10−8 to keep the value stable.

Loss

The final loss function Lall = λ1Lcls + λ2Lbox + λ3Lobj of YOLOv5 mainly consists of a weighted summation of Objectness Loss, Class probability Loss and Bounding box Loss, in which Lbox and Lcls are calculated using BCEWithLogits loss, Lbox calculates the objective loss of all samples, and Lcls calculates the classification loss of the positive samples only, which is calculated as shown in Eq (6).

(6)

Where the localization loss Lobj is calculated using CIoU loss [26], only the localization loss of positive samples is calculated as shown in Eq (7).

(7)(8)(9)

λ1λ2λ3—weight of the multi-task loss function

σ(x)—Computational rules for the Sigmoid function σ(x) = 1/(1 + ex)

b, w, h—represents the center point, width, height

pre, gt—represents predicted box and ground truth box

c—it is the diagonal distance between the minimum bounding box that encloses both the predicted and true bounding boxes.

v—represents the normalized difference between the aspect ratios of the predicted box and the ground truth box.

α—Balance factor, Weighting factor to balance the loss caused by aspect ratio and the loss caused by IoU part.

Experimental results and analysis

Experimental analysis of SPP

The SPP module was used to connect the BackBone and Neck in the entire network structure of YOLOv5s. Among all the models, most of the YOLOv5s’ BackBones were CSPDarkNet53. The project team also compared the performance of YOLOv5s with Resnet-34/VGG-16 embedded and other SPP modules, as shown in Table 1. From Table 1, it can be seen that the method proposed in this paper performed the best in terms of performance.

thumbnail
Table 1. Performance comparison of various SPP structures.

https://doi.org/10.1371/journal.pone.0295661.t001

In the context of the Adam optimizer, the SPPFCSPC-G model achieved superior performance across all parameter metrics, including Precision, Recall, mAP_0.5, and mAP_0.5:0.95, surpassing all other SPP structures. The SPPFCSPC-G model demonstrated significant improvements in all metrics compared to the second-ranked model in each category, with increases of 1.0% for Precision, 1.2% for Recall, 1.2% for mAP_0.5, and 0.8% for mAP_0.5:0.95. Moreover, the SPPFCSPC-G model exhibited an average training speed 12.4% faster than the second-ranked SPPCSPC model. This means that SPPFCSPC-G only required 43.70 seconds to complete the entire process of initializing parameters, forward propagation, loss computation, backward propagation, and parameter updates for 800 images.

In the SGD optimizer, the SPPFCSPC-G model also demonstrated the best Precision and Recall metrics, eventually converging to around 96.20% and 99.22%, respectively. The mAP_0.5 metric did not show significant differences among the nine models. However, mAP_0.5:0.95 decreased by 0.2%. Nevertheless, when considering other parameters, such as Loss, F1_Score, Confusion-Matrix, Param., Parameters, GFLOPs, and the actual detection performance, the decrease in training speed could be considered negligible. Detailed results were provided in this subsequent section.

Experimental analysis of optimizer

Comparison of recognition precision performance.

Adam, as an amalgamation of SGD-M and RMSprop, combined the advantages of momentum and adaptive learning rate, making it an optimization method that calculates an adaptive learning rate for each parameter. The theoretical calculations suggested that it should outperform SGD. However, the performance of optimizers on a dataset depends not only on theoretical derivations but also significantly on the characteristics of the data itself. By combining Table 2 and Table 1, it was evident that on the rice pest dataset, SGD significantly outperformed Adam on various parameter metrics (Precision, Recall, mAP_0.5), improving by over 25% in nearly all cases. Moreover, the difference in mAP_0.5:0.95 reached 50%. This indicates that, in specific scenarios, the characteristics of the data can play a crucial role in determining the performance of the optimizer.

thumbnail
Table 2. Comparison of recognition precision performance of four objects in SGD and Adam.

https://doi.org/10.1371/journal.pone.0295661.t002

F1_Score and Confusion Matrix.

F-score is a commonly used metric for evaluating the precision of an algorithm, which provides a comprehensive performance evaluation metric that reflects the performance of an algorithm in a balanced way by considering both precision P and recall R, and by calculating the reconciled average of the two. Its calculation principle is F1_Score = 2PR/(P + R).

Confusion Matrix is a matrix table intuitively used for evaluating the performance of a classification model for visualizing and summarizing the predictions of a classifier, which compares the true labels from the rice disease leaf test set test with the predictions of the model and displays them in a matrix. The rows of the confusion matrix represent the true labels and the columns represent the model predictions. The number in each cell of the main diagonal represents the intersection of the true label and the model prediction result.

By examining the trends of all the data presented below, it became evident that the performance of the SGD optimizer outperformed the Adam optimizer on the dataset used in this study.

Among the metrics considered were Precision, Recall, P-R curve, F1_Score, and Confusion-Matrix, and the performance of the SGD optimizer consistently outperformed the Adam optimizer on the dataset used in this study.

Experimental analysis of the loss function

Loss is used to measure the difference between the true and predicted values between models, and to some extent determines the merit of the model. The training process of the entire experiment was iterated for 400 epochs. To ensure the stability of the model, the warm-up learning strategy was employed for the first 30 epochs, setting the initial learning rate (lr0) to 0.01. Subsequently, the cosine annealing algorithm controlled the decay of the learning rate, with a hyperparameter of lrf = 0.1, and the minimum learning rate was set to 0.001. After 400 epochs of training, the loss curve obtained is shown in Fig 5. In this Fig, box_loss represents the magnitude of the difference between the model-predicted bounding box and the true bounding box (GIoU), cls_loss is the classification loss used to calculate whether the anchor box is correctly classified with the corresponding calibration, and obj_loss is the confidence loss, which supervises the presence of Grid objects and calculates the confidence level of the network.

thumbnail
Fig 5. Details of model training of YOLOv5s-SPPCSPC-G on the dataset.

(a)SPPCSPC-G_Adam,(b)SPPCSPC-G_SGD,(c)SPPCSPC-G_Adam,(d)SPPCSPC-G_SGD.

https://doi.org/10.1371/journal.pone.0295661.g005

As can be seen in Fig 6, when the epoch is 300/200/100 rounds, the Bounding box Loss, Objectness Loss, and Class probability Loss values of the YOLOv5s-SPPFCSPC-G network have stopped decreasing, reached the convergence point, and stabilized. This indicates that the SPPFCSPC-G model has learned an optimal solution on the given rice diseased leaf training data, and the model has successfully captured the important features in the rice diseased leaf dataset, and thus learned the nature of the data. Also on the TRAIN our SPPFCSPC-G model has the best convergence.

thumbnail
Fig 6. Comparison of loss functions on the training set.

(a)bounding box Loss, (b)objectness Loss,(c)class probability Loss.

https://doi.org/10.1371/journal.pone.0295661.g006

Experimental analysis of computational parameters

Training.

Considering that this work will be applied to mobile detection in the future, it provides detection models for industrialized real-world applications. We focus primarily on the complexity and computation of all models after deployment and secondarily on the speed of the models.

In YOLOv5s, we use different optimizers (SGD, Adam) and the same learning scheme (warm-up warm-up learning strategy, cosine annealing algorithm) to train the designed models separately. All our models are trained on 1 NVIDIA Tesla T4 graphics processor and the speed performance is also measured on NVIDIA Tesla T4 graphics processor, the training results are shown in Table 3.

thumbnail
Table 3. Parameters when training the model using Tesla T4.

https://doi.org/10.1371/journal.pone.0295661.t003

SPPFCSPC_G after convolutional grouping and our optimization strategy has its Parameters, which are only 15.9% higher than SPP/SPPF/SimSPPF with 270 layers, but 65.1% lower than SPPCSPC/SPPFCSPC with the same number of layers, for a network structure with a depth of 293 layers. GFLOPs are the GFLOPs that are used in performing the number of floating-point operations (in billions of floating-point operations) required during the forward propagation of the model, and larger values of GFLOPs indicate that the model needs to perform more floating-point operations, and thus may require longer inference times. Despite the reduced parameters of our model, SPPFCSPC_G is the best performing model in the previous SPP experimental analysis chapters, which indicates that the expressive power of our model does not decrease as a result.

Also, the GFLOPs operation of our model is almost minimized, and the final SPPFCSPC_G model is only 15.8M for the actual model trained in Adam/SGD.

Reasoning.

Various models trained with Intel(R) Core-i5 were used to perform rice disease leaf image detection inference in YOLOv5s and the related data can be viewed in Table 4. All the models were trained with 400 epochs without pre-training or any external data. Both the accuracy and speed performance of the models were evaluated at an input resolution of 640 × 640.

thumbnail
Table 4. Inferred Data (in Intel(R) Core-i5 hardware environment).

https://doi.org/10.1371/journal.pone.0295661.t004

In this study, the project team used the trained SPPFCSPC-G_Adam and SPPFCSPC-G_SGD models to reason on a test dataset containing approximately 100 image samples in an Intel(R) Core(TM) i5 hardware environment to identify four classes of samples, including healthy, blight, spot, blast. the model provides class confidence for each recognized image, with a preprocessing speed of 3.2ms/image and a recognition speed of 0.44s/image for a single image with NMS = 1, but this is not the fastest speed. In all the previous experimental chapters, the performance of SPPFCSPC-G outperforms the other models in all other aspects, and because of this, the performance improvement that this model brings, the resulting speed reduction, is negligible.

As shown in Fig 7, the project team put the models trained by SPPFCSPC_G under Adam/SGD to reason about the detection of rice leaves in real scenarios, respectively. adam only detected more obvious target objects in disease detection of blight and spot, and the actual detection effect for small targets and dark objects was not ideal, while the class confidence of each image is also lower, but the detection effect using SGD is better.

thumbnail
Fig 7. Identification effect drawing.

(a)Healthy state,(b)Bacterial leaf blight,(c)Brown spot disease,(d)Rice blast disease.

https://doi.org/10.1371/journal.pone.0295661.g007

Conclusion

This study presents a novel approach for recognizing and detecting rice leaf diseases in natural environments using the improved YOLOv5-SPPFCSPC-G model. Initially, a series of data augmentation techniques were applied to the original rice images to expand the dataset. Next, the structure and enhancement methods of various SPP modules were reviewed. The new SPPFCSPC-G module was then designed by combining the structure of convolutional grouping and SPPCSPC, and embedded into the BackBone of YOLOv5s.

Thirteen different models were trained and the superiority of the SPPFCSPC-G model was demonstrated through various experiments, including SPP experiments, optimizer experiments, Loss experiments, and computational parameter experiments. The results were analyzed and compared from different perspectives. Furthermore, this paper explored the question of whether the optimizer’s performance on the dataset depends on the theoretical derivation of the computation or on the characteristics of the data itself.

Finally, the trained YOLOv5 model was applied for actual detection, and it showed good detection results. This approach effectively assisted in the detection and identification of rice diseases.

References

  1. 1. Zhongqiang Qi, Junjie Yu, Rongsheng Zhang, et al. Evaluation of resistance of new rice varieties (lines) and main varieties to rice blast in Jiangsu Province during 2016–2020 [J]. Jiangsu Agricultural Sciences,2020,50(01):91–96.
  2. 2. Junyu Yang, Xiaolong Chen, Lingling Gao, et al. Isolation and antagonism of endophytic bacteria from rice bacterial blight [J]. Jiangsu agricultural sciences,2017,45(21):100–105.
  3. 3. Lin Ding. International comparison and empirical analysis on the research level of three main rice diseases based on bibliometrics [D]. Chinese Academy of Agricultural Sciences,2013.
  4. 4. Song DaPeng. Research and Implementation of Rice Leaf Disease Detection Based on Deep Learning[D].Ningxia University in China,2022.
  5. 5. ZHANG Li-chao, MA Rong, ZHANG Yao-xin. Application of improved LeNet-5 model in apple image recognition [J]. Computer Engineering and Design, 2018,39(11):3570–3575.
  6. 6. Shilei Lv, Sihua Lu, Zhen Li, et al. Orange recognition method using improved YOLOv3-LITE lightweight neural network[J]. Transactions of the Chinese Society of Agricultural Engineering, 2019, 35(17):205–214.
  7. 7. Zhongzheng Fu, Xiao He, Kui Fang, et al.Study on the detection of broccoli leaves based on the improved SSD network[J].Journal of Chinese Agricultural Mechanization,2020,41(4):92–97.
  8. 8. Yueteng Cao, Xueyan Zhu, Yandong Zhao, et al.Recognition of plant leaf diseases,insect,and pests based on improved ResNet [J]. Journal of Chinese Agricultural Mechanization, 2021,42(12):175–181.
  9. 9. Chunming Li, Shanting Lu, Songling Yuan, et al.Weed identification algorithm of weeding robot based on Faster R-CNN [J]. Journal of Chinese Agricultural Mechanization,2019,40(12):171–176.
  10. 10. Lee H S, Shin B S. Potato Detection and Segmentation Based on Mask R-CNN[J]. Journal of Biosystems Engineering, 2020, 45(4):1–6.
  11. 11. Dandan Wang, Dongjian He. Recognition of apple targets before fruits thinning by robot based on R-FCN deep convolution neural network[J]. Transactions of the Chinese Society of Agricultural Engineering,2019,35(3):156–163.
  12. 12. Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi. You only look once:Unified,real-time object detection[J]. In Proceeding of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
  13. 13. Joseph Redmon, Ali Farhadi. YOLO9000:Better,Faster,Stronger[J]. In Proceedings of the IEEE conconference on computer vision and pattern recognition, pages 7263–7271, 2017.
  14. 14. Joseph Redmon, Ali Farhadi.YOLOv3: An Incremental Improvement[J]. arXive: 1804.02767,2018.
  15. 15. Alexey Bochkovskiy,Chien-Yao Wang,Hong-Yuan Mark Liao.YOLOv4: Optimal Speed and Accuracy of Object Detection[J].arXive: 2004.10934, 2020.
  16. 16. Jocher Glenn. YOLOv5 release v6.1. https://github.com/ultralytics/yolov5, 2020.
  17. 17. Chuyi Li, Lulu Li, Hongliang Jiang,et al.YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications[J].arXive: 2209.02976,2022.
  18. 18. Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao.YOLOv7: Trainable bag-of-freebies sets new state-of-the-artfor real-time object detectors. In Pro-ceedings of the lEEE/CVF Conference on Computer Visionand Pattern Recognition (CVPR), 2023.
  19. 19. Jocher Glenn. YOLOv8. https://github.com/ultralytics/ultralytics, 2023.
  20. 20. Kaiming He, Xiangyu Zhang, Shaoqing Ren, et al. Spatial Pyramid Pooling in Deep ConvolutionaNetworks for Visual Recognition[J].arXive: 1406.4729v4,2015.
  21. 21. Liang-Chieh Chen, George Papandreou, Senior Member. DeepLab: Semantic lmage Segmentation withDeep Convolutional Nets, Atrous Convolution.and Fully Connected CRFs[J].arXive: 1606.00915v2,2017.
  22. 22. Songtao Liu, Di Huang, and Yunhong Wang. Receptive Field Block Net for Accurate and FastObject Detection[J].arXive: 1711.07767v3,2018.
  23. 23. Qibin Houl, Li Zhang, Ming-Ming Cheng. Strip Pooling: Rethinking Spatial Pooling for Scene Parsing[J].In Pro-ceedings of the lEEE/CVF Conference on Computer Visionand Pattern Recognition (CVPR), 2020.
  24. 24. Sebastian Ruder. An overview of gradient descent optimizationalgorithms[J].arXiv:1609.04747v2,2017.
  25. 25. Kingma DP, Ba J. Adam: A methodfor stochastic optimization[J].arXiv preprintarXiv:1412.6980,2014.
  26. 26. Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, et al. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression[J]. arXiv:1902.09630v2,2019.