MSRRT-DETR: A high-precision apple detection method with strong cross-domain generalization capability in complex orchard scenes

doi:10.1371/journal.pone.0342854

Fig 1.

Sample images from TSAppleData Dataset, showing various growth stages, lighting conditions, occlusion types, and shooting distances.

More »

Expand

Fig 2.

Schematic diagram of partial data augmentation methods for Online Augmentation.

More »

Expand

Fig 3.

Schematic diagram of the MSRRT-DETR overall architecture, where ResBlock denotes the residual module.

More »

Expand

Fig 4.

Schematic diagram of ResNet architecture.

More »

Expand

Fig 5.

Schematic diagram of MSBlock architecture.

More »

Expand

Fig 6.

Schematic diagram of the SCSA attention mechanism structure.

More »

Expand

Fig 7.

Schematic illustration of the Efficient RepGFPN architecture.

More »

Expand

Fig 8.

Structural illustration of the CSPStage and Rep modules.

More »

Expand

Table 1.

Ablation Study on the TSAppleData Dataset.

More »

Expand

Table 2.

Performance comparison of different attention mechanisms on MSRRT-DETR evaluated on the TSAppleData dataset.

More »

Expand

Fig 9.

Feature response heatmaps of different attention mechanisms in apple detection across multiple scenarios.

More »

Expand

Fig 10.

Comparison of detection results before and after model improvement for apples at different growth stages and spatial distributions, along with typical false detection examples.

The first row shows original images, the second and third rows display inference results from the improved and original models respectively, while the fourth row presents enlarged views of error regions from the original model (marked by gray boxes). Undetected targets are indicated by orange boxes, and false detections by orange circles.

More »

Expand

Table 3.

Performance comparison of MSRRT-DETR versus mainstream object detection models on the TSAppleData dataset.

More »

Expand

Fig 11.

Performance comparison of different object detection models in terms of FPS, mAP50, parameter count and Composite Score.

The horizontal axis represents the model’s FPS value, while the vertical axis represents the model’s mAP50 value. The circle size indicates the model’s parameter count (model complexity), with larger circles representing higher parameter counts. The color depth of the circles represents the model’s comprehensive score, where darker colors indicate better overall performance in both accuracy and speed.

More »