Fig 1.
Thermal maps demonstrate the effectiveness of each improvement component. The enhanced model successfully captures defect information at shallow layers while maintaining focus on defect features through multi-scale fusion and attention mechanisms.
Fig 2.
SAConv module architecture and computational flow.
The module employs multi-scale kernel operations followed by adaptive pooling and attention mechanisms for enhanced shallow feature extraction.
Fig 3.
Geometric configuration for -FEIoU computation.
The predicted bounding box (blue), ground truth box (red), and minimum enclosing rectangle (yellow) define the spatial relationships used in loss calculation. Parameters b, w, and h represent box centers, widths, and heights respectively.
Fig 4.
LSKA module architecture and computational flow.
The module processes input features through cascaded depth-wise convolutions: standard DW-Conv followed by dilated DW-D-Conv operations. Results are concatenated with the original feature map after
convolution to produce the final attended output. Parameters: C denotes input channels, H and W represent spatial dimensions, d controls dilation rate, and k defines maximum receptive field extent.
Fig 5.
WASPP module architecture and multi-scale feature integration.
The module employs parallel convolutional branches with varying receptive fields, followed by adaptive weighting mechanisms and feature concatenation. Each pathway contributes scale-specific information that is selectively emphasized through sigmoid-based attention before final fusion.
Fig 6.
Representative defect categories in NEU-DET dataset.
Each class exhibits distinct morphological characteristics and varying degrees of visual complexity, with irregular spatial distributions that challenge detection algorithms.
Fig 7.
Defect category distribution in GC10-DET dataset.
The ten defect classes represent diverse steel surface anomalies with varying scales, textures, and morphological characteristics.
Table 1.
mAP values under different losses with a training-to-testing ratio of 9:1.
Table 2.
mAP values under different losses with a training-to-testing ratio of 8:2.
Table 3.
Effects of different LSKA convolution kernel in NEU-DET Dataset.
Table 4.
Effects of different LSKA convolution kernel in GC10-DET Dataset.
Table 5.
The ablation results of each module.
Fig 8.
Comparative feature extraction visualization through HiResCam analysis.
Heat maps demonstrate superior defect localization capabilities of our proposed architecture compared to baseline YOLOv8n across representative defect categories, revealing enhanced sensitivity to subtle anomalies and improved spatial feature extraction.
Table 6.
Performance comparison with state-of-the-art detection methods on NEU-DET dataset.
Table 7.
Generalization performance comparison across different detection architectures.
Fig 9.
Comparative visualization of detection performance across different methods on industrial defect samples.
Table 8.
Statistical significance analysis of ablation components across 5 random splits on NEU-DET dataset.
The evaluation indicators mainly include the mean (), standard deviation (
), 95% confidence interval, improvement over baseline (
), and p-value of mAP scores.
Table 9.
Statistical analysis of experimental results across 5 random splits.
The evaluation indicators mainly include the mean (), standard deviation (
), and 95% confidence interval of mAP scores.
Fig 10.
Confusion matrices demonstrating classification performance under different training-testing data splits.