Fig 1.
Schematic of the improved DeepLabv3 + model.
The ASPP module is replaced with a multiscale attention aggregation (MSAA) mechanism, and the ELA module is added to the shallow feature extraction part to enhance feature extraction.
Fig 2.
StarNet follows the traditional hierarchical network, directly using convolutional layers to downsample the resolution, doubling the number of channels at each stage, and repeating multiple star blocks to extract features.
Fig 3.
Structure diagram of the star blocks.
A comparison between the star operation and the summation operation shows that the star operation consistently outperforms the summation operation, especially in narrower networks; this is attributed to its ability to map inputs to high-dimensional space without expanding the network width.
Fig 4.
Structural diagram of the multiscale attention aggregation (MSAA) module.
This module uses different sizes of convolution kernels, such as 3 × 3, 5 × 5, and 7 × 7, to perform multiscale analysis on feature maps, with small convolution kernels capturing fine details of small defects and large convolution kernels focusing on the overall shape of large defects.
Fig 5.
The structure of the ELA module and the process of performing multiscale analysis on feature maps via convolution kernels and related operations is shown.
Fig 6.
Sample images of steel defects.
From left to right are sample images of inclusions, patches, and scratches.
Table 1.
Performance comparison of the model with other semantic segmentation models.
Fig 7.
Charts comparing the mIoU values.
Table 2.
Computational Complexity and Model Size of StarNet with Modules.
Table 3.
Comparison of StarNet as the backbone network combined with modules.
Fig 8.
Curve of the change in model loss function.
Fig 9.
Comparative visualization of defect segmentation results (from left to right: original image, predictions of the improved model, ground truth, MobileNetV2 baseline, Xception baseline).