Fig 1.
The working process of the OFIDA.
Training DynamicFocusNet with the MS-COCO 2017 dataset to achieve accurate classification and localization of target images (a). Evaluating the performance of DynamicFocusNet using the MS-COCO 2017 test set (b). Utilizing the trained DynamicFocusNet to detect and localize target images in original images (c), and employing a cropping technique to accurately separate detected objects from original images (d), enabling precise one-to-many image data augmentation of samples.
Fig 2.
Integrated view of the OFIDA framework and its modules.
Fig 3.
The framework of our head network.
Given a feature map X, RepConv conducts parameter reorganization, resulting in X′. Then, the content-aware attention module (CAAM) separates content-aware category representations M from X′. The Dynamic Graph Convolutional Network (D-GCN) models global and local relations in M, generating a robust representation P with rich relational information across categories. Object detection is performed by DETECT on X′, producing classification scores Cls and bounding box regression results Bbox. Finally, the classification scores Cls are averaged with S, yielding the final scores Y for each category.
Table 1.
Parameters setting.
Fig 4.
Visual examples of object-focused image data augmentation algorithm: Localization, classification, and separation of target regions from original images.
Table 2.
Performance comparison of the OFIDA and several SOTA data augmentation methods for image classification.
Table 3.
Performance evaluation of semantic segmentation on the PASCAL VOC 2012 validation set using mIoU.
Table 4.
Performance evaluation of semantic segmentation on the CITYSCAPES validation set using mIoU.
Table 5.
DynamicFocusNet performance evaluation on MS-COCO 2017 val set.