Fig 1.
Features extracted from the self-supervised backbone.
(a) Input image. (b) A channel that is suitable for foreground-background segmentation. (c) A random channel. (d) A promising channel for instance segmentation. (e) A channel suitable for instance segmentation, multiplied by the foreground mask. As shown, instances in the image are distinguishable by their pixel value.
Fig 2.
Potential appropriateness of channels for intense segmentation.
As depicted in this figure, a specific channel shows promising potential for performing well in the instance segmentation task. These results were analyzed across different instances.
Fig 3.
Pipelines of discussed deep spectral methods.
(a) Workflow of [18]. An affinity matrix is created using a dot product with features from a self-supervised backbone. Eigenvectors of the Laplacian matrix derived from the affinity matrix are utilized for segmentation tasks. (b) Application of the proposed NCR module on the features from the self-supervised backbone to remove noisy channels. The Fiedler eigenvector is then employed for foreground-background segmentation. (c) Pipeline for instance segmentation. Stable feature map channels are further reduced based on their standard deviation to enhance feature richness. The resulting feature map is multiplied by the foreground mask, and the affinity matrix is created using the BoC metric. Finally, pixels are clustered using the eigenvectors of the Laplacian matrix, resulting in instance segmentation.
Fig 4.
Visualization of some channels from the self-supervised backbone.
As evident in this figure, lower entropy corresponds to a better representation of objects in images, while higher entropy results in a more unclear representation resembling noise.
Fig 5.
Influence of DCR on the distinction between instances in YouTube-VIS 2019 dataset.
As depicted by the orange curve, the standard deviation of the channels diminishes while simultaneously, Δ, which represents the average difference between instances, also decreases. Consequently, channels with a higher standard deviation will probably display a more significant average difference between instances.
Fig 6.
Qualitative results for instance segmentation, when using dot product to create the affinity matrix.
As illustrated, in some channels from the feature map, there are pixels with very high or low values. Using dot product sensitive to these values can lead to incorrect instance segmentation outputs.
Fig 7.
Comparison of dot product and Bray-Curtis metrics in creating the affinity matrix.
Three pixels belonging to the same instance are selected from the feature maps, and random noise is added to some of their channels. The affinity matrix created by the dot product is unsuitable for our purpose. In contrast, the matrix created by the Bray-Curtis matrix correctly demonstrates the correlation between the three pixels.
Fig 8.
Evaluation of the NCR module on the YouTube-VIS 2019 dataset.
Results of Fg-Bg segmentation for various values of M. Channels are sorted in ascending order based on their entropy.
Fig 9.
Evaluation of the NCR module on the PascalVOC 2012 dataset.
Results of Fg-Bg segmentation for different values of M. Channels are arranged in ascending order based on their entropy.
Table 1.
F-score results for Fg-Bg segmentation, considering different values of M, with and without post-processing.
Fig 10.
Qualitative outcomes of Fg-Bg segmentation on Youtube-VIS 2019 dataset.
As illustrated, Percentages indicate the proportion of channels preserved from NCR. Precise number of channels to be retained varies for each dataset. It is determined during the generation of final masks.
Table 2.
A comparison of instance segmentation results between different metrics for creating the affinity matrix, with the proposed metric, regarding mIoU on the Youtube-VIS 2019 and OVIS datasets.
Table 3.
Quantitative results for different metrics under varying levels of occlusion in terms of mIoU on the Youtube-VIS 2019 dataset.
Table 4.
Quantitative results for instance segmentation considering different ratio values, in terms of mIoU on the Youtube-VIS 2019 dataset.
Fig 11.
Results of instance segmentation under various occlusion conditions.
Instance segmentation results under varying levels of occlusion, represented by the MBOR value while utilizing different metrics for creating the affinity matrix. The proposed metric, BoC, outperforms other metrics and produces more accurate masks, even in scenarios with heavy occlusions.
Fig 12.
Results of instance segmentation across different scale ratio levels.
Extracted masks for different metrics on the Youtube-VIS 2019 dataset. A lower value of ratio indicates greater variation in object sizes.
Table 5.
Quantitative results for different metrics under varying levels of distance between instances are presented in terms of mIoU on the Youtube-VIS 2019 dataset.
Table 6.
Quantitative results for different metrics under varying levels of FG/BG smoothness in terms of mIoU on the Youtube-VIS 2019 dataset.
Fig 13.
Ratio of variance of intra-instance similarity to inter-instance similarity for different similarity metrics.
As depicted in this figure, the BoC metric showed a reduced mR value compared to alternative metrics. This finding confirms that the BoC metric captures intra-instance and inter-instance similarities more effectively than other metrics, thereby enhancing instance segmentation.
Table 7.
An ablation study to analyze the impact of proposed components on mIoU.
Fig 14.
An visual ablation study to analyze the impact of proposed BoC metric.