Fig 1.
Example of the workflow for the MabCUT method.
Train and generate images from both the S → T and T → S directions, extract features using independent embedding blocks, and perform contrastive learning by querying key points with the attention extractor.
Fig 2.
The framework achieves bidirectional mapping by utilizing the mappings A : S → T and B : T → S, effectively enabling I2IT between unpaired images while avoiding the strict cycle consistency constraint. In this paper, we define generators A and B, where Aenc and Benc serve as encoders, and Adec and Bdec serve as decoders. By employing Aenc and HS as the embedding blocks to extract features from various layers of the source domain, and utilizing Benc and HT as the embedding blocks for the target domain. The attention matrix selects multiple layers of features through queries and calculates the PatchNCE loss. Additionally, discriminators DS and DT compute the GAN loss.
Fig 3.
The operational principle of global attention extractor.
Image features are extracted from Ss and A(Ss) using embedding blocks S and T, respectively. These features are then mapped to three-dimensional matrices Fs and Ft. Various operations, including reshaping and transposing, are applied to matrix Fs to derive a two-dimensional attention matrix Mg. N rows are selected based on the importance of each row in the matrix. These rows are then matched with the value matrices of the target and source domains to find the relevant important points, negative examples, and positive examples. The contrast loss is subsequently calculated. Among them, the feature blocks inside the blue, red, and green boxes represent key points, positive examples, and negative examples respectively.
Fig 4.
(a) ResNet generator structure. (b) Depthwise convolution. (c) Pointwise convolution.
Table 1.
Image data for the three datasets.
Fig 5.
Comparison results on the Horse2Zebra dataset.
Table 2.
FID and KID×100 scores on the Horse2Zebra dataset, with the best performance indicated in bold.
Fig 6.
Comparison results on the Cat2Dog dataset.
Table 3.
FID and KID×100 scores on the Cat2Dog dataset.
Fig 7.
Comparison results on the Cityscapes dataset.
Table 4.
FID and KID×100 scores on the Cityscapes dataset.
Fig 8.
Qualitative ablation experiment.
Here, MabCUT represents the results of this model, and (A)→(E) are the I2IT results of each ablation module in sequence.
Table 5.
The quantitative comparison results from ablation experiments.
In order to demonstrate the effects of each of our contributions on I2IT.
Fig 9.
In this paper, we consolidate and compute the proportional rankings provided by users across various models. Subsequently, we conduct a thorough analysis of the quality of these models using detailed graphical representations. The horizontal axis shows the percentage of ranks, while the vertical axis refers to the various models.