Multi-attention bidirectional contrastive learning method for unpaired image-to-image translation

doi:10.1371/journal.pone.0301580

Fig 1.

Example of the workflow for the MabCUT method.

Train and generate images from both the S → T and T → S directions, extract features using independent embedding blocks, and perform contrastive learning by querying key points with the attention extractor.

More »

Expand

Fig 2.

MabCUT framework.

The framework achieves bidirectional mapping by utilizing the mappings A : S → T and B : T → S, effectively enabling I2IT between unpaired images while avoiding the strict cycle consistency constraint. In this paper, we define generators A and B, where A_enc and B_enc serve as encoders, and A_dec and B_dec serve as decoders. By employing A_enc and H_S as the embedding blocks to extract features from various layers of the source domain, and utilizing B_enc and H_T as the embedding blocks for the target domain. The attention matrix selects multiple layers of features through queries and calculates the PatchNCE loss. Additionally, discriminators D_S and D_T compute the GAN loss.

More »

Expand

Fig 3.

The operational principle of global attention extractor.

Image features are extracted from S_s and A(S_s) using embedding blocks S and T, respectively. These features are then mapped to three-dimensional matrices F_s and F_t. Various operations, including reshaping and transposing, are applied to matrix F_s to derive a two-dimensional attention matrix M_g. N rows are selected based on the importance of each row in the matrix. These rows are then matched with the value matrices of the target and source domains to find the relevant important points, negative examples, and positive examples. The contrast loss is subsequently calculated. Among them, the feature blocks inside the blue, red, and green boxes represent key points, positive examples, and negative examples respectively.

More »

Expand

Fig 4.

(a) ResNet generator structure. (b) Depthwise convolution. (c) Pointwise convolution.

More »

Expand

Table 1.

Image data for the three datasets.

More »

Expand

Fig 5.

Comparison results on the Horse2Zebra dataset.

More »

Expand

Table 2.

FID and KID×100 scores on the Horse2Zebra dataset, with the best performance indicated in bold.

More »

Expand

Fig 6.

Comparison results on the Cat2Dog dataset.

More »

Expand

Table 3.

FID and KID×100 scores on the Cat2Dog dataset.

More »

Expand

Fig 7.

Comparison results on the Cityscapes dataset.

More »

Expand

Table 4.

FID and KID×100 scores on the Cityscapes dataset.

More »

Expand

Fig 8.

Qualitative ablation experiment.

Here, MabCUT represents the results of this model, and (A)→(E) are the I2IT results of each ablation module in sequence.

More »

Expand

Table 5.

The quantitative comparison results from ablation experiments.

In order to demonstrate the effects of each of our contributions on I2IT.

More »

Expand

Fig 9.

User study results.

In this paper, we consolidate and compute the proportional rankings provided by users across various models. Subsequently, we conduct a thorough analysis of the quality of these models using detailed graphical representations. The horizontal axis shows the percentage of ranks, while the vertical axis refers to the various models.

More »

Expand