Fig 1.
CrossSwap model architecture overview.
In details of ID extractor, IT, ET, and PT denote identity, expression, and pose coefficients of target face (IS, ES, and PS for source face).
Fig 2.
U-net dimensions in CrossSwap.
Fig 3.
Architecture of cross-attention generator in detail. At layer 0 (the bottleneck feature part), the operations on the left side are used and from layer 1 upwards, right-side operations are implemented.
Fig 4.
Cross-attention generator dimensions in CrossSwap.
Where D denotes the discriminator.
Table 1.
Performance on Pose, ID, FID, Expression metrics of CrossSwap and the baseline methods. ↓ indicates the lower, the better performance, and ↑indicates the higher, the better.
Table 2.
Comparison table of identity metric between ArcFace embedding and shape-aware identity embedding. The two models showed similar results in the identity preservation part.