Fig 1.
Direct multimodal registration of distinctly different modalities is a challenging task (middle path). In this study, we evaluate if, instead, performing I2I translation of one modality to another may lead to a simpler monomodal registration problem (either of the two peripheral paths).
Fig 2.
Examples of pairs of original (2D) images acquired by different modalities considered in this study (contrast enhanced for visualisation).
Zurich (remote sensing) dataset: (A) modality A: NIR, (B) modality B: RGB. Cytological data: (C) modality A: Fluorescence microscopy, (D) modality B: QPI. Histological data: (E) modality A: SHG, (F) modality B: BF.
Fig 3.
Example of pair of (slices of) original 3D radiological images from the RIRE dataset acquired by different modalities considered in this study.
(A) Modality A: T1 MR image, (B) modality B: T2 MR image.
Table 1.
Success of modality translation methods expressed in terms of Fréchet Inception Distance (FID).
Smaller is better. Standard deviations are taken over the 3 folds for Zurich, Cytological and Radiological data. cyc, drit, p2p, star, comir denote the methods CycleGAN, DRIT++, pix2pix, StarGANv2, and CoMIR, respectively. Suffix _A (resp. _B) denotes generated Modality A (resp. B). The best result achieved by an I2I translation method on each of the datasets is bolded. FID values between the initial considered multimodal image datasets (B2A), as well as between the training and testing splits within each modality (A2A and B2B) for each dataset, are included as references. FID values between the generated CoMIR representations are not directly comparable to those of the I2I translations, since the method generates a different (artificial) modality. Comparison with the values for the original multimodal data (B2A) confirms considerable reduction of FID.
Fig 4.
Modality-translated image samples of the Zurich dataset by different evaluated methods (contrast-enhanced for visualisation).
Each row shows the results on one random image from each fold. Images in Columns 2–6 are generated from the images in (the corresponding row of) Column 1. Top block: Translations generated from Modality B. Bottom block: Translations generated from Modality A. The arrows indicate what to compare for visual inspection of the level of achieved similarity (pointing from generated images to the corresponding target of the learning process).
Fig 5.
Modality-translated image samples of the Cytological dataset by different evaluated methods (contrast-enhanced for visualisation).
Each row shows the results on one random image from each fold. Images in Columns 2–6 are generated from the images in (the corresponding row of) Column 1. Top block: Translations generated from Modality B. Bottom block: Translations generated from Modality A. The arrows indicate what to compare for visual inspection of the level of achieved similarity (pointing from generated images to the corresponding target of the learning process).
Fig 6.
Modality-translated image samples of the Histological dataset by different evaluated methods (contrast-enhanced for visualisation).
Each row shows the results on one random image from each fold. Images in Columns 2–6 are generated from the images in (the corresponding row of) Column 1. Top block: Translations generated from Modality B. Bottom block: Translations generated from Modality A. The arrows indicate what to compare for visual inspection of the level of achieved similarity (pointing from generated images to the corresponding target of the learning process).
Fig 7.
Modality-translated image samples of the Radiological dataset by different evaluated methods.
Each row shows the results on one random slice from each fold. Images in Columns 2–6 are generated from the images in the corresponding row of Column 1. Top block: Translations generated from Modality B. Bottom block: Translations generated from Modality A. The arrows indicate what to compare for visual inspection of the level of achieved similarity (pointing from generated images to the corresponding target of the learning process).
Fig 8.
Success rate of the observed registration approaches.
(A) SIFT on Zurich data, (B) α-AMD on Zurich data, (C) SIFT on Cytological data, (D) α-AMD on Cytological data, (E) SIFT on Histological data, (F) α-AMD on Histological data. x-axis: initial displacement dInit between moving and fixed images, discretised into 10 equally sized bins (marked by vertical dotted lines). y-axis: success rate λ within each bin (averaged over 3 folds for Zurich and Cytological data). In the legend, cyc, drit, p2p, star and comir denote CycleGAN, DRIT++, pix2pix, StarGANv2, and CoMIR methods respectively. Suffix _A (resp. _B) denotes that generated Modality A (resp. B) is used for the (monomodal) registration. B2A denotes registration of the original multimodal images, without using any modality translation. MI, MIND, NGF and CA represent using MI maximisation, MIND, NGF and CurveAlign for registration, respectively.
Fig 9.
Success rate of the observed 3D registration approaches on Radiological data.
x-axis: initial displacement dInit between moving and fixed images, discretised into 10 equally sized bins (marked by vertical dotted lines). y-axis: success rate λ within each bin (averaged over 3 folds). In the legend, cyc, drit, p2p, star and comir denote CycleGAN, DRIT++, pix2pix, StarGANv2, and CoMIR methods respectively. Suffix _A (resp. _B) denotes that generated Modality A (resp. B) is used for the (monomodal) registration. B2A denotes registration of the original multimodal images, without using any modality translation. MI, MIND and NGF represent using MI maximisation, MIND and maximisation of the similarity of NGF for registration, respectively.
Table 2.
Overall registration success rate (in percent) for the evaluated methods on four datasets.
Larger is better. The success rate λ is aggregated over all transformation levels for each dataset. Standard deviations are taken over the 3 folds for Zurich, Cytological and Radiological data. cyc, drit, p2p, star, comir, MIND(α-AMD),MIND(MSD), NGF, MI and CA denote the methods CycleGAN, DRIT++, pix2pix, StarGANv2, CoMIR, MIND+α-AMD-based registr., MIND+MSD-based registr., NGF, MI maximisation and CurveAlign, respectively. _A (resp. _B) denotes using generated Modality A (resp. B) for registration. B2A refers to the multimodal registration performance on the acquired images without modality translation. MI, MIND and NGF provide reference performance of good conventional multimodal registration methods. For each dataset, the best I2I-based approach, as well as the overall best performing (multimodal) approach, are bolded.
Fig 10.
Relation between average FID reached by a modality translation method and the success rate λ of the subsequent registration.
(A) On Zurich data, (B) on Cytological data, (C) on Histological data, (D) on Radiological data. In the legend, cyc, drit, p2p, star and comir denote the methods CycleGAN, DRIT++, pix2pix, StarGANv2 and CoMIR, respectively. Suffix _A (resp. _B) denotes that generated Modality A (resp. B) is used in (monomodal) registration. The marker style indicates whether α-AMD (aAMD) or SIFT (SIFT) is used for the registration. The error-bars correspond to standard deviation computed over 3 folds for Zurich, Cytological and Radiological data.