Fig 1.
The complete pipeline of SinGAN-Seg to generate synthetic segmentation datasets.
Step 1 represents the training of four channels SinGAN-Seg models. Step 2 represents a fine-tuning step using the neural style transfer [57]. The four channels SinGAN is a single training step of our model. Note the stacked input and output compared to the original SinGAN implementation [56] which input only a single image with a noise vector and output only an image. In our SinGAN implementation, all the generators (from G0 to GN−1), except GN, get four channels image (a polyp image and a ground truth) as the input in addition to the input noise vector. The first generator, GN get only the noise vector as the input. The discriminators also get four channels images which consist of an RGB polyp image and a binary mask as input. The inputs to the discriminators can be either real or fake.
Fig 2.
Sample images and corresponding segmentation masks from HyperKvasir [51].
Fig 3.
Distribution of true pixel percentages of the full image size of polyp masks in HyperKvasir [51].
Fig 4.
Samples of real images and corresponding SinGAN-Seg generated synthetic GI-tract images with corresponding masks. The first column contains real images and masks. All other columns represent randomly generated synthetic data from our SinGAN-Seg models, which were trained using the image in the first column.
Fig 5.
Analyzing diversity of generated data.
Real masks are presented on left column. Mean(middle column) and standard deviation (right column) calculated from 10 random masks generated from SinGAN-Seg.
Fig 6.
Distribution of the true pixel percentages calculated from 10, 000 synthetic masks generated with synthetic images using SinGAN-Seg.
The 10, 000 generated images represent the 1, 000 real polyp images. From each real image, 10 synthetic samples were generated. The synthetic 10, 000 dataset can be downloaded from https://osf.io/xrgz8/.
Fig 7.
Direct generations of SinGAN-Seg versus style transferred samples.
The style transferring was performed using 1: 1, 000 content to style ratio. The first row of generated images present quality of images before applying the style transferring and the second row of the same image shows images after applying style transferring. It can be observed that the second row with the style transferring gives better quality.
Table 1.
SIFID value comparison for real versus fake images generated from the SinGAN-Seg models.
Fig 8.
The experiment setup to analyze the quality of SinGAN output.
Experiment 01—the baseline experiments performed only using the real data. Experiment 02—in this experiment, generated synthetic data is used to train segmentation models, and the real data is used to measure the performance metrics.
Fig 9.
Real versus synthetic data performance comparison with UNet++ and the effect of applying the style-transferring post processing.
Note that Y-axis starts from 0.70 for better visualization of differences.
Table 2.
Three-fold average of basic metrics to compare real versus synthetic performance with UNet++ and the effect of style-transfers performance.
Fig 10.
Distribution comparison between real (top row) and synthetic (bottom row) masks.
Synthetic masks were generated using the SinGAN-Seg.
Fig 11.
Real versus fake performance comparison with small training datasets.
Fake datasets are generated with the style transfer method using content: style ratio of 1: 1, 000.
Table 3.
Real versus fake comparisons for small datasets after applying the style transfer method with a 1: 1000 ratio for fake data.
Fig 12.
Sample images generated from different GAN architectures.
SinGAN-Seg has two versions: one is without style transfer (SinGAN-Seg) and one is with style transfer (SinGAN-Seg-ST). The best ratio of content: style = 1: 1, 000 was used for transferring style.
Table 4.
FID value comparison between the real dataset of 1000 real images and the synthetic datasets of 1000 synthetic images generated from different GAN architectures which are modified to generate four channels outputs.
Fig 13.
FastGAN versus SinGAN-Seg and SinGAN-Seg-ST.
SinGAN-Seg-ST represents SinGAN-Seg with style transfer of 1: 1000.
Table 5.
FID value calculations between real and synthetic datasets generated from FastGAN, SinGAN-Seg and SinGAN-Seg-ST trained with small datasets.