3D fluorescence microscopy data synthesis for segmentation and benchmarking

Automated image processing approaches are indispensable for many biomedical experiments and help to cope with the increasing amount of microscopy image data in a fast and reproducible way. Especially state-of-the-art deep learning-based approaches most often require large amounts of annotated training data to produce accurate and generalist outputs, but they are often compromised by the general lack of those annotated data sets. In this work, we propose how conditional generative adversarial networks can be utilized to generate realistic image data for 3D fluorescence microscopy from annotation masks of 3D cellular structures. In combination with mask simulation approaches, we demonstrate the generation of fully-annotated 3D microscopy data sets that we make publicly available for training or benchmarking. An additional positional conditioning of the cellular structures enables the reconstruction of position-dependent intensity characteristics and allows to generate image data of different quality levels. A patch-wise working principle and a subsequent full-size reassemble strategy is used to generate image data of arbitrary size and different organisms. We present this as a proof-of-concept for the automated generation of fully-annotated training data sets requiring only a minimum of manual interaction to alleviate the need of manual annotations.

When discussing spherical harmonics, it may be useful for a general audience to compare to Fourier Transform, a more familiar weighted sum of basis functions. Though, the presentation given is already done well. We agree that the concept of the Fourier transform is very likely more familiar to the reader, and we added a hint to the similarities between both concepts to the spherical harmonic section. Thanks for pointing this out. However, we feel that a full comparison between both approaches would lead to an overly complex and lengthy explanation, that might lead to confusion, as Fourier transform is not used later on.
Line 202: The "in conclusion" seems unnecessary This phrase was removed.
Line 209: It would be good to mention the accuracy measures that will be used here. This should refer to the general quality of the generated structures. However, we think that this sentence is rather confusing and does not provide reasonable information, which is why we removed it.  Line 216: It would be useful to mention the imaging lens numerical aperture and physical pixel sizes here from ref 32. for quick understanding of the spatial resolution of the data sets. Good point, we missed this information entirely. The missing information was added to the data set description.

3/4
Line 222: It would be useful to mention this data set was acquired with a multi-photon fluorescence microscope (ref 33). Microscopy technique and imaging resolution have been added to the data set description.
Line 250: By integrating the z-dimension for the intensity profiles you lose the ability to assess the depth-dependent performance of the GAN. It would be interesting to have at least two -perhaps an "upper" and "lower" profile if not a variety of individual xy planes at different z positions. Assessing the depth-dependent performance was the intention of these plots. To get a better overview of the distribution of intensities, we added further plots showing the distribution over YZ. To our understanding, these plots help to assess how well the cumulative intensity for different thicknesses of the specimen can be reconstructed. We think, XY profiles would be harder to interpret and would only show depth-dependent performance when considering quite a few z positions, which would start to be cluttering. If you have any tips on how we could represent this information in a more compact format, we would be happy to extend the figure.
Line 252: The similarity between the real and synthetic xz intensity spectra could be qualified with the visually obvious discrepancies that happen in the high-frequency areas. What accounts for those discrepancies could be discussed. One of our possible interpretations is that those discrepancies are caused by non-deterministic components like noise, and differences in signal intensities. We added further interpretations and explanations to Section 4.
Line 262 and 263: If a few words describing the approaches in [34] and [35] could be used here it would improve the readability. A short description of both approaches has been added, which hopefully helps to get a better understanding of the experimental setup.
Line 304: "allow to again conclude the generation of realistic 3D image data" is missing a subject, consider "allow us to again conclude…" Thanks for pointing this out, we corrected the sentence.
Line 316: Please describe the approach in [37] for the audience's quick reference. We added further explanations of the approach's concept for a quick reference.
Line 358: It would be interesting to also quantify the different "quality levels" of the synthesized data sets in terms of signal-to-noise ratio, contrast ratio, or some other normal image quality metric. The PSNR was added as further metric. Due to the different value ranges of PSNR and the other metrics, we replaced the plot with a table.