Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Benchmarking diffusion models against state-of-the-art architectures for OCT fluid biomarker segmentation

  • Katherine Du,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Ophthalmology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, United States of America

  • Utkarsh Doshi,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft

    Affiliation Department of Ophthalmology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, United States of America

  • Benjamin DiCenzo,

    Roles Data curation, Formal analysis, Methodology

    Affiliation Department of Ophthalmology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, United States of America

  • Jessica Jiang,

    Roles Data curation, Formal analysis, Methodology

    Affiliation Department of Ophthalmology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, United States of America

  • Ethan Wu,

    Roles Data curation, Formal analysis, Methodology

    Affiliation Department of Ophthalmology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, United States of America

  • Adarsh Gadari,

    Roles Software, Validation

    Affiliation Department of Ophthalmology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, United States of America

  • Sharat Chandra Vupparaboina,

    Roles Data curation, Resources, Software

    Affiliation Department of Ophthalmology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, United States of America

  • Elham Sadeghi,

    Roles Methodology, Validation

    Affiliation Department of Ophthalmology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, United States of America

  • Sandeep Chandra Bollepalli,

    Roles Conceptualization, Methodology, Supervision, Validation

    Affiliation Department of Ophthalmology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, United States of America

  • José-Alain Sahel,

    Roles Funding acquisition, Project administration, Resources, Software

    Affiliation Department of Ophthalmology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, United States of America

  • Jay Chhablani,

    Roles Funding acquisition, Project administration, Resources, Software, Supervision, Validation

    Affiliation Department of Ophthalmology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, United States of America

  • Kiran Kumar Vupparaboina

    Roles Conceptualization, Methodology, Project administration, Resources, Software, Supervision, Validation, Writing – review & editing

    kiran1559@gmail.com, kkv@pitt.edu

    Affiliation Department of Ophthalmology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, United States of America

Abstract

Objectives

Retinal diseases, major causes of vision impairment and blindness, are assessed using optical coherence tomography (OCT) scans. Automated report generation for retinal OCT scans, powered by deep learning, can help standardize interpretations and track retinal disease in clinical settings. A key challenge is accurately segmenting retinal disease signatures. This study explores using the diffusion model to segment subretinal fluid (SRF), intraretinal fluid (IRF), and pigment epithelial detachment (PED) in typical clinical settings, comparing their performance to other leading segmentation models.

Methods

We labeled OCT scans and extracted those with specific pathologic retinal features: 269 scans with SRF, 224 scans with IRF, and 114 scans with PED. Three trained reviewers manually segmented these features for downstream analysis. Using manually segmented scans as the ground truth, we trained the diffusion model, Nested U-Net, nnU-Net, TransUNet, and SwinUNet to predict these segmentations. All models were evaluated using 5-fold cross-validation, with performance measured by Dice coefficient, sensitivity, specificity, Pearson correlation coefficient, and R2.

Results

All models show high similarly with ground truth segmentations in predicting SRF, IRF, and PED, as shown by the Dice coefficient (Diffusion model: 0.81 ± 0.12, 0.66 ± 0.09, 0.75± 0.11). The diffusion model has relatively higher sensitivity compared to most other models, while all models display very high specificity. The Pearson correlation coefficient and R2 values show strongly associated pixel quantification of segmented areas for models, with the nnU-Net model performing the strongest overall.

Conclusion

This study demonstrates that while diffusion models can comparably segment retinal pathologies using a limited number of manually annotated scans, the nnU-Net model remains the most effective overall for automated OCT analysis.

Introduction

Retinal diseases, including age-related macular degeneration (AMD), diabetic retinopathy (DR), and central serous chorioretinopathy (CSCR) are major contributors to vision impairment and blindness [1]. With the growth of deep learning in medical imaging, a potential application in ophthalmology is automated report generation for optical coherence tomography (OCT) scans, the key imaging modality for evaluating retinal diseases which provides detailed cross-sectional images of posterior segment structures including retinal layers [23]. The creation of these automated reports as a medical decision support tool can facilitate standardized interpretations and reporting of retinal disease status in clinic settings [46]. It also has the potential to increase the accessibility of ophthalmologic care for patients in areas with shortages, by providing a retinal OCT scan report to identify pathologic retinal features and triaging the patient to higher levels of care if necessary [78]. Additionally, machine learning techniques can be applied to track the progression over time of the overall state of the retina or of specific pathologic features via qualitative and quantitative metrics [910].

The segmentation of specific retinal pathologies, including subretinal fluid (SRF), intraretinal fluid (IRF), pigment epithelial detachment (PED), drusen, and hyperreflective dots or foci is a crucial step in advancing automated report generation for OCT scans. In the recent past, various machine learning (ML) based approaches have been fairly accurate in segmenting several key disease features, including SRF, IRF and PED [1112]. These approaches predominantly involve convolutional neural network (CNN)-based models, including U-Net and encoder-decoder fully convolutional neural networks (FCNNs). For instance, a study which employed the U-Net model to segment five crucial neovascular age-related macular degeneration (nAMD) features resulted in high correlation between automated and manual segmentations and moderate Dice scores [13]. Other studies which use an encoder-decoder style FCNN demonstrate its applicability in detecting and quantifying macular fluid and PED in conventional OCT images [1416]. Transformer-based segmentation models, such as Swin-UNet and TransUNet, have also been demonstrated to effectively segment SRF, IRF, and PED [1718]. However, these approaches have been evaluated primarily on disparate controlled datasets, leaving their generalizability to unseen data uncertain.

Diffusion models are a class of generative models that have gained significant attention in recent years for their ability to generate high-quality images through a stepwise denoising process [19]. These models work by progressively adding noise to data and then learning to reverse this process, iteratively recovering the original data. In the context of OCT image analysis, diffusion models may be well suited for tasks such as segmentation and mask generation [2021]. Retinal OCT images often present challenges due to noise, low contrast, and variability in the tissue structures, and diffusion models have been shown to enhance scan quality [2224]. However, there is sparse literature on their potential for improving segmentation in medical imaging tasks, where precision is critical.

The objective of this study is to evaluate the effectiveness of diffusion models in segmenting common retinal disease features, specifically SRF, IRF, and PED. We aim to evaluate diffusion models against benchmark approaches in a practical setting, using a relatively small set of manually annotated retinal biomarkers—an amount feasible for clinical centers given the substantial time, effort, and expertise required. Our goal is to identify which machine learning model is most suitable for retinal pathology segmentation under typical resource limitations. The OCT scans utilized include dry AMD, wet AMD, DR, and CSCR eyes. We employ a custom retinal biomarker segmentation tool for the manual segmentation of these retinal pathologies and automated preparation of the scans for downstream analysis. Using manually segmented scans as the ground truth, we train the diffusion model to segment SRF, IRF and PED. We compare the proposed approach vis-a-vis segmentation based on the nnU-Net, TransUNet, Swin-UNet, and Nested U-Net models. This study provides a stepping stone to enhancing automated OCT analysis by introducing the diffusion model as an alternative for robustly segmenting retinal pathologies.

Methods

Dataset

This retrospective study was conducted in accordance with the Declaration of Helsinki, and was approved by the Institutional Review Board of the University of Pittsburgh Medical Center (Study ID: STUDY20030263; Retrospective Study of Presentations & Outcome of Vitreo-Retinal Diseases (). Written informed consent was obtained from all participants for the inclusion of their retrospective data. All OCT data were de-identified prior to analysis by replacing patient medical record numbers with random numbers and excluding any demographic information; authors did not have access to patient identifiers during or after data collection.

We utilized OCT volumes obtained from the Cirrus 5000 OCT device (Carl Zeiss Meditec) from 200 unique eyes corresponding to individual subjects diagnosed with dry age-related macular degeneration (AMD), wet AMD, diabetic retinopathy (DR), or central serous chorioretinopathy (CSCR). The dataset was roughly balanced, with about one-third of scans from each condition. Each Cirrus OCT volume consisted of 128 B-scans, with a lateral resolution of 6 mm and a depth resolution of 2 mm, (512 × 1024 pixels). From each volume,16 B-scans were uniformly sampled, yielding 3,200 B-scans in total. A subset of these scans containing retinal pathologies was manually segmented. Patients were screened between 2017 and 2019, and the anonymized dataset was accessed on April 8 2024 for research purposes.

Feature description

The various retinal disease signatures segmented for training and testing the diffusion model include subretinal fluid, intraretinal fluid, and pigment epithelial detachment (Fig 1).

thumbnail
Fig 1. Examples of OCT scan pathological feature masks used as ground truth labels for automated prediction and segmentation of these pathologies by machine learning models.

https://doi.org/10.1371/journal.pone.0335615.g001

Collect scans with disease features

To this end, we leveraged our previously in-house developed OCT image-labeling software to annotate retinal features qualitatively including scan quality, reason for poor quality scan (if applicable), healthy/diseased scan, foveal scan, drusen, SRF, IRF, PED, geographic atrophy, hyperreflective dots, and hyperreflective foci [25]. Next, using these qualitative labels, we filtered out OCT scans with SRF, IRF and PED features for segmentation. The scans segmented include those from dry AMD, wet AMD, diabetic retinopathy, and CSCR eyes. Table 1 shows the counts of pathologic features segmented.

thumbnail
Table 1. Counts of pathologic features segmented on OCT scans for each retinal disease subtype and in total.

https://doi.org/10.1371/journal.pone.0335615.t001

Annotation strategy

Three trained reviewers independently segmented the OCT B-scan features under consideration using an in-house retinal biomarkers segmentation tool, with guidance from a retinal ophthalmologist. To ensure that multiple reviewers can label scan features in a standardized manner, the concurrence between reviewers and the retinal ophthalmologist was established beforehand. At the end of the segmentation process, the retinal ophthalmologist reviewed all the segmentations to ensure their accuracy.

Models

Nested U-Net model.

The Nested U-Net is an advanced variant of the standard U-Net architecture, designed for precise medical image segmentation [26]. It features a symmetric encoder–decoder structure with dense skip connections and residual blocks across multiple depth levels, enhancing multi-scale feature fusion. Residual Dense Blocks mitigate gradient vanishing and improve feature reuse, while the dense skip connections preserve fine-grained spatial information, allowing the network to capture both local and global context effectively.

nnU-Net model.

nnU-Net is a self-adapting U-Net–based framework for biomedical image segmentation [27]. Unlike manually designed architectures, nnU-Net automatically configures preprocessing, network architecture, training, and post-processing steps to match the properties of a given dataset, serving as a strong and standardized baseline across diverse medical imaging tasks.

TransUNet model.

TransUNet is a hybrid architecture that integrates convolutional and transformer components for medical image segmentation [28]. Its encoder consists of a CNN backbone for multi-scale feature extraction, while the deepest features are processed by a Vision Transformer to capture global dependencies. Skip connections are drawn from the CNN encoder and fused into a U-Net–style decoder. By combining the global context modeling of transformers with the local detail preservation of CNN skip connections, TransUNet achieves robust segmentation performance.

Swin-UNet model.

Swin-UNet is a transformer-based extension of U-Net that replaces traditional convolutional layers with Swin Transformer blocks [29]. It employs hierarchical feature extraction and shifted window self-attention to capture both local and global context efficiently. Structured in an encoder–decoder fashion similar to U-Net, Swin-UNet preserves fine-grained spatial details while maintaining long-range dependencies, making it well suited for dense prediction tasks.

Diffusion model.

Diffusion models are a class of generative models that learn to produce data by reversing a gradual forward noising process [19,30,31]. They operate by adding Gaussian noise to training data in a stepwise fashion and then learning to reconstruct the original data by inverting this process. Once trained, these models can generate new data by applying the learned denoising steps to random noise.

For this study, we utilized MedSegDiff [32], a diffusion probabilistic model (DPM) tailored for medical image segmentation. MedSegDiff introduces two key innovations:

  1. Dynamic Conditional Encoding: modulates feature conditions at each denoising step to prioritize region-specific attention, addressing spatial heterogeneity in medical images
  2. Feature Frequency Parser (FF-Parser): suppresses high-frequency noise during reverse diffusion while preserving structural details

Model training

In this study, we conducted a comprehensive comparative analysis of the diffusion model and four other state-of-the-art architectures for the segmentation of three distinct retinal biomarkers: SRF, IRF, and PED. All models were trained on NVIDIA GTX 2080 GPUs (12 GB VRAM) using 5-fold cross-validation. OCT images were resized to 256 × 256 pixels, and stratification was applied to ensure that scans from the same subject appeared only in either the training or test set.

  • Nested U-Net: trained for up to 200 epochs with a learning rate of 1e-4, reduced by a factor of 2 every 8 epochs. Batch size was 4. Early stopping was applied with a patience of 25 epochs. Loss: Dice + Binary Cross-Entropy.
  • nnU-Net: trained for 1000 epochs with a linearly decaying learning rate reaching 0 at the final epoch. Loss: Dice + Cross-Entropy.
  • TransUNet: initialized from pretrained checkpoints, trained for up to 200 epochs with a learning rate of 1e-4 and weight decay of 1e-5. Early stopping with patience 50 was applied. Training converged after 165–200 epochs. Loss: Dice + Binary Cross-Entropy.
  • Swin-UNet: initialized from pretrained checkpoints, trained for up to 200 epochs with a learning rate of 1e-4 and weight decay of 1e-5. Early stopping with patience 50 was applied. Training converged after 95–130 epochs. Loss: Dice + Binary Cross-Entropy. Due to its architecture, Swin-UNet input image size is 224x224 pixels. Images were resized to 256x256 pixels for Dice score calculation.
  • Diffusion (MedSegDiff): trained uniformly for 500 epochs with a fixed learning rate of 1e-4 and batch size 4. Loss: Mean Squared Error.

Evaluation metrics

For each of the retinal biomarkers segmented, we evaluated the performance of the diffusion and nested U-Net models using the following evaluation metrics:

Dice similarity coefficient: The Dice coefficient (DSC) quantifies the spatial overlap between the predicted and ground truth segmentation [33]. It ranges from 0, indicating no overlap, to 1, indicating perfect overlap. This metric provides a balanced assessment of segmentation performance by incorporating both precision and recall, making it particularly suitable for medical image analysis.

Sensitivity: Sensitivity measures the model’s ability to correctly identify positive cases [34]. It is defined as the proportion of true positives among all actual positive instances.

Specificity: Specificity evaluates the model’s ability to correctly identify negative cases [34]. It represents the proportion of true negatives among all actual negative instances.

Correlation coefficient and R2: The correlation coefficient quantifies the linear relationship between two variables, ranging from −1 indicating perfect negative correlation to +1 indicating perfect positive correlation, with 0 indicating no correlation [35]. The Pearson correlation coefficient (PCC) measures the strength and direction of the linear relationship between two variables, while R² represents the proportion of the variance in one variable that is explained by the linear relationship with the other variable [35].

Results

For the nested U-Net, nnU-Net, TransUNet, SwinUNet, and diffusion models, the DSC conveys high similarity between the manually labeled ground truth segmentations and the ones predicted by each model for identifying SRF, IRF, and PED. As shown in Fig 2, the predictions made by the models were similar to the ground truth segmentations for the majority of OCT scans. However, the diffusion models display slightly lower DSCs compared to the other model types overall, except for surpassing SwinUNet in IRF and PED (Table 2). The nnU-Net provides the most accurate segmentations of the retinal biomarkers out of all the models. Specifically, the models are best at predicting SRF and show comparable abilities to correctly segment the feature, with DSCs ranging from 0.81 ± 0.12 for the diffusion model to 0.86 ± 0.10 for the nnU-Net model. For IRF, the nnU-Net’s predictions align best with the ground truth segmentations (DSC: 0.72 ± 0.10) and the SwinUNet model the least (DSC: 0.65± 0.09). Lastly, the average predictions for all models were relatively robust for PED (range of DSCs from 0.85± 0.09 for nnU-Net model to 0.73 ± 0.10 for SwinUNet model).

thumbnail
Table 2. Dice coefficients of the five deep learning models for the segmentation of OCT scan pathology using 5-fold cross-validation.

https://doi.org/10.1371/journal.pone.0335615.t002

thumbnail
Fig 2. Examples of OCT scans manually segmented for each retinal biomarker in the ground truth segmentations, followed by the predicted segmentations from the five deep learning models.

https://doi.org/10.1371/journal.pone.0335615.g002

The sensitivity and specificity of the diffusion and other models, in comparison to the ground truth, vary depending on the retinal features that are segmented, as shown in Table 3. The diffusion model displays relatively increased sensitivity in segmenting SRF, IRF, and PED. Sensitivity measures how well the models correctly identify and segment all relevant regions (true positives) that should be segmented. Hence, all five models successfully identify and segment most of the desired regions, with diffusion models ranking second or third among the compared models. The specificity is also robust for all five models across all retinal features, indicating that the models are proficient at identifying regions that should not be segmented (true negatives).

thumbnail
Table 3. Evaluation metrics of the five deep learning models for the segmentation of OCT scan pathology using 5-fold cross-validation.

https://doi.org/10.1371/journal.pone.0335615.t003

Lastly, in quantifying pixel area between ground truth and predicted segmentations, the nnU-Net model shows a slightly improved PCC and R2 for SRF, IRF, and PED (Table 3). The PCC is between 0.67–0.82 for the diffusion model and between 0.72–0.86 for the nnU-Net model for all features segmented. The R2 values are between 0.24–0.59 for the diffusion model and between 0.43–0.71 for the nnU-Net model. Hence, there is a moderate to strong linear relationship and moderate explained variability between the ground truth and predicted segmentation pixel areas for all models; in other words, the quantified segmentation areas show a moderate to strong association between the ground truth and predicted segmentations. The models are able to adequately quantify both smaller and larger areas of SRF, IRF, and PED compared to ground truth segmentations (Fig 3).

thumbnail
Fig 3. Correlation of areas segmented manually (ground truth) versus areas segmented by diffusion and nnU-Net models (predicted area from strongest compared model) per OCT scan.

https://doi.org/10.1371/journal.pone.0335615.g003

Discussion

Our study demonstrates that the diffusion model is a novel machine learning technique that can be applied to detect pathologic retinal features with comparable accuracies to other state-of-the-art machine learning models. On our dataset, the diffusion models display slightly lower DSCs overall, while the nnU-Net consistently achieves the highest performance across all three retinal biomarkers. The sensitivity and specificity metrics demonstrate that the diffusion model is more inclusive of pathologic retinal feature areas compared to most other models, and all models are superior in their ability to avoid segmenting the image background. Moreover, the segmentation areas predicted by the diffusion and other models are moderately correlated with the ground truth areas, with the nnU-Net model exhibiting the highest PCC and R2 values for fluid and PED.

Nested U-Net, nnU-Net, TransUNet, and SwinUNet are advanced deep learning archetectures built upon or inspired by the original U-Net, commonly used for medical image segmentation [36]. They have been shown to effectively segment retinal biomarkers in several studies. On the other hand, a diffusion model generates images by iteratively refining noisy inputs, reversing a diffusion process that progressively adds noise to the image [32]. For the task of segmenting scans, the diffusion model gradually denoises a corrupted image and can generate the segmentation, but it usually requires more computation and training compared to the U-Net-based models which directly learns to predict missing pixels. However, if trained thoroughly, diffusion models may be able to produce more realistic and coherent results by leveraging their generative properties.

While the diffusion model and various U-Net-based models show similar performance, due to the relatively small size of our segmented scans dataset used in training and testing the models – with 269 total scans for SRF, 224 scans for IRF, and 114 scans for PED – we can identify some potential challenges for the diffusion model in segmenting these retinal biomarkers to accurately quantify their areas [37]. The diffusion model inherently requires more input data to build a robust model because of the nature of its learning and generative processes [38]. However, we aimed to balance this with the practical limitations for many clinical centers to manually segment OCT scans due to the substantial human time, effort, and expertise needed, as our goal is to evaluate the diffusion model against other model types for real-world use. Hence, for a few predictions, some of the output OCT scans from the diffusion model had regions segmented outside of the retinal tissue layers and in the background (Fig 4, row of SRF). This may be due to textural differences in the background, with small patches of solid black being segmented as it resembles fluid texture compared to the typical speckled gray background. Additionally, artifacts within the retinal layers themselves such as the black shadowing in the IRF scan of Fig 4 caused the predicted segmentation to be slightly larger than that of the ground truth scan but still representative overall. Lastly, the thin nature of some double-layer sign PEDs resulted in the segmentation predictions of the diffusion model to cover a portion of the retinal pathology but not the full feature (Fig 4, row of PED).

thumbnail
Fig 4. Examples of OCT scans for each retinal biomarker which presented challenges for the diffusion model in achieving complete segmentation accuracy.

https://doi.org/10.1371/journal.pone.0335615.g004

Limitations of this study include the segmentation of larger retinal pathologies but not significantly smaller features, such as drusen, hyperreflective dots, and hyperreflective foci. Future studies can compare the ability of the diffusion model and U-Net-based models in segmenting these retinal biomarkers and others including geographic atrophy. This could help elucidate whether these machine learning models show differences in correctly distinguishing and segmenting smaller areas or areas outside of the retinal tissue layers, as well as how the performance compares between various models. Another limitation is the sample size of the segmented features, as diffusion models tend to produce highly accurate representations with more extensive training data [39]. Although we started with thousands of labeled OCT scans, the segmented features represented only a small proportion of the dataset, and we selected examples with variability and good quality, limiting our sample size.

In conclusion, this study evaluates diffusion models as an alternative to conventional U-Net-based approaches for OCT retinal pathology segmentation, showing their potential under typical clinical data limitations. While all models demonstrate robust performance in segmentation of these features, the diffusion model technique shows relatively greater sensitivity overall. However, under realistic clinical constraints involving limited annotated data, the diffusion model demonstrates limited segmentation accuracy and area correlations compared to the other state-of-the-art models such as nnU-Net. From a broader outlook, these models can be applied in the process of automated report generation of retinal OCT scans to identify the regions of pathologic retinal features. As shown by this study, a crucial application of these models is to accurately quantify pathologic retinal feature areas to personalize patient care. Quantifying segmentation area will aid in tracking the progression of retinal pathology over time, during its natural course or after ophthalmological treatments to gauge treatment efficacy. This will support the development of teleophthalmology for structured reporting in the clinic as well as promote the accessibility of retinal diagnostic imaging for underserved communities.

References

  1. 1. Daich Varela M, Sen S, De Guimaraes TAC, Kabiri N, Pontikos N, Balaskas K, et al. Artificial intelligence in retinal disease: clinical application, challenges, and future directions. Graefes Arch Clin Exp Ophthalmol. 2023;261(11):3283–97. pmid:37160501
  2. 2. Keane PA, Sadda SR. Retinal imaging in the twenty-first century: state of the art and future directions. Ophthalmology. 2014;121(12):2489–500. pmid:25282252
  3. 3. Eladawi N, Elmogy M, Ghazal M, Helmy O, Aboelfetouh A, Riad A, Schaal S, El-Baz A. Classification of retinal diseases based on OCT images. Front Biosci. 2018;23(2):247–64.
  4. 4. Abràmoff MD, Lou Y, Erginay A, Clarida W, Amelon R, Folk JC, et al. Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning. Invest Ophthalmol Vis Sci. 2016;57(13):5200–6. pmid:27701631
  5. 5. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402–10. pmid:27898976
  6. 6. Lee CS, Baughman DM, Lee AY. Deep learning is effective for the classification of OCT images of normal versus age-related macular degeneration. Ophthalmol Retina. 2017;1(4):322–7. pmid:30693348
  7. 7. De Fauw J, Ledsam JR, Romera-Paredes B, Nikolov S, Tomasev N, Blackwell S, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24(9):1342–50. pmid:30104768
  8. 8. Leingang O, Riedl S, Mai J, Reiter GS, Faustmann G, Fuchs P, et al. Automated deep learning-based AMD detection and staging in real-world OCT datasets (PINNACLE study report 5). Scientific Reports. 2023;13(1):19545.
  9. 9. Moraes G, Fu DJ, Wilson M, Khalid H, Wagner SK, Korot E, et al. Quantitative analysis of OCT for neovascular age-related macular degeneration using deep learning. Ophthalmology. 2021;128(5):693–705. pmid:32980396
  10. 10. Kurmann T, Yu S, Márquez-Neila P, Ebneter A, Zinkernagel M, Munk MR, et al. Expert-level automated biomarker identification in optical coherence tomography scans. Sci Rep. 2019;9(1):13605. pmid:31537854
  11. 11. Karn PK, Abdulla WH. On machine learning in clinical interpretation of retinal diseases using OCT images. Bioengineering (Basel). 2023;10(4):407. pmid:37106594
  12. 12. Varga L, Kovács A, Grósz T, Thury G, Hadarits F, Dégi R, et al. Automatic segmentation of hyperreflective foci in OCT images. Comp Method Prog Biomed. 2019;178:91–103.
  13. 13. Ricardi F, Oakley J, Russakoff D, Boscia G, Caselgrandi P, Gelormini F, et al. Validation of a deep learning model for automatic detection and quantification of five OCT critical retinal features associated with neovascular age-related macular degeneration. Br J Ophthalmol. 2024;108(10):1436–42. pmid:38485214
  14. 14. Mantel I, Mosinska A, Bergin C, Polito MS, Guidotti J, Apostolopoulos S, et al. Automated quantification of pathological fluids in neovascular age-related macular degeneration, and its repeatability using deep learning. Transl Vis Sci Technol. 2021;10(4):17. pmid:34003996
  15. 15. Schlegl T, Waldstein SM, Bogunovic H, Endstraßer F, Sadeghipour A, Philip A-M, et al. Fully automated detection and quantification of macular fluid in OCT using deep learning. Ophthalmology. 2018;125(4):549–58. pmid:29224926
  16. 16. Pawloff M, Gerendas BS, Deak G, Bogunovic H, Gruber A, Schmidt-Erfurth U. Performance of retinal fluid monitoring in OCT imaging by automated deep learning versus human expert grading in neovascular AMD. Eye (Lond). 2023;37(18):3793–800. pmid:37311835
  17. 17. Philippi D, Rothaus K, Castelli M. A vision transformer architecture for the automated segmentation of retinal lesions in spectral domain optical coherence tomography images. Sci Rep. 2023;13(1):517. pmid:36627357
  18. 18. Li F, Pan W, Xiang W, Zou H. Automatic segmentation of multitype retinal fluid from optical coherence tomography images using semisupervised deep learning network. Br J Ophthalmol. 2023;107(9):1350–5. pmid:35697498
  19. 19. Yang L, Zhang Z, Song Y, Hong S, Xu R, Zhao Y, et al. Diffusion models: a comprehensive survey of methods and applications. ACM Comput Surv. 2023;56(4):1–39.
  20. 20. Yang B, Li J, Wang J, Li R, Gu K, Liu B. DiffusionDCI: a novel diffusion-based unified framework for dynamic full-field OCT image generation and segmentation. IEEE Access. 2024.
  21. 21. Wu Y, He W, Eschweiler D, Dou N, Fan Z, Mi S, et al. Retinal OCT synthesis with denoising diffusion probabilistic models for layer segmentation. In: 2024 IEEE International symposium on biomedical imaging (ISBI). IEEE; 2024. 1–5.
  22. 22. Hu D, Tao YK, Oguz I. Unsupervised denoising of retinal OCT with diffusion probabilistic model. In: Medical imaging 2022: image processing. SPIE; 2022. 25–34.
  23. 23. Li S, Higashita R, Fu H, Li H, Niu J, Liu J. Content-preserving diffusion model for unsupervised as-oct image despeckling. In: International conference on medical image computing and computer-assisted intervention. Cham: Springer Nature Switzerland; 2023. 660–70.
  24. 24. Ji B, He G, Chen Z, Zhao L. A novel diffusion-model-based OCT image inpainting algorithm for wide saturation artifacts. In: Chinese conference on pattern recognition and computer vision (PRCV). Singapore: Springer Nature; 2023. 284–95.
  25. 25. Du K, Shah S, Bollepalli SC, Ibrahim MN, Gadari A, Sutharahan S, et al. Inter-rater reliability in labeling quality and pathological features of retinal OCT scans: a customized annotation software approach. PLoS One. 2024;19(12):e0314707. pmid:39693322
  26. 26. Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J. Unet: A nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support: 4th international workshop, DLMIA 2018, and 8th international workshop, ML-CDS 2018, held in conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, proceedings. 2018. 3–11.
  27. 27. Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021;18(2):203–11. pmid:33288961
  28. 28. Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, et al. Transunet: transformers make strong encoders for medical image segmentation. 2021. https://arxiv.org/abs/2102.04306
  29. 29. Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, et al. Swin-unet: Unet-like pure transformer for medical image segmentation. In: European conference on computer vision. 2022. 205–18.
  30. 30. Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Adv Neural Inform Process Syst. 2020;33:6840–51.
  31. 31. Sohl-Dickstein J, Weiss E, Maheswaranathan N, Ganguli S. Deep unsupervised learning using nonequilibrium thermodynamics. In: International conference on machine learning. 2015. 2256–65.
  32. 32. Wu J, Fu R, Fang H, Zhang Y, Yang Y, Xiong H, et al. Medsegdiff: medical image segmentation with diffusion probabilistic model. In: Medical imaging with deep learning. 2024. 1623–39.
  33. 33. Dice LR. Measures of the amount of ecologic association between species. Ecology. 1945;26(3):297–302.
  34. 34. Parikh R, Mathai A, Parikh S, Chandra Sekhar G, Thomas R. Understanding and using sensitivity, specificity and predictive values. Indian J Ophthalmol. 2008;56(1):45–50. pmid:18158403
  35. 35. Schober P, Boer C, Schwarte LA. Correlation coefficients: appropriate use and interpretation. Anesth Analg. 2018;126(5):1763–8. pmid:29481436
  36. 36. Jiangtao W, Ruhaiyem NIR, Panpan F. A comprehensive review of U‐Net and its variants: advances and applications in medical image segmentation. IET Image Process. 2025;19(1).
  37. 37. Zhang T, Wang Z, Huang J, Tasnim MM, Shi W. A survey of diffusion based image generation models: issues and their solutions. arXiv preprint. 2023.
  38. 38. Chen M, Mei S, Fan J, Wang M. Opportunities and challenges of diffusion models for generative AI. Natl Sci Rev. 2024;11(12):nwae348. pmid:39554240
  39. 39. Kazerouni A, Aghdam EK, Heidari M, Azad R, Fayyaz M, Hacihaliloglu I, et al. Diffusion models in medical imaging: a comprehensive survey. Med Image Anal. 2023;88:102846. pmid:37295311