Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Advanced retinal disease detection from OCT images using a hybrid squeeze and excitation enhanced model

  • Gülcan Gencer ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    gencergulcan@gmail.com

    Affiliation Department of Biostatistics and Medical Informatics, Faculty of Medicine, Afyonkarahisar Health Sciences University, Afyonkarahisar, Turkey

  • Kerem Gencer

    Roles Data curation, Methodology, Supervision, Visualization

    Affiliation Department of Computer Engineering, Faculty of Engineering, Afyon Kocatepe University, Afyonkarahisar, Turkey

Abstract

Background

Retinal problems are critical because they can cause severe vision loss if not treated. Traditional methods for diagnosing retinal disorders often rely heavily on manual interpretation of optical coherence tomography (OCT) images, which can be time-consuming and dependent on the expertise of ophthalmologists. This leads to challenges in early diagnosis, especially as retinal diseases like diabetic macular edema (DME), Drusen, and Choroidal neovascularization (CNV) become more prevalent. OCT helps ophthalmologists diagnose patients more accurately by allowing for early detection. This paper offers a hybrid SE (Squeeze-and-Excitation)-Enhanced Hybrid Model for detecting retinal disorders from OCT images, including DME, Drusen, and CNV, using artificial intelligence and deep learning.

Methods

The model integrates SE blocks with EfficientNetB0 and Xception architectures, which provide high success in image classification tasks. EfficientNetB0 achieves high accuracy with fewer parameters through model scaling strategies, while Xception offers powerful feature extraction using deep separable convolutions. The combination of these architectures enhances both the efficiency and classification performance of the model, enabling more accurate detection of retinal disorders from OCT images. Additionally, SE blocks increase the representational ability of the network by adaptively recalibrating per-channel feature responses.

Results

The combined features from EfficientNetB0 and Xception are processed via fully connected layers and categorized using the Softmax algorithm. The methodology was tested on UCSD and Duke’s OCT datasets and produced excellent results. The proposed SE-Improved Hybrid Model outperformed the current best-known approaches, with accuracy rates of 99.58% on the UCSD dataset and 99.18% on the Duke dataset.

Conclusion

These findings emphasize the model’s ability to effectively diagnose retinal disorders using OCT images and indicate substantial promise for the development of computer-aided diagnostic tools in the field of ophthalmology.

Introduction

Diseases of the retina are conditions that seriously jeopardize people’s eyesight and have a direct impact on quality of life. The retina is an essential part of the human eye that is made up of vision cells. It is responsible for processing visual information. The macula, which is required for central vision, is located inside the retinal layer. Retinal damage, particularly in the macular region, can result in a substantial loss of vision [1]. As a result, early identification of retinal anomalies is crucial for facilitating prompt medical treatments as well as reducing vision loss [2]. Among the most prevalent retinal disorders are diabetic macular edema (DME) and age-related macular degeneration (AMD). There are two types of AMD: wet AMD (choroidal neovascularization, or CNV) and dry AMD (drusen), which is the primary cause of blindness in people over 65 [3]. About 25% of diabetic people get DME, which is brought on by fluid buildup in the retina as a result of diabetes problems. These disorders have the potential to permanently impair eyesight if they are not treated in a timely manner. As a result, the development of automated diagnostic systems has become essential for efficient treatment planning, as such systems can reduce the burden on clinicians and improve early detection rates [4].

The detection and treatment of retinal disorders have been completely transformed by advances in medical imaging technologies. A non-invasive imaging technique that produces high-resolution cross-sectional images of the retina is optical coherence tomography. OCT imaging allows for detailed visualization of retinal layers, providing critical insights into retinal health and enabling early diagnosis of conditions like AMD and DME. However, interp images can be time-consuming and requires specialized expertise. To address these challenges, artificial intelligence (AI) and deep learning (DL) methods offer promising solutions for enhancing OCT image analysis and increasing diagnostic accuracy [5]. DL models like as Convolutional Neural Networks (CNNs) have shown impressive results in a number of applications, such as brain tumor and skin cancer detection [6,7]. CNNs have proven useful in ophthalmology for identifying glaucoma from fundus images and segmenting retinal arteries [8]. In the study by Gencer et al. (2024) [9], a comparative analysis of deep learning models was performed in the classification of OCT images. The study stated that the ResNet-101 model provided the highest accuracy and specificity compared to other models. In addition, this study by Gencer and Gencer (2024) [10] emphasizes the success of deep learning models in medical image classification. These AI-powered methods improve clinical decision-making and boost diagnostic precision.

This study proposes a novel SE-Enhanced Hybrid Model for the classification of retinal diseases using OCT images. The model leverages Squeeze-and-Excitation (SE) blocks to adaptively recalibrate feature responses on a per-channel basis, thereby increasing the representational capacity of the network. Additionally, by combining the EfficientNetB0 and Xception architectures, the model benefits from the strengths of both frameworks in feature extraction and classification accuracy. The main contributions of this study are as follows:

  • Proposing a novel SE-Enhanced Hybrid Model that integrates EfficientNetB0 and Xception architectures with SE blocks for enhanced feature extraction and classification performance.
  • Conducting a comparative analysis that demonstrates the superior accuracy and robustness of the proposed model in classifying retinal diseases such as DME, CNV, and drusen.
  • Providing a comprehensive evaluation of the model using two publicly available OCT datasets (UCSD and Duke), highlighting its generalizability and applicability in clinical settings.
  • Addressing the computational efficiency of the model, making it a feasible solution for real-time diagnostic applications in ophthalmology.

The remaining sections of this article are organized as follows. First, a comprehensive analysis of related research is provided. Next, the proposed methodology is detailed, covering CNN, SE blocks, the SE-Enhanced Hybrid model architecture, and the datasets used. The subsequent section presents the experimental results, followed by a discussion of the study’s limitations. Finally, the article concludes with a summary of the findings and suggestions for future research directions.

Related work

With the growth of computing systems in ophthalmology and the advancement of imaging technology, the analysis and categorization of OCT images is becoming more and more popular. Because of new technologies and increased processing capacity, artificial intelligence-supported decision support systems are being used in this industry more and more. This section summarizes current research with OCT datasets that are comparable to the study’s.

Using the Inception V3 architecture that has been pre-trained on the ImageNet dataset, Kermany et al. (2018) [11] created a transfer learning system for medical image analysis. When evaluated on OCT images, this technique effectively identified retinal diseases. Similar to this, Silva et al. (2023) [12] compares four distinct ensemble CNN models with current CNN architectures while utilizing transfer learning.

Kim and Tran (2020) [13] examined and assessed for the classification of OCT images. They also used an ensemble learning approach based on several ResNet152 architectures to achieve great performance, and they employed Fully Convolutional Networks (FCN) to remove noise. DenseNet was utilized by Rastogi et al. (2019) [14] to identify retinal abnormalities from OCT images, and they emphasized how successful it was in comparison to conventional CNNs.

Unlike typical CNNs, a capsule network application was developed to extract spatial information from images [15]. A deep learning-based approach that uses OCT images to diagnose macular disorders has been suggested [16]. The suggested model achieves great accuracy and performance by utilizing the VGG-16, VGG-19, and ResNet architectures. For automatic OCT image classification, we used a deep transfer learning approach based on VGG-16. Comparing CNN and transfer learning models for classifying diabetic macular edema, drusen, and choroidal neovascularization yielded an accuracy of up to 99.38% [17].

A layer-guided convolutional neural network (LGCNN) was proposed to classify retinal diseases using retinal layer segmentation [18]. Deep learning techniques are also being used to minimize noise in OCT images, which aids in the more accurate diagnosis and classification of retinal disorders. Mehdizadeh et al. (2021) [19] provide a technique to denoise OCT images and improve perceptual acuity that combines deep feature loss with a VGG network. Hu et al. (2023) [20] present a selective denoising technique that increases classification accuracy by first classifying CNN-classified images that need noise reduction before utilizing the BM3D algorithm. The semi-supervised Capsule cGAN approach, which mixes supervised and unsupervised losses for speckle noise reduction, is proposed by Wang et al. (2021) [21].

A generative adversarial network (GAN) is used by Hasan et al. (2021) [22] to examine OCT images. A generator that creates denoised images from noisy inputs and a discriminator that discerns between actual and denoised images are components of the Gan architecture. Better performance measurements, such peak signal-to-noise ratio (PSNR), demonstrate how the GAN-based approach works better than conventional denoising approaches like wavelet-transform, bilateral, non-local means (NLM), and BM3D. Using a variety of datasets, Bogacki and Dziech (2023) [23] provide a novel deep learning technique for testing OCT images. Through the use of BM3D filtering, pairs of clean and noisy images are used to train a deep learning model. Several quantitative criteria have demonstrated the huge improvement in image quality that denoising brings.

CNN models like ResNet50 and Xception were used to identify disorders in retinal OCT images. They discovered that MobileNetV2 was the most effective [24]. Zhu et al. (2024) [25] created nn-MobileNet, a CNN model that can be utilized on mobile devices, by reevaluating the CNN designs used for the diagnosis of retinal disorders. Using four distinct public datasets, this model has proven to perform better on a variety of tasks, such as the categorization of diabetic macular edema, fundus multimorbidity identification, and diabetic retinopathy.

A completely automated technique is presented by Mittal and Bhatnagar (2022) [26] to use OCT images to diagnose disorders like DME. High accuracy rates were obtained by employing SVM-based classifier with HOG descriptors.

A framework for the automated identification of the retinal pigment layer using spectral domain OCT images was proposed by Naz et al. (2016) [27]. Khalid et al. (2016) [28] developed a classification system that employs numerous fitting curve procedures, including noise reduction approaches, together with intensity-based thresholding for age-related drusen identification. In their study, Salaheldin et al. [29] propose a deep learning-based model for automatic detection and grading of papilledema from OCT images. The model aims to aid clinical diagnosis and management by providing an accurate and efficient way to determine and assess the severity of papilledema.

This study by Salaheldin et al. (2024) [30] presents a hybrid model that utilizes artificial intelligence techniques for the detection of retinal disorders using OCT images. The model combines various machine learning approaches to increase accuracy and diagnostic confidence in identifying retinal conditions. Kayadibi and Güraksın (2023) [31] developed a CNN-based stacking ensemble learning (EL) method to detect common retinal diseases such as diabetic macular edema, choroidal neovascularization, and drusen from OCT images. The study performed high accuracy classification using homogeneous and heterogeneous EL methods and achieved over 99% success on Duke and UCSD datasets. This method contributes to early diagnosis by increasing the accuracy of computer-aided diagnostic tools.

A review of the literature reveals that classical machine learning classifiers dominated the research of OCT datasets such as Duke [3,27,32]. The other dataset, originally provided by Kermany et al. [11], is commonly used to evaluate deep learning algorithms, notably CNN architectures. This paper proposes a hybrid SE-Enhanced Hybrid Model architecture for detecting retinal illness from OCT images, which combines SE blocks for improved feature representation with EfficientNetB0 and Xception for effective feature extraction.

Materials and methods

Datasets

This work used two publicly available OCT datasets, the UCSD and Duke datasets, as shown in Table 1. These datasets include photographs of both normal retinas and retinas affected by disorders.

  • The first dataset is presented by Srinivasan et al. [33]. This collection provides OCT images used to diagnose and classify retinal disorders, including NORMAL, AMD, and DME.
  • The second dataset was given by Kermany et al. [11]. This dataset includes 84,495 OCT images classified into four categories: NORMAL, CNV, DME, and DRUSEN.

Description

In this paper, a hybrid SE-Enhanced Hybrid Model architecture is suggested for detecting prevalent retinal disorders using OCT images. A hybrid deep learning model combining two powerful convolutional neural network (CNN) architectures was implemented. These are EfficientNetB0 and Xception. This hybrid model also included Compression and Excitation (SE) blocks to improve feature extraction. The purpose of this model is to categorize images in the OCT dataset. SE blocks were used to recalibrate feature responses on a channel-by-channel basis by explicitly modeling interdependencies between channels. This helped improve the representativeness of the network. The model combined the outputs of EfficientNetB0 and Xception architectures, both enhanced with SE blocks, to create a robust feature extraction mechanism. It was passed through fully connected layers to perform classification.

Images were resized to 224 × 224 pixels and rescaled to the [0, 1] pixel density range. Data augmentation techniques such as random rotations, shifts, shear transformations, zooms, and horizontal flips were applied to improve the training data and prevent overfitting. The model was trained using Adam optimizer with a learning rate of 0.0001. Model checkpoints, reducing the learning rate at the plateau, and stopping callbacks early were used to improve training efficiency and prevent overfitting. The technique illustrated in Fig 1 involves a preprocessing step to ensure that the set of data is scaled to the appropriate input sizes for previously trained CNN models.

Convolutional neural network

Convolutional Neural Networks (CNNs) are sophisticated neural networks that imitate the visual processing systems of living things in order to interpret and evaluate visual input. CNNs are a subset of deep learning that have shown remarkable performance in a wide range of data-driven applications because of their multilayer structure, which makes it possible to extract hierarchical characteristics from input images. Fig 2 shows an example of a convolutional neural network design. CNNs were brought to the forefront in computer vision research by Krizhevsky and others. When they won the ImageNet Large Scale Visual Recognition Competition (ILSVRC) with their AlexNet architecture. This achievement, highlighting the ability of CNNs for unsupervised feature extraction and setting a new standard for image classification tasks [34]. A typical CNN architecture consists of several types of layers, including convolutional layers, pooling layers, and fully connected layers:

Convolutional Layers apply a series of filters (kernels) to the input image, creating feature maps that capture various aspects of the image, such as edges, textures, and patterns. Mathematically, the convolution process for an input image I and K can be represented as:

(1)

where (m, n) are the filter’s coordinates and (i, j) are the coordinates of the input image. Through this procedure, the network may directly learn the spatial hierarchies of features from raw pixel data [35]. An activation function, such the Rectified Linear Unit (ReLU), is applied after every convolution layer. The definition of the ReLU function is:

(2)

ReLU introduces nonlinearity into the model, allowing it to learn more complex patterns and interactions within the data [36].

These layers perform subsampling operations, reducing the spatial dimensions of feature maps while preserving the most critical information. A common pooling process is maximum pooling, defined as:

(3)

where (i + m,j + n) denotes the local region where the most processing is done, and P(i,j) is the aggregated result. This makes the model more resilient to geographical fluctuations in the input data and computationally efficient [37]. Fully connected layers in a CNN integrate the characteristics that the convolutional layers retrieved to arrive at a final classification conclusion. These layers function similarly to those found in conventional neural networks, linking each neuron in one layer to each subsequent layer’s neuron. They are depicted as follows:

(4)

where x is the input, y is the output, b is the bias, and f is an activation function. W is the weight matrix [34].

Transfer learning

A pre-trained model created for one job is utilized as the basis for a new model on a different task, a process known as transfer learning in machine learning. By using the information acquired from the first activity, this method enhances performance and shortens the training period for the subsequent task [38]. Transfer learning in image classification refers to the use of models that have already been pre-trained on huge datasets, like ImageNet, which has millions of classified photos organized into thousands of categories. These pre-trained models are helpful for a variety of image identification tasks because they have learnt to extract generic characteristics from images, such as edges, textures, and forms [39]. Pre-trained models like Xception, EfficientNetB0, AlexNet, VGG16, InceptionV3, and ResNet are frequently used in image categorization. These models have demonstrated strong performance in a range of image identification tasks after being trained on extensive image datasets [4042]. By keeping the convolution layers intact and eliminating the final classification layer, the pre-trained model is utilized as a feature extractor. From the input photos, these layers extract characteristics that are relevant. A new classifier trained on the target dataset may then be fed the extracted features. On the target dataset, the pre-trained model is occasionally further refined. This involves retraining some or all of the layers of the pre-trained model on new data. Fine-tuning allows the model to adapt previously learned features to the specific characteristics of the new data set. Features extracted from the pre-trained model are fed into a new classifier, usually a fully connected neural network or a support vector machine (SVM) trained to perform the final classification task.

The Xception model, which is a Convolutional Neural Network (CNN) structure and an enhanced version of Google’s Inception model, is one of these models. “Extreme Inception” is what Xception stands for, and its main objective is to improve deep learning model performance and accuracy [43].In his paper Sadik et al. (2023 [44]), investigated the use of transfer learning and the Xception model for the diagnosis of skin disorders. Hirahara (2019) [45] conducted a preliminary assessment of the Xception model’s application to transfer learning in the creation of the CADe system, which is used to identify brain tumors. Mukhlif and Al-Khateeb (2023) [46] talked about the use of the Xception model for breast cancer image classification as well as novel transfer learning strategies. This study used several well-known pre-trained CNN architectures as part of the SE-Enhanced Hybrid Model methodology to classify retinal diseases from OCT images. Selected architectures include EfficientNetB0 and Xception. Each of these models has demonstrated high performance on a variety of image classification challenges, making them suitable candidates for transfer learning in medical image analysis. EfficientNetB0 achieves high efficiency using model scaling strategies and delivers high accuracy with a low number of parameters. Xception, on the other hand, performs powerful feature extraction using deep discrete convolutions and exhibits high classification performance. SE blocks increase the representativeness of the network and improve overall performance by adaptively recalibrating feature responses on a channel-by-channel basis. This hybrid model has been utilized as a potent tool for detecting retinal disorders from OCT images, and its usefulness has been tested using OCT datasets. The use of SE blocks and specific CNN topologies improves classification accuracy, allowing for more trustworthy outcomes in medical diagnosis processes. Table 2 summarizes the characteristics of the CNN architectures employed in this investigation.

Preprocessing

The proposed method’s preprocessing phase ensures that the raw data set is scaled to fit the input parameter sizes of previously trained CNN architectures. The dataset’s training and testing images for EfficientNetB0 and Xception were trimmed to 224 ×  224 pixels to fit the input dimensions of these CNN architectures. By making sure the images complied with CNN architectures’ input specifications, our procedure readied them for both training and testing. The pixel values of each image were divided by 255 to rescale them to the [0, 1] pixel intensity range. Because normalization provided constant input value ranges, it helped stabilize the training process. To suit the input dimensions required by the CNN models employed in this work, all photos were scaled to 224 ×  224 pixels. To reduce overfitting and boost model resilience, training images are subjected to a variety of data augmentation strategies. These methods included applying shift transformations to photos, randomly zooming in on images, randomly flipping images horizontally, and randomly rotating images to a certain extent.

Details of the proposed SE-enhanced hybrid architecture and classification model

In this study, SE-Enhanced Hybrid Model methodology is proposed to classify retinal diseases from OCT images. The core of this model is based on a custom CNN that integrates SE blocks to improve feature extraction. SE Blocks are a special block structure used to increase the feature extraction power of the network by recalibrating the importance of each channel. This structure increases the representativeness of the features by generating adaptive weights for each channel. A strong classification performance was achieved by using EfficientNetB0 and Xception architectures in the model. EfficientNetB0 and Xception models were used to extract features from OCT images. EfficientNetB0 provides high accuracy with high efficiency and low number of parameters by using model scaling strategies. Xception, on the other hand, performs powerful feature extraction using deep discrete convolutions. SE (Squeeze-and-Excitation) blocks are integrated after each convolutional layer to adaptively recalibrate feature responses on a per-channel basis [47]. This is accomplished through a two-step process: squeezing and stimulation. Mathematically, the SE block is defined as:

(5)

where z is the input, and are the weights, δ is the ReLU activation, and σ is the σsigmoid activation44. After convolution and SE blocks, the extracted features are smoothed and passed through a dense layer with 128 neurons followed by a dropout layer to avoid overfitting. These combined features are passed through the Dense layer and the Dropout layer. In the final stage, retinal diseases are classified using a Softmax classifier. Softmax classifier is an activation function used to classify multiple classes. It converts the outputs into probability values for each class and ensures that the class with the highest probability is selected [48].The performance of the model is evaluated with metrics such as accuracy, precision, recall and F1 score.

The SE-Hybrid architecture is trained using the Adam optimizer with a learning rate of 0.0001. The model was trained for 30 epochs with a batch size of 32. During training, data augmentation techniques such as rotation, width and height shifting, cropping, zooming, and panning are applied to increase the robustness of the model and prevent overfitting. The SE-Enhanced Hybrid Model provides a robust framework for classifying retinal diseases from OCT images. By combining the representative power of SE blocks and the feature extraction efficiency of CNNs, significant improvements in classification performance have been achieved. This model has been tested on OCT datasets, demonstrating its effectiveness in detecting retinal diseases.

Experiments and results

This paper proposes a hybrid SE-Enhanced Hybrid architecture for detecting retinal disorders utilizing retinal images. Initially, the model was created with pre-trained CNN architectures such as EfficientNetB0 and Xception. These models stand out in medical image analysis with their high accuracy and efficiency. EfficientNetB0 and Xception were used to extract features from images. EfficientNetB0 offers scalability and high accuracy, while Xception provides powerful feature extraction using deep discrete convolutions. SE (Squeeze-and-Excitation) blocks are integrated after each convolutional layer to adaptively recalibrate feature responses on a per-channel basis. This comprehensive evaluation highlights the effectiveness of the SE-Enhanced Hybrid Model in accurately detecting retinal diseases from images.

Experimental preparation

All experiments on the SE-Enhanced Hybrid architecture suggested in this paper, as well as pre-trained CNN architectures, were carried out in Python with the TensorFlow and Keras libraries. The experiments were performed on a computer with the following specifications: Operating System, Windows 10, Processor:,Intel(R) Core i7 2.6 GHz, Graphics Card, Nvidia GTX 1650Ti, Software Environment, Python 3.8, TensorFlow 2.x, Keras 2.x.

Efficiency and resource utilization

Table 3 presents a comparison of model training and inference efficiencies between the UCSD and Duke datasets, highlighting key metrics such as training time, inference time, computational FLOPs, and resource utilization (GPU and CPU memory).

thumbnail
Table 3. Comparison of Performance and Resource Usage of UCSD and DUKE Models.

https://doi.org/10.1371/journal.pone.0318657.t003

Hyperparameters and training

The hybrid model was created by adding SE blocks to the outputs of EfficientNetB0 and Xception architectures. SE blocks increased the representativeness of the network by adaptively recalibrating feature responses on a channel-by-channel basis. The outputs of EfficientNetB0 and Xception models were combined with global average pooling layers, and then final classification was performed with fully connected (FC) layers. In the last layer of the model, classification was performed using the Softmax activation function. For training the model, data was prepared using ImageDataGenerator and divided into training, validation and test sets. The model was trained for 30 epochs and during the training process. The decision to limit training to 30 epochs was informed by preliminary experiments in which we observed the model’s performance trends, including training and validation accuracy and loss metrics. In these experiments, the model achieved a high accuracy level early in training, and additional epochs did not yield significant performance gains, instead showing signs of potential overfitting. To prevent overfitting, we also employed early stopping, which further reinforced the decision to limit training to this duration, as the model often converged or reached optimal performance within the chosen range. Given the high parameter count and volume of data, we aimed to balance computational efficiency with model generalization, and 30 epochs were determined to be sufficient to reach a robust performance level. Additionally, to strengthen the evaluation of generalization capability, we included the Matthews Correlation Coefficient (MCC) metric in our analysis, alongside accuracy, precision, recall, and specificity, to provide a more comprehensive view of the model’s performance.

The optimization technique used in this study was Adaptive Moment Estimation (Adam), selected to optimize model performance due to its effectiveness in handling sparse gradients. Adam was configured with a learning rate of 0.0001, based on preliminary testing to balance convergence speed and accuracy. Table 4 gives more information on these hyperparameters. The training approach also included tactics such as ReduceLROnPlateau, Early Stopping, and model checkpointing. The effectiveness of the model’s categorization was evaluated using cross-entropy loss. The cross entropy loss criteria in this study, which is used with Softmax in the last layer of the model, has the following mathematical expression:

thumbnail
Table 4. Specifics of the architecture-related hyperparameters.

https://doi.org/10.1371/journal.pone.0318657.t004

(6)

In this case, q stands for the Softmax output, z for the number of classes, and P for the output of the categorical classes. Non-linear outputs can be generated using CNN training functions [49].

Performance metrics

The efficiency of the models used in this study was evaluated using a commonly used confusion matrix. Performance measures including precision (Equation 7), sensitivity (Equation 8), specificity (Equation 9), accuracy (Equation 10), and F1 score (Equation 11) were computed using this matrix. To evaluate the performance of the classification models, Matthews Correlation Coefficient (MCC) statistical measures is provided in Equations 12. MCC is a balanced measure used in binary classification problems, evaluating the overall performance of the classifier. It provides reliable results even when there is an imbalance between positive and negative classes [50].

(7)(8)(9)(10)(11)(12)

True positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) are the terms used in these equations. The system’s F1 score, accuracy, sensitivity, specificity, and precision were calculated using these data. Specificity guarantees the genuine negative rate, whereas accuracy shows how closely measurement findings match the true value. Another important criterion in performance evaluation is the ROC (Receiver Operating Characteristic) curve. The ROC curve enables comprehensive reporting of sensitivity and specificity rates.

In this study, the performance of the SE-Enhanced Hybrid architecture and other pre-trained models was evaluated using the aforementioned metrics. Confusion matrix was calculated for each model and accuracy, sensitivity, specificity, precision and F1 score were calculated using TP, TN, FP and FN values. Additionally, ROC curves were plotted for each class.

EfficientNetB0, Xception versus SE-enhanced hybrid on OCT dataset

In the “Details of the proposed SE-Enhanced hybrid architecture and classification model” section, a hybrid architecture based on the ability to detect retinal diseases is proposed and detailed. Tables 5 and 6 show performance measurements of SE-Enhanced Hybrid against EfficientNetB0 and Xception for the Duke and UCSD datasets, respectively (best results are underlined in bold). In terms of all performance parameters, the SE-Enhanced Hybrid fared better than the EfficientNetB0 and Xception. This section summarizes the performance comparisons between SE-Enhanced Hybrid, EfficientNetB0 and Xception CNN architectures using ROC curves, confusion matrices and training and loss graphs. For the first data set, they are displayed in Figs 3 and 4). For the second data set, it can be seen in Figs 5 and 6. The best-performing CNN architecture is SE-Enhanced Hybrid, as can be shown in all figures.

thumbnail
Table 5. Performance metrics of SE-EfficientNetB0, SE-Xception and SE-Hybrid on first dataset (Duke).

https://doi.org/10.1371/journal.pone.0318657.t005

thumbnail
Table 6. Performance metrics of SE-EfficientNetB0, SE-Xception and SE-Hybrid second dataset (UCSD).

https://doi.org/10.1371/journal.pone.0318657.t006

thumbnail
Fig 3. Training and Validation Performance of SE-EfficientNetB0, SE-Xception and SE-Hybrid Models on the Duke Dataset: Comparison of Accuracy and Loss Metrics Across Epochs.

https://doi.org/10.1371/journal.pone.0318657.g003

thumbnail
Fig 4. Comparative Analysis of Confusion Matrices and ROC Curves for SE-EfficientNetB0, SE-Xception and SE-Hybrid Models on the Duke Dataset.

https://doi.org/10.1371/journal.pone.0318657.g004

thumbnail
Fig 5. Training and Validation Performance of SE-EfficientNetB0, SE-Xception and SE-Hybrid Models on the UCSD Dataset: Comparison of Accuracy and Loss Metrics Across Epochs.

https://doi.org/10.1371/journal.pone.0318657.g005

thumbnail
Fig 6. Comparative Analysis of Confusion Matrices and ROC Curves for SE-EfficientNetB0, SE-Xception and SE-Hybrid Models on the UCSD Dataset.

https://doi.org/10.1371/journal.pone.0318657.g006

Comparison with state-of-the-art deep learning based techniques

In this section, we compare the proposed SE-Enhanced Hybrid Model with the state-of-the-art deep learning techniques used for classification. Table 7 provides a summary of these techniques and performance measures. It shows that the proposed SE-Enhanced Hybrid Model significantly outperforms the existing state-of-the-art methods on different datasets: The proposed SE-Enhanced Hybrid Model achieves 99.58% accuracy on the UCSD dataset and 99.18% accuracy on the Duke dataset. These results exceed the highest accuracy rates reported by Kim and Tran (2020) [13] and Naz et al (2016)[27]. Kermany et al. (2018) [11] achieved an accuracy of 96.6% using transfer learning with InceptionNet V3. In contrast, the SE-Enhanced Hybrid Model improves this accuracy by approximately 3%, highlighting the effectiveness of including SE blocks and combining EfficientNetB0 with Xception architectures. Techniques that include image preprocessing and feature extraction, such as Naz et al.‘s (2016) [27] and Khalid et al. (2016) [28] report varying accuracies (98.00% and 92.00%, respectively). The proposed method not only simplifies the process by leveraging advanced deep learning architectures, but also significantly increases the accuracy.Rastogi et al. (2019) [14] used densely connected convolutional neural networks and achieved. The superior accuracy of the proposed SE-Enhanced Hybrid Model demonstrates its improved ability to learn and generalize from images. The integration of Compress and Excitation (SE) blocks increases the representation capacity of the model by improving feature recalibration. This approach, combined with the hybrid architecture of EfficientNetB0 and Xception, contributes to the outstanding performance of the model. The proposed SE-Improved Hybrid Model shows significant improvements in image classification accuracy compared to existing state-of-the-art methods. Effectively combining EfficientNetB0 and Xception architectures with SE blocks, the model leverages advanced feature extraction and recalibration techniques, providing superior performance on both UCSD and Duke datasets.

thumbnail
Table 7. Summary of techniques compared to state-of-the-art technologies for OCT classification using different datasets.

https://doi.org/10.1371/journal.pone.0318657.t007

Limitations

Although our study obtained promising results, it has some limitations. First, the datasets used are limited to the UCSD and Duke datasets. It is necessary to test the performance of the model on different data sets other than these data sets and evaluate its generalization ability. Second, the computational resources and time used during training and testing the model can be quite high due to large data sets and complex models. Third, the performance of the model is limited to certain retinal diseases, and its validity for other retinal diseases needs to be investigated. Finally, the applicability of the model in clinical settings and its performance in real-world conditions need to be evaluated. These limitations are important considerations for future studies.

Conclusion

In recent years, with the advancement of technology, artificial intelligence has achieved great success in the field of medicine and has begun to be widely used. Medical imaging devices also evolve with these technological advances. In clinical settings, specialist physicians can interpret images; However, the duration and accuracy of this process largely depends on the experience of the specialist. As a result, this study proposes employing a hybrid SE-Enhanced Hybrid Model to detect retinal disorders in OCT images. The suggested model combines SE blocks, EfficientNetB0, and Xception architectures. SE blocks improve the network’s representational ability by adaptively adjusting per-channel feature responses. The combined features from EfficientNetB0 and Xception are processed via fully connected layers and categorized using the Softmax algorithm. The methodology was tested on UCSD and Duke’s datasets and produced excellent results. The proposed SE-Improved Hybrid Model outperformed the best available approaches, achieving an accuracy rate of 99.58% on the UCSD dataset and 99.18% on the Duke dataset. As seen in Tables 4 and 5, the proposed method outperformed the previous approaches. Tables 4 and 5 exhibit performance comparisons for the Duke dataset and the UCSD dataset, respectively. The results suggest that the proposed strategy outperforms alternative approaches. Future work may focus on expanding the model’s validation on diverse OCT datasets and investigating its applicability in real-time clinical settings. Additionally, integrating other feature extraction techniques and further optimizing the SE blocks could enhance model performance, making it a robust tool for broader applications in ophthalmology.

References

  1. 1. Shojaei S, Sabbaghi H, Mehrabi Y, Daftarian N, Etemad K, Ahmadieh H. Vision-related quality of life in patients with inherited retinal dystrophies. J Curr Ophthalmol. 2022;34(1):80–6. pmid:35620379
  2. 2. Saade C, Smith RT. Reticular macular lesions: a review of the phenotypic hallmarks and their clinical significance. Clin Exp Ophthalmol. 2014;42(9):865–74. pmid:24803342
  3. 3. Wang W, Lo ACY. Diabetic retinopathy: pathophysiology and treatments. Int J Mol Sci. 2018;19(6):1816. pmid:29925789
  4. 4. Lee R, Wong TY, Sabanayagam C. Epidemiology of diabetic retinopathy, diabetic macular edema and related vision loss. Eye Vis (Lond). 2015;2:17. pmid:26605370
  5. 5. Stanojević M, Drašković D, Nikolić B. Retinal disease classification based on optical coherence tomography images using convolutional neural networks. J Electron Imaging. 2023;32(3):032004-.
  6. 6. Ayshwarya B, Dhanamalar M, Kumar V, editors. MRI Image Analysis for Brain Tumor Detection Using Convolutional Neural Network. 2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF); 2023: IEEE.
  7. 7. Nguyen T-TP, Ni S, Liang G, Khan S, Wei X, Skalet A, et al. Widefield optical coherence tomography in pediatric retina: a case series of intraoperative applications using a prototype handheld device. Front Med (Lausanne). 2022;9:860371. pmid:35860728
  8. 8. Elsharif A, Abu-Naser S. Retina diseases diagnosis using deep learning. 2022.
  9. 9. Gencer K, Gencer G, Cizmeci İH. Deep learning approaches for retinal image classification: a comparative study of GoogLeNet and ResNet architectures. Int Sci Vocat Stud J. 2024;8(2):123–8.
  10. 10. Gencer K, Gencer G. Hybrid deep learning approach for brain tumor classification using EfficientNetB0 and novel quantum genetic algorithm. PeerJ Computer Science. 2025;11:e2556.
  11. 11. Kermany DS, Goldbaum M, Cai W, Valentim CCS, Liang H, Baxter SL, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 2018;172(5):1122–31.e9. pmid:29474911
  12. 12. Silva LF de J, Cortes OAC, Diniz JOB. A novel ensemble CNN model for COVID-19 classification in computerized tomography scans. Res Control Optim. 2023;11:100215.
  13. 13. Kim J, Tran L, editors. Ensemble learning based on convolutional neural networks for the classification of retinal diseases from optical coherence tomography images. 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS); 2020. IEEE.
  14. 14. Rastogi D, Padhy RP, Sa PK, editors. Detection of retinal disorders in optical coherence tomography using deep learning. 2019 10th International conference on computing, communication and networking technologies (ICCCNT); 2019. IEEE.
  15. 15. Tsuji T, Hirose Y, Fujimori K, Hirose T, Oyama A, Saikawa Y, et al. Classification of optical coherence tomography images using a capsule network. BMC Ophthalmol. 2020;20(1):114. pmid:32192460
  16. 16. Han J, Choi S, Park JI, Hwang JS, Han JM, Ko J, et al. Detecting macular disease based on optical coherence tomography using a deep convolutional network. J Clin Med. 2023;12(3):1005. pmid:36769653
  17. 17. Shatil S, Kabir M. Retinal OCT image classification based on CNN and transfer learning. International Conference on Soft Computing and Pattern Recognition. 2022.
  18. 18. Huang L, He X, Fang L, Rabbani H, Chen X. Automatic classification of retinal optical coherence tomography images with layer guided convolutional neural network. IEEE Signal Process Lett. 2019;26(7):1026–30.
  19. 19. Mehdizadeh M, MacNish C, Xiao D, Alonso-Caneiro D, Kugelman J, Bennamoun M. Deep feature loss to denoise OCT images using deep neural networks. J Biomed Opt. 2021;26(4):046003. pmid:33893726
  20. 20. Hu L, Guo R, Li S, Cao J, Liu Q. Accuracy improvement for classifying retinal OCT images by diseases using deep learning-based selective denoising approach. J Innov Opt Health Sci. 2023;16(06).
  21. 21. Wang M, Zhu W, Yu K, Chen Z, Shi F, Zhou Y, et al. Semi-supervised capsule cGAN for speckle noise reduction in retinal OCT images. IEEE Trans Med Imaging. 2021;40(4):1168–83. pmid:33395391
  22. 22. Hasan M, Alom M, Fatema U, Wahid M. Deep learning based retinal OCT image denoising using generative adversarial network. 2021 International Conference on Automation, Control and Mechatronics for Industry 40 (ACMI). 2021.
  23. 23. Bogacki P, Dziech A. Effective deep learning approach to denoise optical coherence tomography images using BM3D-based preprocessing of the training data including both healthy and pathological cases. IEEE Access. 2023;11:65395–406.
  24. 24. Tasnim N, Hasan M, Islam I. Comparisonal study of deep learning approaches on retinal OCT image. arXiv preprint. 2019.
  25. 25. Zhu W, Qiu P, Chen X, Li X, Lepore N, Dumitrascu OM, et al., editors. nnMobileNet: rethinking CNN for retinopathy research. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2024.
  26. 26. Mittal P, Bhatnagar C. Detection of DME by classification and segmentation using OCT Images. WEB. 2022;19(1):601–12.
  27. 27. Naz S, Ahmed A, Akram MU, Khan SA, editors. Automated segmentation of RPE layer for the detection of age macular degeneration using OCT images. 2016 sixth international conference on image processing theory, tools and applications (IPTA); 2016: IEEE.
  28. 28. Khalid S, Akram M, Jameel A, Khalil T. Automated detection of drusens to diagnose age related macular degeneration using OCT images. Int J Comp Sci Inform Sec. 2016;14(10).
  29. 29. Salaheldin AM, Abdel Wahed M, Talaat M, Saleh N. Deep learning‐based automated detection and grading of papilledema from OCT images: a promising approach for improved clinical diagnosis and management. Int J Imaging Syst Technolo. 2024;34(4):e23133.
  30. 30. Salaheldin AM, Abdel Wahed M, Saleh N. A hybrid model for the detection of retinal disorders using artificial intelligence techniques. Biomed Phys Eng Express. 2024;10(5). pmid:38955139
  31. 31. Kayadibi I, Güraksın GE. An early retinal disease diagnosis system using OCT images via CNN-based stacking ensemble learning. Int J Mult Comp Eng. 2023;21(1):1–25.
  32. 32. Li F, Chen H, Liu Z, Zhang X, Wu Z. Fully automated detection of retinal disorders by image-based deep learning. Graefes Arch Clin Exp Ophthalmol. 2019;257(3):495–505. pmid:30610422
  33. 33. Srinivasan PP, Kim LA, Mettu PS, Cousins SW, Comer GM, Izatt JA, et al. Fully automated detection of diabetic macular edema and dry age-related macular degeneration from optical coherence tomography images. Biomed Opt Express. 2014;5(10):3568–77. pmid:25360373
  34. 34. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Adv Neural Inform Process Syst. 2012;25.
  35. 35. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44. pmid:26017442
  36. 36. Goodfellow I, Bengio Y, Courville A. Regularization for deep learning. Deep Learning. 2016;216–61.
  37. 37. Boureau Y-L, Ponce J, LeCun Y, editors. A theoretical analysis of feature pooling in visual recognition. Proceedings of the 27th international conference on machine learning (ICML-10); 2010.
  38. 38. Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2010;22(10):1345–59.
  39. 39. Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks? Adv Neural Inform Process Syst. 2014;27.
  40. 40. He K, Zhang X, Ren S, Sun J, editors. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016.
  41. 41. Simonyan K. Very deep convolutional networks for large-scale image recognition. arXiv preprint. 2014.
  42. 42. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z, editors. Rethinking the inception architecture for computer vision. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016.
  43. 43. Gulli A, Kapoor A. TensorFlow 1.x deep learning cookbook: over 90 unique recipes to solve artificial-intelligence driven problems with python. 2017.
  44. 44. Sadik R, Majumder A, Biswas AA, Ahammad B, Rahman MdM. An in-depth analysis of Convolutional Neural Network architectures with transfer learning for skin disease diagnosis. Healthc Anal. 2023;3:100143.
  45. 45. Hirahara D, editor Preliminary assessment for the development of CADe system for brain tumor in MRI images utilizing transfer learning in Xception model. 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE); 2019: IEEE.
  46. 46. Mukhlif A, Al-Khateeb B, Mohammed M. Classification of breast cancer images using new transfer learning techniques. Iraqi J Comput Sci Math. 2023;4(1):167–80.
  47. 47. Hu J, Shen L, Sun G, editors. Squeeze-and-excitation networks. Proceedings of the IEEE conference on computer vision and pattern recognition; 2018.
  48. 48. Goodfellow I. Deep learning. MIT Press. 2016.
  49. 49. Kingma DP. Adam: A method for stochastic optimization. arXiv preprint. 2014.
  50. 50. Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta. 1975;405(2):442–51. pmid:1180967