Figures
Abstract
Diabetic retinopathy (DR) is a prominent reason of blindness globally, which is a diagnostically challenging disease owing to the intricate process of its development and the human eye’s complexity, which consists of nearly forty connected components like the retina, iris, optic nerve, and so on. This study proposes a novel approach to the identification of DR employing methods such as synthetic data generation, K- Means Clustering-Based Binary Grey Wolf Optimizer (KCBGWO), and Fully Convolutional Encoder-Decoder Networks (FCEDN). This is achieved using Generative Adversarial Networks (GANs) to generate high-quality synthetic data and transfer learning for accurate feature extraction and classification, integrating these with Extreme Learning Machines (ELM). The substantial evaluation plan we have provided on the IDRiD dataset gives exceptional outcomes, where our proposed model gives 99.87% accuracy and 99.33% sensitivity, while its specificity is 99. 78%. This is why the outcomes of the presented study can be viewed as promising in terms of the further development of the proposed approach for DR diagnosis, as well as in creating a new reference point within the framework of medical image analysis and providing more effective and timely treatments.
Citation: Kamal SA, Du Y, Khalid M, Farrash M, Dhelim S (2024) DRSegNet: A cutting-edge approach to Diabetic Retinopathy segmentation and classification using parameter-aware Nature-Inspired optimization. PLoS ONE 19(12): e0312016. https://doi.org/10.1371/journal.pone.0312016
Editor: Jinran Wu, Australian Catholic University, AUSTRALIA
Received: August 31, 2024; Accepted: September 30, 2024; Published: December 5, 2024
Copyright: © 2024 Kamal et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data supporting this study's findings are publicly available and can be accessed via this link: https://ieee-dataport.org/open-access/indian-diabetic-retinopathy-image-dataset-idrid
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The human eye is a complex organ consisting of almost forty interconnected elements, including the retina, iris, optic nerve, etc, and it can be affected by several diseases such as diabetic retinopathy (DR), glaucoma, cataracts and others. Among the long-term complications of diabetes, DR ranks high and is the second most common cause of blindness in the world, affecting the middle-aged and elderly. This is caused by damage to the tiny vessels that supply blood to the retina, which can develop slowly over time and cause blindness if not treated. A sedentary lifestyle that includes factors such as low physical activity and increased consumption of foods rich in fats and sugars has also contributed to the further spread of the disease. The WHO has estimated that about one billion people in the world are affected by disorders related to eyes. Currently, DR impacts roughly 537 million individuals globally in 2021, with estimations forecasting that the number of people affected will increase to 783 million by 2045 [1–3]. Such conditions must be diagnosed at their early stages and treated because if the retina worsens, it may lead to further complications [4]. DR has been grouped into various categories but ranges from no observable retinopathy to Proliferative Diabetic Retinopathy, where extensive damage of the retina is observed. In the first stage, termed Non-Proliferative Diabetic Retinopathy (NPDR), retinal damage can be observed, but there are no new vessels.
In contrast, in the Proliferative Diabetic Retinopathy (PDR), the retinal damage is severe and new vessels will have formed [5]. Fig 1 presents the typical signs of DR in a retinal fundus image: HE, SE, MA, and H. This detailed visualization also helps identify the extent of DR and, hence, the need for constant monitoring to combat this disabling condition [6].
The following is a brief outline of the development of DR: Fig 2 shows the stages where blood vessels narrow and later become blocked, and may result in complications such as microaneurysms, haemorrhages, and the formation of new blood vessels. This exemplification embraces both NPDR and PDR phases and, thus, illustrates the essential events that transpire at every stage of the disease.
The various factors and stages of DR make it crucial to monitor and accurately assess ophthalmic diseases regularly. Traditional diagnostic techniques face challenges in treating patients in complicated health cases, thus marking artificial intelligence’s significance in medical fields. Sophisticated algorithms, i.e. artificial neural networks and support vector machines [7], are employed frequently because of their ability to analyze complicated patterns of medical images such as DR where there is irregular growth of blood vessels [8–11]. The efficacy of these machine learning models depends on the quality and amount of medical data fed into them, underscoring the significance of acquiring good quality data to enhance the accuracy of diagnosis and condense the complications caused by diseases through early detection.
In addition, there have been advancements in the use of deep learning algorithms in medical diagnostics with the help of image segmentation. These techniques allow for predicting pixel-wise classes of the input images, which may help accurately segment the regions of interest needed to properly identify particular lesions in retinal images and other disease conditions or abnormal findings in medical imaging [12–14]. The development of Convolutional Neural Networks (CNNs) has ushered in great improvements in image processing technologies, improving the ability of automated feature detection and prediction. However, they are not ideal for segmentation tasks where pixel-level classification is needed because of their fully connected layers, which prioritise the class probability rather than the object’s structure [15]. Fully Convolutional Networks (FCNs) have been introduced to overcome these challenges, reducing the traditional CNN structures and substituting the fully connected layers with the convolution and de-convolution layers, thus improving the pixel-level segmentation speed [16, 17].On this basis, FCEDN introduces both the encoder and decoder components, greatly enhancing the segmentation performance since the feature maps of corresponding scales can be extracted and reconstructed to match the original image size in detail and at scale. Models such as FCEDN largely depend on the quality and amount of training data, often manually labelled by physicians. To reduce the problem of data restriction and improve the model’s robustness in unseen data, sophisticated data augmentation methods such as GANs are applied [18]. These models produce photorealistic images that help augment the training data and achieve better and more efficient diagnostic results in the healthcare sector [19–21].
This study presents a new method that complements the features of deep learning techniques, as seen in prior research, while rectifying their limitations on computational time and clinical relevance. In order to illustrate the advantages of our model, we compare it with the listed technologies and emphasize the increase in the speed of the calculations and decrease in the number of computations, and, at the same time, we highlight the possibility of achieving better precision in clinical diagnosis, especially in mammography image inspection.
The science of early diagnosis of the disease still has not grown to fully meet one of the biggest tasks of checking early signs of breast cancer. Such reliance on non-automatic results of mammograms means that the process requires the services of expert radiologists and special equipment, both of which are time-intensive and expensive. While this approach poses some demands of knowledge, it remains vulnerable to the variation of human discretion. Variability may also be the cause of rather substantial discrepancies. However, little ones may be insignificant, and on the other hand, big ones may lead to failure in recognizing early cancer signs or mistaking new findings for benign entities. It does this while offering potential health risks and, at the very least, could cause unnecessary patient stress. These problems demonstrate the present and urgent need to develop and establish more uniform and more applied methods for the interpretation of mammography. To address this need, the work employs complex computational tools to analyse the data to enhance the efficacy and reliability of mammography interpretations. We focus on cutting-edge CAD systems with deep learning and artificial intelligence applications. Modern CAD systems, as opposed to earlier ones, are more advanced and accurate in detecting minimal irregularities in mammography. Instead of conventional CAD technologies, we pay attention to those CAD technologies that are based on artificial intelligence. These innovations are much advanced compared to the methods used for radiological processes over many years [22–25]. However, deep learning, especially the convolutional neural network (CNN), has shown much improvement in image processing in the recent past, allowing tremendous improvements in automatic feature detection and enhanced classification [26].
Although CNNs are widely appreciated for their magnificent image classification performance, they have drawbacks when applied directly to segmentation problems. This shortcoming is associated with fully connected layers in standard CNNs that have to work harder and emphasise the inherent spatial architecture, which is important for correct pixel-level semantic segmentation to provide suboptimal results [27]. To address this problem, scholars then developed FCNs, which significantly shifted to the pixel-level segmentation by changing CNNs’ fully connected layers with the convoluted and deconvolution layers, which enhanced the optimisation of the segmentation processes [8, 15, 16, 28]. As for the design of FCNs, mainly because FCNs are not dense layers, this enables faster training sessions even with fewer parameters. A prototypical FCN includes a Convolution layer, a pooling layer, a rectified linear unit layer, and an un-pooling layer only at the end. However, as shown in the following sections, the FCN architecture is constrained by a non-trainable up-sampling layer, which limits its performance. Training these models with a diverse data set is essential, and GANs are valuable in this regard. Creating realistic medical images with GANs allows data sets for training deep learning models to be extended without exposing individual patients. This extends the model’s convenience in diagnosing different DR conditions in the other groups of patients.
Similarly, while segmentation has improved, knowledge sharing has been added to the deep structured learning toolbox. TL harnesses the benefits from existing abundant CNN models that can extract features from new limited datasets without learning the models from scratch [13, 29, 30]. The approach is useful when working with large-scale labelled data sets, where the features defined by the first layers of CNN can be used in other datasets. After the feature extraction step, the next step uses models such as Extreme Learning Machines (ELM). ELMs, at first developed for single-layer feed-forward networks, were later modified and implemented in deeper structures such as CNNs. These are far more effective in terms of learning curve than conventional deep learning architectures, guaranteeing high classification rates and avoiding deterioration [31]. When thinking about the general workflow for segmenting images and identifying specific areas of interest, one would start with FCNs for more precise pixel-level delineation. When the parameters of FCEDN have been adjusted to the specifications required, and when the training datasets have been enhanced with images generated by GANs, transfer learning methods may be utilized to extract pertinent features from these separated regions. Finally, by considering features, ELM-based classifiers can rapidly and effectively classify them while providing a one-stop solution to complex image processing problems. This research can potentially revolutionise the diagnosis of DR in its early stages. This new approach will establish new standards for objective and efficient medical image analysis in ophthalmology, and subsequently, contribute towards enhanced treatments and improved patient care.
Research objectives
The need to fix the efficacy and performance of fundus image evaluation has formed the basis of research into improved methods. Despite being highly credible due to the involvement of experts, conventional manual evaluations of a student’s performance are tiresome and have inter and intra-rater reliability issues. To fill these gaps, this study proposes to use synthetically generated data together with segmentation models and state-of-the-art classification techniques to enhance the accuracy of FIs’ analysis. Particularly, a Generative Adversarial Network (GAN) is used for the synthetic image generation process, and a Fully Convolutional Encoder-Decoder Network (FCEDN) is chosen for segmentation. Therefore, transfer learning acts as a revolutionary step in which selection models are used to specify feature extraction. The assumption of Extreme Learning Machines (ELM) is backing this progress, which is known to classify information rapidly and accurately. It can be seen that the integration of transfer learning along with ELM in the analysis of machine-driven FIs can be a new, valuable discovery. This pressing requirement in the rapidly evolving medical imaging domain suggests that FIs interpretation should be improved. Adopting complex computational approaches may bring about the need for higher accuracy and faster DR detection. To this end, our research endeavours to derive concrete objectives that will guide the quest of developing an innovative and novel deep-learning framework optimised for perceptive mammogram evaluation. The main objectives of this study are as follows:
- To review the existing practices of traditional FIs interpretation methods as comprehensively as possible, paying most attention to the strengths and possible weaknesses of the existing methods.
- To train deep fusion algorithms for the DR dataset to distinguish between healthy and unhealthy systems, as the benign and malignant ROIs are similar, while the normal and tumour classes differ significantly.
- To achieve the best feature extraction, specifically in cases with limited data available, various novel methodologies should be incorporated into the FIs analysis process, such as a synthetic data generation technique and segmentation models.
- To perform an original classification modality to the trial and error to achieve fast and accurate categorization after the feature extraction phase.
- To compare the results obtained using the combined approach with those obtained when using the regular methods of analyzing mammogram images in terms of accuracy, sensitivity, specificity and time taken.
- To test the proposed methods’ scalability and modularity to include larger data sets and future developments in mammogram imaging technology or other data environments.
This study aims to enhance FIs analysis by incorporating synthetic data generation, segmentation models, and sophisticated classification methods to achieve these objectives.
Key contributions of the research
Specifically, improving the analysis of the Fundus images is highly beneficial in the context of the constantly evolving medical imaging landscape. Diabetic retinopathy (DR) screening is the main critical issue in diabetes care that this study aims to solve by utilizing state-of-the-art computing. It attempts to propose and solve some of the key research problems found in the literature. The goal is to have an automatic retinal fundus image analysis system. Our goals provided the framework for an efficient, accurate, and robust analysis platform that will transform DR screening and identification. The main findings of the article are:
- Nature-Inspired Multi-Enhanced Method and KCBGWO Optimization: New methods for determining the severity of DR features have been proposed and put into practice. One such method is the K-Means Clustering-Based Binary Grey Wolf Optimization (KCBGWO) approach, which is used to fine-tune the parameters of the proposed FCEDN and ELM to improve the performance of segmentation and classification.
- DR-Net Framework Efficiency: Examined how well the DR-Net architecture performed while classifying Fundus pictures using various datasets for multiclass classification.
- Adaptive Histogram Equalization (AHE) and Synthetic Data Generation: Applied AHE to reduce noise and improve image quality in the early stages. Created synthetic pictures based on DR phases where data deficit was solved by transforming them into high-resolution fundus images using GANs, particularly GauGAN.
- FCEDN and Transfer Learning Hyper-Parameter Optimization: A hyper-parameter optimized FCEDN was used for semantic segmentation, improving pixel-by-pixel classification accuracy—an important part of early disease detection. We used transfer learning to extract features for our models, considerably reducing overfitting and training time.
- Fine-Tuning ELM and Extensive Performance Analysis: The final hidden layer of ELM was optimized for DR classifications, proving the model’s functionality. Contrasts the DR-Net with other cutting-edge techniques, demonstrating the DR-Net’s superior performance.
- Impact and Implications: highlighted the prospect of more tailored and targeted treatment plans as well as preventative measures for DR and highlighted the change in diagnosis as a significant development.
The work proposes novel deep learning networks as well as uses advanced GAN-based data augmentation methods to achieve the best results in automating and improving the accuracy of the diagnosis of DR. Adopting the KCBGWO algorithm to optimize the hyperparameters in FCEDN and ELM models is a step-up from previous methods, thus improving the models’ performance and making quality diagnostic tools available to different healthcare facilities.
The article’s outline is as follows: The literature review section unveils major developments in the past through a comprehensive analysis. The proposed research methodology section describes the techniques employed in this research, as described below, as well as the experimental results, and the discussion section explains the findings independently. Lastly, in the conclusion section, the authors provide a summary of the research and the implications resulting from the study.
Literature review
In the medical imaging field, which is rapidly updating, generative models are crucial for extending the often poorly labelled datasets. These models not only enhance the data density of the model but also greatly enhance the diagnostic accuracy and effectiveness of the medical field. It remains common to refer to deep learning as artificial intelligence. Still, it is a subcategory of complex computing techniques that have become the healthcare sector’s foundation, particularly in precision medicine. It has been proven to work in various diseases like retinal diseases [12, 32–36], breast cancer [14, 37, 38], skin cancer [39], arrhythmia [40], Alzheimer’s disease [41], intracranial diseases [42, 43], HIV infections [44], as well as lung cancer [29, 30].
Progresses in DR diagnosis
Some of the significant advancements include the development of a multitude of methods for DR diagnosis based on deep learning; as a result, numerous innovative methodologies have been proposed in the detection of DR. Later at the [45] the multi-scale shallow CNNs were implemented which outperformed all the existing models. Still, these models were very much dependent on the variations in input data. Based on this, [46] that can be seen as laying down some important advances in lesion detection. However, problems remained, especially with micro-aneurysm location because of the fluorescein dye. This has been a good move, as adopted ResNet50 and VGG16 were expeditious in detecting DR lesions. However, the identification of micro-aneurysms because of fluorescein was one of the obstacles that were encountered. To improve the accuracy of DR detection, a 3D-CNN ensemble was implemented. However, this decision was made without significant concerns about the features used [47]. In addition to the many layers, [48] used several types of CNNs; while this strategy offered richness to the computational process, it came with its computational costs.
Moving to the preprocessing, [49] used UM to improve the sensitivity but with the cost of losing image edges. Embracing the possibility of deep learning, [50] attempted the DR class prediction, outlining directions for improvement. [51] An initial attempt at early DR detection using only the dimensionality reduction methodology was remarkably innovative but at the expense of eradicating spatial information. This was adapted from famous architectures [52–54], which created a Siamese-like CNN structure that initially had reasonable levels of success. However, its effectiveness when transferring to other datasets is yet to be determined.
While attempting to establish a high level of exudate detection [55], the researchers obtained fairly good results, but training took a long time. The DeepDR framework originally presented by [56] performed exceptionally well vis-à-vis its specificity and sensitivity, although the algorithm’s resilience when operating within more diverse sets merits a look into. Concerns regarding the uneven Gray Levels manifested minor drawbacks to the CNN approach [57]. Moving to the preprocessing models, [58] tested some potential but were flattered with noisy pictures. For instance, the ML classification model was flexible with [59] but posed maintenance issues. Building upon prior work that aimed to refine classification techniques, the [60] approach obtained satisfactory VTDR grading accuracy. VTDR risk detection was the focus of both [61] and [62], where the need to improve model’s performance and the dataset used were highlighted. [63] The CNN technique, which used a two-stage CNN approach, looked quite successful but also required more processing resources.
Advancements in synthetic image generation and segmentation
Furthermore, deep learning (DL) has assumed the important role of image synthetic data generation in segmenting retinal disease. For example, [64] used Pix2Pix architecture [65] to produce high-resolution synthetic multi-parametric MRI brain images. This process starts with establishing normal brain segmentation maps extracted from T1-weighted scans, followed by a series of linear transformations that add tumour labels to the maps. These improvements result in the creation of high-resolution synthetic MRI images. While extending the usage of generative models, [66] employing GANs to synthesize fake liver lesion ROIs for improving the CNN classification of complex lesions, including cysts, metastases, and hemangiomas. They proposed using three types of DC-GANs, each trained for a specific lesion type, and an AC-GAN that supervised all three lesions altogether [67]. It was also clear that the DC-GAN models performed better than the AC-GAN ones and offered a higher sensitivity and specificity than the basic CNN improvements.
Building on the benefits of generative models, the researchers produced artificial chest X-ray pictures using a DC-GAN [19]. The efficacy of the CNN in identifying disorders, including infiltration, atelectasis, and unremarkable finds on the NIH ChestX-ray14 [68] dataset, was improved by incorporating these generated pictures into the actual images. Incorporating both synthetic and real data significantly increased the classification accuracy and verified the effectiveness of the proposed approach in leveraging synthetic data to enrich actual datasets. [69]used a CycleGAN [70] uniquely to create NCCT from CECT images, improving the capacity to assess the severity of the illness. This shift made the improvement of the segmentation of essential organs including the kidneys, liver, and spleen possible. In addition to streamlining the imaging procedure, the method boosted the Dice scores significantly from 0.535 to 0.747. This underscores the significance of synthetic scans in augmenting the training set and the possibility of noteworthy progress in diagnosis.
Generative models are beneficial in cases where there is a scarcity of images available for a certain ailment, such as Diabetic Retinopathy (DR). [20] introduced DR-GAN, a sophisticated multi-scale U-Net-like network specifically developed for generating high-resolution fundus pictures by leveraging information related to DR stages and lesions information. The synthetic images were utilized in future tasks, like DR grading and lesion segmentation, to enhance the accuracy of the well-known EyePACS [71] and the FGADR [72] dataset. The study [73] employed a two-step methodology, where the initial part utilized ProGAN to construct semantic label maps of the retina arteries. The maps were transformed into realistic retinal pictures using the image-to-image translation network. This approach was extensively trained and verified on the DRIVE and CHASE_DB1 datasets, and it achieved segmentation accuracy that was either comparable to or superior to the current leading methods.
Feature selection and optimization techniques
In the topic of segmentation, several investigations have been carried out; one prominent pioneer is [74]. They used convolutional neural networks (CNNs) to understand the retinal images and the U-Net model to identify the optic disc in VTDR. They concentrated on employing morphological filtering and watershed modification for the detection of OD (Object Detection) and OC (Object Counting) [75]. The exudate detection method used Deep and convolutional networks with SVM classifiers [76]. However, in their investigation of glaucoma optic neuropathy screening, the authors [28] and [77] proposed an ensemble system that effectively integrated both the local and global image levels. The approaches vary with existing methods proposed by Ferreira et al. (2018), Ran et al. (2018), Zhang et al. (2017) and Xu et al. (2020) by proposing segmentation and classification methods for cataract detection. Li et al. extended the previously stated technique by introducing models [76–78] that employ more intricate frameworks.GoogleNet-CAM is utilized to conduct automatic cataract detection. The significance of hyperparameter selection in deep learning networks has been emphasized in [79–81]. In order to address the complex nature of manual tuning, scientists have proposed applying biomimicry-inspired techniques such as quantum-behaved PSO [82] and particle swarm optimization (PSO) [83]. The grey wolf optimization (GWO) algorithm [86] aims to replicate the hunting habits of grey wolves. However, other variations of the Grey Wolf Optimizer (GWO) have been suggested. The primary issue is that the early GWO population is expected to lack a clear direction, as highlighted in [84–88]. This lack of direction may subsequently impact the algorithm’s effectiveness.
Research gap
Among the major challenges identified, one of them is that deep learning models are very sensitive to the input data and variations in them can have a huge effect on the precision and training of the models. However, using advanced models, including deep convolutional neural networks, presents a problem of high computational requirements for a model that may not always be feasible to implement in every clinical setting. Even with all these advancements, there is an issue of segmentation precision, particularly in images with different contrasts and conditions, which greatly affects diagnostic accuracy. The use of Convolutional Neural Networks also presents another major problem, which is the lack of large datasets marked with particular medical conditions such as Diabetes Retinopathy has an impact on how machine learning models are trained. In addition, it would be challenging to consistently get the optimal model performance since deep learning network hyperparameter tweaking is time-consuming and sensitive to human intervention.
Our research proposes a variety of innovative approaches to overcoming these issues:
- Enhanced Synthetic Data Generation: We use cutting-edge Generative Adversarial Networks (GANs) to produce artificial pictures that are more lifelike and better represent the pathological circumstances of the patient. This method trains deep learning models more effectively and becomes more generalizable to other medical problems.
- Optimization Algorithms: They also apply nature-mimicry algorithms such as the Grey Wolf Optimization (GWO) to enable the automation and enhancement of hyperparameter selection. This strategy helps minimize manual tuning and, therefore, improves the efficiency and preciseness of the model training process.
- Advanced Segmentation Techniques: Therefore, the suggested technique seeks to improve the pixel-wise classification performance by utilizing Fully Convolutional Networks (FCNs) and Fully Convolutional Encoder-Decoder Networks (FCEDN). These models are specifically designed to function with medical pictures, yielding segmentation results that are more precise and crisp.
- Utilization of Extreme Learning Machines (ELM): The ELMs are incorporated to improve the detection and classification function after the segmentation stage. As mentioned earlier, these ELMs have a fast learning ability, making them suitable for real-time diagnosis of medical ailments and can easily accommodate large data sets.
It is anticipated that the specific problems identified in the literature will be addressed by these suggested solutions, which will also raise the bar for medical image analysis by increasing its efficiency, speed, and adaptability. While substantial progress has been made, the primary unresolved challenges in the field include the lack of large annotated datasets, computational complexity, and the manual effort required for hyperparameter tuning. Our model addresses these concerns by leveraging synthetic data generation, advanced optimization algorithms, and enhanced segmentation techniques to improve overall performance in medical image recognition.
Proposed research methodology
The study presents a new method for DR that applies multi-class classification to the DR-Net architecture. A block diagram that depicts the steps involved in DR detection is shown in Fig 3. when using the proposed approach. This diagram clearly illustrates how the proposed system efficiently identifies and categorizes DR using DR-Net.
Dataset
The open public dataset Indian Diabetic Retinopathy Image Dataset (IDRID) [89] is used to identify macular oedema and DR from retinal images employing the computerized technique. In this way, this dataset can be viewed as a useful source of information for researchers who focus on developing computer-assisted screening and diagnostic means. IDRID encompass high quality, colour fundus images: IDRID encompasses high quality, colour fundus images:
Regarding the photographs, retinal signs for both DR and macular oedema were labelled, including microaneurysms, haemorrhages and both hard and soft exudates. Such annotations encompass various progressive phases of the disease, including the stages of mild and proliferative DR. IDRID provides retinal images with high resolutions of 4288 x 2848 and 3456 x 2304 pixels; this is critical in grading the disease since features that may be of significance are well spotted. All the photographs are catalogued by professionals who can illustrate the disease’s presence and degree. The IDRID dataset has 134 images without DR, 20 images with MNDR, 84 with M NDPDR, 74 with S NDPDR, and 49 with PDR. In addition, a subset refers to instances of DME, but as a count, it differs from the other subcategories because some cases may belong to them. The segmentation masks in this dataset are designed to achieve precise spatial accuracy for four types of lesions: Soft exudates, hard exudates, haemorrhages, and microaneurysms. As shown in Fig 4, the images included in the IDRID dataset are fundus images (FIs) with their ground truth masks. This dataset is mostly used for modelling purposes to train models, which are used to detect and predict the severity of DR and diabetic macular oedema. Experts utilized this for the final tuning of models with such tasks as transfer learning, data augmentation, and the new state-of-the-art convolutional neural network architectures. This is important as it offers better and well-annotated data sets for improving the field of medical image analysis and, in turn, the patients’ care due to early diagnosis of diseases with high accuracy.
(a) Retinal Lesions in Fundus Images, (b) hard exudates, (c) soft exudates, (d) haemorrhages, and (e) microaneurysms.
Retinal fundus images fusion using GANs
The fundus images are reconstructed using an algorithm known as GauGAN to produce realistic fundus images. GauGAN is based on Generative Adversarial Networks (GANs), VAE-GANs, and class-conditioned Variational Autoencoder (VAE). This structure allows GauGAN to steer the Generator with a style, as shown in Fig 5, which depicts the entire structure during training.
The Encoder is shown in Fig 5(A), which uses the fundus pictures as input to obtain the mean and variance of a Gaussian distribution. Fig 5(B) illustrates how the Generator employs a residual learning architecture. From a Gaussian distribution, it generates the retina fundus pictures using one-hot encoded semantic lesion maps and randomly chosen latent vectors, z2 ∈ Z. Adding stochasticity improves the pictures’ diversity by using a variational sampling strategy. Features from the mask used as a conditioning factor can be easily integrated with the help of the SPADE (SPatially-ADaptivE normalization) approach. For each semantic label map, SPADE effectively learns unique scaling and bias parameters, as shown in Fig 5(C), hence maximizing the spatial adaptation of these parameters. GauGAN uses four loss functions to train the Generator, while one is used for the Discriminator. The first step in training the Generator is calculating the GAN loss, which is determined by dividing the expectation by the Discriminator’s predictions:
(1)
The Generator utilizes Feature Matching Loss to synchronize the feature spaces of the Generator and the Discriminator with the original images, effectively reducing discrepancies in the Discriminator’s assessments of both original and generated images. The loss formula is expressed as:
(2)
Moreover, VGG Feature-Matching Loss is implemented to ensure that the synthetic images closely resemble the real images used in pre-training on ImageNet. This involves comparing the feature map outputs from various layers of a pre-trained VGG-19 model, specifically relu 1_1, relu 2_1, relu 3_1, relu 4_1, and relu 5_1. The corresponding loss function is:
(3)
The Encoder adopts a KL-loss function to ensure the latent vectors adhere to a normal distribution. This is defined as:
(4)
The total loss for the Generator, which combines these components, is calculated as:
(5)
This strategy ensures a balanced optimization across different dimensions of the generative model.
Binary Grey Wolf Optimizer based on K-means clustering-based
As depicted in Fig 6, KCBGWO integrates Binary Grey Wolf Optimizer (BGWO) with K-means clustering, enhancing the optimization process by effectively navigating the search space. The initial population P consists of N grey wolves, each represented by a binary vector. Initialized randomly:
(6)
This random initialization ensures a diverse population that aids in avoiding local optima. The population is divided into KKK clusters via K-means clustering:
(7)
The cluster Ki is epitomized by its centroid Ki, calculated as:
(8)
The centroids help update the search process by guiding wolves towards better solutions. The objective function minimizes the sum of squared distances within clusters:
(9)
Wolves are ranked by fitness f(Xi.j), and roles (α,β,δ,ω) are assigned to the most fit:
(10)
Wolves are given specific roles, which include α, β, δ, and ω. The ranking assures:
(11)
The hunting behaviour of grey wolves inspires the position update. The distance between wolves and prey is calculated using:
(12)
Here, A and C vectors are updated to control exploration and exploitation:
(13)
(14)
These vectors help balance randomization and convergence. The updated position of the wolves is given by.
The updated position of the wolf is as follows:
(15)
The green arrow, representing the ideal result, shows how the equation rearranges each wolf’s location to get them closer to the prey. The step size, determined by the distance from the prey and the caliber of the best solutions, is defined by the word A.K. This updating strategy is akin to how grey wolves hunt, which entails shifting their positions to get closer to their target. The dynamic adjustment ensures that the wolves continuously refine their locations while they hunt for answers, enhancing the quality of the answers.
The sigmoid function is utilized to adjust the continuous dynamics to binary search spaces:
(16)
The updated rule definition of the binary position is:
(17)
This allows smooth transitions between continuous and binary spaces, improving the optimization of binary decision problems. Each wolf’s fitness is evaluated using a problem-specific function. A problem-specific fitness function f is used to evaluate each wolf on an individual basis:
(18)
The KCBGWO algorithm minimizes the Euclidean distance between data points and centroids, ensuring local search efficiency. The parameter a(τ) linearly decreases over iterations, transitioning the search from global exploration to local exploitation:
(19)
Finally, the optimization process stops when the maximum number of iterations T is reached or when the fitness of the alpha wolf meets a predefined threshold.
Finally, the optimization process stops when the maximum number of iterations T is reached or when the fitness of the alpha wolf meets a predefined threshold.
The KCBGWO algorithm is a powerful hyperparameter optimization tool, particularly effective for binary and categorical search spaces. Integrating clustering and binary updates balances exploration and exploitation, offering machine learning experts an efficient method for optimizing algorithm configurations. The pseudocode for this algorithm is presented in Algorithm 1, providing a step-by-step guide for implementation.
Algorithm 1. Pseudocode of KCBGWO
- Input:
- N ‐ Total number of wolves (Population size)
- C ‐ Clusters count
- D ‐ Dimensions of the search-space
- T ‐ Max number of iterations
- f–Fitness-evaluation function
- Output:
- α ‐ Optimal solution discovered
- Procedure:
- 1. Initialize Population:
- Create a population (ρ) with n wolves, each with a D-dimensional binary vector.
- For each wolf: Fi in ρ
- For each dimension d = 1 to D:
- Fi[d] = GenerateBinary()
- 2. Form Clusters:
- Apply K-means to split ρ into K clusters.
- For each cluster Ki:
- Calculate its centroid using the average position of wolves in C_i.
- 3. Assign Roles:
- For each cluster C_i:
- Order wolves by their fitness using f.
- Designate roles: α (Alpha), β (Beta), δ (Delta), ω (Omega) based on fitness order.
- 4. Update Positions:
- For iterations t = 1 to T:
- Adjust control parameter a:
- a = 2 * (1 - τ / T)
- For each wolf Fi in ρ:
- For top roles Fp in {α, β, δ}:
- Set vectors A and C:
- A = 2 * a * GenerateVector() ‐ a
- C = 2 * GenerateVector()
- Compute distance K from Fρ to Fi:
- K = |C * position of Fρ at τ ‐ position of Fi at τ
- Update position for the next iteration:
- Fi[τ+1] = position of Fρ at τ ‐ A * K
- Use a sigmoid for binary transitions:
- For each dimension d = 1 to D:
- probability = Sigmoid((X_α.[d] + X_β.[d] + X_δ.[d]) / 3)
- Fi[d][t+1] = 1 if GenerateRandom() < probability else 0
- Reassess fitness and readjust roles:
- For each wolf Fi in P:
- Update α, β, δ based on new fitness evaluations.
- Reapply K-means to adjust clusters and centroids.
- 5. Finalize:
- Return the α with the best fitness.
- Supporting Functions:
- GenerateBinary() ‐ Outputs 0 or 1 randomly.
- GenerateVector() ‐ Produces a vector with random elements each in [0, 1].
- GenerateRandom() ‐ Yields a random float between 0 and 1.
- Sigmoid(x) ‐ Computes 1 / (1 + exp(-x)).
Enhancing semantic segmentation with FCEDN and KCBGWO
CNNs are effective in classification tasks in computer vision, but their performance for segmentation is limited by their fully connected layers, which ignore spatial information. This leads to poor performance where pixel-level details are required, as the context is not captured. Since the fully connected layers are substituted by convolutional (Conv) and deconvolutional (de-Conv) layers, Fully Convolutional Networks (FCNs) are best suited to address the challenges. With this modification, FCNs may provide pixel-level outputs while maintaining spatial context—a crucial feature for picture segmentation.
FCNs use two main architectures for semantic segmentation: Two main architectures are used by FCNs for semantic segmentation:
Basic FCN Structure: Includes convolutional layer, rectified linear unit, pooling and up-sampling layer. While the Conv and pooling layers help to decrease image size and increase the abstraction of important features, the up-sampling layers bring the image back to its original size. However, upsampling without training the weights for the upsampling process is a limitation that may cause loss of details.
Advanced Encoder-Decoder Setup: The Advanced Encoder-Decoder Configuration in this study incorporates an encoder that conducts downsampling, coupled with a decoder that uses Transpose Convolution (TC) and Upsampling (UP) layers for precise upsampling. Incorporating trainable parameters during the upsampling phases is crucial for enhancing the network’s ability to replicate segmentation outputs with high precision and accuracy.
This Fully Convolutional Encoder-Decoder Network (FCEDN), illustrated in Fig 7, utilizes this sophisticated architecture. The encoder integrates convolutional layers, dropout, and max-pooling for efficient downsampling and feature extraction, reducing spatial resolution effectively. The decoder sequentially employs TC, UP, and additional dropout layers to upscale these features to the input’s original dimensions for precise segmentation. Fine-tuning the hyperparameters of the FCEDN is essential, with the K-Means Clustering-Based Binary Grey Wolf Optimizer (KCBGWO) playing a pivotal role in this process, encompassing four critical stages.
Encoding: Hyperparameters are thus represented in the Euclidean space of k dimensions, with each dimension representing the settings of each layer in the FCEDN.
Population Initialization: A number of potential solution vectors or ‘wolves’ are created, each a hyperparameter vector of different dimensionality.
Fitness Evaluation: Every vector is then analyzed to evaluate how much it enhances the Jaccard coefficient that measures the similarity between predicted and actual segmentations.
Population Update: The set of hyperparameters is improved through performance; the better ones are used to proceed to the next stages.
Besides, this approach helps to tune the FCEDN for the best performance and demonstrates the integration of highly sophisticated neural network designs with cutting-edge optimization algorithms. The end product is a reliable system that can achieve semantic segmentation at high levels of accuracy, which is suitable for various applications that require detailed examination of the spatial distribution of features, for instance, in medical imaging applications.
Feature extraction
The use of deep learning, and in particular Convolutional Neural Networks (CNNs), has expanded significantly in domains such as text analysis, object identification, picture interpretation, and facial recognition. Convolutional layers, pooling layers, activation layers, dropout layers (if applicable), and fully linked layers make up the CNN models of the current generation. This work uses one of the most popular methods, employing CNN models that have already been trained to extract features from DR datasets. Therefore, the model can extract general characteristics from the fundamental convolution layers by using DR datasets, which can facilitate the model’s use in medical image analysis.
We chose and altered four well-known CNN models in order to modify CNN architectures for DR image datasets: VGG16, ResNet50, DenseNet201, and InceptionResNetV2 are among the models. These networks have two primary functions: Adjusting and extracting features. The technique is divided into two phases: feature extraction and fine-tuning. During the feature extraction process, the pre-trained model uses its past knowledge to identify pertinent characteristics in the newly collected data. This data is then fed into a newly trained classifier. Multiple convolutional layers extract various aspects from the input images: higher levels detect abstract and representational properties, mid layers identify textural and form characteristics, and low layers detect minor elements like edges and colours.
This research examines the similarities, contrasts, and potential for integration among these variables, considering the wide variety of DR picture types seen in our sample and compared with other datasets. Specifically, by investigating the feature vector fusion method from multiple pre-trained at different levels, we expect that merging features from the numerous models and the various levels would result in a better and more complete representation.
Several fusion techniques have been devised and examined to test these theories. This approach uses all the tangled layers from the various pre-trained models to extract feaand fuse features. Using only three models, doing away with InceptionResNetV2, and relying solely on the partial convolutional layers, Strategy 2 is comparable to Strategy 1. The final classification technique uses the information in the convolution layers close to the network’s bottom. Finally, but just as importantly, Strategy 4 combines the outputs of the final three ResNet50 blocks, mixing them into feature vectors using maximum pooling to get a more thorough feature representation.
Furthermore, an attention mechanism, an additional component frequently used in semantic segmentation and picture classification, was considered to comprehend subtle aspects. In order to minimize interference from other channels and preserve as much information as possible from various places while considering the relevance of distinct spaces, the FuNet model integrates channel and spatial attention modules. Fig 8 shows the specifics of the FuNet model’s Channel Attention and Spatial Attention organization and structure.
KCBGWO-ELM-based DR classification using
The study also proposes a novel technique for DR diagnosis using KCBGWO-ELM. The ELM [90] is a feed-forward neural network for different computational purposes, including classification, regression, and clustering. The back propagation algorithm that can be used with ELM includes single and multiple hidden layers, but what differentiates it is the static nature of the number of hidden nodes and their parameters which are biases and weights. In contrast to conventional backpropagation approaches and other algorithms that train neural networks that require frequent weight updates and can get trapped in local minima, the ELM parameters can remain constant or be used as supplied. The ELM algorithm works for minimum training error normalized weights for better performance, and some techniques are used to manage local optimum. In Fig 9, you see how the ELM works with the arrows pointing at each step. As for the fast learning ability, ELM usually performs better than those using backpropagation.
For the Single-Hidden Layer Feed-forward Neural Networks (SLFN) with G hidden-nodes and activation function f(x) defined, let a set of H-specific random sample points be defined as (pi, ti), where pi = [pi1, pi2, …, pin] T ∈ Qn And ti = [ti1, ti2, …, tim] T ∈ Qm.
The underlying equation that governs this network is given as follows:
(22)
The weight vector ai = [ai1, ai2, …, ain] T establishes a connection between the ith hidden node and the input nodes, whereas wi = [wi1, wi2, …, win] T linked with the ith hidden node to the output node. The variable ci indicates the threshold value related to ith hidden node and oj = [oj1, oj2, …, ojn] T depicts vector outputs for the jth node generated by SLFN.
SLFNs with G hidden nodes and activation function f(x) can reliably predict a set of H samples with zero (0) error. The equation represents the level of accuracy:
(23)
The equation, as previously mentioned, can be succinctly expressed as follows:
The output matrix from the hidden layer is denoted by "M". Each column of matrix M represents the output produced by the kth hidden node in response to the inputs. y1, y2, and subsequent values up to yH. The solution to the linear system can be represented as:
(28)
In this context, M−1 represents the Moore-Penrose generalized inverse of the matrix M. The output function of the ELM is defined as:
The ELM’s output function is defined as:
(29)
KCBGWO-ELM.
The KCBGWO-ELM model, which successfully classifies DR by combining the FuNet architecture with ELM and KCBGWO, is a unique model introduced in this paper. The following is a concise expression of the suggested model:
In order to extract the relevant characteristics, the DR dataset was first inserted into the FuNet architecture.
Configuring parameters: The population size, the maximum number of iterations, and the number of hidden layers for the ELM were then determined as control parameters. First, biases for the hidden layer and random input weights between -1 and 1 were applied to the population.
GWO Initialization: The vector’s constituent parts which hold the input weights for every possible Grey Wolf Optimizer (GWO) solution—are defined in this step. These parameters are used in the process of optimization.
Fitness evaluation: In order to evaluate the calibre of solutions produced at each optimization process iteration, a cost function is used. The fitness function employed was the Mean Square Error (MSE), which was computed as follows:
(30)
The variables oi, ti, and n represent the observed values, anticipated values, and sample size.
Optimization Progression: The genetic binary grey wolf leader and follower people’s roles were adjusted and changed as the algorithm developed.
Parameter storage involved updating the weights and biases and saving the ideal values when the maximum number of iterations was reached. The procedure was restarted from the GWO startup stage if the maximum was not reached.
Training and Evaluation: The obtained optimized weights and biases were utilized to train the ELM classifier with the training set. Optimization was applied to create the solution vector suggested to reach optimal outcomes. Thus, it was evaluated whether the model fits the data and the possibility of its generalization with the help of the designated test set.
In this paper, we proposed and developed the KCBGWO-ELM method that combines feature extraction, parameter optimization, and classification, which presents a powerful solution to segment and classify images of DR.
Performance evaluation
Evaluating the performance of models in classification tasks is crucial for applying machine learning in practice. This study uses a group of measures suitable for classification analysis and consists of diverse metrics. The following metrics are utilized to assess the model’s picture classification capabilities:
Accuracy (Ac): The proportion of the model’s correct predictions.
Sensitivity (Sn): This reflects the model’s ability to correctly classify positive instances.
Specificity (Sp): Overall, the ability to predict negative instances accurately.
Precision (Pr): The reliability of the leaders’ positive predictions.
In this way, by including all these measures presented in Table 1, we provide a more thorough assessment of the models segmentation and classification ability, facilitate further adjustments, and prepare the model for deployment in real-life environments.
Experimental results and discussion
This section provides the quantitative results of the experiments conducted on the proposed experimental setup using the IDRiD dataset, particularly with the FIs data, to verify the efficiency of the proposed DR Classification method. MATLAB with GPU acceleration was used in the performance of the experiments for the study’s proposed algorithm. The above experiments were performed on the system with intel core i7 13th generation with 32 GB RAM and GeForce RTX 3090 Ti GPU. In order to measure performance, the IDsRID database, which is well-liked in the related field, was employed. The method they suggested for the enhancement of the current technique of power system load forecasting was the BGWO, which was modified by integration of K-means clustering and a weighting factor for the comparison with other algorithms and the existing methods such as the BGWO, the GWO, and the PSO. The above algorithms were implemented in MATLAB language on a high-end computer. The above operation was tested 30 times for each of the algorithms to make the comparison more realistic. The population was initialized to 30, and the maximum number of iterations was fixed to 500 for all the algorithms. Table 2 summarises the key hyperparameters and performance metrics for the BGWO algorithm integrated with K-means clustering and ELM.
Comparative analysis of KCBGWO with baseline methods
This paper evaluates the KCBGWO’s performance and compares it to well-developed metaheuristic algorithms, including the BGWO, GWO, and PSO. To assess the effectiveness of our proposed algorithm, we conducted evaluations using ten established benchmark functions. These functions are split into two categories: unimodal and multimodal. The unimodal group includes Sphere (F1), Schwefel (F2), Schwefel (F3), Schwefel (F4), and Generalized Rosenbrock (F5). For the multimodal category, we analyzed Generalized Schwefel (F6), Rastrigin (F7), Ackley (F8), Griewank (F9), and Generalized Penalized (F10). Each algorithm was analyzed according to the methodological approaches outlined in the original papers on their development. Table 3, which enumerates the performance of the algorithms, shows that KCBGWO yields the best performance compared to the other algorithms for most of the test functions. The results of each function and their specific performance metrics presented in the results table emphasis the effectiveness of KCBGWO in these assessments. As for the separate algorithms, as it could be observed, the BGWO, GWO, and PSO were able to perform specific functions effectively, whereas KCBGWO outperformed them or was more competitive in most of the tasks. Subsequent visualizations like that captured in the expected graph will further show the convergence patterns of these algorithms. These visuals are expected to demonstrate that the KCBGWO algorithm is proficient in managing the trade-off between exploration and exploitation throughout the early and later phases of the optimization procedure.
Data balancing and segmentation results
This paper aims to understand GAN sampling and how it can be used in balancing datasets. The training set provides further information about the imbalance in the data, characterized by the condition that the number of instances in the most frequent class, grade 3, is six times higher than in the least frequent class, grade 0. Thus, the following steps are proposed: To overcome this, GAN sampling was applied in order to generate fake samples for each class so that all classes have the same number of samples as the class with the most samples. Another 25 synthetic images were generated to complement existing examples and make the classes more balanced regarding the presence of GAN-derived samples.
The process was, therefore, one where retina fundus images were generated from lesion maps depicting the DR grades ranging from 0 to 4. Lesion maps in RGB representation were synthesized according to the DR grade. GauGAN was employed to generate synthetic fundus images; thus, the study presented practical examples of utilizing modern image synthesis techniques in the medical field. The image processing starts by performing binarization of the image to enhance the low-intensity regions, which are likely to contain red lesions, converting them into bright white shapes on a black background. Morphological techniques are used to fine-tune these shapes by filling gaps and depicting potential lesions as compact formations to exclude black specks that represent non-essential white noise. Each outlined region is examined for geometric features, including the lesion size, complexity of the borders, and elongation, to distinguish the real red lesion from image artefacts.
For bright lesions, the approach is slightly different but follows the same steps as above, changing the binarisation threshold to highlight the image’s lighter parts. Possible bright lesions are also analyzed through morphological changes and geometrical factors to ensure that false bright areas or other reflections are not mistaken for true lesions. The process ends with Fig 10, which demonstrates the segmentation results and provides an overview of the DR diagnosis process’s detailed and complex series of actions. Subfigure (a) depicts the true retinal image, subfigure (b) provides a synthetic image, and subfigure (c) indicates the detected red lesions (RL) of the retina, which are microaneurysms or haemorrhages, marked as Figure d shows the bright lesions (BL) as detected from the image segmentation process, which is the exudates or cotton wool spots. Subfigure (e) superimposes numerical feature descriptions of the bright lesions’ shape, size, and other measurements. Similarly, arrows and numbers in Subfigure (f) represent the red lesions in terms of numerical values, offering a precise quantitative evaluation of the identified lesions.
(a) Initial retinal image, (b) Enhanced image post-preprocessing, (c) Identification of red lesions (RL), (d) Identification of bright lesions (BL), (e) Quantitative feature labels on bright lesions, and (f) Quantitative feature labels on red lesions.
Classification results
This paper focuses on a complex, innovative approach to the feature extraction based on the GoogLeNet model used in computer vision tasks. The development of the KCBGWO-ELM is the main purpose of this research, and its effectiveness will be verified in this study. Because of such limitations as memory constraints, it became necessary to down-sample by incorporating specifications for the maximum pooling layers in the feature output layer. We obtained a set of CNN-based feature combinations, designated K1 through K4, to obtain a complete picture. Specifically:
- K1, which is based on ResNet50 and uses the full potential of the convolutional layers, includes 2048 features.
- The K2 feature is from the DenseNet201 model, including its entire convolutional structure and 1920 features.
- K3, obtained from the full convolutional basis of the VGG16 model, contains 512 features.
- K4, K4 uses the entire convolutional base of InceptionResNetV2, amounting to 1536 features.
We also found three feature sets—L1, L2, and L3—extracted from the same models that concentrated on particular intermediate layers:
- L1 comprises 1024 distinctive characteristics harvested from the ’conv4_block6_out’ layer within the ResNet50 architecture.
- L2 captures 1792 features sourced from the ’conv4_block6_out’ layer of the DenseNet201 network.
- L3 secures 512 features from the ’block4_pool’ layer associated with the VGG16 model.
Further expanding the array of feature sets, M1, M2, and M3 have been introduced, each sourced from specific intermediate layers unique to their respective models:
- M1amasses 512 features extracted from the ’conv3_block4_out’ layer in the ResNet50 framework.
- M2 gathers 512 features from the ’conv3_block12_concat’ layer found in DenseNet201.
- M3 pulls 256 characteristics from the ’block3_pool’ layer of the VGG16 model.
These enhanced feature collections provide a robust framework for deep analytical applications.The following explains where the sets of features are derived from, how they are extracted, and their dimensionality to help with any tasks involving images and classification.
The study’s heatmaps show the performance metrics of four sophisticated models: For five severity levels Normal, Mild, Moderate, Severe, and PDR—the performance of various forms of ELM, GWO-ELM, BGWO-ELM, and KCBGWO-ELM is compared across four categories, K1, K2, K3, and K4. Each heatmap depicts important metrics: Four individuals are: Precision, the fraction of total examples for which the model produced accurate predictions; An indicator of sensitivity is the proportion of correct results to the total number of results (both positive and negative); Precision: the proportion of correct negative results to the total of both positive and negative results; Precision: the percentage of correct positive predictions out of all the cases that the model classified as positive. In Category K1, ELM has proved to have high accuracy for the Normal severity threshold of 99. 35% but declines as the severity rises, having 93%. 70% for PDR. On the other hand, the proposed model KCBGWO-ELM has relatively higher accuracy, consistently outperforming other models in all severity thresholds with accuracy for a normal of 99. 77% and for PDR 95. 50%. The sensitivity for ELM starts at 99. normality is at 60% and decreases to 93%, confirming the negative effect of mental health on normality at 40% for PDR. On the other hand, KCBGWO-ELM achieves high sensitivity for Normal 99. 70%, and for PDR 95. 30%. Sensitivity for ELM varied with different studies ranging from 98%. 90% (Normal) to 93. 90% (PDR), while KCBGWO-ELM, ranging between 99, achieve the highest values. Normal range 79% to 95%. 70% for PDR. As for the measure of precision, it is slightly lower for ELM and equals 99. 10% (Normal) to 93. For non- Normal data, 80% (PDR), KCBGWO-ELM has high accuracy for Normal 99. 74% and for PDR 95. 60%. Similar trends are observed in Categories K2, K3, and K4, where KCBGWO-ELM outperforms all other models in all evaluated metrics. Fig 11(A) portrays K1, which is a comparison.
For Category K2, the ELM model generally achieves accuracies ranging from 99% to 100%, with a notable exception where the accuracy drops to 98%. The recognition accuracy for typical algorithms is quite low, often around 0% for PDR, whereas the KCBGWO-ELM model demonstrates substantially higher accuracy, achieving 99.85% for Normal and 96.20% for PDR categories. ELM’s sensitivity remains high at 99.75% for Normal and 93.60% for PDR. In contrast, KCBGWO-ELM shows an even higher sensitivity, reaching 99.80% for Normal and 96.50% for PDR. Accuracies for ELM typically span from 99% to 100.10% in the Normal category and 94.40% in PDR, with the lowest false positive rates among the models examined. KCBGWO-ELM outperforms others with the highest specificity scores of 99.90% for Normal and 96.70% for PDR. The precision of ELM varies from 99.30% in Normal to a lower range in PDR, notably 94.10%. For KCBGWO-ELM, precision rates are even higher, with 99.97% in Normal and 96.60% in PDR. The accuracy of muscle units (MUs) identified in Category K3 for ELM oscillates between 99.65% in Normal and 94% in PDR. Several methods, such as PIR-ELM (99.21%), KCE-ELM (98.34%), KCC-ELM (98.94%), KCF-ELM (99.45%), and KCBGWO-ELM (99.71%) show varied effectiveness. The sensitivity for ELM in this category is also high, noted at 99.70% for Normal and 94.40% for PDR, with KCBGWO-ELM achieving even higher rates. In Category K4, ELM starts at a base accuracy of 99.75% for Normal and goes up to 95% for PDR, whereas KCBGWO-ELM peaks with 99.95% for Normal and 97.10% for PDR. The specificity of ELM spans from 99.70% in Normal to 95% in PDR, and for KCBGWO-ELM, it reaches 100% in Normal and 97.20% in PDR. These models are depicted in Fig 11(B)–11(D), allowing for a clear comparison of their performance under various conditions. The enhanced performance of KCBGWO-ELM is visually represented in bar graphs, with darker colours indicating higher performance metrics.
The discussion below outlines various methodologies to improve integrating features from multiple data sources. One effective approach involves applying preprocessing techniques, which transform the data into a format better suited for integration. These methods ensure that the disparate data types are harmonized, enabling more seamless and effective analysis. This paper also discusses other strategies to facilitate FS 1, a fusion strategy combining feature vectors from K1, K2, K3 and K4, leading to a combined vector with a feature-length of 6016. Fusion Strategy 2 (FS 2) combine the features from L1, L2 and L3, and the feature vector length becomes 3328. The Fusion Strategy 3 (FS 3) combines the feature vectors of M1, M2, and M3, which generates a vector of dimension 1280. Res 3 includes features from Res 2 and the previous FS and results in a vector of 2304; Res 4 follows the same pattern concerning Res 3 and has a vector of 2664; Res 5 takes the features from Res 4 and produces a final vector of 3584.
Moreover, the FuNet model also uses a particular fusion technique that combines features from three component models denoted as L1, L2, and L3; the feature extent is augmented here to 7040 through attention blocks. These strategies are significant for the extraction of various and rich characteristic from input images and, therefore, achieving image classification. The research employs different PMs with unique layers to identify important features from the DR image domain. To optimize the fusion strategies implemented above, these were carried out carefully to maximise the classification models’ performance and accuracy. The paper also compares the accuracy of several Machine learning algorithms such as ELM, GWO-ELM, BGWO-ELM, and the proposed KCBGWO-ELM for different stages of DR, including normal, mild, moderate, severe, and proliferative (PDR) stages. Furthermore, the FuNet model is evaluated, concretely comparing it to conventional approaches to demonstrate the advantages of modern fusion techniques in improving diagnostic outcomes depicted in Table 4.
The ELM classifier also proved to give good results in all the stages of diabetic retinopathy (DR). For normal cases, it has an accuracy (ACC) of 90.93%, and for other cases, such as mild, moderate, and severe stages, the accuracy remains impressive with values like 91.75%, 92.65%, and 93.73%, respectively. For PDR cases, the ELM classifier provided an accuracy of 94.82%, a sensitivity (SEN) of 93.93%, a specificity (SPC) of 95.28%, and a precision (PRE) of 94.74%.In the GWO-ELM classifier, the obtained ACC was 90.32% for normal cases, showing good performance. For PDR, the accuracy was slightly higher at 94.54%. This was also evident in the high SEN, SPC, and PRE values, indicating the success of this approach. The GWO-ELM maintained high metrics across all categories with values like 91.57%, 92.40%, and 93.46% for mild, moderate, and severe cases, respectively. The proposed BGWO-ELM classifier retained a high accuracy of at least 90.59% for normal cases and reached 94.17% for PDR cases. It demonstrated that the SEN, SPC, and PRE values remained high across all the stages, signifying that the classifier was reliable, with SEN ranging from 90.03% to 93.69%, SPC from 90.64% to 94.62%, and PRE from 90.46% to 94.08%. According to the experimental results, the proposed KCBGWO-ELM classifier had the highest accuracy for normal cases at 92.82% and for PDR cases at 96.48%. The analysis of the SEN, SPC, and PRE of KCBGWO-ELM also revealed that they were higher, indicating enhanced detection capability. The SEN reached as high as 95.74%, SPC at 96.90%, and PRE at 96.35% for PDR cases, showcasing its effectiveness. The FuNet classifier was found to have the highest overall accuracy compared to other models implemented. For normal cases, FuNet attained an average accuracy of 95.33%, a sensitivity of 94.46%, and a specificity of 94.75%, while the precision was 95.00%. In PDR cases, FuNet achieved an ACC of 98.54%, an SEN of 98.03%, an SPC of 98.82%, and a PRE of 98.42%. Such remarkable performances indicate that the proposed KCBGWO-ELM and FuNet classifiers could accurately detect DR at all the mentioned stages, demonstrating a promising strategy for clinical diagnosis and efficient management of related patients.
The four ROC plots that were produced provide a thorough evaluation of the classifiers’ efficiency using four state-of-the-art approaches. KCBGWO-ELM, ELM, GWO-ELM, and BGWO-ELM are five separate classes that have been proposed: Normal, Mild, Moderate, Severe, and Proliferative Diabetic Retinopathy (PDR) are the levels of RD severity as shown below in Fig 12. Every method was tested on synthetic data generated with different random seeds to cover all aspects of each approach. The first plot for the ELM method showed that all the classifiers performed well. From the results in Fig 12, the Moderate class received the highest AUC of 0. 93, which is highly desirable for diagnostic tests because the instrument has a high discriminative ability. The model’s macro and micro average AUCs, which indicate equal classifier accuracy across all classes, were 0.91. The AUC values of 0 showed that the ROC curves for the other classes similarly had great levels of accuracy. Severe is 90, Normal is 90, and Mild is 0. 92 for PDR and Mild ethnicity. The second plot demonstrated that the suggested technique, GWO-ELM, much improved the classifier. An AUC of 0. 94 was also generated by the classifiers for the Mild, Moderate, and PDR classifications, demonstrating the improved accuracy and dependability of the approach. It is evident from comparing the provided ROC curves that the Normal and Severe classifications likewise had positive outcomes, with an AUC of 0.92. The recommended strategy was more consistent and trustworthy across all categories, as seen by the equal micro and macro average AUC of 0.93. The efficacy of the classifiers was displayed in the third plot of the suggested BGWO-ELM approach, with the moderate class having the highest 0.97AUC. The AUC for the Normal and Severe courses was 0.95, while the AUC for the Mild and PDR classes was 0.95. The micro and macro average AUCs, which are equal to 0.96, stated that the BGWO-ELM offered high performance without compromising between various classes, proving the method’s reliability. Last but not least, the 4th plot of the proposed KCBGWO-ELM was quite promising, where Mild secures 0.97 AUC. Both Normal and Severe classes turned out to have an AUC of 0. 96. The AUC for micro and the macro average were both at 0. 97, meaning that the proposed method has been continuously and accurately classifying the given data.
Table 5 gives a comprehensive analysis of some of the techniques adopted in various categorization tasks, focusing on their performance metrics of precision, recall and F1-measure. It may be noted that in the context of the dynamic field of medical image analysis, FuNEt-based models like ELM, GWO-ELM, BGWO-ELM, and KCBGWO-ELM have offered a tremendous boost to diagnostic efficiency, especially in the case of IDRiD dataset. It would now be precise to say that the proposed model, KCBGWO-ELM, yields the best 99. 8% accuracy, 99.4% sensitivity and 99.9% Specificity. These metrics provide high diagnostic accuracy and an extraordinary ability to classify clinical and health cases.
It is crucial to place this proposed model’s performance in perspective; for this reason, we will compare this model to other remarkable techniques until 2023. For example, a popular SVM model used in 2021 to analyse the IDRiD dataset provided high accuracy, equal to 98%. 06%, specificity of 77% and overall accuracy of 82%, while the second part yielded a sensitivity of 83%, specificity of 77%, and accuracy of 82%. The sensitivity was determined to be 67%, while the specificity of the test was determined to be 100%. Even though the SVM model has a specific value of 1, the sensitivity is much lower than that of the KCBGWO-ELM.Moreover, the accuracy was lower than the CNN+UNet model, developed in 2023, reaching 96. 50% with a specificity of 97 and sensitivity of 89. It also has a sensitivity of 100% and a specificity of 99%. In all aspects, the KCBGWO-ELM model has proved to have better results than the other models.
Other derivates of the proposed model, like the GWO-ELM and the BGWO-ELM, also yield impressive results. The GWO-ELM variant got an accuracy of 0.99. 12%, sensitivity of 98. 95% for the detection of breast cancer, with a sensitivity of 74% and specificity of 99%. 39%, while many of the preceding models are lower. Similarly, as in the case of the previous algorithm, its variant, namely the BGWO-ELM, was characterized by an accuracy of 99. 23%, sensitivity of 98. 75%, with a sensitivity of 100% and a specificity of 99. 51%, which can be seen as a better result than the performance indicators of other networks, such as 2023’s GNN and 2019’s R-CNN, which showed worse results in such measures.
Compared to the more modern methods, such as the supervised contrastive learning of the image data in 2023, which this paper was based on, it got an accuracy of 98. Overall, the accuracy of the proposed KCBGWO-ELM reaches 91%, and once again, the effectiveness of the developed algorithm in medical image diagnosis is of high level. This level of analysis establishes the FuNEt-based models as the innovation that defined the future of the field and created avenues for higher accuracy, sensitivity, and specificity, paving the way for the use of the models in actual clinical practice.
Conclusion
This study proposes a novel approach for screening and classifying diabetic retinopathy utilizing modern neural network systems. Applying our proposed approach of combining Fully Convolutional Encoder-Decoder Networks (FCEDN) with K-Means Clustering- Binary Grey Wolf Optimizer (KCBGWO), the performance and accuracy of retinal image segmentation as well as analysis is greatly improved. Generative Adversarial Network (GANs) for synthetic data generation and transfer learning for feature extraction enhances the approach, providing unmatched reliability and robustness. Experimental results on the IDRiD dataset confirm the effectiveness of our proposed KCBGWO-ELM model for obtaining high accuracy, sensitivity, and specificity in DR diagnoses. These results reaffirm the efficacy and applicability of this method in reaching high accuracy in DR diagnosis early enough and in scalable methodologies, thus guaranteeing the patient better results. However, despite these promising results, our work has certain limitations. The dependency on high-quality annotated datasets like IDRiD restricts the generalizability of our model to other datasets and real-world scenarios with less comprehensive annotations. Additionally, the computational complexity of our approach may limit its deployment in resource-constrained environments.
Future work will address these limitations by incorporating more diverse and significant datasets to enhance the model’s robustness and generalizability. We also plan to explore the application of our framework to other ophthalmic conditions, leveraging advanced optimization techniques to boost performance further. Moreover, efforts will be directed towards optimizing the computational efficiency of our model, making it more accessible for broader clinical use, including in low-resource settings. This will ensure that our innovative approach can truly revolutionize DR detection and contribute to more effective and timely treatment strategies.
References
- 1. Ogurtsova K, da Rocha Fernandes JD, Huang Y, Linnenkamp U, Guariguata L, Cho NH, et al. IDF Diabetes Atlas: Global estimates for the prevalence of diabetes for 2015 and 2040. Diabetes Res Clin Pract. 2017;128. pmid:28437734
- 2. Cho NH, Shaw JE, Karuranga S, Huang Y, da Rocha Fernandes JD, Ohlrogge AW, et al. IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045. Diabetes Res Clin Pract. 2018;138. pmid:29496507
- 3. Sun H, Saeedi P, Karuranga S, Pinkepank M, Ogurtsova K, Duncan BB, et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022;183. pmid:34879977
- 4. Pourghaffari Lahiji A, Gohari M, Mirzaei S, Nasiriani K. The effect of implementation of evidence-based eye care protocol for patients in the intensive care units on superficial eye disorders. BMC Ophthalmol. 2021;21. pmid:34256729
- 5. Wilkinson CP, Ferris FL, Klein RE, Lee PP, Agardh CD, Davis M, et al. Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales. Ophthalmology. 2003;110. pmid:13129861
- 6. Bilal A, Sun G, Mazhar S. Survey on recent developments in automatic detection of diabetic retinopathy. Journal Francais d’Ophtalmologie. 2021. pp. 420–440. pmid:33526268
- 7. Bilal A, Sun G, Li Y, Mazhar S, Khan AQ. Diabetic Retinopathy Detection and Classification Using Mixed Models for a Disease Grading Database. IEEE Access. 2021;9: 23544–23553.
- 8. Bilal A, Liu X, Shafiq M, Ahmed Z, Long H. NIMEQ-SACNet: A novel self-attention precision medicine model for vision-threatening diabetic retinopathy using image data. Comput Biol Med. 2024;171: 108099. pmid:38364659
- 9. Bilal A.; Liu X.; Baig T.I.; Long H.; Shafiq M. EdgeSVDNet: 5G-Enabled Detection and Classification of Vision-Threatening Diabetic Retinopathy in Retinal Fundus Images. Electronics. 2023;12: 4094.
- 10. Voets M, Møllersen K, Bongo LA. Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. PLoS One. 2019;14. pmid:31170223
- 11.
Anas Bilal1, Azhar Imran2, Talha Imtiaz Baig3, 4, Xiaowen Liu1, Haixia Long1,*, Abdulkareem Alzahrani5 MS. Improved Support Vector Machine based on CNN-SVD for Vision-Threatening Diabetic Retinopathy Detection and Classification. PLoS One. 2024.
- 12. Bilal A, Sun G, Mazhar S. Diabetic Retinopathy detection using Weighted Filters and Classification using CNN. 2021 Int Conf Intell Technol CONIT 2021. 2021.
- 13. Bilal A, Sun G, Mazhar S, Imran A, Latif J. A Transfer Learning and U-Net-based automatic detection of diabetic retinopathy from fundus images. Comput Methods Biomech Biomed Eng Imaging Vis. 2022; 1–12.
- 14. Bilal A, Imran A, Liu X, Liu X, Ahmed Z, Shafiq M, et al. BC-QNet: A Quantum-Infused ELM Model for Breast Cancer Diagnosis. Comput Biol Med. 2024; 108483. pmid:38704900
- 15. Baykal Kablan E, Dogan H, Ercin ME, Ersoz S, Ekinci M. An ensemble of fine-tuned fully convolutional neural networks for pleural effusion cell nuclei segmentation. Comput Electr Eng. 2020;81.
- 16. Badrinarayanan V, Kendall A, Cipolla R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans Pattern Anal Mach Intell. 2017;39. pmid:28060704
- 17. Wang D, Li Z, Dey N, Ashour AS, Moraru L, Simon Sherratt R, et al. Deep-segmentation of plantar pressure images incorporating fully convolutional neural networks. Biocybern Biomed Eng. 2020;40.
- 18. Karras T, Laine S, Aila T. A Style-Based Generator Architecture for Generative Adversarial Networks. IEEE Trans Pattern Anal Mach Intell. 2021;43. pmid:32012000
- 19. Bhattacharya D, Banerjee S, Bhattacharya S, Uma Shankar B, Mitra S. GAN-Based Novel Approach for Data Augmentation with Improved Disease Classification. 2020.
- 20. Liao K, Lin C, Zhao Y, Gabbouj M. DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time. IEEE Trans Circuits Syst Video Technol. 2020;30.
- 21. Tan H, Liu X, Yin B, Li X. DR-GAN: Distribution Regularization for Text-to-Image Generation. IEEE Trans Neural Networks Learn Syst. 2023;34. pmid:35442894
- 22. Tiwari D, Dixit M, Gupta K. Breast cancer-caps: a breast cancer screening system based on capsule network utilizing the multiview breast thermal infrared images. Turkish J Electr Eng Comput Sci. 2022;30.
- 23. Görgün AR, Baytore C, Comlekci S, Tuglu MI, Kaya A. Microwave hyperthermia application with bioimplant single slot coaxial antenna design for mouse breast cancer treatment. Turkish J Electr Eng Comput Sci. 2022;30.
- 24. Kalra M, Kumar V, Kaur M, Idris SA, Öztürk Ş, Alshazly H. A novel binary emperor penguin optimizer for feature selection tasks. Comput Mater Contin. 2022;70.
- 25. Öztürk Ş, Akdemir B. HIC-net: A deep convolutional neural network model for classification of histopathological breast images. Comput Electr Eng. 2019;76.
- 26. Ozturk S, Akdemir B. Automatic leaf segmentation using grey Wolf optimizer based neural network. Proceedings of the 21st International Conference on Electronics. 2017.
- 27. Bittner K, Cui S, Reinartz P. Building extraction from remote sensing data using fully convolutional networks. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences—ISPRS Archives. 2017.
- 28. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2015.
- 29. Anas Bilal, Muhammad Shafiq, Fang Fang, Muhammad Waqar, Inam Ullah, Yazeed Yasin Ghadi, et al. IGWO-IVNet3: DL-Based Automatic Diagnosis of Lung Nodules Using an Improved Gray Wolf Optimization and InceptionNet-V3. Sensors (Switzerland). 2022. https://doi.org/10.3390/s22249603
- 30. Bilal A, Sun G, Li Y, Mazhar S, Latif J. Lung nodules detection using grey wolf optimization by weighted filters and classification using CNN. J Chinese Inst Eng Trans Chinese Inst Eng A. 2022;45.
- 31. Wang J, Lu S, Wang SH, Zhang YD. A review on extreme learning machine. Multimed Tools Appl. 2022;81.
- 32. Gao W, Fan B, Fang Y, Song N. Lightweight and multi-lesion segmentation model for diabetic retinopathy based on the fusion of mixed attention and ghost feature mapping. Comput Biol Med. 2024;169: 107854. pmid:38109836
- 33. Bilal A, Zhu L, Deng A, Lu H, Wu N. AI-Based Automatic Detection and Classification of Diabetic Retinopathy Using U-Net and Deep Learning. Symmetry (Basel). 2022;14.
- 34. Bilal A, Sun G, Mazhar S, Imran A. Improved Grey Wolf Optimization-Based Feature Selection and Classification Using CNN for Diabetic Retinopathy Detection. Lect Notes Data Eng Commun Technol. 2022;116: 1–14.
- 35. Singh LK, Khanna M, Garg H SR. Efficient feature selection based novel clinical decision support system for glaucoma prediction from retinal fundus images. Med Eng Phys. 2024;1: 104077. pmid:38365344
- 36. Khanna M, Singh LK, Thawkar S, Goyal M. Deep learning based computer-aided automatic prediction and grading system for diabetic retinopathy. Multimed Tools Appl. 2023.
- 37. Singh LK SK. An enhanced and efficient approach for feature selection for chronic human disease prediction: A breast cancer study. Heliyon. 2024;40.
- 38. Wetstein SC, de Jong VMT, Stathonikos N, Opdam M, Dackus GMHE, Pluim JPW, et al. Deep learning-based breast cancer grading and survival analysis on whole-slide histopathology images. Sci Rep. 2022;12. pmid:36068311
- 39. Alam TM, Shaukat K, Khan WA, Hameed IA, Almuqren LA, Raza MA, et al. An Efficient Deep Learning-Based Skin Cancer Classifier for an Imbalanced Dataset. Diagnostics. 2022;12. pmid:36140516
- 40. Madan P, Singh V, Singh DP, Diwakar M, Pant B, Kishor A. A Hybrid Deep Learning Approach for ECG-Based Arrhythmia Classification. Bioengineering. 2022;9. pmid:35447712
- 41. Duc NT, Ryu S, Qureshi MNI, Choi M, Lee KH, Lee B. 3D-Deep Learning Based Automatic Diagnosis of Alzheimer’s Disease with Joint MMSE Prediction Using Resting-State fMRI. Neuroinformatics. 2020;18. pmid:31093956
- 42. Singh LK, Khanna M, Monga H PG. Nature-inspired algorithms-based optimal features selection strategy for COVID-19 detection using medical images. New Gener Comput. 2024;10: 1–64.
- 43. Shi Z, Miao C, Schoepf UJ, Savage RH, Dargis DM, Pan C, et al. A clinically applicable deep-learning model for detecting intracranial aneurysm in computed tomography angiography images. Nat Commun. 2020;11. pmid:33257700
- 44. Bilal A, Sun G, Mazhar S, Junjie Z. Neuro-optimized numerical treatment of HIV infection model. Int J Biomath. 2021;14.
- 45. Chen W, Yang B, Li J, Wang J. An Approach to Detecting Diabetic Retinopathy Based on Integrated Shallow Convolutional Neural Networks. IEEE Access. 2020;8.
- 46. Pan X, Jin K, Cao J, Liu Z, Wu J, You K, et al. Multi-label classification of retinal lesions in diabetic retinopathy for automatic analysis of fundus fluorescein angiography based on deep learning. Graefe’s Arch Clin Exp Ophthalmol. 2020;258. pmid:31932886
- 47. Tymchenko B, Marchenko P, Spodarets D. Deep learning approach to diabetic retinopathy detection. ICPRAM 2020 ‐ Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods. 2020.
- 48. Qummar S, Khan FG, Shah S, Khan A, Shamshirband S, Rehman ZU, et al. A Deep Learning Ensemble Approach for Diabetic Retinopathy Detection. IEEE Access. 2019;7.
- 49. Pao SI, Lin HZ, Chien KH, Tai MC, Chen JT, Lin GM. Detection of Diabetic Retinopathy Using Bichannel Convolutional Neural Network. J Ophthalmol. 2020;2020. pmid:32655944
- 50. de la Torre J, Valls A, Puig D. A deep learning interpretable classifier for diabetic retinopathy disease grading. Neurocomputing. 2020;396.
- 51. Gadekallu TR, Khare N, Bhattacharya S, Singh S, Maddikunta PKR, Ra IH, et al. Early detection of diabetic retinopathy using pca-firefly based deep learning model. Electron. 2020;9.
- 52. Ju Y, Jian M, Wang C, Zhang C, Dong J, Lam KM. Estimating High-resolution Surface Normals via Low-resolution Photometric Stereo Images. IEEE Trans Circuits Syst Video Technol. 2023.
- 53. Ju Y, Shi B, Chen Y, Zhou H, Dong J, Lam K-M. GR-PSN: Learning to Estimate Surface Normal and Reconstruct Photometric Stereo Images. IEEE Trans Vis Comput Graph. 2023; 1–16.
- 54. Zeng X, Chen H, Luo Y, Ye W. Automated diabetic retinopathy detection based on binocular siamese-like convolutional neural network. IEEE Access. 2019;7.
- 55. Mateen M, Wen J, Nasrullah N, Sun S, Hayat S. Exudate Detection for Diabetic Retinopathy Using Pretrained Convolutional Neural Networks. Complexity. 2020;2020.
- 56. Zhang W, Zhong J, Yang S, Gao Z, Hu J, Chen Y, et al. Automated identification and grading system of diabetic retinopathy using deep neural networks. Knowledge-Based Syst. 2019;175.
- 57. Samanta A, Saha A, Satapathy SC, Fernandes SL, Zhang YD. Automated detection of diabetic retinopathy using convolutional neural networks on a small dataset. Pattern Recognit Lett. 2020;135.
- 58. Bibi I, Mir J, Raja G. Automated detection of diabetic retinopathy in fundus images using fused features. Phys Eng Sci Med. 2020;43. pmid:32955686
- 59. Math L, Fatima R. Adaptive machine learning classification for diabetic retinopathy. Multimed Tools Appl. 2021;80.
- 60. Rekhi RS, Issac A, Dutta MK. Automated detection and grading of diabetic macular edema from digital colour fundus images. 2017 4th IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics, UPCON 2017. 2017.
- 61. Marin D, Gegundez-Arias ME, Ponte B, Alvarez F, Garrido J, Ortega C, et al. An exudate detection method for diagnosis risk of diabetic macular edema in retinal images using feature-based and supervised classification. Med Biol Eng Comput. 2018;56. pmid:29318442
- 62. Kunwar A, Magotra S, Sarathi MP. Detection of high-risk macular edema using texture features and classification using SVM classifier. 2015 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2015. 2015.
- 63. Perdomo O, Otalora S, Rodríguez F, Arevalo J, González FA. A Novel Machine Learning Model Based on Exudate Localization to Detect Diabetic Macular Edema. 2017.
- 64. Shin HC, Tenenholtz NA, Rogers JK, Schwarz CG, Senjem ML, Gunter JL, et al. Medical image synthesis for data augmentation and anonymization using generative adversarial networks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2018.
- 65. Mishra P, Herrmann I. GAN meets chemometrics: Segmenting spectral images with pixel2pixel image translation with conditional generative adversarial networks. Chemom Intell Lab Syst. 2021;215.
- 66. Frid-Adar M, Klang E, Amitai M, Goldberger J, Greenspan H. Synthetic data augmentation using GAN for improved liver lesion classification. Proceedings ‐ International Symposium on Biomedical Imaging. 2018.
- 67. Kumaar MA, Samiayya D, Rajinikanth V, Vincent P M DR, Kadry S. Brain Tumor Classification Using a Pre-Trained Auxiliary Classifying Style-Based Generative Adversarial Network. Int J Interact Multimed Artif Intell. 2023;In Press.
- 68. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. 2017.
- 69. Sandfort V, Yan K, Pickhardt PJ, Summers RM. Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks. Sci Rep. 2019;9. pmid:31729403
- 70. Zhu JY, Park T, Isola P, Efros AA. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. Proceedings of the IEEE International Conference on Computer Vision. 2017.
- 71.
EyePACS-1. In: The world of eyepacs [Internet]. 2015. Available: http://www.eyepacs.com
- 72. Zhou Y, Wang B, Huang L, Cui S, Shao L. A Benchmark for Studying Diabetic Retinopathy: Segmentation, Grading, and Transferability. IEEE Trans Med Imaging. 2021;40. pmid:33180722
- 73. Andreini P, Ciano G, Bonechi S, Graziani C, Lachi V, Mecocci A, et al. A two-stage GAN for high-resolution retinal image generation and segmentation. Electron. 2022;11.
- 74. Wang L, Liu H, Lu Y, Chen H, Zhang J, Pu J. A coarse-to-fine deep learning framework for optic disc segmentation in fundus images. Biomed Signal Process Control. 2019;51. pmid:33850515
- 75. Kwasigroch A, Jarzembinski B, Grochowski M. Deep CNN based decision support system for detection and assessing the stage of diabetic retinopathy. 2018 International Interdisciplinary PhD Workshop, IIPhDW 2018. 2018.
- 76. Li J, Xu X, Guan Y, Imran A, Liu B, Zhang L, et al. Automatic Cataract Diagnosis by Image-Based Interpretability. Proceedings ‐ 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018. 2019.
- 77. Dong Y, Zhang Q, Qiao Z, Yang JJ. Classification of cataract fundus image based on deep learning. IST 2017 ‐ IEEE International Conference on Imaging Systems and Techniques, Proceedings. 2017.
- 78. Li J, Xie L, Zhang L, Liu L, Li P, Yang J jiang, et al. Interpretable Learning: A Result-Oriented Explanation for Automatic Cataract Detection. Lecture Notes in Electrical Engineering. 2019.
- 79. Aziz RM, Mahto R, Goel K, Das A, Kumar P, Saxena A. Modified Genetic Algorithm with Deep Learning for Fraud Transactions of Ethereum Smart Contract. Appl Sci. 2023;13.
- 80. Saxena A. An efficient harmonic estimator design based on Augmented Crow Search Algorithm in noisy environment. Expert Syst Appl. 2022;194.
- 81. Mohakud R, Dash R. Survey on Hyperparameter Optimization Using Nature-Inspired Algorithm of Deep Convolution Neural Network. Smart Innovation, Systems and Technologies. 2021.
- 82. Li Y, Xiao J, Chen Y, Jiao L. Evolving deep convolutional neural networks by quantum behaved particle swarm optimization with binary encoding for image classification. Neurocomputing. 2019;362.
- 83. Wang Y, Zhang H, Zhang G. cPSO-CNN: An efficient PSO-based algorithm for fine-tuning hyper-parameters of convolutional neural networks. Swarm Evol Comput. 2019;49.
- 84. Kim S, Lee S, Choi J Il, Cho H. Binary genetic algorithm for optimal joinpoint detection: Application to cancer trend analysis. Stat Med. 2021;40. pmid:33205511
- 85. Rahnamayan S, Tizhoosh HR, Salama MM. Opposition-based differential evolution. Stud Comput Intell. 2008;143.
- 86. Rahnamayan S, Tizhoosh HR, Salama MMA. A novel population initialization method for accelerating evolutionary algorithms. Comput Math with Appl. 2007;53.
- 87. Gao W feng, Huang L ling, Wang J, Liu S yang, Qin C dong. Enhanced artificial bee colony algorithm through differential evolution. Appl Soft Comput J. 2016;48.
- 88. Shaheen MAM, Hasanien HM, Alkuhayli A. A novel hybrid GWO-PSO optimization technique for optimal reactive power dispatch problem solution. Ain Shams Eng J. 2021;12.
- 89. Porwal P, Pachade S, Kamble R, Kokare M, Deshmukh G, Sahasrabuddhe V, et al. Indian diabetic retinopathy image dataset (IDRiD): A database for diabetic retinopathy screening research. Data. 2018;3.
- 90. Ding S, Zhao H, Zhang Y, Xu X, Nie R. Extreme learning machine: algorithm, theory and applications. Artif Intell Rev. 2015;44.
- 91. Harangi B, Toth J, Baran A, Hajdu A. Automatic screening of fundus images using a combination of convolutional neural network and hand-crafted features. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS. 2019. pmid:31946452
- 92. Sakaguchi A, Wu R, Kamata S ichiro. Fundus image classification for diabetic retinopathy using disease severity grading. ACM International Conference Proceeding Series. 2019.
- 93. Kind A, Azzopardi G. An Explainable AI-Based Computer Aided Detection System for Diabetic Retinopathy Using Retinal Fundus Images. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2019.
- 94. Li X, Hu X, Yu L, Zhu L, Fu CW, Heng PA. CANet: Cross-Disease Attention Network for Joint Diabetic Retinopathy and Diabetic Macular Edema Grading. IEEE Trans Med Imaging. 2020;39. pmid:31714219
- 95. Elswah DK, Elnakib AA, El-Din Moustafa H. Automated Diabetic Retinopathy Grading using Resnet. National Radio Science Conference, NRSC, Proceedings. 2020.
- 96. Saranya P, Prabakaran S. Automatic detection of non-proliferative diabetic retinopathy in retinal fundus images using convolution neural network. J Ambient Intell Humaniz Comput. 2020.
- 97. Alcalá-Rmz V, Maeda-Gutiérrez V, Zanella-Calzada LA, Valladares-Salgado A, Celaya-Padilla JM, Galván-Tejada CE. Convolutional Neural Network for Classification of Diabetic Retinopathy Grade. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2020.
- 98. Shaukat N, Amin J, Sharif M, Azam F, Kadry S, Krishnamoorthy S. Three-Dimensional Semantic Segmentation of Diabetic Retinopathy Lesions and Grading Using Transfer Learning. J Pers Med. 2022;12. pmid:36143239
- 99. Jiwani N, Gupta K, Afreen N. A Convolutional Neural Network Approach for Diabetic Retinopathy Classification. Proceedings ‐ 2022 IEEE 11th International Conference on Communication Systems and Network Technologies, CSNT 2022. 2022.
- 100. Albadr MAA, Ayob M, Tiun S, AL-Dhief FT, Hasan MK. Gray wolf optimization-extreme learning machine approach for diabetic retinopathy detection. Front Public Heal. 2022;10. pmid:35979449
- 101. Chandran J. Jasper Gnana, J. Jabez SS. Auto-Metric Graph Neural Network optimized with Capuchin search optimization algorithm for coinciding diabetic retinopathy and diabetic Macular edema grading. Biomed Signal Process Control. 2023;80: 104386.
- 102. Saranya P, Pranati R, Patro SS. Detection and classification of red lesions from retinal images for diabetic retinopathy detection using deep learning models. Multimed Tools Appl. 2023.
- 103. Nasajpour M, Karakaya M, Pouriyeh S, Parizi RM. Federated Transfer Learning For Diabetic Retinopathy Detection Using CNN Architectures. Conference Proceedings—IEEE SOUTHEASTCON. 2022.
- 104. Zhu D, Ge A, Chen X, Wang Q, Wu J, Liu S. Supervised Contrastive Learning with Angular Margin for the Detection and Grading of Diabetic Retinopathy. Diagnostics. 2023;13. pmid:37510133
- 105. Ashwini K, Dash R. Grading diabetic retinopathy using multiresolution based CNN. Biomed Signal Process Control. 2023;86.
- 106. Oulhadj M, Riffi J, Khodriss C, Mahraz AM, Bennis A, Yahyaouy A, et al. Diabetic Retinopathy Prediction Based on Wavelet Decomposition and Modified Capsule Network. J Digit Imaging. 2023;36. pmid:36973632
- 107. Desika Vinayaki V, Kalaiselvi R. AHO-MLCNN: archerfish hunting optimisation based modified lightweight CNN for diabetic retinopathy detection. Comput Methods Biomech Biomed Eng Imaging Vis. 2023;11.
- 108. Raiaan MAK, Fatema K, Khan IU, Azam S, Rashid MRU, Mukta MSH, et al. A Lightweight Robust Deep Learning Model Gained High Accuracy in Classifying a Wide Range of Diabetic Retinopathy Images. IEEE Access. 2023;11.