Figures
Abstract
This paper provides an exploratory analysis of underwater video analysis techniques to enhance image quality and facilitate accurate classification of different marine species. Our methodology progresses through several steps, beginning with the quality of underwater images that might be reduced by variables such as decreased light intensity, color modification, and limited visibility. These attributes pose significant challenges to develop accurate object detection methods. This paper outlines the processing pipeline employed to enhance the quality of images from underwater videos and facilitate precise object detection. First, we use the Gray World (GW) algorithm for image enhancement, effectively mitigating the challenges of aquatic environment, such as color distortion and low contrast. Subsequently, we compare the traditional Histogram Equalization (HE) and the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithms to assess their efficacy in enhancing underwater image quality. Next, Canny Edge Detection is utilized to identify the prominent features in the enhanced images, aiding in subsequent classification tasks. Next, three state-of-the-art deep learning models, Visual Geometry Group 16-layer network (VGG16), 50-layer Residual Network (ResNet50), and 121-layer Densely Connected Convolutional Network (DenseNet121), are leveraged through transfer learning to classify underwater species, including fish, coral reefs, and sea turtles. Finally, by enhancing the visual quality of underwater images, our research contributes to better understanding of the underwater ecosystem and supports conservation efforts. Enhanced Super-Resolution GAN (ESRGAN) is a superior Generative Adversarial Network (GAN) technique to improve the quality of noisy images. This paper contributes to advancing the field of underwater image and video analysis, offering valuable insights for applications in marine biology, environmental monitoring, underwater robotics, and autonomous navigation.
Citation: Homoud A, Das S (2026) GAN-based underwater image enhancement and scene classification using transfer learning. PLoS One 21(3): e0345593. https://doi.org/10.1371/journal.pone.0345593
Editor: Don S, Amrita Vishwa Vidyapeetham, INDIA
Received: September 20, 2025; Accepted: March 6, 2026; Published: March 27, 2026
Copyright: © 2026 Homoud, Das. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript.
Funding: Saudi Arabia Cultural Bureau in London. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
The ocean is a crucial component of the Earth’s surface that provides humanity with food, medicine, raw materials, tourism, and ecological research opportunities. Marine organisms play a crucial role in the oxygen cycle of the earth, climate regulation, and garbage decomposition, monitoring of fish biodiversity which are essential elements in marine research. Basic information for marine analysis includes fish abundance, species distribution, sex ratio, and behavioral traits. Scientists can also predict climate change in a particular area by analyzing changes in the population size of a specific marine species [1]. Furthermore, classifying fish helps monitor their movement patterns and migration trends, contributing to a more thorough understanding of different species. However, using oceanographers to manually process and analyze video data is time-consuming, subjective, and labor-intensive [2]. Therefore, this research on automated underwater fish detection and video analysis has promising future prospects [3]. Research based on extensive underwater data emphasizes the importance of using large robust, accurately labeled, and unbiased dataset [4]. The underwater environment typically presents several image processing challenges, such as low resolution, inadequate lighting, intricate backgrounds, obstruction, distortion due to camera movement, and the resemblance between various fish species [5,6].
Advanced algorithms improved the effectiveness of image processing models for underwater scene classification tasks as they prevent the model from becoming complex even for low-quality images [7]. Various enhancement techniques have been proposed in the scientific literature such as HE and the CLAHE algorithms, which aim to improve the overall contrast by redistributing pixel intensities. GW algorithm used for color correction attempts to compensate for wavelength dependent attenuation by averaging RGB (Red-Green-Blue) channels [8,9]. These methods are known for their computational efficiency and ease of implementation. However, they oprate at pixel level and often amplify noise under low-illumination conditions which are typically encountered in the underwater environment [10]. Recently, image enhancement techniques, particularly GAN has got increasing attention for their ability for enhancing images. GAN-based methods, such as SRGAN and ESRGAN have achieved massive improvement in the quality of blurry and noisy images [11], [12]. However, several works indicate that GAN-based enhancement may introduce some challenging color distortion and low contrast, as a side effect [13].
Deep learning approaches, particularly convolutional neural networks (CNNs) and their variants, are now the state of the art trends in fish identification research for automated image categorization [3]. CNNs were utilized in underwater video analysis on industrial scale farms and natural ocean environments [14]. Deep learning algorithms simplify accurate recognition, classification, and tracking of species by automatically learning complex patterns in large visual data. These models help fish farms automate routine tasks and identify abnormal behavior, increasing efficiency and reducing manual labor [15]. This paper helps in improving the quality of underwater images and simplify the accurate classification of marine species.
2. Previous works on underwater video processing
2.1. Model-based enhancement methods
Enhancing image, segmentation, tracking, and object detection are well established concepts. They are utilized in autonomous vehicles, smartphones, and surveillance purposes. Studies for enhanced underwater images provide an in-depth review of the methods used to address the challenges of capturing images in underwater environments. There are many methods, designed to deal with image enhancement, such as dark channel prior-based methods [16,17], polarization-based imaging [18,19], which aim to simulate and correct light propagation phenomena, such as water attenuation and scattering to fix color distortion and improve visibility. Moreover, histogram-based methods [20,21], retinex-based methods [22], fusion-based methods [23], and machine learning approaches [24], directly modify the captured image data to improve contrast, clarity, and color accuracy.
The Dark Channel Prior (DCP) approach is an enhancement technique that is employed to increase the quality of underwater images. The operation is based on the concept that in natural images, the dark channel typically consists of low-intensity values, indicating places not impacted by haze or underwater scattering [25]. DCP uses this characteristic to estimate the image without haze by examining the dark channel of the input image and removing the haze component. This method improves underwater images’ clarity and sharpness by efficiently minimizing haze and turbidity impact. The DCP method is widely used in underwater image processing because it effectively enhances image quality under difficult underwater conditions [26]. The study in [27] showcased the efficacy of the DCP approach in improving the quality of underwater images and emphasized its significance in underwater imaging. While, the authors in [28] demonstrated that it is impossible to eliminate the haze, and as a result, images produced by these approaches often exhibit a color cast, typically green or blue to absorb and scatter light in an underwater environment. Polarization-based imaging, as described in [18], utilizes simulations of light interactions with water molecules to correct color changes and improve visibility in underwater videos of fish and turtles. Polarization-based imaging methods utilize the polarization characteristics of light to improve underwater visibility and minimize glare. By analysing the polarization patterns of light reflected from underwater objects, these techniques can eliminate unwanted reflections and improve the distinction between different elements in underwater video footage [29]. Polarization-based imaging provides an efficient method to enhance visibility and improve the sharpness of underwater images, which makes it extremely valuable for recording intricate footage of marine species.
Retinex-based approaches are influenced by the human visual system’s capacity to perceive color constancy under different lighting conditions. The techniques described in [30] involve breaking down an image into its reflectance and illumination elements. By increasing the reflectance, color distortions can be corrected and the contrast can be improved. Fusion-based methodologies combine data from various images or distinct processing procedures to get a single improved image. This method can involve combining images taken under different lighting conditions or integrating results from several enhancement algorithms to produce enhanced visibility and accurate color representation in underwater images [23]. Together, these intangible enhancement techniques offer efficient options for improving the sharpness and complexity of underwater images, enabling more accurate observation and analysis of marine organisms, such as fish and turtles. A comparison of the literature survey is summarized in Table 1.
2.2. Contrast and color correction enhancement methods
Underwater video analysis is a rapidly developing field with diverse applications that range from marine biology research to underwater robotics and environmental monitoring. Existing methods and techniques in this domain encompass a wide range of approaches aimed at enhancing image quality, detecting objects, and classifying marine organisms. Traditional techniques often rely on handcrafted features and image processing algorithms, while more recent advances leverage on deep learning algorithms for automated analysis.
One common challenge in underwater video analysis is poor visibility caused by factors such as water turbidity, light attenuation, and color distortion. To address this, researchers have proposed various image enhancement techniques tailored to the underwater environment. These include methods based on HE [21], CLAHE [37], and GW algorithm [9]. HE and CLAHE modify the intensity distribution of an image to enhance contrast and visibility [21]. The techniques mentioned above improve overall visual quality by distributing the most common intensity values, enhancing the visibility of details in underwater scenes [38]. While these techniques can improve image quality to some extent, they often struggle with preserving color fidelity and enhancing fine details, especially in highly turbid or low-light conditions [39].
2.3. Deep learning for video enhancement and classification
Deep learning methods are frequently applied to many scenarios in computer vision and video data analysis. Recent works have explored GAN-based enhancement methods, such as SRGAN and ESRGAN, which are intended to improve fine details and enhance visual image quality [40]. While these methods produce visually appealing results, several studies indicate that optimization using competitive networks can introduce artificial textures and fail to accurately improve recognition in underwater environments, mainly due to discrepancies with real training data [40]. Deep learning has become essential method for classifying underwater organisms due to limited data available for classification. CNN architectures, such as ResNet, DenseNet, and VGG have been widely studied [41], while DenseNet model showed improved features [7]. These models have adopted their ability to take advantage of pre-training features from large datasets [42], and achieved improvements in classification accuracy between underwater datasets [7]. Recent studies include temporal modeling of streaming video data for understanding the overall context of underwater scenes. However, these models introduce increasing computational complexity and are critical to deploy in resource limited underwater environment.
3. Motivation and novelty
Our current research is pursued through a novel strategy that involves a series of sequential computational steps. Firstly, we conducted video exploration to collect a variety of underwater footage from various marine environments and conditions. Secondly, we employ advanced image enhancement algorithms, specifically the GW algorithm, to improve the visual quality of underwater images. This step focuses on resolving typical problems such as color distortion and low contrast to improve the clarity of the footage. Following image enhancement, a comparative analysis is performed on two commonly used techniques: HE and the CLAHE algorithm [37], to evaluate their effectiveness in improving image quality. Additionally, the Canny edge detector is used to find important features in the enhanced images, which helps with the classification tasks. Next, we utilize deep learning models with transfer learning in well-known architectures, including VGG16, ResNet50, and DenseNet121 to classify marine organisms, including fish, coral reefs, and sea turtles. Finally, we utilize the ESRGAN technique to improve the visual quality of underwater images.
In summary, this paper makes the following contributions:
- We enhance the quality of underwater images of fish species.
- We improve the classification of underwater species, especially fish, sharks, and coral reefs using the latest techniques in ecological image/video analysis.
- We test different algorithms to distinguish and show structural differences between different species.
4. Proposed methodology
Fig 1 shows a flow chart that outlines the steps of the computational experiments. Initially, all videos were decomposed into individual images. To prevent the absence of any fish-related data, we extracted all individual frames, each with its appropriate annotation. Both frames and annotations are stored in different folders. The second phase involves converting the initial annotation into the appropriate format for CNN designs such as VGG16, ResNet50, and DenseNet121 models, followed by applying data cleaning to the entire dataset.
It starts with input video and preprocessing, enhancement techniques to improve image quality and passed to classification model, which outputs the final label.
4.1 Data-set description
In this study, we used the underwater video dataset that sourced from Undersea Production website [43], which is a professional underwater film production company. The video was recorded in Indonesia and covered locations such as Bali, Komodo, Raja Ampat, Lembeh, Fakfak, and Triton Bay as shown in Fig 2. These regions contain rich marine species that help us study fish diversity and classify fish species. The dataset contains a high-resolution video featuring various species in different environment conditions such as lighting conditions, turbidity, underwater species types, etc. We also used the LifeCLEF-2015 dataset [44] for enhancement techniques. This dataset is challenging because it has low image quality and class similarity as shown in Fig 3. We consider here in this study three different types of underwater species such as fish, coral reefs, and turtles that the data comes from the video (fish, coral reef, and turtle) and the LifeCLEF-2015 include many types of fish. The dataset includes 9,162 annotations across 15 species, but the distribution is strongly imbalanced because Dascyllus reticulatus has 3,165 species, Chaetodon lunulatus with 1,217 and Pempheris vanicolensis with 999. Some species, such as Neoglyphidodon nigroris with 85 and Zebrasoma scopas with 72 as shown in Table 2, are severely underrepresented, making them challenging for deep learning models to learn. This imbalance has direct implications for detection and classification performance, particularly for rare or visually ambiguous species.
This spatial distribution maps were generated using a Python-based geospatial pipeline with Matplotlib to visualize the Indonesia region.
It contains 15 classes of underwater species [44].
4.2. Video data exploration
In the initial step of our methodology, we conducted video exploration to gain insights into the characteristics of the underwater footage. This entailed examining the metadata linked to the video, such as frame count, frame size, and frame rate. The information indicated that the video consisted of 9940 frames, each with a width of 1920 pixels and a height of 1012 pixels. The footage was captured at a frame rate of 59.94 frames per second. We visually inspected the video by plotting a selection of the frames to create representative samples of the footage. 14 frames, selected from the video are displayed in Fig 4, providing a snapshot of the underwater world captured in the footage. This initial study laid the foundation for the subsequent data processing and analysis phases. These frameworks provide valuable context for understanding the challenges and opportunities for the analysis of underwater video and guide the subsequent steps described in the methodology, including image enhancement and classification tasks. In the last step before image enhancement, we use the augmentation techniques to improve the robustness and efficiency of the model when dealing with underwater images. Underwater conditions such as turbidity, low lighting conditions, and color distortions make it crucial to expose the model to a wide variety of training images [45]. Data augmentation techniques, such as rotation, flipping, and contrast adjustments help simulate different environmental conditions and make the model more adjustable as shown in Fig 5.
The frames selected from the underwater video, row 1 and 3 display original images, row 2 and 4 display images are in grayscale. This shows how grayscale converting removes color and retains intensity and important details in the image.
First row: a) original image, b) contrast, second row: a) original image, b) flipping, and third row: a) original image, b) rotation. This helps to improve model learning and performance.
4.3. Image enhancement methods
We utilize the GW algorithm to improve color distortion and contrast in underwater images. It assumes that in ideal illumination, an image should appear neutral gray with equal intensities for the red, green, and blue channels. First, the algorithm calculates the mean intensity of each color channel and the average intensity of the image. Each channel’s mean is scaled proportionally to match the overall average, neutralizing color casts. The adjusted values are clipped to stay within the valid range [0, 255] [39]. The GW algorithm is simple to use and effective with various color distributions. Its performance may be limited in images with a single color tone, since the assumption of a neutral gray may not hold. However, it is useful for improving color imbalances and preprocessing images for future analysis or augmentation. The visualization of the GW algorithm for underwater video is shown in Fig 6. For complex underwater environments, we used the LifeCLEF-2015 dataset [44] as shown in Fig 7. The equations for mean intensity of each channel is computed as follows [39]:
The frames selected from underwater video with clear environments, row 1 and 3 show original images and row 2 and 4 show images obtained by the GW algorithm.
The complex underwater environment such as low visibility due to turbidity, inconsistent lighting, and noisy background. Row 1 shows original images and row 2 shows images obtained by the GW algorithm, which correct the color imbalance. This improves color and visual quality of the images.
where, N is the total number of pixels in the image, IR, IG, and IB are the channel intensities.
The global gray reference value is then computed as:
The corrected image channels are obtained by:
where, ,
, and
are the scaling factors for each channel while,
4.4. Histogram equalization vs. CLAHE
In this step, we conducted a comparative analysis between HE and CLAHE to evaluate their effectiveness in enhancing the visual quality of underwater images. HE is a widely used technique in image processing that aims to improve contrast by redistributing pixel intensities such that the histogram of the output image becomes approximately uniform. It operates globally on the intensity values of the image without considering local neighborhoods. CLAHE is an adaptive method that divides the image into small tiles and applies histogram equalization to each tile separately. This adaptive approach prevents over-amplification of noise in regions with low contrast, which can occur with standard histogram equalization. For our analysis, we applied HE and CLAHE to each underwater video frame. The HE process involved using OpenCV’s ‘equalizeHist’ function to equalize the intensity histogram of the RGB channels independability. On the other hand, for CLAHE, we utilized the ‘createCLAHE’ function to create a CLAHE object with specified parameters and applied it to the channel of the
color space that represents colors using three channels:
for lightness, and
and
channels for correct color cast (green/blue) [20]. The resulting frames from both methods were visually inspected and compared to assess their performance in enhancing the image contrast and visibility of underwater features. We present the results of our analysis through the following examples.
Fig 8 provides visual representations of the enhancement achieved by each method, aiding in our comparative analysis. This comparative analysis provides insights into the strengths and limitations of each method and helps in determining the most suitable approach for enhancing underwater imagery.
The images showing the resulting frames after applying HE, CLAHE and the corresponding original frames. This shows contrast enhancement in two techniques, HE improves overall contrast, while, CLAHE improve the local details.
4.5. Canny edge detection
In this step, we employed the Canny edge detector to identify the edges in underwater images. Fig 9 presents the edge-detected frames alongside the original frames. This visual representation aids in analysing the effectiveness of the Canny edge detector in identifying underwater features and structures. Edge detection is a fundamental technique in image processing that aims to identify abrupt changes in brightness, which typically correspond to object boundaries or significant features within the image as shown in Fig 10. The Canny edge detector, introduced in [46], is renowned for its accuracy and robustness. It operates by first smoothing the image with a Gaussian filter to reduce noise. Subsequently, it computes the gradient magnitude and direction at each pixel using derivatives of Gaussian filters. Finally, it performs non-maximum suppression to thin out the detected edges and applies hysteresis thresholding to determine the final edge pixels. Mathematically, the process involves convolving the image with a Gaussian kernel to obtain the smoothed image [47]:
Row 1 shows the original frames. Row 2 shows the Canny edge detected frames, which highlight the fine edges with reducing noise to detect clear boundaries.
Edge detector for turtle, coral reef, and fish images are shown with varying σ. Small value of σ preserves fine details, while higher σ reduces the noise and less details of the original structures. Comparisons are made on different noise levels with σ = 1, σ = 20. σ = 40, σ = 60, σ = 80, and σ = 100.
Equation (5) represents the Gaussian kernel and is the standard deviation of pixel intensities. Next, equation (6) represents the gradient magnitude and direction
which are computed as:
Here, Gx and Gy are the derivatives of the smoothed image in the horizontal and vertical directions, respectively. Finally, the hysteresis thresholding is applied to determine the final edge pixels, where a high threshold is used to identify strong edge pixels and a low threshold is employed to link weak edge pixels to the strong ones. By implementing the Canny edge detector, we could visualize the detected edges in the underwater images. This technique is invaluable for extracting essential features and structures from the imagery, facilitating subsequent analysis tasks such as object detection and classification. Here, Fig 11 shows the histogram of different contrast images and the pixel values to analyze the intensity levels.
The images of different contrast such as dark image, bright image, color image, and high or low contrast images.
5. Transfer learning models, training and performance evaluation
In this step, we employ transfer learning to fine-tune the pre-trained CNN models, namely VGG16, ResNet50, and DenseNet121, for the image classification task between fish, coral reef, and sea turtle. Each pre-trained model was initialized with pre-trained weights in the ImageNet dataset [48]. We customized the models by adding additional layers for classification. Specifically, we appended a flatten layer to the output of the pre-trained models, followed by one or more dense layers. These dense layers utilized Rectified Linear Unit (ReLU) activation functions to introduce nonlinearity and a final dense layer with softmax activation to output class probabilities. The final processed data set consisted of approximately 9,940 images divided into 80% training, 10% testing, and 10% validation.
The training data, consisting of 3,000 images per class, was utilized to fine-tune these models. We applied image augmentation techniques, including rotation, width and height shift, shear, zoom, and horizontal flip, to diversify the training dataset and improve the models’ generalization capabilities [45]. The fine-tuned models achieved impressive training accuracies, with DenseNet121 leading to the best training accuracy of 99.375%. ResNet50 closely followed with 99.04%, while VGG16 archived a slightly lower with 96.88%. Additionally, the models were evaluated on a separate test dataset containing 2,988 images with an equal distribution of the three classes. The performance metrics of precision, recall, and F1-score for each model are summarized below in terms of true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The F1-score is calculated as:
After applying regularization techniques and fine-tuning hyperparameters, DenseNet improved the test accuracy of 99%, with F1-score of 0.99 as shown in Fig 12. This shows us a significant reduction between training and test performance, as compared to the initial results (training accuracy 99.37%, F1-score 0.9021), indicating enhanced generalization ability.
Performance metrics of precision, recall, and F1-score for DenseNet121, VGG16 and ResNet50 transfer learning models are compared.
We conducted statistical analysis using 95% confidence intervals (CI) for the F1-score. The results are summarized in Table 3 which showed ResNet50 with F1-score of 1.0000 with a very narrow CI (0.995–1.000). DenseNet performed strongly having CI (0.985–0.996) with F1-score, while VGG16 obtained a lower score of 0.9476 with CI (0.930–0.962). The non-overlapping intervals between VGG16 and the other models indicate that the observed differences are statistically significant, supporting the claim that ResNet and DenseNet outperform VGG16 in the LifeCLEF-2015 dataset.
In summary, these results demonstrate the effectiveness of fine-tuned models in accurately classifying the underwater images and highlight the superior performance of ResNet50 in this image classification task
6. Numerical experimental results
In this comprehensive exploration, our research delved into several critical aspects of underwater video analysis, yielding significant insights and advancements. Below, we provide a detailed account of our experimental findings across various stages of this comparative study on video analysis.
6.1. Underwater video exploration
Our meticulous examination of the underwater video dataset unveiled crucial details about its composition and structure. With metadata analysis revealing a vast repository comprising of 9,940 frames, each frame having dimensions of pixels and a rate of 59.94 frames per second (fps), we gained invaluable insights into the temporal and spatial dimensions of the underwater environment captured in the video. Furthermore, the visual representation of plotting a subset of successive frames provided a compelling visual narrative, offering further context, and setting the stage for subsequent big data processing endeavors.
6.2. Image enhancement using Gray World (GW) algorithm
Drawing upon the GW algorithm, we aim to improve the visual fidelity of underwater images, tackling prevalent challenges such as color distortion and inadequate contrast. Through a meticulous application of mathematical adjustments rooted in the L*a*b* color space, our implementation of the GW algorithm yielded significantly improved outcomes. Notably, the color channels were harmoniously balanced, leading to a perceptible enhancement in color equilibrium and contrast within the processed images. Such enhancements are pivotal for discerning finer details and nuances within underwater scenes, consequently enriching the subsequent analysis and interpretation.
6.3. Comparison between histogram equalization and CLAHE algorithm
An important novelty of our study is a thorough comparative evaluation between the traditional HE method and the sophisticated CLAHE algorithm. This comparative analysis aimed to discern the efficacy of these techniques in mitigating the inherent challenges prevalent in underwater image processing. Our findings emphasizes the superiority of CLAHE over its conventional counterpart, as evidenced by its ability to preserve intricate image details while reducing undesirable artifacts due to the application of these algorithm.
6.4. Edge detection using canny edge detector
Implementing edge detection principles through the Canny edge detector provided a significant milestone in our research, helping us to unveil the intricate contours and boundaries of underwater imagery. Using Gaussian blur and gradient calculations, the Canny edge detector exhibited remarkable insight in identifying the salient features, providing invaluable insights for subsequent analysis. We performed an ablation study by evaluating the performance of the model with and without edge features as shown in Table 4. These results improve classification; ResNet50 achieved a perfect score with Canny edges 1.0 compared to without 0.984, while DenseNet121 improved from 0.975 to 0.991. VGG-16 gained an increase of 0.017. The delineation of edges facilitates object detection and segmentation and increases the interpretability and comprehensibility of underwater scenes, thus fostering a deeper understanding of marine ecosystems.
6.5. Transfer learning-based image classification
To harness the transformative potential of transfer learning, we explored three pre-trained CNN architectures: VGG16, ResNet50, and DenseNet121. We use a multifaceted approach to discern marine biodiversity by fine-tuning these models for image classification tasks of fish, coral reef, and sea turtle categories. With a meticulously curated dataset comprising over 30,000 images, our models focused on training and validation, paving the way for valuable insights into the classification process. The robust performance exhibited by DenseNet121, culminating in a enthusiasm training accuracy of 99.375%, underscored the efficacy of transfer learning in supporting image classification tasks within the marine biodiversity assessment domain. The commendable performance of ResNet50 and VGG16 further corroborated the transformative potential of transfer learning to allow accurate and discerning classification of aquatic organisms. As shown in Figs 13-15, the training and testing performance curves are illustrated by the convergence behavior in VGG16, ResNet50 and DenseNet121. Confusion matrices were used to emphasize the class-wise prediction and misclasiffication as shown in Figs 16-18. In addition, violin plots shown in Figs 19-21 highlighted the distribution of the performance for each class, enabling a more nuanced comparison of the robustness of the model. Table 5 display comparisons cross various fish species and classification models performance in different benchmarks.
(left) VGG training performance, (right) VGG testing performance of accuracy, precision, recall, and F1-score.
The performance of accuracy, precision, recall, and F1-score (left) and testing performance (right).
of accuracy, precision, recall, and F1-score (left) and testing performance (right).
7 Image enhancement using GAN
The focus of this section is to improve the quality of noisy images using state-of-the-art image processing methods such as GAN models, specifically using SRGAN and ESRGAN. Through these techniques, the video processing pipeline is improved by underwater image denoising, generating images with super resolution, and post-processing to produce high-quality visuals [11,40]. To illustrate the performance of the image enhancement techniques using GAN, the LifeCELF-2015 dataset has been used as a benchmark [44]. This dataset contains a range of underwater images with low quality of lighting and contrast, making it a good choice for evaluating SRGAN and ESRGAN. Preprocessing is a step that involves cleaning up noisy images. The process takes place before the resolution of these images is improved using GAN. Fast non-local means denoising is a technique that is used to remove noise from images by considering the colors of a neighboring pixel and then deciding the best color for each pixel in the noisy image, as shown in Fig 22. This technique removes noise while maintaining important details such as the sharpness of the edges [54].
SRGAN and ESRGAN are advanced deep learning methods that prove the quality of blurry images and enhance image resolution [11,12]. According to [55], these have two main parts: firstly, the generator (G) that improves the low-resolution (LR) images by turning them into high-resolution (HR) images. Secondly, a discriminator (D) that tells the difference between images that are originally high quality and those enhanced using the generator. The generator utilizes perceptual loss to create images that look similar to the ground truth. The content loss for SRGAN is described as follows:
Here, W H C refers to the dimensions of the feature map, which represent the W (width), H (hight), and C (number of channels) of the feature map. IHR and ISR are the ground truths and generated images respectively, and is the feature map from the ith layer. The SRGAN includes an adversarial loss to motivate the generator to create realistic images [55]:
The total generator loss is weighted sum of the content loss and adversarial loss:
In the construction of SRGAN, ESRGAN provides an improvement to enhance the quality of the images. These include the application of Residual-in-Residual Dense Blocks (RRDB) in the generator to improve the capacity of the network to learn high-frequency details and to prevent degradation during training. ESRGAN uses a relativistic GAN loss to yield more realistic images. The adversarial loss in ESRGAN and the total generator loss are described as:
The SRGAN produced realistic images and detail enhancement more than ESRGAN as shown in Fig 23. We aim to evaluate model performance using objective metrics such as the Peak Signal-to-Noise Ratio (PSNR), the Structural Similarity Index (SSIM), and Mean Squared Error (MSE) [56]. These measures will allow us to provide a more rigorous comparison between the ESRGAN and SRGAN methods, and to validate improvements in perceptual quality with numerical evidence of image enhancement. The evaluation of both GAN models indicates that SRGAN achieves lower MSE than ESRGAN, and higher PSNR. These results show that SRGAN generates improved underwater images that are nearly to HR ground truth, whereas the ESRGAN achieves higher MSE, lower PSNR and lower SSIM as shown in Table 6. These results suggest that ESRGAN is less effective than SRGAN in underwater image enhancement.
(left) VGG confusion matrix using training data, (right) VGG confusion matrix using testing data.
(left) ResNet confusion matrix using training data, (right) ResNet confusion matrix using testing data.
(left) DenseNet confusion matrix using training data, (right) DenseNet confusion matrix using testing data.
The plots showing accuracy, precision, recall, and F1-score using training data (left) and testing data (right).
The plots showing accuracy, precision, recall, and F1-score using training data (left) and testing data (right).
The plots showing accuracy, precision, recall, and F1-score using training data (left) and testing data (right).
First row are original images, and second row are denoised images. This method improves edge accuracy by removing noise and keeping important details.
First row shows original images, second row shows enhancement images using SRGAN techniques, and third row shows enhanced images using the ESRGAN technique.
8. Conclusion and future works
This paper shows valuable insights and advancements in underwater video analysis through transfer learning models. Using systematic exploration of various techniques and methodologies, we have achieved significant progress in enhancing underwater image quality and facilitating accurate classification of marine species. Key findings from our study include the effectiveness of the Gray World algorithm in addressing color distortion and contrast issues, the superiority of CLAHE over traditional HE for image enhancement, and the successful application of the Canny edge detector for feature extraction in underwater image processing. Moreover, the transfer learning-based approach, leveraging pre-trained CNN models such as VGG16, ResNet50, and DenseNet121, have demonstrated remarkable performance in image classification tasks, while the DenseNet121 emerged as the most accurate underwater image classifier. The significance of our work lies in its potential to advance marine science and environmental monitoring efforts through improved underwater video analysis techniques. By enhancing the visual quality of underwater images and enabling precise identification of marine species, our research contributes to a deeper understanding of aquatic ecosystems and supports conservation initiatives. SRGAN is an excellent GAN technique that was used to improve the quality of blurry and noisy images as it sharpened the details without removing any of the essential information from the images. We used open dataset such as LifeCLEF-2015 [57], for our analysis. This dataset consists of images extracted from different underwater videos with limited species, behaviors, and water conditions. Therefore, this dataset is a challenging benchmark for enhancing underwater images. Moreover, the GW algorithm supposed to balance the color distribution which was a failure with lighting reflection and shadows. The GW is limited and struggles to correct and shift the colors, often leading to unnatural results. Also, the GW struggles with low lighting conditions where the intensity of images in dark shadows is poor, resulting in poor color correction. In the future research may focus on addressing these identified limitation by improving both underwater image enhancement and classification performance. First, for enhancement, training deep model such as GAN should use underwater datasets that handle underwater color distortion and low contrast conditions to improve SSIM and reduce the distortions. In addition, the training dataset should contain greater environmental realism, underwater video with motion, depth and multiple species for increasing model robustness. Second, for classification, future researches may address class imbalance and rare species recognition by training data and learning the strategies.
In summary, our work represents a significant step forward in underwater video analysis, offering valuable contributions to scientific knowledge and practical applications in marine biology, ecology, and environmental conservation. By continuing collaboration between machine learning and ecology, we can unlock new opportunities to understand and preserve the diverse ecosystems of our oceans for future generations.
Acknowledgments
For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) license to this manuscript.
References
- 1.
Sung M, Yu SC, Girdhar Y. Vision based real-time fish detection using convolutional neural network. In: OCEANS 2017-Aberdeen. IEEE; 2017. 1–6.
- 2. Weinstein BG. A computer vision for animal ecology. J Anim Ecol. 2018;87(3):533–45. pmid:29111567
- 3. Siddiqui SA, Salman A, Malik MI, Shafait F, Mian A, Shortis MR, et al. Automatic fish species classification in underwater videos: exploiting pre-trained deep neural network models to compensate for limited labelled data. ICES Journal of Marine Science. 2017;75(1):374–89.
- 4. Ditria EM, Lopez-Marcano S, Sievers M, Jinks EL, Brown CJ, Connolly RM. Automating the Analysis of Fish Abundance Using Object Detection: Optimizing Animal Ecology With Deep Learning. Front Mar Sci. 2020;7.
- 5.
Tamou AB, Benzinou A, Nasreddine K, Ballihi L. Transfer Learning with deep Convolutional Neural Network for Underwater Live Fish Recognition. In: 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS), 2018. 204–9. https://doi.org/10.1109/ipas.2018.8708871
- 6. Jalal A, Salman A, Mian A, Shortis M, Shafait F. Fish detection and species classification in underwater environments using deep learning with temporal information. Ecological Informatics. 2020;57:101088.
- 7.
Homoud A, Das S, Townley S. Comparative Analysis of Deep Learning Model Efficiency for Underwater Video Classification. In: 2025 IEEE International Conference on High Performance Computing and Communications (HPCC), 2025. 1–9. https://doi.org/10.1109/hpcc67675.2025.00135
- 8. Ancuti CO, Ancuti C, De Vleeschouwer C, Bekaert P. Color Balance and Fusion for Underwater Image Enhancement. IEEE Trans Image Process. 2018;27(1):379–93. pmid:28981416
- 9. An S, Xu L, Deng Z, Zhang H. HFM: A hybrid fusion method for underwater image enhancement. Engineering Applications of Artificial Intelligence. 2024;127:107219.
- 10. Chong-Yi Li, Ji-Chang Guo, Run-Min Cong, Yan-Wei Pang, Bo Wang. Underwater Image Enhancement by Dehazing With Minimum Information Loss and Histogram Distribution Prior. IEEE Trans Image Process. 2016;25(12):5664–77. pmid:28113974
- 11.
Wang X, Yu K, Wu S, Gu J, Liu Y, Dong C, et al. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. Lecture Notes in Computer Science. Springer International Publishing. 2019. p. 63–79. https://doi.org/10.1007/978-3-030-11021-5_5
- 12. Yang R. Exploring SRGAN-Based Model for Image Super-Resolution. ACE. 2024;83(1):175–81.
- 13. Ruan Z, Wang Z, He Y. DeformableFishNet: a high-precision lightweight target detector for underwater fish identification. Front Mar Sci. 2024;11.
- 14.
Dasari SK, Sravani L, Kumar MU, Rama Venkata Sai N. Image Enhancement of Underwater Images Using Deep Learning Techniques. Lecture Notes in Networks and Systems. Springer Nature Singapore. 2023. p. 715–30. https://doi.org/10.1007/978-981-99-3878-0_61
- 15. Chen JC, Chen T-L, Wang H-L, Chang P-C. Underwater abnormal classification system based on deep learning: A case study on aquaculture fish farm in Taiwan. Aquacultural Engineering. 2022;99:102290.
- 16.
Ueki Y, Ikehara M. Underwater Image Enhancement Based on the Iteration of a Generalization of Dark Channel Prior. In: 2019 IEEE Visual Communications and Image Processing (VCIP), 2019. 1–4. 10.1109/vcip47243.2019.8965726
- 17. Zhou J, Yang T, Zhang W. Underwater vision enhancement technologies: a comprehensive review, challenges, and recent trends. Appl Intell. 2022;53(3):3594–621.
- 18. Liu F, Wei Y, Han P, Yang K, Bai L, Shao X. Polarization-based exploration for clear underwater vision in natural illumination. Opt Express. 2019;27(3):3629–41. pmid:30732380
- 19. Chen Y, Li Y, Wang Y, Mi Z, Wang Y, Fu X. Robust polarization-based underwater image enhancement method using anchor brightness adaptation. Optics and Lasers in Engineering. 2023;169:107737.
- 20. Mohd Azmi KZ, Abdul Ghani AS, Md Yusof Z, Ibrahim Z. Natural-based underwater image color enhancement through fusion of swarm-intelligence algorithm. Applied Soft Computing. 2019;85:105810.
- 21. Liu K, Peng L, Tang S. Underwater Object Detection Using TC-YOLO with Attention Mechanisms. Sensors (Basel). 2023;23(5):2567. pmid:36904769
- 22. Zhang W, Dong L, Pan X, Zhou J, Qin L, Xu W. Single Image Defogging Based on Multi-Channel Convolutional MSRCR. IEEE Access. 2019;7:72492–504.
- 23. Song H, Wang R. Underwater Image Enhancement Based on Multi-Scale Fusion and Global Stretching of Dual-Model. Mathematics. 2021;9(6):595.
- 24. Li J, Skinner KA, Eustice RM, Johnson-Roberson M. WaterGAN: Unsupervised Generative Network to Enable Real-time Color Correction of Monocular Underwater Images. IEEE Robot Autom Lett. 2017;:1–1.
- 25. He K, Sun J, Tang X. Single Image Haze Removal Using Dark Channel Prior. IEEE Trans Pattern Anal Mach Intell. 2011;33(12):2341–53. pmid:20820075
- 26. Peng Y-T, Cao K, Cosman PC. Generalization of the Dark Channel Prior for Single Image Restoration. IEEE Trans Image Process. 2018;27(6):2856–68. pmid:29570087
- 27. Li C, Guo C, Ren W, Cong R, Hou J, Kwong S, et al. An Underwater Image Enhancement Benchmark Dataset and Beyond. IEEE Trans Image Process. 2019;:10.1109/TIP.2019.2955241. pmid:31796402
- 28. Liang Z, Ding X, Wang Y, Yan X, Fu X. GUDCP: Generalization of Underwater Dark Channel Prior for Underwater Image Restoration. IEEE Trans Circuits Syst Video Technol. 2022;32(7):4879–84.
- 29. Hu K, Weng C, Zhang Y, Jin J, Xia Q. An Overview of Underwater Vision Enhancement: From Traditional Methods to Recent Deep Learning. JMSE. 2022;10(2):241.
- 30.
Srinivas S, Siddharth VR, Dutta S, Khare NS, Krishna L. Channel prior based Retinex model for underwater image enhancement. In: 2022 Second International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), 2022. 1–10. https://doi.org/10.1109/icaect54875.2022.9807919
- 31.
Parihar AS, Singh K. A study on Retinex based method for image enhancement. In: 2018 2nd International Conference on Inventive Systems and Control (ICISC), 2018. 619–24. https://doi.org/10.1109/icisc.2018.8398874
- 32. Tang C, von Lukas UF, Vahl M, Wang S, Wang Y, Tan M. Efficient underwater image and video enhancement based on Retinex. SIViP. 2019;13(5):1011–8.
- 33.
Khan A, Ali SSA, Malik AS, Anwer A, Meriaudeau F. Underwater image enhancement by wavelet based fusion. In: 2016 IEEE International Conference on Underwater System Technology: Theory and Applications (USYS), 2016. 83–8. https://doi.org/10.1109/usys.2016.7893927
- 34. Islam MJ, Xia Y, Sattar J. Fast Underwater Image Enhancement for Improved Visual Perception. IEEE Robot Autom Lett. 2020;5(2):3227–34.
- 35.
Katayama T, Song T, Shimamoto T, Jiang X. GAN-based color correction for underwater object detection. In: OCEANS 2019 MTS/IEEE SEATTLE, 2019. 1–4.
- 36.
Zhu J-Y, Park T, Isola P, Efros AA. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), 2017. 2242–51. https://doi.org/10.1109/iccv.2017.244
- 37.
Sharma R, Kamra A. A Review on CLAHE Based Enhancement Techniques. In: 2023 6th International Conference on Contemporary Computing and Informatics (IC3I), 2023. 321–5. https://doi.org/10.1109/ic3i59117.2023.10397722
- 38. Bai L, Zhang W, Pan X, Zhao C. Underwater Image Enhancement Based on Global and Local Equalization of Histogram and Dual-Image Multi-Scale Fusion. IEEE Access. 2020;8:128973–90.
- 39. Alsakar YM, Sakr NA, El-Sappagh S, Abuhmed T, Elmogy M. Underwater image restoration and enhancement: a comprehensive review of recent trends, challenges, and applications. Vis Comput. 2024;41(6):3735–83.
- 40.
Sukesh M, Muthunayagam M, Latha M. Super-resolution performance: A comparative analysis of SRGAN and ESRGAN techniques for single image restoration. In: 2024 Intelligent Systems and Machine Learning Conference (ISML), 2024. 128–34.
- 41. Qu H, Wang G-G, Li Y, Qi X, Zhang M. ConvFishNet: An efficient backbone for fish classification from composited underwater images. Information Sciences. 2024;679:121078.
- 42. Catalán IA, Álvarez-Ellacuría A, Lisani J-L, Sánchez J, Vizoso G, Heinrichs-Maquilón AE, et al. Automatic detection and classification of coastal Mediterranean fish from underwater images: Good practices for robust training. Front Mar Sci. 2023;10.
- 43.
4K UltraHD underwater biodiversity stock footage demo video. 4K Underwater Footage.
- 44.
Joly A, Goëau H, Glotin H, Spampinato C, Bonnet P, Vellinga WP. LifeCLEF 2015: multimedia life species identification challenges. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 6th International Conference of the CLEF Association, CLEF’15, Toulouse, France, September 8-11, 2015, Proceedings, 2015. 462–83.
- 45.
Homoud A, Das S, Townley S. Challenges in Underwater Object Detection and Video Segmentation Using Deep Learning. In: 2024. 1–6.
- 46. Canny J. Collision Detection for Moving Polyhedra. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1986;PAMI-8(2):200–9.
- 47. Agrawal H, Desai K. CANNY EDGE DETECTION: A COMPREHENSIVE REVIEW. IJTRS. 2024;9(Spl):27–35.
- 48.
Xu L, Bennamoun M, An S, Sohel F, Boussaid F. Deep Learning for Marine Species Recognition. Smart Innovation, Systems and Technologies. Springer International Publishing. 2019. p. 129–45. https://doi.org/10.1007/978-3-030-11479-4_7
- 49. Shah C, Nabi MM, Alaba SY, Ebu IA, Prior J, Campbell MD, et al. YOLOv8-TF: Transformer-Enhanced YOLOv8 for Underwater Fish Species Recognition with Class Imbalance Handling. Sensors (Basel). 2025;25(6):1846. pmid:40292985
- 50. Shao X, Chen H, Magson K, Wang J, Song J, Chen J, et al. Deep Learning for Multilabel Classification of Coral Reef Conditions in the Indo‐Pacific Using Underwater Photo Transect Method. Aquatic Conservation. 2024;34(9).
- 51. Hassanudin WM, Utomo VG, Apriyanto R. Fine-grained analysis of coral instance segmentation using YOLOv8 models. Sinkron: jurnal dan penelitian teknik informatika. 2024;8(2):1047–55.
- 52. Baek J-W, Kim J-I, Kim C-B. Deep learning-based image classification of sea turtles using object detection and instance segmentation models. PLoS One. 2024;19(11):e0313323. pmid:39585892
- 53.
Adam L, Čermák V, Papafitsoros K, Picek L. SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification. In: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024. 7131–41. https://doi.org/10.1109/wacv57701.2024.00699
- 54.
Buades A, Coll B, Morel J-M. A Non-Local Algorithm for Image Denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). 60–5. https://doi.org/10.1109/cvpr.2005.38
- 55. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial networks. Commun ACM. 2020;63(11):139–44.
- 56.
Zhang R, Isola P, Efros AA, Shechtman E, Wang O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 586–95. https://doi.org/10.1109/cvpr.2018.00068
- 57.
Homoud A, Das S, Townley S. Challenges in Underwater Video Data Processing. In: 2025 Eighth International Women in Data Science Conference at Prince Sultan University (WiDS PSU), 2025. 1–6.