Multi-scale diffusion model for underwater image restoration and enhancement

Yuan Fang; Qianyi Li; Kai Wang

doi:10.1371/journal.pone.0331465

Abstract

Background:

Underwater environments face challenges with image degradation due to light absorption and scattering, resulting in blurring, reduced contrast, and color distortion. This significantly impacts underwater exploration and environmental monitoring, necessitating advanced algorithms for effective enhancement.

Objectives:

The study aims to develop an innovative underwater image enhancement algorithm that integrates physical models with deep learning to improve visual quality and surpass existing methods in performance metrics.

Methods:

The proposed method employs a multi-level design combining physical insights from the diffusion model with deep learning. It utilizes an encoder-decoder pipeline for image decomposition, adaptive color adjustment through a color encoder network, and pixel optimization using an inverse denoising diffusion model to achieve enhanced visual quality.

Results and conclusion:

Experimental results demonstrate significant improvements, with an average increase of 6.61 in PSNR, 0.15 in SSIM, and 0.87 in UIQM. The integration of physical models and deep learning proves effective for underwater image enhancement. This research advances the field by providing a robust framework for addressing imaging challenges in underwater environments.

Citation: Fang Y, Li Q, Wang K (2025) Multi-scale diffusion model for underwater image restoration and enhancement. PLoS One 20(9): e0331465. https://doi.org/10.1371/journal.pone.0331465

Editor: Yakun Ju, Nanyang Technological University, SINGAPORE

Received: March 5, 2025; Accepted: August 16, 2025; Published: September 10, 2025

Copyright: © 2025 Fang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting information files.

Funding: This study was supported by the Scientific Research Fund of Liaoning Provincial Education Department (Grant number JYTMS20230416 to YF) and the Natural Science Foundation of Liaoning Province (Grant number 2022-BS-263 to YF).

Competing interests: The authors have declared that no competing interests exist.

Introduction

Underwater image processing plays a crucial role in acquiring oceanic information and is widely applied in marine ecology research, underwater archaeology, and marine monitoring. It enhances situational awareness by integrating human expertise, machine intelligence, and physical-environmental dynamics. However, complex underwater environments—characterized by light scattering, wavelength-dependent attenuation, and turbidity—lead to severe visual data degradation, significantly hindering autonomous decision-making [1]. The absorption and scattering effects of the water medium, along with interference from suspended particles, result in substantial image quality degradation, manifesting as color distortion, low contrast, blurring, and insufficient brightness. These issues not only impair visual quality but also pose significant challenges for computer vision tasks such as underwater image segmentation and object detection. Consequently, enhancing underwater image quality and restoring visual clarity have become critical research challenges in marine science and underwater exploration.

The primary factors contributing to underwater image degradation include particle scattering and light absorption in seawater, which reduce image clarity and introduce color shifts, low contrast, and blurring [2]. These distortions severely affect computer vision performance, making underwater image enhancement essential for obtaining clear and reliable information. Traditional enhancement approaches often rely on physical models that estimate medium transmission maps and atmospheric light to improve image quality. However, these methods frequently suffer from inaccurate degradation modeling and fail to achieve satisfactory results in complex underwater conditions. In recent years, deep learning has provided a breakthrough in underwater image enhancement. Techniques such as convolutional neural networks (CNNs) and generative adversarial networks (GANs) have demonstrated the capability to learn nonlinear mappings between degraded and clear images, achieving significant improvements across various underwater tasks [3].

Despite these advancements, several challenges remain. Many deep learning-based methods depend on paired datasets for training, but acquiring high-quality reference images for underwater environments remains a major obstacle. Additionally, existing enhancement techniques often struggle to simultaneously address multiple degradation factors. For example, improving brightness may introduce additional noise, while enhancing color fidelity could compromise texture details [4]. Furthermore, many approaches heavily rely on prior knowledge and lack adaptability to varying environmental conditions. To address these issues, this study employs a diffusion model for unsupervised learning, integrating the advantages of physical modeling and deep learning. A multi-level enhancement strategy is designed to refine image details and overcome the limitations of current methods [5].

The key contributions of this paper are twofold: (1) leveraging the diffusion model’s capability to simulate complex data distributions and generate high-quality underwater images, and (2) decoupling degradation factors in the frequency domain to enhance each aspect of image quality, thereby effectively addressing the challenge of handling multiple degradation factors simultaneously.

Related work

Underwater image enhancement techniques

In 1994, Zuiderveld K first proposed the Contrast Limited Adaptive Histogram Equalization (CLAHE) method, a classical image enhancement technique that holds a significant position in the field. In 2011, Han J.H., Yang S., and Lee B.U. introduced a new 3D color histogram equalization method that produces a uniform 1D grayscale histogram [6]. These methods enhance image quality by adjusting pixel values, but they do not account for the various factors causing image degradation. Although traditional CLAHE methods can improve image contrast, they may also introduce noise, especially in underwater images, which are complex and often noisy. The hybrid CLAHE algorithm designed by Hitam et al. optimized the processing pipeline, successfully avoiding the common noise amplification issue inherent in traditional CLAHE methods, thereby proving more efficient for underwater image enhancement. Naik and his team proposed a shallow neural network model, which, compared to traditional deep neural network (DNN) models, requires fewer parameters and exhibits higher computational efficiency while still maintaining strong performance. This model has demonstrated good generalization capabilities in enhancement tasks, showing its effectiveness in various types of images, making it highly adaptable in practical applications like underwater image enhancement. Currently, underwater image enhancement methods can be broadly classified into two categories: physical model-based approaches and deep learning-based techniques. In the field of underwater image enhancement, physical model-based methods restore degraded underwater images by estimating medium transmission characteristics and atmospheric light to enhance visual effects and improve overall image quality. For example, the color recovery model proposed by Kang attempts to compensate for the effects of absorption and scattering by the water body on light of different wavelengths, thus recovering the scene’s true colors [7]. D.L. Che et al. proposed an underwater image enhancement method based on light estimation, which, in combination with the scattering model of underwater images, estimates depth and lighting conditions to restore image clarity and contrast [8]. Common deep learning-based methods typically use Convolutional Neural Networks (CNNs) to perform tasks such as dehazing, denoising, and color correction to improve image quality. For instance, GAN-based image enhancement networks can effectively reconstruct high-quality underwater images, especially when dealing with low-contrast and color-distorted images [9]. CNN-based underwater image enhancement methods, such as the Deep Underwater Image Enhancement Network (DUINet), optimize and enhance underwater images through an end-to-end training mechanism. However, compared to physical model-based methods, deep learning approaches often heavily rely on the distribution of training data, and any distribution change in the underwater image during inference can lead to a significant degradation in enhancement performance.

Denoising diffusion probabilistic models

The Denoising Diffusion Probabilistic Model (DDPM), proposed by Ho et al. in 2020, forms the foundation of diffusion models. It generates samples by gradually adding noise to data and learning how to remove the noise through a reverse process [10]. To improve sampling efficiency, the Denoising Diffusion Implicit Model (DDIM), introduced by Song et al. in 2020, made key optimizations to the DDPM. DDIM introduces a non-Markov diffusion process, significantly accelerating the sampling process of diffusion models [11]. To make the diffusion model capable of generating more controllable images, conditional information can be incorporated into the model, enabling conditional generation. This method allows the diffusion model to generate images under specific conditions, such as incorporating label information, text descriptions, or other conditional constraints. Dhariwal et al. [12] proposed a classifier-guided training strategy within the diffusion model, which adjusts the conditional information during the generation process. By using the gradient of a pre-trained classifier, this approach conditions each step of the generation process, ensuring that the generated samples adhere to the specified conditions. Benefiting from conditional generation, diffusion models have been successfully applied to various image restoration tasks, including denoising, deblurring, and inpainting. For example, in the task of underwater image enhancement, Tang et al [13]. utilized the generative capabilities of diffusion models to design a lightweight architecture that improves image quality and detail recovery.

Methodology

This section primarily introduces the framework and details of the image enhancement network, which is designed to restore degraded underwater images while preserving texture details and ensuring color consistency. The network first employs frequency-domain processing to more effectively identify and differentiate various degradation factors, enabling targeted enhancement strategies. The objective of this network is to transform an input degraded underwater image into its corresponding clear image , where B, H, W, and C represent the batch dimension, height, width, and number of channels, respectively. The optimized clear image is then fed into a diffusion model to generate a higher-quality output image .

Network architecture

First, the input image is obtained and initially enhanced to produce the real underwater image . The process first obtains an output of shape 64 × 256 × 256 for each group through multi-scale dilated convolution groups,then it is obtained by feature fusion an output of shape . The enhanced underwater image x₀ is then mapped into the feature space through a convolutional layer. Next, these features are fed into an encoder-decoder network for processing, where most of the computations are performed by the spatial-frequency blocks. This module consists of downsampling and upsampling operations, with each upsampling layer having a shortcut connection to the corresponding level of the encoder. Finally, a coarse output is generated.

Meanwhile, the initial image x₀ is also fed into a color encoder. The color encoder network extracts visual features from the image and subsequently maps the extracted features to different scales for adaptive color querying of image features. Next, the coarse output image is combined with the generated color features to obtain a vivid output image . Finally, the best matching natural image x is found by searching in the domain of natural image R, and the final generated image is generated by the inverse denoising model.

In addressing the problem of underwater image enhancement, decoupling degradation factors in the frequency domain. Such as color distortion, low contrast, and blurring has proven to be an effective strategy. Frequency domain processing enables clearer identification and differentiation of these degradation factors, allowing for targeted enhancement approaches. However, a straightforward application of the Fast Fourier Transform (FFT) to decompose an image into its amplitude and phase components, followed by an Inverse Fast Fourier Transform (IFFT) with amplitude and phase swapping, introduces additional artifacts such as shadows and noise [14].

Alternatively, decomposing an image into phase and amplitude components via FFT and subsequently reconstructing a new image using IFFT can provide further insights based on prior knowledge. Specifically, images reconstructed from phase information predominantly preserve texture and structural details, whereas those reconstructed from amplitude components primarily retain color and illumination details, while completely losing texture and structure. This distinction underscores the necessity of carefully designing frequency-domain processing techniques to achieve effective underwater image enhancement.

In this work, A network is designed to ensure that the enhanced image exhibits improved color fidelity, increased brightness, minimized noise, and enriched texture details. The fundamental processing module adopts a three-path architecture, consisting of an initial enhancement path, a color correction path, and a deblurring path.

The initial enhancement path aims to perform preliminary enhancements on the input image to improve its overall quality. This is achieved by adjusting fundamental parameters such as contrast and brightness, thereby increasing the visibility of the image. The color correction path further refines the image through deep processing, enhancing spatial details and structural information to achieve better perceptual quality. Meanwhile, the deblurring path utilizes a diffusion-based structure to refine the image, effectively recovering texture and structural details while optimizing color, brightness, and noise characteristics. The network structure and flowchart are shown in Figs 1 and 2.

Download:

Fig 1. Network structure diagram.

https://doi.org/10.1371/journal.pone.0331465.g001

Download:

Fig 2. Network flowchart.

https://doi.org/10.1371/journal.pone.0331465.g002

Module implementation

Initial enhancement module.

Various underwater image enhancement techniques have been proposed, including methods that decompose color, brightness, texture, and other image attributes in the frequency domain [15]. However, raw underwater images often suffer from noise, low color fidelity, and poor contrast, which can negatively impact the enhancement process. To address these issues, a preliminary enhancement step is necessary to suppress noise and improve the overall image quality before further processing. This preliminary enhancement improves color fidelity, increases brightness, reduces noise, and enriches texture details.

As illustrated in Fig 3, the image undergoes a preliminary enhancement stage, followed by feature extraction using an encoder-decoder pipeline similar to the traditional U-Net structure [16]. In this framework, the input features are partitioned into two components: (placeholder for specific terms). The image is then transformed from the spatial domain to the frequency domain via the Fast Fourier Transform (FFT), decomposing it into magnitude and phase components. The transformation is mathematically represented in Eq (1).

(1)

where, x₁ is the first half of the input signal along the channel dimension. FFT(x₁) is a fast Fourier transform of x₁ to convert it from the spatial domain signal to a complex signal in the frequency domain. M is magnitude component, P is the phase component extracted from the transformed image.

Download:

Fig 3. Preliminary enhancement module diagram.

https://doi.org/10.1371/journal.pone.0331465.g003

The magnitude and phase components are processed separately. The magnitude component undergoes convolution using a 1 × 1 convolutional kernel to preserve the structural integrity of the magnitude information and prevent distortion caused by excessively large kernels.

To enhance the magnitude features, the LeakyReLU activation function is applied. During network training, LeakyReLU aids in retaining more detailed information, thereby improving the model’s ability to capture and represent complex data features. This enhances the network’s capability to process intricate information effectively. The transformation process is mathematically represented in Eq (2).

(2)

The phase component P plays a crucial role in preserving the shape and details of the image. Therefore, a 3 × 3 convolutional kernel is applied to process the phase information. During the model construction, the Swish activation function is employed to enhance the non-linear representation capability. By doing so, the model can more effectively maintain the structural details of the image while processing, thereby improving the effectiveness of feature extraction. The transformation process is mathematically represented in Eq (3).

(3)

After processing both the magnitude and phase components, the inverse fast Fourier transform (IFFT) is applied to convert the data back to the spatial domain. This transformation is mathematically represented in Eq (4).

(4)

The enhanced magnitude and phase components are connected to the original input x₁ through a residual connection, resulting in the final output feature y₁. This process is mathematically represented in Eq (5).

(5)

For x₂, convolution operations are directly performed in the spatial domain using two 3 × 3 convolutional layers, each followed by a GELU activation function to enhance spatial features. The GELU activation function smooths nonlinear transformations, thereby improving feature detail enhancement and enabling the model to better learn and express data features. Finally, the output of the spatial pathway is connected to x₂ via a residual connection to obtain y₂. This process is mathematically represented in Eqs (6) and (7).

(6)

(7)

Where x₂ is the second half of the input signal along the channel dimension. By employing a concatenation operation, y₁ and y₂ are merged to obtain the final output feature y. This feature fusion strategy enables the model to leverage both frequency-domain and spatial-domain information simultaneously, thereby enhancing feature representation capabilities and facilitating the disentanglement of degradation factors. This process is mathematically expressed in Eq (8).

(8)

Color correction module.

The issue of color attenuation in underwater imaging presents another significant challenge. To address this, previous studies have proposed various prior-based solutions. For instance, some methods rely on the dark channel prior (DCP), while others utilize underwater scene priors or histogram distribution priors for processing [17]. However, due to the complexity of underwater environments, the degree of color attenuation varies, and a single prior-based approach cannot effectively handle all color cast and attenuation issues.

Therefore, in the color correction module, we design a color encoder network is designed to extract visual features from the image. The extracted features are then mapped to different scales to enable adaptive color queries for image characteristics. The color encoder network consists of multiple convolutional layers and downsampling operations, which progressively extract comprehensive image features through hierarchical downsampling. These multi-scale features are subsequently used to optimize color queries.

Additionally, this module employs a color encoder to compute the color histogram of the image in the RGB color space. The color histogram represents the color distribution of the image by counting the frequency of specific pixel intensities or intensity intervals. Given an image x, the pixel value at a given position is denoted as Eq (9).

(9)

where, represent the pixel values in the red, green, and blue channels, respectively. For each color channel , the pixel values are segmented into k intervals, as shown in Eq (10).

(10)

where, represents the width of each interval, typically set as .

To compute the number of pixels in each channel c that belong to each interval , the calculation is given by Eq (11).

(11)

where, represents the frequency of the k-th interval in the color histogram for channel c, and is an indicator function that equals 1 if the condition holds, and 0 means otherwise.

As shown in Fig 4, during the training process, the color encoder is utilized to extract color features consistent with the reference image. The extracted features are then mapped to different scales for adaptive color querying of image features. Furthermore, connections are established between cross-attention,

Download:

Fig 4. Color correction module network structure.

https://doi.org/10.1371/journal.pone.0331465.g004

Deblurring module.

Since the vivid output image contains additive Gaussian noise, the deblurring module is used to remove more than the noise to improve the structure definition and restore the lost texture. The deblurring task in this paper is realized based on the inverse denoising diffusion model. (Diffusion Models). Diffusion models generate images through a forward diffusion process and a reverse diffusion process [18]. The purpose of this module is to process blurred images and transform them into sharper images through a series of algorithms and operations, thereby improving the image quality. The key idea behind the diffusion model is to progressively add noise to the original image (the forward diffusion process) and then learn how to gradually recover the image from the noise (the reverse diffusion process) [19].

During the forward diffusion process, noise is gradually added to the image , obtained from the previous two modules, to simulate the blurring or degradation process. The forward diffusion process progressively transforms the image into pure noise over multiple time steps t.

In the deblurring task, the blurry image is obtained through a similar noise addition process. This process is formulated as shown in Eq (12).

(12)

where, represents the image at step t, is a factor between 0 and 1 that controls the intensity of the noise. is Gaussian noise, and T denotes the maximum time step in the diffusion process.

A general the reverse diffusion process learns how to gradually recover the original image from pure noise. This process uses a neural network model to predict the noise at the current time step t, but there is no clear reference image to train a deep model to control its diffusion direction. Therefore, this module uses the natural image domain as a condition to control the diffusion direction, so as to generate higher quality images. To recover the data from the noise, each step of the reverse process can be defined as shown in Eq (13).

(13)

where, represents the distribution that predicts the previous time-step image from the current image ; is the mean predicted by the model,used to generate . And denotes the noise scale in the reverse diffusion step. To perform conditional generation,the inverse procedure conditioned on natural images can be rewritten as shown in Eq (14).

(14)

where is an unconditional denoising model. is the image condition term, which measures how well the current noise image x matches the target condition image c. C₁ is the normalization constant.

The next goal is to generate the augmented image by searching in the domain of the natural image c to find the best matching natural image c. The first step is to adjust the mean calculation to include the conditional gradient. The process can be defined as shown in Eq (15)

(15)

Where s is the guiding strength and Σ is the diffusion model variance. At generation step t, sampling , the conditional posterior probability is approximated as follows Eq (16).

(16)

By adjusting the intensity s, the fidelity and condition consistency of the generated image are controlled. is transformed to align with c in the natural image domain and compute the similarity between the generated image and the natural image as the matching function . The network structure of the deblurring module is shown in Fig 5.

Download:

Fig 5. Deblurring module network structure.

https://doi.org/10.1371/journal.pone.0331465.g005

Loss function.

For the preliminary enhancement module loss aims to ensure the rationality of the initial image decomposition and the stability of the diffusion denoising. The loss function of the augmentation module is defined as follows Eq (17).

(17)

where is decomposition reconstruction loss, is diffusion prior loss, is illumination smoothness constraints.

The color correction module loss is based on color histogram analysis and adversarial training. The loss function of the color correction module is defined as follows Eq (18)

(18)

where is histogram of the bulldozer distance, is chrominance-luminance decoupling loss, is adversarial color loss.

The defuzzification module loss is designed by combining conditional diffusion with perceptual optimization. The loss function of the defuzzification module is defined as follows Eq (19)

(19)

where is variational lower bound loss, is conditional gradient guided loss, is perceptual sharpness loss.

In order to ensure that these modules maintain the consistency of the output when they work together and avoid the conflict of the optimization objectives of each module, the cross-module consistency loss is designed as follows Eq (20).

(20)

where is alignment loss in the frequency domain, is multi-scale pyramid loss.

In order to meet the multi-task collaborative optimization strategy, a hierarchical loss function is designed to cover the joint training requirements of three sub-modules: image enhancement, color correction and deblurring. The total loss function is defined as follows in Eq (21).

(21)

where , , and are hyper-parameters to balance the contribution of each loss.

Experiment results analysis

Dataset.

The constructed dataset consists of real underwater images captured using an underwater camera at a port and dock in Dalian, Liaoning Province. Initially, 1000 real underwater images were extracted from video recordings. Among these, images with different types of degradation were selected, including 300 images with color cast, 200 images with blurriness, and 260 images with low illumination.

The test dataset is divided into two parts: Test1 and Test2. Test1 contains 760 underwater images exhibiting the three types of degradation. Test2 consists of the remaining 240 underwater images without significant degradation.

Experiment settings.

To introduce enhance the robustness of our model, we incorporate horizontal flipping and rotation were employed during training. To ensure a fair comparison, all images were resized to 256 × 256 during both training and testing. For the hyperparameters of our model, the weights , , , in Eq (21). Other parameters adhere to the default settings in PyTorch. We trained them for 100 epochs with a batch size of 32 with a learning rate .

Comparative experiments

Quantitative evaluation results.

In order to avoid the deviation of experimental results caused by the difference of input image quality. Before evaluating the performance of the different methods,the original images of all the test sets were evaluated for no-reference quality by UIQM, NIQE, and QIPE to ensure objective consistency of the input degradation degree.

To evaluate the performance of different methods, both quantitative and subjective visual quality assessments were conducted on the Test1 dataset. The evaluation metrics used for Test1 include Structural Similarity Index (SSIM), Peak Signal-to-Noise Ratio (PSNR), Natural Image Quality Evaluator (NIQE), and Perceptual Image Quality Evaluator (PIQE) to measure the difference between the enhanced images and the reference images [20–22]. Additionally, the Underwater Image Quality Metric (UIQM) was introduced to evaluate the intrinsic quality of the images.

SSIM measures the similarity of two images in three dimensions: brightness, contrast and structure. The value of SSIM is in the range of [−1, 1], the closer the value is to 1, the more similar the images are; PSNR is based on the mean square error (MSE) to evaluate the distortion of the image. Higher values of PSNR indicate better image quality; NIQE is a non-referenced image quality evaluation metric that measures the naturalness of an image; lower NIQE scores indicate that the image is closer to a natural image; PIQE is a reference-free image quality assessment metric that measures the visual quality of an image. It evaluates the image quality by analyzing the texture and noise characteristics of the image.The lower the PIQE score, the better the image quality; UIQM usually combines the results of multiple dimensions such as brightness, contrast, and texture to give a composite score. The higher the UIQM score, the better the image quality.

Specifically, higher values of SSIM, PSNR, and UIQM indicate better image enhancement performance, while lower values of NIQE and PIQE reflect better image quality. Tables 1 and 2 compare the experimental results on Test1 and Test2 for eight selected models: WaterNet, UDCP, CLAHE, RGHS, HE, FUnIE-GAN, and the proposed model [23–28].

Download:

Table 1. Quantitative comparison of different methods on Test 1.

https://doi.org/10.1371/journal.pone.0331465.t001

Download:

Table 2. Quantitative comparison of different methods on Test 2.

https://doi.org/10.1371/journal.pone.0331465.t002

Qualitative evaluation results.

Subjective analysis of the Test 1 sample images reveals that the original images exhibit degradation phenomena such as blurring and color distortion. After processing with the CLAHE and HE models, overexposure is commonly observed. Although the brightness of the images significantly increases, the overall structure is compromised, resulting in a loss of considerable details and a decline in image quality. The images enhanced by the RGHS and UDCP methods tend to have a yellow-green tint. Among these, the RGHS method performs better in terms of brightness enhancement, yielding visually superior results, while the images enhanced by UDCP show average brightness. The overall enhancement effects of the WaterNet and FUnIE-GAN models are better, with improved image clarity, but they remain relatively dark, and the brightness and contrast are still insufficient. In comparison with other methods, the technique proposed in this paper shows significant improvements in restoring the true colors of the image and substantially enhances the image contrast. This leads to a notable improvement in the visual quality of the processed images and a marked improvement in overall clarity, resulting in higher-quality images. The comparison of results and detailed images between the proposed method and other algorithms is shown in Figs 6 and 7.

Download:

Fig 6. Image quality comparison of different models on the Test 1 test set.

https://doi.org/10.1371/journal.pone.0331465.g006

Download:

Fig 7. Comparison of image details of different models on the Test 1 test set.

https://doi.org/10.1371/journal.pone.0331465.g007

Subjective analysis of the Test 2 sample images indicates that the original images enhanced by WaterNet generally yield good results, but the details are relatively average. The images enhanced by UDCP are overall darker. The CLAHE method effectively improves the image clarity, achieving a relatively good enhancement, but the processed images show some color deviation compared to the original images. Both the RGHS and HE methods perform well, with RGHS being particularly strong in dehazing, while HE-enhanced images show average clarity, with some regions still appearing blurred. FUnIE-GAN excels in image enhancement, effectively improving clarity. However, images processed with this method suffer from lower overall brightness and insufficient contrast, which somewhat impacts the visual quality. In contrast, the method proposed in this paper not only effectively removes underwater haze but also provides superior overall clarity, better color balance, and the highest degree of color matching with the original images. Furthermore, this method outperforms other methods in terms of brightness and contrast and shows more refined detail handling. The enhanced images generated by each model on the Test 2 dataset are shown in Fig 8.

Download:

Fig 8. Enhanced images generated by different models on the Test 2 test set.

https://doi.org/10.1371/journal.pone.0331465.g008

Ablation experiment.

In order to prove the effectiveness of the deblurring module, we set up two experimental groups to perform the task on real underwater images. Fig 9 illustrates the result for differernt models,where group A represents the model without deblurring module,and group B represents the independent function of defuzzification module is verified. We observed color enhancement inaccuracies in the images enhanced without deblurring Module,as the overall color of the image appeared yellowish and the edge details of the image were blurred. This verifies that deblurring model plays a crucial role in enhancing the overall color and texture details of the images.

Download:

Fig 9. Qualitative ablation results of the proposed deblurring module.

https://doi.org/10.1371/journal.pone.0331465.g009

It should be noted that the larger the PSNR value, the better the image quality; The larger the SSIM value, the better the image quality; UIQM is a no-reference underwater image quality assessment index based on the excitation of the human visual system, and the larger the value, the better the image quality. Table 3 shows the experimental results of the two groups on the dataset.

Download:

Table 3. Impact of deblurring modules on network performance.

https://doi.org/10.1371/journal.pone.0331465.t003

Conclusion

This paper adopts the diffusion model for unsupervised learning, combines the advantages of physical models and deep learning, and designs an underwater image enhancement algorithm that enhances the details of the image through multi-level design. The original image is processed in three steps. First, the image is decomposed using an encoder-decoder pipeline similar to the traditional U-Net structure, and then the initial enhanced image is extracted from the image through a color encoder network and adaptive color query is performed. Finally, the processed image is deblurred using the reverse denoising diffusion method to generate a clear underwater image. The results of the comparative experiment clearly show that compared with a variety of existing methods, the method proposed in this paper has significant advantages in eliminating the impact of water degradation on the image. The image enhanced by this method can more accurately reproduce the original features of the object, effectively improving the image quality and information restoration.

References

1. Xie Y, Yu Z, Yu X, Zheng B. Lighting the darkness in the sea: A deep learning model for underwater image enhancement. Front Mar Sci. 2022;9.
- View Article
- Google Scholar
2. Yang M, Hu J, Li C, Rohde G, Du Y, Hu K. An in-depth survey of underwater image enhancement and restoration. IEEE Access. 2019;7:123638–57.
- View Article
- Google Scholar
3. Wang Y, Zhang Q, Wang G-G, Cheng H. The application of evolutionary computation in generative adversarial networks (GANs): a systematic literature survey. Artif Intell Rev. 2024;57(7).
- View Article
- Google Scholar
4. Cheng Z, Fan G, Zhou J, Gan M, Chen CLP. FDCE-net: underwater image enhancement with embedding frequency and dual color encoder. IEEE Trans Circuits Syst Video Technol. 2025;35(2):1728–44.
- View Article
- Google Scholar
5. Wang H, Yao M, Chen Y, Xu Y, Liu H, Jia W, et al. Manifold-based incomplete multi-view clustering via bi-consistency guidance. IEEE Trans Multimedia. 2024;26:10001–14.
- View Article
- Google Scholar
6. Han J-H, Yang S, Lee B-U. A novel 3-D color histogram equalization method with uniform 1-D gray scale histogram. IEEE Trans Image Process. 2011;20(2):506–12. pmid:20801744
- View Article
- PubMed/NCBI
- Google Scholar
7. Han M, Lyu Z, Qiu T, Xu M. A review on intelligence dehazing and color restoration for underwater images. IEEE Trans Syst Man Cybern, Syst. 2020;50(5):1820–32.
- View Article
- Google Scholar
8. He L, Yan Y, Cai C. Underwater image enhancement based on light attenuation prior and dual-image multi-scale fusion. In: 2022 Global Conference on Robotics, Artificial Intelligence and Information Technology (GCRAIT). 2022. p. 478–82. https://doi.org/10.1109/gcrait55928.2022.00105
9. Cheng K, Tahir R, Eric LK, Li M. An analysis of generative adversarial networks and variants for image synthesis on MNIST dataset. Multimed Tools Appl. 2020;79(19–20):13725–52.
- View Article
- Google Scholar
10. Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems. 2020;33:6840–51.
- View Article
- Google Scholar
11. Song J, Meng C, Ermon S. Denoising diffusion implicit models. arXiv preprint. 2020;2020.
- View Article
- Google Scholar
12. Dhariwal P, Nichol A. Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems. 2023;34:8780–94.
- View Article
- Google Scholar
13. Tang P, Li L, Xue Y, Lv M, Jia Z, Ma H. Real-world underwater image enhancement based on attention U-Net. JMSE. 2023;11(3):662.
- View Article
- Google Scholar
14. Park S, Yu S, Moon B, Ko S, Paik J. Low-light image enhancement using variational optimization-based retinex model. IEEE Trans Consumer Electron. 2017;63(2):178–84.
- View Article
- Google Scholar
15. Xu L, Hang J, Han J, Wang T, Bai L. Molten image fusion and enhancement based on image decomposition in frequency domain. SIViP. 2020;15(2):421–9.
- View Article
- Google Scholar
16. Lu Y, Lin J, Chen S, He H, Cai Y. Automatic tumor segmentation by means of deep convolutional U-net with pre-trained encoder in PET images. IEEE Access. 2020;8:113636–48.
- View Article
- Google Scholar
17. Lee S, Yun S, Nam J-H, Won CS, Jung S-W. A review on dark channel prior based image dehazing algorithms. J Image Video Proc. 2016;2016(1).
- View Article
- Google Scholar
18. Wang Y, Niu R, Zhang L, Wu K, Sahli H. A scale-based forward-and-backward diffusion process for adaptive image enhancement and denoising. EURASIP J Adv Signal Process. 2011;2011(1).
- View Article
- Google Scholar
19. Croitoru F-A, Hondru V, Ionescu RT, Shah M. Diffusion models in vision: a survey. IEEE Trans Pattern Anal Mach Intell. 2023;45(9):10850–69. pmid:37030794
- View Article
- PubMed/NCBI
- Google Scholar
20. Bakurov I, Buzzelli M, Schettini R, Castelli M, Vanneschi L. Structural similarity index (SSIM) revisited: a data-driven approach. Expert Systems with Applications. 2022;189:116087.
- View Article
- Google Scholar
21. Wu L, Zhang X, Chen H, Wang D, Deng J. VP-NIQE: An opinion-unaware visual perception natural image quality evaluator. Neurocomputing. 2021;463:17–28.
- View Article
- Google Scholar
22. Liu Y, Tang C, Zheng Z, Lin L. No-reference stereoscopic image quality evaluator with segmented monocular features and perceptual binocular features. Neurocomputing. 2020;405:126–37.
- View Article
- Google Scholar
23. Wang C, Duan W, Luan C, Liang J, Shen L, Li H. USNet: underwater image superpixel segmentation via multi-scale water-net. Front Mar Sci. 2024;11.
- View Article
- Google Scholar
24. Yang M, Hu J, Li C, Rohde G, Du Y, Hu K. An in-depth survey of underwater image enhancement and restoration. IEEE Access. 2019;7:123638–57.
- View Article
- Google Scholar
25. Zheng L, Shi H, Sun S. Underwater image enhancement algorithm based on CLAHE and USM. In: 2016 IEEE International Conference on Information and Automation (ICIA). 2016. p. 585–90. https://doi.org/10.1109/icinfa.2016.7831889
26. Zhang Z, Tong Q, Yi C, Fu X, Ai J, Wang Z. The appropriate image enhancement method for underwater object detection. In: 2022 IEEE 22nd International Conference on Communication Technology (ICCT). 2022. p. 1627–32. https://doi.org/10.1109/icct56141.2022.10073015
27. Anwar S, Li C. Diving deeper into underwater image enhancement: a survey. Signal Processing: Image Communication. 2020;89:115978.
- View Article
- Google Scholar
28. Lu S, Guan F, Lai H. Underwater image enhancement based on global features and prior distribution guided. Image and Vision Computing. 2024;148:105101.
- View Article
- Google Scholar

[ref1] 1. Xie Y, Yu Z, Yu X, Zheng B. Lighting the darkness in the sea: A deep learning model for underwater image enhancement. Front Mar Sci. 2022;9.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Yang M, Hu J, Li C, Rohde G, Du Y, Hu K. An in-depth survey of underwater image enhancement and restoration. IEEE Access. 2019;7:123638–57.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Wang Y, Zhang Q, Wang G-G, Cheng H. The application of evolutionary computation in generative adversarial networks (GANs): a systematic literature survey. Artif Intell Rev. 2024;57(7).
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Cheng Z, Fan G, Zhou J, Gan M, Chen CLP. FDCE-net: underwater image enhancement with embedding frequency and dual color encoder. IEEE Trans Circuits Syst Video Technol. 2025;35(2):1728–44.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Wang H, Yao M, Chen Y, Xu Y, Liu H, Jia W, et al. Manifold-based incomplete multi-view clustering via bi-consistency guidance. IEEE Trans Multimedia. 2024;26:10001–14.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Han J-H, Yang S, Lee B-U. A novel 3-D color histogram equalization method with uniform 1-D gray scale histogram. IEEE Trans Image Process. 2011;20(2):506–12. pmid:20801744
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref7] 7. Han M, Lyu Z, Qiu T, Xu M. A review on intelligence dehazing and color restoration for underwater images. IEEE Trans Syst Man Cybern, Syst. 2020;50(5):1820–32.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref8] 8. He L, Yan Y, Cai C. Underwater image enhancement based on light attenuation prior and dual-image multi-scale fusion. In: 2022 Global Conference on Robotics, Artificial Intelligence and Information Technology (GCRAIT). 2022. p. 478–82. https://doi.org/10.1109/gcrait55928.2022.00105

[ref9] 9. Cheng K, Tahir R, Eric LK, Li M. An analysis of generative adversarial networks and variants for image synthesis on MNIST dataset. Multimed Tools Appl. 2020;79(19–20):13725–52.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref10] 10. Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems. 2020;33:6840–51.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref11] 11. Song J, Meng C, Ermon S. Denoising diffusion implicit models. arXiv preprint. 2020;2020.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref12] 12. Dhariwal P, Nichol A. Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems. 2023;34:8780–94.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref13] 13. Tang P, Li L, Xue Y, Lv M, Jia Z, Ma H. Real-world underwater image enhancement based on attention U-Net. JMSE. 2023;11(3):662.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref14] 14. Park S, Yu S, Moon B, Ko S, Paik J. Low-light image enhancement using variational optimization-based retinex model. IEEE Trans Consumer Electron. 2017;63(2):178–84.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref15] 15. Xu L, Hang J, Han J, Wang T, Bai L. Molten image fusion and enhancement based on image decomposition in frequency domain. SIViP. 2020;15(2):421–9.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref16] 16. Lu Y, Lin J, Chen S, He H, Cai Y. Automatic tumor segmentation by means of deep convolutional U-net with pre-trained encoder in PET images. IEEE Access. 2020;8:113636–48.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref17] 17. Lee S, Yun S, Nam J-H, Won CS, Jung S-W. A review on dark channel prior based image dehazing algorithms. J Image Video Proc. 2016;2016(1).
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref18] 18. Wang Y, Niu R, Zhang L, Wu K, Sahli H. A scale-based forward-and-backward diffusion process for adaptive image enhancement and denoising. EURASIP J Adv Signal Process. 2011;2011(1).
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref19] 19. Croitoru F-A, Hondru V, Ionescu RT, Shah M. Diffusion models in vision: a survey. IEEE Trans Pattern Anal Mach Intell. 2023;45(9):10850–69. pmid:37030794
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref20] 20. Bakurov I, Buzzelli M, Schettini R, Castelli M, Vanneschi L. Structural similarity index (SSIM) revisited: a data-driven approach. Expert Systems with Applications. 2022;189:116087.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref21] 21. Wu L, Zhang X, Chen H, Wang D, Deng J. VP-NIQE: An opinion-unaware visual perception natural image quality evaluator. Neurocomputing. 2021;463:17–28.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref22] 22. Liu Y, Tang C, Zheng Z, Lin L. No-reference stereoscopic image quality evaluator with segmented monocular features and perceptual binocular features. Neurocomputing. 2020;405:126–37.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref23] 23. Wang C, Duan W, Luan C, Liang J, Shen L, Li H. USNet: underwater image superpixel segmentation via multi-scale water-net. Front Mar Sci. 2024;11.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Yang M, Hu J, Li C, Rohde G, Du Y, Hu K. An in-depth survey of underwater image enhancement and restoration. IEEE Access. 2019;7:123638–57.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref25] 25. Zheng L, Shi H, Sun S. Underwater image enhancement algorithm based on CLAHE and USM. In: 2016 IEEE International Conference on Information and Automation (ICIA). 2016. p. 585–90. https://doi.org/10.1109/icinfa.2016.7831889

[ref26] 26. Zhang Z, Tong Q, Yi C, Fu X, Ai J, Wang Z. The appropriate image enhancement method for underwater object detection. In: 2022 IEEE 22nd International Conference on Communication Technology (ICCT). 2022. p. 1627–32. https://doi.org/10.1109/icct56141.2022.10073015

[ref27] 27. Anwar S, Li C. Diving deeper into underwater image enhancement: a survey. Signal Processing: Image Communication. 2020;89:115978.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref28] 28. Lu S, Guan F, Lai H. Underwater image enhancement based on global features and prior distribution guided. Image and Vision Computing. 2024;148:105101.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

Figures

Abstract

Background:

Objectives:

Methods:

Results and conclusion:

Introduction

Related work

Underwater image enhancement techniques

Denoising diffusion probabilistic models

Methodology

Network architecture

Module implementation

Initial enhancement module.

Color correction module.

Deblurring module.

Loss function.

Experiment results analysis

Dataset.

Experiment settings.

Comparative experiments

Quantitative evaluation results.

Qualitative evaluation results.

Ablation experiment.

Conclusion

References