Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

ILR-Net: Low-light image enhancement network based on the combination of iterative learning mechanism and Retinex theory

  • Mohan Yin,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft

    Affiliation School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China

  • Jianbai Yang

    Roles Formal analysis, Funding acquisition, Project administration, Resources, Supervision, Writing – review & editing

    yangjianbai@hrbnu.edu.cn

    Affiliation School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China

Abstract

Images captured in nighttime or low-light environments are often affected by external factors such as noise and lighting. Aiming at the existing image enhancement algorithms tend to overly focus on increasing brightness, while neglecting the enhancement of color and detailed features. This paper proposes a low-light image enhancement network based on a combination of iterative learning mechanisms and Retinex theory (defined as ILR-Net) to enhance both detail and color features simultaneously. Specifically, the network continuously learns local and global features of low-light images across different dimensions and receptive fields to achieve a clear and convergent illumination estimation. Meanwhile, the denoising process is applied to the reflection component after Retinex decomposition to enhance the image’s rich color features. Finally, the enhanced image is obtained by concatenating the features along the channel dimension. In the adaptive learning sub-network, a dilated convolution module, U-Net feature extraction module, and adaptive iterative learning module are designed. These modules respectively expand the network’s receptive field to capture multi-dimensional features, extract the overall and edge details of the image, and adaptively enhance features at different stages of convergence. The Retinex decomposition sub-network focuses on denoising the reflection component before and after decomposition to obtain a low-noise, clear reflection component. Additionally, an efficient feature extraction module—global feature attention is designed to address the problem of feature loss. Experiments were conducted on six common datasets and in real-world environments. The proposed method achieved PSNR and SSIM values of 23.7624dB and 0.8653 on the LOL dataset, and 26.8252dB and 0.7784 on the LOLv2-Real dataset, demonstrating significant advantages over other algorithms.

Introduction

Currently, computer vision tasks have made significant progress in the areas of target detection, image classification and image segmentation. However, these are built on the basis of well-lit daytime scenes, and images acquired or captured under conditions such as low light and backlighting usually face challenges such as low brightness, loss of details, and color shifts, which seriously affect the effectiveness of various vision tasks. Therefore, enhancement of images captured in low-light environments is of great significance and practical value.

Over decades of progress within low-light image enhancement (LLIE) [13]. Numerous methodologies have emerged, encompassing techniques such as Histogram equalization(HE) [4], Retinex-based methods [58], deep learning approaches, among others. Among these, the Retinex theory [913] is commonly utilized to mimic human visual perception of objects and further decompose the image into reflection and illumination components. Therefore, the mathematical representation of image I can be expressed as: (1)

where L represent the illumination component containing variations in image brightness and distribution of light intensity, and R represents the reflection component containing rich image details and color characteristics, and ∘ denotes element-wise multiplication. Although early Retinex-based algorithms can enhance image brightness, the image visual effect is poor, prone to a large amount of noise, and computationally complex. For this reason, scholars have developed an end-to-end LLIE enhancement network by combining deep learning with Retinex model and used the network to estimate and enhance the illumination and reflection maps respectively. However, it is difficult to simultaneously denoise and retain detailed information while maintaining the enhancement effect. This is shown in Fig 1:

thumbnail
Fig 1. The image obtained by Retinex-Net has well-preserved color details, but there is noise; the image obtained by KinD and KinD++ has good denoising effect, but the details are blurred.

https://doi.org/10.1371/journal.pone.0314541.g001

In this paper, we propose ILR-Net, a low-light image enhancement network based on the combination of iterative learning mechanism and Retinex theory. The network consists of two sub-networks: adaptive learning and Retinex decomposition. In the adaptive learning sub-network, the original image undergoes initial feature extraction through a multi-branch dilated convolution module. The extracted features are then processed by feature enhancement units and a U-Net feature learning module for deeper-level learning. These outputs are subsequently passed into feature fusion units for information integration, resulting in a clear and converged image. Throughout this process, weight sharing is applied. In the Retinex decomposition sub-network, unlike traditional Retinex approaches that enhance the illumination component, we focus on directly denoising the reflection component, which contains rich detail information. The low-light image is first processed by a layer-by-layer denoising decomposition module, which integrates Coordinate Attention (CA) [14], Squeeze-and-Excitation Attention (SE) [15], and residual layers before decomposition. This step yields a reflectance map with detailed information. The reflectance map is then further processed through the reflectance component denoising module for feature extraction and denoising, guiding the fusion of the final enhanced image.

The contributions of this paper are as follows:

  • This paper presents a novel LLIE network. The network contains adaptive learning sub-network and Retinex decomposition sub-network. Extensive experiments show that our method outperforms other state-of-the-art LLIE methods and exhibits good subjective visualization.
  • The Global Feature Attention (GFA) is designed in the U-Net feature learning module inspired by the Convolutional Block Attention Module (CBAM) [16] to improve the extraction of image detail information and retain more feature information. An adaptive iterative learning module with weight sharing is designed, to realize the fusion convergence of the results at each stage, and to obtain a clearer image by learning through several iterations.
  • Propose a layer-by-layer denoising decomposition subnetwork. The denoising operation is performed before image decomposition to obtain better decomposition results.

The subsequent content of this paper is outlined as follows: Section “Related Work” provides a categorization of pertinent methodologies for LLIE. Section “Proposed method” introduces the ILR-Net framework, detailing each module and the loss function. Section “Experimental results and analysis” presents the experimental results and analyses. Section “Conclusions” makes the conclusion section.

Related work

Currently, there has been significant development in the field of LLIE, which can be broadly categorized into traditional low-light image enhancement and deep learning-based low-light image enhancement.

Traditional methods

Faced with the challenges of computer vision tasks in low light environments, early scholars began to utilize traditional LLIE methods for image enhancement. These include histogram equalization-based enhancement and Retinex theory-based enhancement. Histogram Equalization (HE) [4] enhances the contrast of an image by redistributing the image pixel values to make the histogram of the image more homogeneous, thereby achieving better visibility. Examples include global histogram equalization, adaptive histogram equalization, and so on [1719]. This method is fast and effective and does not require additional parameters, but it is accompanied by problems such as loss of details in local areas, poor enhancement, and intensity enhancement of noise in the image.

The Retinex theory [1013] views an image as consisting of two components, the illumination component and the reflection component, and argues that the brightness and color of an image depend mainly on the reflective properties of the object, rather than on the lighting conditions alone. The illumination component is enhanced to obtain a corresponding normal light image. Variants of this theory include Single Scale Retinex (SSR) [5, 6], Multi Scale Retinex (MSR) [7], and Color Recovery based Retinex Algorithm (MSRCR) [8]. Although the above methods provide a good improvement in image enhancement, color restoration and detail preservation, the algorithm is slow and cannot be applied to some real-time scenes.

Therefore, in most cases, traditional enhancement methods rely too much on manually designed a priori or statistical models to a large extent. Their performance varies when applied to different scenarios.

Deep learning methods

To enhance image quality and efficiency, researchers have made significant strides by integrating convolutional neural networks (CNNs) and generative adversarial networks (GANs). These networks enable independent learning of image feature information, resulting in higher quality and more realistic image enhancement. LLNet [20] represent a pioneering application of deep learning in low-light image enhancement. It utilizes a deep neural network structure that employs stacked sparse denoising encoders and an end-to-end training mechanism. However, the enhanced images produced by this method often exhibit residual noise and excessive smoothing. MBLLEN [21] enhances images by extracting features at various levels through multiple sub-networks. While this approach improves the quality of enhanced images in several aspects, some outputs may exhibit a somewhat overexposed effect in certain cases. Wei et al. [22] introduced the Retinex Network (Retinex-Net), which integrates neural networks with Retinex theory to decompose images into reflectance and illumination components. The method learns the light-independent reflectance and the smoothness of the illumination map, followed by enhancement and denoising of both components. While this approach achieves clearer image enhancement, it is susceptible to random noise. Zhang et al. [23, 24] successively proposed the KinD and KinD++ algorithms for the illumination component that can be flexibly adjusted, using the same decomposition network as that of Retinex, and the enhancement and denoising processes can be carried out for the reflection and illumination components respectively on the basis of Retinex-Net. This method has better results in color recovery, but there is the problem of unclear local details.

In response to challenges in supervised learning, such as overfitting and the difficulty of obtaining paired images. Guo et al. [25] proposed a zero-learning method Zero-Reference Deep Curve Estimation (Zero-DCE). Zero-DCE tackles low-light image enhancement by framing it as a curve estimation problem. By treating a low-light image as input and generating higher-order curves as output, it adjusts the input dynamic range at the pixel level to produce an enhanced image. However, this method heavily relies on multiple exposure training data, neglects noise considerations, and is ineffective under extreme enhancement conditions. Jiang et al. [26] proposed Enlighten Generative Adversarial Network (EnlightenGAN), an unsupervised generative adversarial network. EnlightenGAN incorporates a global-local discriminator structure to capture more detailed features, coupled with self-regularized perceptual loss and attention mechanisms for enhanced results. Recently, Wu et al. [27] introduced URetinex-Net, a Retinex-based deep unfolding network. This approach reformulates the optimization problem into a learnable network, effectively addressing the decomposition problem by implicitly regularizing the model. Through adaptive fitting of the implicit prior in a data-driven manner, URetinex-Net achieves noise suppression and detail preservation in its decomposition results. Ma et al. [28] proposed a new self-calibrating illumination learning framework (SCI) is proposed that establishes a cascading illumination learning process with weight sharing to achieve image enhancement. A self-calibration module is constructed to reduce the computational cost and an additional network module is introduced to assist training to enable testing using only a single block, improving the efficiency of the model while enhancing the image quality. Hue et al. [29] proposed a novel unsupervised enhancement framework (PSENet) to address the limitations of current methods in dealing with overexposed images, which trains the network by constructing synthetic images to simulate all potential exposure scenarios, making it robust to various lighting conditions and allowing for better enhancement of images under extreme conditions. Fu et al. [30] proposed Learning a Simple Low-light Image Enhancer from Paired Low-light Instances (PairLIE), an unsupervised method for learning adaptive priors from pairs of shimmering images and designed a simple self-supervised mechanism to remove implausible features from the original images to assist Retinex decomposition. Two low-light images were utilized for training to fully extract the information from the low-light images, and a simpler network was utilized to achieve image enhancement. Wang et al. [31] proposed a new zero-reference low-light enhancement framework (QuadPrior), which is based on the physical light transfer theory and designs a light-invariant prior to connect normal-light images and low-light images. And a lightweight a priori image framework was designed to be trained using normal illumination images to automatically realize low light enhancement. Yu et al. [32] proposed a novel learning-based perceptual resampling method. This approach utilizes model knowledge to learn perceptual information from input images, enabling the customization of resampling features, which further enhances the model’s ability to extract features. Lv et al. [33] proposed a novel zero-shot framework called FourierDiff, which embeds Fourier priors into a pre-trained diffusion model to mitigate the degradation of the model’s capabilities. Moreover, this method has low requirements for training data. To produce better visual results, a spatial frequency optimization method was further designed to precisely enhance image detail, achieving superior enhancement outcomes. Zhu et al. [34] proposed a simple and efficient flow-based image enhancement framework, FlowIE, which estimates a direct path from feature distribution to high-quality images. A linear many-to-one transport mapping is constructed through conditional rectification to accelerate the network’s inference capability. Furthermore, a faster inference algorithm was introduced, optimizing path estimation using the tangent direction at the midpoint based on the Lagrange mean law, to achieve better visual results. Shi et al. [35] proposed a novel method that combines denoising and enhancement of low-light images, which is not affected by training data or noise. It adjusts the enhancement level of each pixel by scaling the denoised image based on the illumination intensity. Then, noise is removed from the original low-light image in the form of reflections, improving the network’s denoising capability. This approach achieves optimal enhancement results without losing image information.

In conclusion, both types of methods have their limitations in enhancing images. The former relies more on manual parameter adjustments, performing poorly in complex scenarios; while the latter relies heavily on extensive data support, requiring high-quality training data and necessitating precautions against overfitting. Although the existing models obtain good image results, there are still problems such as blurred details and poor denoising. The method in this paper deals with both detail preservation and denoising and gets better results.

Proposed method

The ILR-Net framework is divided into two branches: Retinex decomposition and feature enhancement, with the overall flowchart shown in Fig 2. The final enhanced image is obtained by merging the low-noise reflection component, which contains rich color signals from the Retinex decomposition, with the illumination estimation from the feature enhancement branch.

ILR-Net network structure is shown in Fig 3. In the adaptive learning sub-network, the multi-branch dilation convolution module and the U-Net feature learning module perform feature extraction on the input image, respectively, and the results obtained from the former go through the feature enhancement unit for deeper feature learning. Subsequently, the resulting feature map is dot-multiplied with the results of the multi-branch dilation convolution module, and the resulting feature map and the results of the U-Net module go to the feature fusion unit for feature fusion to obtain a clearer image. In Retinex decomposition sub-network, the low light image is noise suppressed and decomposed into illumination component and reflection component based on the use of CA [14] and SE [15]. The obtained reflection component undergoes further feature extraction and denoising operations through the reflection component denoising module to obtain a low-noise clear reflection component. Finally, the clear image derived from the adaptive learning sub-network is spliced with the denoised reflection component on the channel, and the final enhanced image is obtained by Efficient Attention (ECA) [36] and 3×3 convolution.

Multi-branch dilated convolution module

The multi-branch dilation convolution module is shown in Fig 4. Taking the original low-light image as input, initial feature extraction is first performed by 3×3 convolution. Secondly, after four layers of dilation convolution [37] branches with expansion rates of 1,2,4,8 respectively is used for feature learning under different sensory fields. Finally, the feature maps of each branch are merged to obtain an image containing rich feature information. Dilation convolution can expand the receptive field of the network without using a large convolution kernel, thus obtaining richer features. However, due to the unique nature of dilation convolution, concatenating dilation convolutions with the same expansion rate will easily lead to discontinuous sampling features and grid effect [38]. Therefore, the expansion rate of the dilation convolution is set different ensure the continuity of the sampling and sensing fields. The whole computational process of the multi-branch dilation convolution module can be shown as follows: (2) where DC stands for dilated convolution, t(t = 1,2,4,8) stands for dilation rate, and X stands for the corresponding feature map.

U-net feature learning module

To further extract rich detail information from low-light images, a U-Net module based on the full convolution strategy is designed on the basis of multi-branch dilation convolution and adds a residual layer in each layer in order to fuse more feature information. Inspired by CBAM [16], global feature attention (GFA) is designed. A multiscale feature extraction module is introduced in GFA, which extracts feature information from the input feature map using convolutional layers in parallel and fuses local and global feature information using residual structure. The multi-branch feature extraction module utilizes multiple 3×3 convolution kernels concatenated together instead of larger convolution kernels. These concatenated kernels are then incorporated into the network in a parallel manner to reduce parameter count and acquire rich feature information. The Fig 5 illustrates the structure of the U-Net feature learning module alongside the attention mechanism of the GFA.

Within the U-Net network, the lowermost layer’s small-resolution feature map holds extensive feature details. Hence, this paper integrates the GFA module into this layer to enhance feature learning within the network. Given an input feature U0, it first goes through the multibranch feature extraction module for feature extraction, then uses AvgPool2d and MaxPool2d to obtain the detail information, then goes through the fully connected layer and uses the Sigmoid function to normalize it in the interval 0–1, and then multiplies it by the result Uc of the multibranch feature extraction module for feature fusion to obtain the feature map U1. Subsequently, maximum pooling and average pooling are performed in the channel dimension, and after splicing, they are normalized using the Conv+Sigmoid function, and then multiplied by the feature map U1 to obtain the feature map U2. Finally, the original input U0 is added with the feature map U2 to realize the fusion process of the global feature information. The whole calculation process is shown below: (3) where FC and σ denote the fully connected and Sigmoid activation functions.

Adaptive iterative learning module

The adaptive iterative learning module, shown in Fig 6, uses a fully convolutional network for adaptive learning and iterative convergence to obtain optimal results. It consists of two parts: a feature enhancement unit and a feature fusion unit. The feature map Dt that passes through the multi-branch dilation convolution module is input to the feature learning unit to get the enhanced feature map Et and the input image Dt are subjected to Hadamard product operation to obtain the output result ; The input of the feature fusion unit consists of and the result Ut from the U-Net feature learning module. Firstly, the channels are spliced, and the SE enables the network to concentrate on learning the useful channel information, and then through the fusion unit based on Conv+BatchNormal+ReLU, it speeds up the fusion of the feature information and the convergence of the model, and obtains the converged feature mapping Ft. Subsequently, Ft is used as the input to the next stage of the loop, and the whole iterative learning process shares the weights, and finally a clear converged enhanced image is obtained. obtain a clear converged enhanced image.

Feature enhancement unit.

The feature enhancement unit uses Conv+BatchNormal+ReLU for feature extraction and normalization of the input feature maps and improves the learning ability of the network through residual structures with a uniform convolution size of 3 × 3. BatchNormal normalizes each channel and reduces the dependency between channels to improve the generalization ability of the network. The first estimated enhancement component Et is first obtained as the input to the feature fusion unit. The feature enhancement unit is computed as follows: (4) where BN and σ denote the BatchNormal and Sigmoid activation functions, respectively.

Feature fusion unit.

In order that the network does not lose the detailed features of the image during the learning process, so the input of the feature fusion unit consists of the result Ut of the U-Net feature extraction module and the result of the multiplication of the output Et and Dt of the feature enhancement unit, and the size of the convolution is also uniformly 3 × 3. Firstly, and Ut are processed through the convolution layer for the feature extraction and spliced in the channel dimension, and the weights are assigned to their channels through the SE, and the processed results are Kc into the fusion unit. The fusion unit consists of two layers of Conv+BatchNorm+ReLU and three stacked network structures based on Conv+BatchNorm+ReLU and adopts the hopping connection to transfer the feature information. Kc is normalized by the fusion unit, and the corresponding feature mapping Ft is obtained through Conv + Sigmoid. The obtained result Ft is used as input for the next loop. The calculation process is shown below: (5) where BNσ & ρ represent BatchNormal, Sigmoid activation function and SE, respectively.

Layer-by-layer denoising decomposition module

To minimize noise generated during the decomposition process and retain detailed image information, a layer-wise denoising decomposition module was devised based on CA and SE [14, 15]. This module shares parameters during the training process and the structure of the layer-by-layer denoising decomposition module are shown in Fig 7. Firstly, feature extraction is performed on the low-light image using 3×3 convolution. Second, CA is employed to allocate varying weights to the feature information within the feature map based on their coordinates. This assigns smaller weights to pixel coordinates with higher noise levels and larger weights to those with lower noise levels, thereby achieving noise suppression within the feature map. Subsequently, two on-channel denoising operations are performed to further suppress the noise by performing deeper feature extraction using two 3×3 Conv+ReLU for the shallow features from the previous stage and estimating the noise level in each channel by assigning weights to the channels of the feature map using the SE. Meanwhile, to enhance the detail retention capability of this module, the adjusted features are fused multiple times using a residual structure to prevent loss of detail information. Finally, the reflection and illumination components are decomposed using a residual block and a 3×3 convolution.

Reflection component denoising module

Based on Retinex theory, the reflection component reflects the characteristics of the object and contains a lot of color detail information. To better deal with low light images, the noise and artifacts in the reflection component are reduced as much as possible. In this paper, a reflection component denoising module is designed to further suppress the noise in the reflection component after decomposition, to obtain a clear and detailed image. The structure of the reflection component denoising module is shown in Fig 8.

This module aims to fine-tune the initial reflection component Rlow obtained from the layer-by-layer denoising decomposition module. Firstly, after a 3×3 convolution for initial feature extraction, the noise present in each channel is evaluated again by SE [15] and the corresponding weights are assigned. Secondly, deeper feature extraction of the feature maps obtained in the previous stage is performed after three iterations of a learning mechanism consisting of two Conv+ReLU layers and a SE, which uses a residual learning strategy so that the model focuses on learning the detailed information of the image. The final noise reduced image is then output after a Conv+ReLU layer and a 3×3 convolution. The convolution kernels are consistently sized at 3×3, and the fill mode is configured to copy, thereby preventing edge artifacts.

Loss function

The loss function design in this paper is divided into two parts, adaptive learning sub-network and Retinex decomposition sub-network are trained separately.

Loss function of the adaptive learning sub-network.

An unsupervised loss function has been devised to train the network, further taking into account the structural, spatial, and perceptual information of the image. This loss function can be expressed as: (6) where, LCB denotes Charbonnier Loss (CB); LPer denotes Perceptual Loss (Per); LSSIM denotes Structural Similarity Loss (SSIM); Lref denotes the reflection consistency loss; λCB, λper, λSSIM and λref represent the corresponding loss coefficients in order to better balance each loss function and optimize the network performance. They are respectively set to 1.0,1.0,0.1,0.01.

Charbonnier Loss: Instead of the conventional L1 loss, the Charbonnier loss is adopted to approximate it, aiming to minimize the disparity between the enhanced image and the real image under genuine conditions. The Charbonnier loss proves to be more advantageous in optimizing the model and enhancing the performance of image processing tasks, especially when combatting noise, preserving edge information, or handling outliers. The formula for Charbonnier loss is presented below: (7) where y and denote the real image and the enhanced image under normal light conditions, respectively. The constant c regulates the rate of change of the loss function as it approaches zero, ensuring stability. In this paper, c is set to 10−6.

Perceptual Loss: The incorporation of perceptual loss in this paper addresses the issue of excessive smoothing caused by structural similarity loss. Perceptual loss measures image disparities by leveraging the intermediate representation of a pre-trained neural network model. This approach enhances the preservation of detailed information and visual fidelity, thereby augmenting the realism of the image. The formula for perceptual loss is provided below: (8) where y and denote the real image and the enhanced image under normal light conditions, respectively. Wi,j and Hi,j represent the height and width of the feature maps obtained from the ith block and jth convolution, respectively, while Ci,j denotes the channel. φi,j represents the feature maps acquired from the ith block and jth convolutional layer of the pre-trained Visual Geometry Group16 (VGG16) model.

Structural Similarity Loss: The Structural Similarity (SSIM) metric quantifies the likeness between images based on their brightness, contrast, and structural characteristics. It assesses the resemblance between the original image in standard lighting conditions and the improved version image to enhance the preservation of structural details and intricate features. The formula for computing the structural similarity loss is provided below: (9) where x and y denote the test image and the reference image, respectively. μx and μy represent their respective mean values, reflecting the brightness information. and denote the variances of x and y, reflecting the contrast information. σxy signifies their covariance, reflecting the structural information of the image. Additionally, c1 and c2 are constants close to zero and non-zero, respectively, introduced to prevent division by zero issues.

Reflection consistency loss: different from the reflection similarity loss in the decomposition sub-network, this loss function measures the differences between images by comparing the differences between corresponding pixels or feature points. The reflection components are extracted separately for the input and output images, and then the squared Euclidean distance between these two reflection components is calculated as the loss value. The formula for the reflection consistency loss is shown: (10) where N represents the sum of pixel points; and represent the reflective component of the ith pixel point of the input low-light image and the enhanced image, respectively.

Loss function of the Retinex decomposition subnetwork.

To retain the structural information of the original image and enhance the noise reduction capability of the decomposition sub-network, the loss function LDecom of the decomposition sub-network can be expressed as: (11) where Lrecon denotes the decomposition reconstruction loss; Lir denotes the reflection similarity loss; Lsmooth denotes the illumination smoothing loss; α and β denote the weighting coefficients of the different losses.

Decomposition reconstruction losses are expressed as: (12) where Rlow and Inormal denote the reflection component and illumination component obtained after decomposition, respectively; Slow denotes the original real image; is the L1 parameter.

The loss of reflective similarity is expressed as: (13) where Rlow denotes the reflected component of the low illumination image; Rnormal denotes the reflected component of the original image under normal light conditions.

The illumination smoothing loss is denoted as: (14) where ∇Ii and ∇Ri denote the gradients of the illuminated and reflected components, respectively; and λg represents the weighting coefficients.

Experimental results and analysis

Experimental environment and training settings

We use PyTorch deep learning framework to conduct experiments, which are completed on Windows 10, Intel(R) i5-13600KF, NVIDIA GeForce 4070 GPU platform. During the training process, the training samples are uniformly adjusted to 600×400, and the training is performed on the public dataset LOL, and the Adam optimizer is used to optimize the model; the momentum parameters are set to β1 = 0.5, β2 = 0.999; the batch size (batch size) is set to 16; the number of iterative training times (epoch) is set to 300, and the first 200 times are set to the initial Learning rate (lr) = 0.001, in the next every 20 iterations after the end of training learning rate decay to 10% of the last.

Image evaluation metrics

In this paper, we use widely used evaluation metrics to quantitatively evaluate the model effect, using peak signal-to-noise ratio PSNR, structural similarity (SSIM) [39], Multi Scale Structural Similarity (MS-SSIM) [40], Perceptual Image Quality Evaluator (PIQE) [41], Blind/Reference less Image Spatial Quality Evaluator (BRISQUE) [42], natural image quality evaluation (NIQE) [43] and learning to perceive image block similarity (LPIPS) [44]. There is no precise definition of PIQE and BRISQUE in terms of formulas, so this section will describe in detail the principles of calculating these two-assessment metrics without presenting mathematical formulas.

The peak signal-to-noise ratio can be expressed as: (15) where H and W denote the length and width of the image, respectively; X(i,j) and Y(i,j) denote the test image and the reference image, respectively; and MSE denotes the mean square error. The larger value of PSNR represents the smaller distortion of the image, and the better quality of the image.

The structural similarity can be expressed as: (16) where x and y represent the test image and the reference image, respectively. μx and μy denote their mean values, while and represent their variances. σxy indicates their covariance, and c1 and c2 are non-zero constants introduced to prevent division by zero. Unlike PSNR, SSIM not only accounts for differences in brightness and contrast but also considers discrepancies in structural information, aligning more closely with human visual perception. SSIM ranges between 0 and 1, where values closer to 1 signify higher similarity between two images and better image quality. Both metrics offer insights into the degree of information preservation and the reconstruction quality of the enhanced image.

The natural image quality assessment can be expressed as: (17) where v1, v2, ∑1, and ∑1 represent the mean vector and covariance matrix of the natural MVG model and the distorted image Multivariate Gaussian (MVG) model, respectively. The MVG model is a multivariate Gaussian distribution model that can be used to describe the relationship between multiple variables that can be used in describing the distribution of pixels in a small area over the color and spatial domains and these feature vectors are used to compute the NIQE metric scores. The no-reference NIQE metric aligns more closely with human visual perception. A higher value indicates poorer image quality, while a lower value suggests greater similarity to the real image.

Learning to perceive image block similarity can be expressed as: (18) where d represents the distance between x and x0. The feature stacks and extracted at the L-layer are unit normalized in the channel dimension. The number of activated channels is then reduced using the vector Wl, and the L2 distance is calculated. Finally, averaging over space and summing over channels is performed. Closer to subjective human perception, the lower the value of LPIPS, the smaller the perceived difference between two images.

Multi Scale Structural Similarity (MS-SSIM) can be expressed as: (19) multi-scale approach to examine image details at different resolutions. The reference and distorted image signals are used as inputs, and a low-pass filter is applied iteratively to down-sample the filtered image by a factor of two. The resolution of the original image is assumed to be denoted as Scale1, and the resolution of the image after M-1 iterations is denoted as ScaleM. the contrast measure cj (x,y) and the structure measure sj (x,y) in SSIM are computed at the scales obtained in each iteration, and the luminance measure lM (x,y) is computed only at the last scale, ScaleM. the composite metrics are obtained by synthesizing the results of the measurements at different scales. The indices αj, βj, and γj were used to adjust the different components, to simplify the choice of parameters and were set to αj = βj = γj.

Perceptual Image Quality Evaluator (PIQE): PIQE is an image quality evaluation algorithm based on human perception, which is capable of evaluating the quality of an image without the need of a reference image (without comparing the original image with the reference image). The principle of PIQE is that the image is divided into multiple chunks, and in each of the chunks some features related to human perception are computed, and these features are combined and subsequently. The block structure and noise features of the image are utilized to calculate the quality score of the image. The advantage of PIQE is that it can evaluate the quality of the image quickly and there is a good consideration of the influencing factors of human perception. The smaller the value of the result, the better the image quality is represented.

Blind/Reference-less Image Spatial Quality Evaluator (BRISQUE): The computational principle of the BRISQUE metric is to extract the mean subtracted contrast normalized (MSCN) coefficients from the image, fit the MSCN coefficients to an asymmetric generalized Gaussian distribution (AGGD), extract the features of the fitted Gaussian distribution, and input them into the support vector machine SVM to do the regression, so as to get the result of the image quality assessment. The smaller the value of the result, the better the image quality is represented.

Comparison experiments

To assess the efficacy of the method proposed in this paper, we compare its experimental results with those of classical advanced image enhancement methods now. These include Retinex-Net (2018-BMVC) [22], KinD (2019-ACMMM) [23], EnlightenGAN [26], RRDNet (2020-ICME) [45], Zero-DCE (2020-CVPR) [25], Zero-DCE++ [46], RUAS (2021-CVPR) [47], SCI (2022-CVPR) [28], URetinex-Net (2022-CVPR) [27], UNIENet (2022-ECCV) [48], PSENet (2023-WACV) [29], PairLIE (2023-CVPR) [30], QuadPrior (2024-CVPR) [31] and subjective on seven datasets. visual comparison. On the quantitative side, the image quality of different methods is evaluated by seven image assessment metrics.

As shown in Fig 9, the enhancement results on the LOL dataset [22] indicate that the Retinex-Net method captures the overall feature information of the image but introduces significant noise and color bias issues. The KinD method effectively removes noise but at the cost of losing some detail, resulting in an overly smoothed image. The RUAS method, while enhancing the image, suffers from overexposure, leading to an unnatural subjective appearance. Both the RRD-Net and SCI methods preserve the image’s color information well, but their enhancement of fine details is insufficient. PSENet, UNIENet, and PairLIE retain most of the image’s details while enhancing it, producing smoother visuals; however, their enhanced images tend to appear darker. Meanwhile, the EnglightenGAN, Zero-DCE, and Zero-DCE++ methods introduce excessive noise during enhancement. QuadPrior and URetinex-Net perform well in terms of image enhancement, noise reduction, and detail preservation. However, compared to these methods, the approach proposed in this paper achieves more natural results, particularly in restoring the original image’s color, resulting in a more visually pleasing and realistic outcome.

thumbnail
Fig 9. Subjective visualization of various methods on the LOL dataset.

https://doi.org/10.1371/journal.pone.0314541.g009

On the quantitative side, as shown in Table 1. It is lower than URetinex-Net method on SSIM metrics and ranks second. It lags behind EnglightenGAN, UNIENet in NIQE index. It is worth noting that this paper’s method achieves better scores in PSNR, MS-SSIM, and LPIPS, which reach 23.7624, 0.8804, and 0.1583, respectively. It shows that this paper’s method has an overall advantage in terms of noise suppression, detail retention, and enhancement effect.

thumbnail
Table 1. Objective evaluation results of different algorithms on LOL datasets.

https://doi.org/10.1371/journal.pone.0314541.t001

Further, this paper conducts experiments on the LOLv2-Real dataset [49], which contains 100 pairs of real low-light images, to better evaluate the performance of this paper’s method in real scenes. As shown in Fig 10, when handling indoor low-light environments, Retinex-Net introduces significant noise and color distortion. EnlightenGAN, RRDNet, ZeroDCE, ZeroDCE++, RUAS, and SCI also show poor enhancement results, with varying degrees of noise. The enhanced images produced by PSENet and PairLIE exhibit noticeable color bias compared to the real images. In contrast, KinD, UNIENet, URetinex-Net, QuadPrior, and the method proposed in this paper achieve more natural enhancement effects and improved visual quality. Similarly, in real night environments, most methods, except for KinD, UNIENet, URetinex-Net, and our approach, fail to produce noticeable enhancement. However, the KinD method tends to lose significant detail, resulting in overly smoothed images. In contrast, the UNIENet, URetinex-Net and this paper methods produce better results visually and the processed images look more realistic.

thumbnail
Fig 10. Subjective visualization of various methods on the LOLv2-Real dataset.

https://doi.org/10.1371/journal.pone.0314541.g010

Also on the quantitative side, as shown in Table 2. Our method slightly lags behind URetinex-Net in the NIQE and LPIPS metrics, ranking second. However, this paper’s method outperforms all other index parameters, reaching 26.8252, 0.7784 and 0.8604 on PSNR, SSIM and MS-SSIM metrics, respectively. better demonstrating the applicability of this paper’s method on the LOLv2-Real dataset [49], which achieves better visual results in both indoor low-light environments and nighttime environments.

thumbnail
Table 2. Objective evaluation results of different algorithms on LOL-v2-Real datasets.

https://doi.org/10.1371/journal.pone.0314541.t002

To assess the generalization capability of this paper’s method, experiments were conducted on five reference-free datasets DICM [50], MEF [51], LIME [52] and NPE [53], and the realistic shooting dataset Real-world, and the experimental results are presented in Figs 1115.

thumbnail
Fig 11. Subjective visualization of various methods on the DICM dataset.

https://doi.org/10.1371/journal.pone.0314541.g011

thumbnail
Fig 12. Subjective visualization of various methods on the MEF dataset.

https://doi.org/10.1371/journal.pone.0314541.g012

thumbnail
Fig 13. Subjective visualization of various methods on the LIME dataset.

https://doi.org/10.1371/journal.pone.0314541.g013

thumbnail
Fig 14. Subjective visualization of various methods on the NPE dataset.

https://doi.org/10.1371/journal.pone.0314541.g014

thumbnail
Fig 15. Subjective visualization of various methods on the Real-world dataset.

https://doi.org/10.1371/journal.pone.0314541.g015

Analyzing the enhancement results across these five datasets. On the DICM dataset, the Retinex-Net method showed a lot of noise and color distortion. The RRDNet, ZeroDCE, ZeroDCE++ and UNIENet methods showed little enhancement effect and lost a lot of detail. The PSENet and QuadPrior methods showed a lot of loss of detail and color shifting. EnlightenGAN, RUAS, SCI, URetinex-Net and PairLIE methods were overexposed, with much detail information lost, and individual methods showed varying degrees of noise. On the MEF, LIME, NPE, and Real-world datasets, the RRDNet, ZeroDCE, UNIENet, PSENet, PairLIE, and QuadPrior methods are able to retain the detailed information of the image better, but the overall color of the enhanced image is lighter, and the enhancement effect is not obvious enough. The Retinex-Net method still has a large amount of noise, serious artifacts and colour shifts, and the overall visual effect of the image is not natural enough. EnlightenGAN, ZeroDCE++, RUAS, SCI, URetinex-Net and PairLIE methods retain the original color information of the image, but all of them have different degrees of exposure and lose some details while enhancing. Although the method proposed in this paper also encounters exposure issues on the DICM and MEF datasets, it offers a more natural overall enhancement of the visual effect. In comparison, it effectively preserves the original detail information of the images, making the results more visually appealing.

On the quantitative side, as shown in Table 3. The method in this paper achieves higher scores on DICM, LIME, MEF, NPE and Real-world datasets, which further proves that our method also achieves better results on unpaired datasets compared to other frontier methods. The results of visual comparison and quantitative evaluation confirm that the images enhanced by the proposed method are closest to the real images, and close results are also obtained for unpaired low light images.

thumbnail
Table 3. Objective evaluation results of different algorithms on DICM, LIME, MEF, NPE, Real-world datasets.

https://doi.org/10.1371/journal.pone.0314541.t003

Ablation experiments

In order to verify the effectiveness of each module and loss function in this paper’s method, this subsection carries out ablation experiments on the LOL dataset [22] for the model and the joint loss function, respectively, and carries out the network and loss function changes according to the configurations in Tables 2 and 3 (√ stands for the module and loss function that have not been removed), and each incremental and decremental network, loss function weights, and parameter settings of the training are kept unchanged. PSNR and SSIM are used to comprehensively evaluate the image quality in terms of brightness, structural contrast, and noise.

Network module ablation experiments and analysis.

For the network module ablation experiments, this subsection removes or partially deletes the Multi-branch Dilation Convolution Module (MDC), U-Net Feature Learning Module (U-Net), Reflection Denoising Module (Ref), Global Feature Attention (GFA), and Layer-by-layer Denoising Decomposition Module (Demo). The experiments have the following six combinations: ① H1: Only remove the multi-branch dilation convolution module and keep the others unchanged. ②H2: Only remove the U-Net feature learning module, replace it with the output of the feature enhancement unit, input it to the feature fusion unit for training, and keep the others unchanged. H3: Remove only the reflection denoising module and keep the rest unchanged. ④H4: Remove only the global feature attention and keep the others unchanged. ⑤H5: Use four-layer convolution instead of layer-by-layer denoising decomposition module for Retinex decomposition, other keep unchanged. ⑥H6: Remove both the initialization module and the U-Net feature learning module, others remain unchanged.

The subjective visual map of the network module ablation experiment is shown in Fig 16 with the details zoomed in to demonstrate the details. From the figure, the overall color information of the image after enhancement using H1 combination is light, and the color details are blurred. The image after enhancement using the combination of H2 loses part of the color information and noise appears around it. The enhanced image using the combination of H3 also has different degrees of noise and color deviation problems. The images enhanced with the combination of H4 and H5 lose more image texture details and show different degrees of noise and artifacts. And the image after enhancement using the combination of H6 lost serious detail information and showed serious color deviation, noise, and distortion problems.

thumbnail
Fig 16. Subjective visualization of network module ablation experiments.

https://doi.org/10.1371/journal.pone.0314541.g016

In terms of quantitative aspects, the changes in the evaluation indexes after the removal of each module are shown in Table 4, from which the images enhanced with the combination of H1, H2, and H4 have a slight decrease in PSNR and SSIM indexes. The image enhanced with the combination of H3 and H5 has a significant decrease in the values of the two indicators, which reflects that both the reflection denoising module and the layer-by-layer denoising decomposition module have a significant effect on the denoising of the decomposed reflection map and restore the rich color information of the original image to the maximum extent. The image index after using H6 combination enhancement decreased seriously, in the absence of the original image initialization denoising and feature extraction, the adaptive iterative learning module is ineffective in noise suppression and detail retention, reflecting the necessity of the two modules, MDC and U-Net.

thumbnail
Table 4. Objective evaluation results of network module ablation experiments.

https://doi.org/10.1371/journal.pone.0314541.t004

Adaptive learning subnetwork loss function ablation experiments and analysis.

To perform ablation experiments for adaptive learning subnetwork loss function, this subsection removes or replaces the Charbonnier loss (LCB), the structural similarity loss (LSSIM), the perception loss (LPre), and the reflection consistency loss (LRef). The experiment has the following four combinations: ① L1: only LCB is removed. ② L2: only LSSIM is removed. ③ L3: only LPre is removed. ④ L4: only LRef is removed.

The subjective visual representation of the loss function ablation experiment is depicted in Fig 17, with detailed sections enlarged for clarity. From the figure, it is apparent that the color information in images enhanced using the L1 combination is altered, and the image edges appear blurred. Images enhanced with the L2 and L4 combinations exhibit varying degrees of distortion and considerable noise. The combination using L3, on the other hand, lost more image texture details and showed different degrees of noise with off-color whitening.

thumbnail
Fig 17. Subjective visualization of the loss ablation experiment.

https://doi.org/10.1371/journal.pone.0314541.g017

Regarding quantitative aspects, changes in evaluation indices after removing each loss function are presented in Table 5. It is evident from the table that whether a certain loss function is removed or replaced, the objective evaluation indices PSNR and SSIM decrease compared to those in the method proposed in this paper, indicating the effectiveness of each loss function.

thumbnail
Table 5. Results of objective evaluation of loss ablation experiments.

https://doi.org/10.1371/journal.pone.0314541.t005

Hyperparametric experiments with multibranch dilation convolution modules

To verify the usefulness of the choice of replacing the standard convolution with a dilation convolution and setting the number of convolution layers to 4 in the multibranch dilation convolution modules of this paper, test experiments were conducted on the LOL dataset [22] for this module. Firstly, the expansion rate of all the dilation convolutions in this module are set to 1 (M1) for the experiment. Secondly, experiments are conducted on models with the number of layers 1 (M2), 2 (M3), 3 (M4) and 5 (M5) in turn. Finally, comparison is made with the models in this paper. The subjective visualization after image enhancement is shown in Fig 18 with zoomed in details.

thumbnail
Fig 18. Multibranch dilation convolution modules ablation experiment subjective visual map.

https://doi.org/10.1371/journal.pone.0314541.g018

The figure illustrates that the brightness enhancement from the M1 model is subtle, with noticeable artifacts in the detailed areas. On the other hand, the image enhanced with the M2 model exhibits significant noise, accompanied by edge blurring. Enhancing with the M3 model results in varying degrees of color distortion. Images enhanced with the M4 and M5 models closely resemble the results obtained with the model proposed in this paper in terms of subjective perception. To further validate the model’s effectiveness, objective evaluations are conducted using two metrics, PSNR and SSIM, with comparison results presented in Table 6.

thumbnail
Table 6. Objective evaluation results of ablation experiments for multibranch dilation convolution module.

https://doi.org/10.1371/journal.pone.0314541.t006

From the table, the use of dilation convolution can effectively improve the PSNR and SSIM values of the image. As the number of concatenated layers increases, the PSNR and SSIM values gradually increase, and the best score is obtained when the present algorithm (M4) is reached. When increasing to five layers (M5), the metric values decrease, and the model performance starts to degrade. Therefore, this module selects four layers of concurrent dilation convolution for feature extraction to achieve the best results.

Limitations

In performing our tests, we found that our method lost a lot of detail information when processing images with both extremely dark and exposed areas, as shown in Fig 19. As can be seen from the window details in the figure and the area of the sun in the sky, our method is effective in enhancing brightness and retaining more color and detail information when processing low or medium brightness areas in an image, achieving better visual results. However, when dealing with overexposed regions, our method tends to over-enhance the brightness of this region, resulting in a serious loss of image details.

thumbnail
Fig 19. The visual effect of our method in enhancing images with both very dark and exposed areas.

https://doi.org/10.1371/journal.pone.0314541.g019

The reason for this problem is that our network does not limit the dynamic range of the brightness of the exposed area very well. When there are both very dark and exposed regions in the image, the network favors the enhancement of the darker regions. In addition, we failed to limit the enhancement strength of the exposed regions, and over-enhanced them to the point where the brightness of the exposed regions exceeded the brightness range of the image, losing a significant amount of detail.

In summary, our method can effectively enhance the overall brightness of an image and retain a large amount of detail in low-light and nighttime environments. However, we recognize the need to improve our method to achieve better enhancement when dealing with images with extremely dark and exposed regions. In our next work, we will focus on exploring ways to better limit the extent of exposure area enhancement to address this limitation, and to improve image quality by retaining more image details when dealing with images with both dark and exposed areas.

Conclusions

This paper presents ILR-Net. The network comprises an adaptive learning sub-network and a Retinex decomposition sub-network. In the adaptive sub-network, initial feature extraction is conducted on the input low light image by concatenating dilation convolutions with varying expansion rates. The output results undergo deeper learning via the feature enhancement unit and the U-Net feature learning module. Subsequently, the feature fusion unit combines these results to generate the corrected enhanced image. The Retinex decomposition sub-network employs the Retinex theory to decompose the original image into light and reflection components. Noise generated during decomposition is suppressed multiple times to prevent detail loss from subsequent noise reduction. The reflection component is then denoised and enhanced using the reflection denoising module. Finally, the feature maps from both branches are concatenated in the channel dimension to produce the final enhanced image. The experimental results show that the method in this paper effectively improves the brightness of the image and recovers the details and color information of the image. It shows good visual results on seven datasets; it also gets higher scores on objective evaluation metrics. On the LOL and LOLv2-Real datasets, compared to the URetinex-Net and QuadPrior methods, our approach improved PSNR by 3.5%, 18.21%, and 11.10%, 36.10%, respectively, and improved MS-SSIM by 0.65%, 2.15%, and 1.20%, 4.99%, respectively. This further demonstrates the superiority of the proposed method. In our subsequent work, we will investigate combining the method with other computer vision domains and reducing the network size to be applied in more scenarios.

References

  1. 1. Wang W, Wu X, Yuan X, and Gao Z. An experiment-based review of low-light image enhancement methods. IEEE Access, vol. 8, pp. 87884–87917, 2020.
  2. 2. Kim W. Low-light image enhancement: A comparative review and prospects. IEEE Access, vol. 10, pp. 84535–84557, 2022.
  3. 3. Rahman Z, Yi-Fei P, Aamir M, Wali S, and Guan Y. Efficient image enhancement model for correcting uneven illumination images. IEEE Access, vol. 8, pp. 109038–109053, 2020.
  4. 4. Abdullah-Al-Wadud M, Kabir M H, Dewan M A A, and Chae O. A Dynamic Histogram Equalization for Image Contrast Enhancement. 2007 Digest of Technical Papers International Conference on Consumer Electronics, Las Vegas, NV, USA, pp. 1–2, 2007.
  5. 5. Jobson D J, Rahman Z, and Woodell G A. Properties and performance of a center/surround retinex. IEEE Trans. Image Process., vol. 6, no. 3, pp. 451462, Mar. 1997. pmid:18282940
  6. 6. Rahman Z, Jobson D J, and Woodell G A. Multi-scale retinex for color image enhancement. In Proc. 3rd IEEE Int. Conf. Image Process., pp. 1003–1006. Sep. 1996.
  7. 7. Jobson D J, Rahman Z, and Woodell G A. A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans. Image Process., vol. 6, no. 7, pp. 965976, Jul. 2002.
  8. 8. Jobson D J. Retinex processing for automatic image enhancement. J. Electron. Imag., vol. 13, no. 1, pp. 100110, Jan. 2004.
  9. 9. Land E H. The Retinex theory of color vision. Sci. Am., vol. 237, no. 6, pp. 108–128, 1977. pmid:929159
  10. 10. Ren X, Yang W, Cheng W, and Liu J. LR3M: Robust low light enhancement via low-rank regularized retinex model. IEEE Trans. Image Process., vol. 29, pp. pmid:32286975, Apr. 2020.
  11. 11. Hao S, Han X, Guo Y, Xu X, and Wang M. Low-light image enhancement with semi-decoupled decomposition. IEEE Trans. Multimedia, early access, Jan. 27, 2020.
  12. 12. Gu Z, Li F, Fang F, and Zhang G. A novel retinex-based fractional order variational model for images with severely low light. IEEE Trans. Image Process., vol. 29, pp. pmid:31841409, Dec. 2019.
  13. 13. Hao P, Wang S, Li S, and Yang M. Low-light image enhancement based on retinex and saliency theories. in Proc. Chin. Autom. Congr., Hangzhou, China, pp. 25942597. Nov. 2019.
  14. 14. Hou Q, Zhou D, and Feng J. Coordinate Attention for Efficient Mobile Network Design. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp. 13708–13717, 2021.
  15. 15. Hu J, Shen L, and Sun G. Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 7132–7141, 2018.
  16. 16. Woo S, Park J, Lee J-Y, and Kweon I S. CBAM: Convolutional block attention module. Proc. Eur. Conf. Comput. Vis., pp. 3–19, 2018.
  17. 17. Tan S F. and Isa N A M. Exposure Based Multi-Histogram Equalization Contrast Enhancement for Non-Uniform Illumination Images. In IEEE Access, vol. 7, pp. 70842–70861, 2019.
  18. 18. Lee C, Lee C, and Kim C -S. Contrast enhancement based on layered difference representation. 2012 19th IEEE International Conference on Image Processing, Orlando, FL, USA, pp. 965–968, 2012.
  19. 19. Yun S -H, Kim J H, and Kim S. Image enhancement using a fusion framework of histogram equalization and laplacian pyramid. In IEEE Transactions on Consumer Electronics, vol. 56, no. 4, pp. 2763–2771, November. 2010.
  20. 20. Lore K G, Akintayo A, and Sarkar S. LLNet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognit., vol. 61, pp. 650–662, Jan. 2017.
  21. 21. Lv F, Lu F, Wu J, and Lim C. MBLLEN: Low-light image/video enhancement using CNNs. In Proc. Brit. Mach. Vis. Conf. (BMVC), vol. 220, pp. 4, 2018.
  22. 22. Wei C, Wang W, Yang W, and Liu J. Deep Retinex decomposition for low-light enhancement. In Proc. Brit. Mach. Vis. Conf. (BMVC), pp. 1–12, 2018.
  23. 23. Zhang Y, Zhang J, and Guo X. Kindling the darkness: A practical low-light image enhancer. In Proc. 27th ACM Int. Conf. Multimedia, pp. 1632–1640, Oct. 2019.
  24. 24. Zhang Y, Guo X, Ma J, Liu W, Zhang J. Beyond Brightening Low-light Images. Int J Comput Vis, vol. 129, pp. 1013–1037 (2021), April. 2020.
  25. 25. Guo C, Li C, Guo J, Loy C C, Hou J, Kwong S, et al. Zero reference deep curve estimation for low-light image enhancement. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1777–1786, Jun. 2020.
  26. 26. Jiang Y, Gong X, Liu D, Cheng Y, Fang C, Shen X, et al. EnlightenGAN: Deep light enhancement without paired supervision. IEEE Trans. Image Process., vol. 30, pp. 2340–2349, 2021. pmid:33481709
  27. 27. Wu W, Weng J, Zhang P, Wang X, Yang W. and Jiang J. URetinex-Net: Retinex-based Deep Unfolding Network for Low-light Image Enhancement. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, pp. 5891–5900, 2022.
  28. 28. Ma L, Ma T, Liu R, Fan X, and Luo Z. Toward Fast, Flexible, and Robust Low-Light Image Enhancement. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022.
  29. 29. Nguyen H, Tran D, Nguyen K, Nguyen R. Psenet: progressive self-enhancement network for unsupervised extreme-light image enhancement. In Proceedings of the IEEE/CVF Winter Confer ence on Applications of Computer Vision, pp. 1756–1765 (2023).
  30. 30. Fu Z, Yang Y, Tu X, Huang Y, Ding X, Ma K -K. Learning a simplelow-light image enhancer from paired low-light instances. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22252–22261 (2023).
  31. 31. Wang W, Yang H, Fu J, Liu J. Zero-Reference Low-Light Enhancement via Physical Quadruple Priors. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2024.
  32. 32. Yu W, Huang J, Li B, Zheng K, Zhu Q, Zhou M. Empowering Resampling Operation for Ultra-High-Definition Image Enhancement with Model-Aware Guidance. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2024, pp. 25722–25731.
  33. 33. Lv X, Zhang S, Wang C, Zheng Y, Zhong B, Li C. Fourier Priors-Guided Diffusion for Zero-Shot Joint Low-Light Enhancement and Deblurring. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2024, pp. 25378–25388.
  34. 34. Zhu Y, Zhao W, Li A, Tang Y, Zhou J, Lu J. FlowIE: Efficient Image Enhancement via Rectified Flow. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2024.
  35. 35. Shi Y, Liu D, Zhang L, Tian Y, Xia X, and Fu X. ZERO-IG: Zero-Shot Illumination-Guided Joint Denoising and Adaptive Enhancement for Low-Light Images. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2024.
  36. 36. Wang Q, Wu B, Zhu P, Li P, Zuo W, and Hu Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 11531–11539, 2020.
  37. 37. Yu F and Koltun V. Multi-scale context aggregation by dilated convolutions. In Proceedings of the International Conference on Learning Representations (ICLR), 2016.
  38. 38. Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X. Understanding Convolution for Semantic Segmentation. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, pp. 1451–1460, 2018.
  39. 39. Wang Z, Bovik A C, Sheikh H. R, and Simoncelli E. P. Image quality assessment: from error visibility to structural similarity. In IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, April. 2004.
  40. 40. Wang Z, Simoncelli E P, and Bovik A C. Multiscale structural similarity for image quality assessment. The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Pacific Grove, CA, USA, 2003, pp. 1398–1402. Vol.2.
  41. 41. Venkatanath N, Praneeth D, Maruthi Chandrasekhar Bh, Channappayya S S, and Medasani S S Blind image quality evaluation using perception based features. 2015 Twenty First National Conference on Communications (NCC), Mumbai, India, 2015, pp. 1–6.
  42. 42. Mittal A, Moorthy A K, and Bovik A C. No-Reference Image Quality Assessment in the Spatial Domain. In IEEE Transactions on Image Processing, vol. 21, no. 12, pp. 4695–4708, Dec. 2012. pmid:22910118
  43. 43. Mittal A, Soundararajan R, and Bovik A C. Making a “Completely Blind” Image Quality Analyzer. In IEEE Signal Processing Letters, vol. 20, no. 3, pp. 209–212, March. 2013.
  44. 44. Zhang R, Isola P, Efros A A, Shechtman E, and Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 586–595, Jun. 2018.
  45. 45. Zhu A, Zhang L, Shen Y, Ma Y, Zhao S, and Zhou Y. Zero-Shot Restoration of Underexposed Images via Robust Retinex Decomposition. 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, pp. 1–6, 2020.
  46. 46. Li C, Guo C, and Loy C C. Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation. In IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 4225–4238, 1 Aug. 2022. pmid:33656989
  47. 47. Liu R, Ma L, Zhang J, Fan X, and Luo Z. Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 10556–10565, Jun. 2021.
  48. 48. Jin Y, Yang W, Tan R T. Unsupervised night image enhancement: when layer decomposition meets light-effects suppression. In Computer Vision—ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII, pp. 404–421.Springer(2022).
  49. 49. Yang W, Wang W, Huang H, Wang S, and Liu J. Sparse Gradient Regularized Deep Retinex Network for Robust Low-Light Image Enhancement. In IEEE Transactions on Image Processing, vol. 30, pp. 2072–2086, 2021.
  50. 50. Lee C, Lee C, and Kim C-S. Contrast Enhancement Based on Layered Difference Representation of 2D Histograms. In IEEE Transactions on Image Processing, vol. 22, no. 12, pp. 5372–5384, Dec. 2013. pmid:24108715
  51. 51. Lee C, Lee C, Lee Y Y, and Kim C. Power-constrained contrast enhancement for emissive displays based on histogram equalization. IEEE Trans. Image Process., vol. 21, no. 1, pp. 80–93, 2012. pmid:21672675
  52. 52. Guo X, Li Y, and Ling H. LIME: Low-light image enhancement via illumination map estimation. IEEE Trans. Image Process., vol. 26, no. 2, pp. 982–993, Feb. 2017. pmid:28113318
  53. 53. Wang S, Zheng J, Hu H -M, and Li B. Naturalness Preserved Enhancement Algorithm for Non-Uniform Illumination Images. In IEEE Transactions on Image Processing, vol. 22, no. 9, pp. 3538–3548, Sept. 2013. pmid:23661319