Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Multi-scale error-driven dense residual network for image super-resolution reconstruction

  • Xueri Li,

    Roles Conceptualization, Data curation, Methodology, Software, Visualization, Writing – original draft

    Affiliation School of Computer Science, Guangdong University of Science and Technology, Dongguan, China

  • Lei Yang,

    Roles Conceptualization, Data curation, Software, Visualization

    Affiliation School of Computer Science, Guangdong University of Science and Technology, Dongguan, China

  • Shimin Liang,

    Roles Conceptualization, Formal analysis, Validation, Visualization

    Affiliation School of Computer Science, Guangdong University of Science and Technology, Dongguan, China

  • Jianfang Wu

    Roles Conceptualization, Methodology, Supervision, Writing – original draft, Writing – review & editing

    wujianfang314@gmail.com

    Affiliations Faculty of Data Science, City University of Macau, Macau, China, College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China

Abstract

Image super-resolution reconstructs high-resolution images from low-resolution inputs. However, current single-image super-resolution techniques often struggle to capture multi-scale information and extract high-frequency details, which compromises reconstruction quality. Moreover, the prevalent feed-forward network architectures lack robust feedback mechanisms for iterative refinement and enhanced acquisition of high-frequency information. To overcome these limitations, this research develops advanced strategies for multi-scale feature extraction, fusion, and feedback in single-image super-resolution. We propose an innovative error-driven, multi-scale dense residual network (EMDN) that retains a feed-forward structure while integrating error-driven feedback. Specifically, our approach utilizes dual multi-scale features: one derived from convolutional kernels of varying sizes and another extracted from diverse inputs, both processed concurrently. Comparative evaluations across different scaling factors demonstrate that our method outperforms existing approaches in both subjective and objective assessments. In particular, compared to the baseline feed-forward network, our model achieves improvements of up to 0.385% in peak signal-to-noise ratio and 0.191% in structural similarity index measure. The experimental results validate the effectiveness and practical significance of our proposed method in enhancing image resolution and restoration quality.

Introduction

The most important information humans obtain from the outside world is visual, and a key carrier of visual information is images. In early information age, due to device limitations and insufficient transmission rates, acquired images often lacked high resolution and contained less information. With rapid development of communication technology and computers, people can use more high-resolution images, and quality requirements have increased. Advances in high-resolution imaging technologies have enabled cameras and sensors to capture more detailed image content. As the demand for higher image quality has grown, super-resolution techniques have been used to enhance image resolution. In combination with advances in communications and computer technology, these techniques can now be applied more broadly to improve image quality. High-resolution images offer enhanced visual effects, information and details. High-resolution images are in high demand in many areas of production. Specifically, in remote sensing images, more localized information is needed to probe the ground, in medical images, high-resolution images with more details of body tissues are needed to assist doctors in medical diagnosis [1]. Acquiring high-resolution medical images typically necessitates the use of expensive imaging equipment. Moreover, increasing scanning frequency or extending scanning duration to achieve higher resolution can introduce motion artifacts and elevate the physical risks to patients. In contrast, employing super-resolution reconstruction provides a more cost-effective and lower-risk approach to obtaining high-resolution medical images [24]. High-precision super-resolution reconstruction of facial images offers substantial potential in law enforcement for personnel identification and screening [5]. However, optical degradations—stemming from factors such as lens blur and aperture diffraction—can significantly compromise the detailed information present in these images [6,7]. In various computer vision applications, utilizing high-resolution images has proven beneficial for achieving superior outcomes. For instance, in target detection, acquiring high-resolution images facilitates identifying smaller targets [5,8,9]. As such, enhancing image resolution stands as a vital area of inquiry [10].

The high resolution of an image means having more pixel points per unit of space. Having more pixel points also means that the image can contain more visual information and can reproduce the real situation of the scene more realistically [11,12]. Despite efforts to optimize the acquisition, transmission, and processing of images, external factors beyond our control may result in varying degrees of quality degradation. Although higher-quality imaging equipment can often improve image resolution, this solution may not be feasible for certain situations due to associated costs and other insurmountable challenges, such as historical images that cannot be reacquired and environments where better equipment cannot be deployed [32]. Therefore, improving hardware alone may not be a sufficient solution to address image resolution challenges. To improve image quality, resorting to software algorithms for image resolution reconstruction is necessary [13]. The process of image super-resolution is not contingent on acquiring superior image-capturing equipment and is cost-effective and widely applicable.

Since the concept of image super-resolution was introduced, numerous approaches have been pro-posed in this field. Traditional super-resolution algorithms can be categorized into three types: interpolation-based methods, reconstruction-based methods, and learning-based methods. However, traditional approaches suffer from insufficient detail restoration, reliance on unrealistic prior assumptions, and low computational efficiency [35,36]. These inherent constraints have catalyzed the emergence of deep learning paradigms in super-resolution, whereby self-learning systems deploy data-driven mechanisms to establish sophisticated nonlinear mappings between low- and high-resolution domains. This technological evolution has considerably enhanced both the perceptual fidelity and functional applicability of super-resolution methods. Modern neural architectures exhibit marked improvements in reconstruction quality relative to conventional algorithms, primarily due to their ability to (1) reduce reliance on human-engineered constraints by autonomously optimizing feature representations, and (2) uncover latent patterns in large datasets that surpass human perceptual capabilities. These data-derived insights facilitate the preservation of photorealistic textures and semantically consistent structures during upscaling operations, representing a paradigm shift in computer vision methodologies [35,37]. Deep learning methods are a hot research topic in image super-resolution. Despite achieving excellent results, current deep learning-based methods have challenges including ineffective multi-scale feature fusion, lack of feedback adjustment information, and slow running speed due to a large model with many parameters. We propose an error-driven and multi-scale based dense residual network (EMDN) to address these issues.

(1) Super-resolution algorithms often rely on a single type of multi-scale features, but a new module has been designed to extract multi-scale features using both multi-scale inputs and multi-scale convolution kernels.

(2) Combined with an error-driven, multi-scale dense residual network, our method efficiently re-constructs image super-resolution. The network leverages two different multi-scale features: one for extracting multi-scale features using various convolutional kernel sizes, and another for multi-scale features of different sizes.

(3) The proposed network excels at various mag-nification ratios, delivering superior results in both subjective and objective assessments compared to other methods.

The remainder of this paper is organized as follows. In Section Related works, we provide a systematic review of super-resolution reconstruction research. Section Materials and methods details our proposed methodology and model architecture. Comprehensive experimental evaluations and ablation studies are presented in Section Experiments and analysis. Finally, Section Conclusion concludes the paper and outlines directions for future research.

Related works

Many image super-resolution algorithms have been developed. Single-image super-resolution reconstruction presents a significant challenge [14]. Deep learning has dramatically improved super-resolution reconstruction, turning it into a thriving research area [15]. The structure of deep learning-based super-resolution reconstruction is usually designed as an end-to-end approach to generate features, discover mapping connections, and construct high-resolution images automatically.

Chen et al. [16] put forward a model named SRCNN, which was a significant advancement in the field. Its architecture contains three convolutional layers. The process begins by extracting features from a low-resolution image using convolution, which establishes mapping relationships. Compared to conventional super-resolution techniques, SRCNN, based on convolutional neural networks, has superior reconstruction quality. Later, Dong et al. proposed FSRCNN, adding transpose convolution to the original method. Unlike SRCNN, FSRCNN inputs a low-resolution image of the original size, not an interpolated one. Then, transpose convolution reconstructs the high-resolution image, significantly improving reconstruction speed. However, enlarged images using transpose convolution tend to have a checker-board effect on the reconstructed high-resolution image. Wang et al. [17] proposed ESPCN, which re-organizes individual pixel points from multiple low-resolution feature channels into a single unit of a single high-resolution image feature channel - an operation called pixel shuffle. This avoids the checkerboard artifacts of transpose convolution and maintains fast reconstruction speed.

Since He proposed ResNet, the residual structure has been widely used, and adding residual connections can further deepen networks and enhance feature extraction [1823]. Kim et al. [18] used residual connections for super-resolution in VDSR, with glob-al residual connections mitigating gradient vanishing and enabling deeper networks that significantly improve reconstruction. Kim et al. [24] used RNN structure for super-resolution in DRCN, benefiting from parameter sharing and fewer model parameters. These methods demonstrate satisfactory lower-magnification reconstruction, but struggle at higher magnifications. Li et al. [25] proposed LapSRN, using a stepwise amplification approach and a pyramid-shaped network structure. This reduces complexity and enables better high-magnification results. Lim et al. [21] removed batch normalization (BN) from their EDSR model to further expand network size and achieve excellent reconstruction, although over-sized models are computationally expensive and harder to train. Fang et al. [26] adapted the traditional iterative back-projection method to deep learning with DBPN, using up-projection and down-projection modules for scaling and self-correction. DBPN achieves strong reconstruction results.

In recent years, significant progress has been made in super-resolution technology through advancements in model architecture optimization, degradation modeling, and cross-domain applications. The diffusion model-based MRKD [33] addresses the issues of poor detail consistency and slow sampling speed in traditional DDPM by incorporating multimodal constraints and knowledge distillation. The lightweight hybrid architecture ESRT [34] combines CNN’s local feature extraction with Transformer’s global modeling capabilities, reducing computational costs while maintaining performance advantages. For few-shot scenarios, DESRGAN [35] employs dual-stream feature extraction and artifact suppression loss functions to mitigate overfitting caused by insufficient data.

In degradation modeling and perceptual optimization, the frequency-domain loss function based on the DCT domain [39] significantly enhances the visual quality of reconstructed images by weighting high-frequency information through quantization matrices. The RSISR survey [36] systematically summarizes challenges in real-world degradation modeling and proposes domain adaptation and self-learning methods to bridge the gap between synthetic and real data. SSIR [37] innovatively introduces spatial shuffle multi-head self-attention (SS-MSA), achieving efficient global-local feature fusion while reducing parameters by 40% and improving reconstruction accuracy.

Cross-domain technology migration provides new insights for SISR: the dual-stream convolution and attention modules in the hyperspectral detection framework HCD-Net [40] could inspire multimodal super-resolution designs. The global-local contrast optimization strategy from the medical imaging enhancement algorithm G-CLAHE [42] may improve super-resolution preprocessing quality. The ensemble learning model in COVID-19 diagnosis [41] demonstrates the robustness of multi-model collaboration, offering references for joint optimization of complex degradations in super-resolution tasks. Additionally, the dynamic resource allocation concept from supply chain AI research [38] could be adapted for the lightweight deployment of super-resolution models.

Super-resolution networks utilize feedforward techniques to map low-resolution to high-resolution images. The goal of this work is to build upon this research and provide solutions to challenges outlined in the previous research. Specifically, we introduce a novel method merging error-driven and multi-scale strategies for notable outcomes in image super-resolution reconstruction.

Materials and methods

Multi-scale feature extraction block

Multi-scale features, interpreted as signal sampling at varying granularities, allow observations of different features at different scales. Fig 1 shows their ex-traction process using convolution kernels at multiple scales and fusing the resultant multi-scale features.

Multi-scale feature extraction is achieved through the use of convolutional kernels of varying scales. The approach involves utilizing two separate branches. While larger kernel sizes are capable of obtaining a broader perceptual field, they also entail a higher number of parameters which can negatively impact the network’s operational speed.

Error-driven fusion block

The error-driven approach, as featured in DBPN [27], eliminates the need for a cyclic structure to conduct feedback operations. It employs a feedforward operation to fuse feedback information with the original features, using a process that involves subtracting two feature maps, extracting error information via convolution, and adding the result to the original feature map. In the DBPN framework, up-projection and down-projection modules utilize this methodology to determine reconstruction errors and extract features. The present study leverages these concepts to formulate an error-driven mechanism for feature fusion. In contrast to the prior error-driven approach, the proposed method does not alter the feature map size and is better suited for parallel multi-branch and serial multi-scale feature fusion. The error-driven mechanism visually represented in Fig 2 can be mathematically described by the given Eq (1):

(1)

Where is the convolution kernel size of convolution layer, F1 and F2 are the two feature maps, and Fout represents the fused features.

The overall structure

Three components make up the suggested error-driven and multi-scale based dense residual network, as seen in Fig 3: an image reconstruction module, an error-driven and multi-scale feature extraction module, and a preliminary feature extraction module. An error-driven, multi-scale feature extraction module receives the low-resolution image. Dense residual connections are used to extract multi-scale information and transfer them to the subsequent module. The reconstruction module creates the high-resolution image by using the output features from the preceding module.

thumbnail
Fig 3. Overall structure of the error-driven and multi-scale based dense residual network. The whole structure contains a preliminary feature extraction module, an error-driven and multi-scale feature extraction module, and an image reconstruction module.

https://doi.org/10.1371/journal.pone.0330615.g003

The low-resolution image’s single-scale features must be extracted before going into the error-driven, multi-scale feature extraction module. There are two types of input low-resolution images: the original size and an interpolated and enlarged version. Both are initially passed through two convolutional layers in order to extract preliminary features. The multi-scale feature extraction module that follows thereafter receives these first features. This module is specifically designed to further extract multi-scale features. The complete procedure is mathematically formulated in Eq (2):

(2)

where is the convolution kernel size of convolution layer, is the output feature map of original size, and is the output feature map of large size.

The incoming feature map is first subjected to a dimensionality reduction operation, using a bottle-neck layer to compress the feature map to a smaller dimension. The proposed neural network architecture employs a convolutional kernel in its bottleneck layer, as mathematically formulated in Eq (3):

(3)

is the feature fusion operation. The compressed original size features enter the convolution layer with a convolution kernel of and the convolution layer with a convolution kernel of , respectively. The process in question can be succinctly expressed by the following Eq (4), which provides a clear and concise representation of the underlying mathematics involved.

(4)

are the extracted features.

The two features, meticulously extracted from the input data, are fused with the compressed, shallow features separately. The resulting feature maps are then subjected to a subtraction operation, yielding a comprehensive comparison of the two feature sets. This subtraction process allows for the identification and extraction of key error information. The result-ant error information is subsequently added back to the original feature map. The entire procedure is succinctly represented by Eqs (5)(6).

(5)(6)

Unlike previous methods, here instead of using residual connections to achieve feature fusion, error-driven is used to fuse features. After fusing the shallow feature information, the feature fusion of two branches is performed once using the error-driven, and the information between the two features is ex-changed through the fusion, the feature maps of the two paths are subtracted, the error information is extracted using the convolutional layer, and the error information is returned to the original features, the process described above is expressed by Eq (7) as follows:

(7)

After exchanging information, the fused feature map continues to extract features once in the parallel path using convolutional layers with different size convolutional kernels, and after extraction an up-sampling operation is performed on the features, andThe up-sampling multiplier corresponds to the scaling multiplier, and this process is described by Eqs (8)(9).

(8)(9)

are upsampling operations, which utilize the method PixelShuffle [22]. The principal function of PixelShuffle, a sophisticated technique for pixel reorganization, is to take a feature map that has a relatively low resolution and manipulate it in such a way that it results in a feature map with a significant-ly higher resolution. The process by which this trans-formation occurs involves a combination of convolution and reorganization, which takes place between multiple channels within the image data.

The fusion of the two enlarged features with the input compressed large-size feature, denoted as FL, is performed using the same error-driven fusion method previously applied to the original-size features. This fusion process is mathematically expressed in Eqs (10) and (11):

(10)(11)

After fusing the shallow, large-scale features, the parallel features are integrated. Specifically, the two large-scale features are combined into a single representation using an error-driven approach, as formulated in Eq (12).

(12)

where denotes the large-scale characterization of the model, which is used for the reconstruction of the final high-resolution image and as a shallow feature to be passed into the deeper modules. The utilization of the interpolated low-resolution images, along with the subsequent features generated from them, necessitates a considerable amount of computational resources. This is due to the intricate nature of the image processing and analysis involved. In order to make efficient use of these resources and reduce the overall consumption of the entire model, the features can be down-sampled to their original size. This process results in a feature map that is suit-able for multi-scale feature extraction. The mathematical representation of this down-sampling process can be expressed through the following Eq (13).

(13)

The output features are concatenated with the features outputted by the previous module and fed to the next in a dense residual concatenation. The reconstruction module input is the output of the er-ror-driven multi-scale feature extraction module and the preliminary feature extraction module. The high-resolution reconstruction is performed by means of Eq (14):

(14)

where SR denotes the final generated high-resolution image, Y0 denotes the global residual connection.

Experiments and analysis

The experiment setup and results analysis

This work was performed on Ubuntu 20.04, based on PyTorch 1.8.1, with an NVIDIA GeForce RTX 3090 GPU. The initial learning rate was set to 0.0001, and the number of iterations was set to 300. The learning rate was reduced by half after 200 iterations, using the Adam optimizer. The batch size was set to 16, and the loss function was the mean squared error loss between the reconstructed image and the real image.

This study, which seeks to provide a comprehensive analysis of image data, strategically utilizes the initial 800 images from the DIV2K dataset as the primary training set. The study further selects five separate test sets: Set5 [23], Set14 [24], BSDS100, Manga109 and Urban100. These datasets collectively constitute the foundational evaluation framework for image super-resolution by encompassing a diverse range of scene types (natural, urban, and artistic), challenge dimensions (including fine details, edges, and stylistic elements), and evaluation requirements (from rapid testing to rigorous validation). Their significance stems not only from the consistency of their task objectives but also from their complementary nature, which enables a comprehensive assessment of algorithm performance [43,44]. During the training phase with DIV2K, the images were divided into size patches and trained on RGB color space. The low-resolution images were obtained from their high-resolution counterparts through the application of bicubic degradation, a process commonly used to reduce the resolution of an image. We take peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) as evaluation metrics. These metrics, PSNR and SSIM, are especially well-suited for comparing the degree of similarity between two images. PSNR and SSIM rely exclusively on simple mathematical operations (e.g., mean squared error and local pixel statistics) without requiring complex models or specialized hardware support, rendering them particularly suitable for rapid validation in large-scale experiments. By achieving an optimal balance between efficiency, versatility, and interpretability, these metrics remain an indispensable standard in the field of super-resolution [4547]. Moreover, the current study made use of reference-free image evaluation metrics, specifically Spatial-Spectral Entropy-based Quality (SSEQ) and Natural Image Quality Evaluator (NIQE), as tools to appraise the quality of super-resolved images in authentic, real-world settings. This approach is adopted because traditional metrics such as PSNR and SSIM, which require an ideal high-resolution reference, are not feasible when such references are unavailable. Therefore, SSEQ and NIQE offer a robust alternative for evaluating the quality of super-resolved images derived from authentic, real-world data [48,49].

The present study involves a comparative analysis of the algorithms proposed herein with those put forth by other researchers, taking into account two objective metrics, namely, PSNR and SSIM. the algo-rithms compared are Bicubic [14], SRCNN [15], VDSR [18], LapSRN [20], MSRN [22], EDSR [21], MDCN [25], SeaNet [26], HDRN [27] and HBPN [28], all comparisons are tested in the mentioned test sets, and the magnifications for comparison are 2 ×, 3 ×, 4 ×, and 8 ×, specifically the image is first downsampled at the corresponding magnification and then restored using a super-resolution algorithm at the same magnification, and the reconstructed quality is evaluated by calculating the PSNR and SSIM by comparing the reconstructed image with the original image. Among them, the error-driven and multi-scale based dense residual network in this pa-per contains 14 multi-scale feature extraction mod-ules. The comparison results are shown in Table 1, Table 2, Table 3 and Table 4.

thumbnail
Table 1. Quantitative comparison of different image super-resolution algorithms at 2× magnification.

https://doi.org/10.1371/journal.pone.0330615.t001

thumbnail
Table 2. Quantitative comparison of different image super-resolution algorithms at 3× magnification.

https://doi.org/10.1371/journal.pone.0330615.t002

thumbnail
Table 3. Quantitative comparison of different image super-resolution algorithms at 4× magnification.

https://doi.org/10.1371/journal.pone.0330615.t003

thumbnail
Table 4. Quantitative comparison of different image super-resolution algorithms at 8× magnification.

https://doi.org/10.1371/journal.pone.0330615.t004

In contrast to traditional methods such as Bicubic, the utilization of deep learning techniques confers significant benefits. Furthermore, the proposed method exhibits marked advantages when com-pared to other deep learning methods. Notably, disparities in reconstruction efficacy between the proposed approach and other methods are not apparent when zooming in by a factor of two. This is due to the relatively low difficulty of the zooming task, which fails to provide a clear reflection of differences between various methods. However, the proposed EMDN achieves optimal results across all datasets when zooming in by a factor of three. In addition, EMDN exhibits optimal reconstruction effects on two datasets when zooming in by a factor of four, suboptimal and near-optimal results on two datasets, and optimal results on three datasets when zooming in by a factor of eight.

In our experiments, low-resolution images with different magnifications are obtained by downsampling, and different downsampling magnifications result in different degrees of image information loss. The larger the downsampling magnification, the more image information is lost. During downsampling, more information is lost as the magnification increases. downsampling may only lose a small amount of high-frequency details, while downsampling may lose a large amount of details, making it difficult for the super-resolution algorithm to recover all the original information, corresponding to the results in Tables 14.

The present study displays the visual effect plots of butterfly images magnified 2 times on the Set5 dataset, implemented by various models, as present-ed in Fig 4. In terms of subjective assessments, the compared methods encompass Bicubic, SRCNN, DBPN, EDSR, MSRN, RDN, and the newly-proposed EMDN, where GT signifies the initial high-resolution image.

thumbnail
Fig 4. Reconstruction results of each method on the Set14 dataset at a magnification of 2×.

https://doi.org/10.1371/journal.pone.0330615.g004

The study presents a comparison of various super-resolution techniques for enhancing building images in Urban100. Specifically, the comparison was conducted for the magnification case, which is known to be more challenging than the magnification . The results of the comparison, as depicted in Fig 5, indicate that while images reconstructed through Bicubic and SRCNN appear blurred, those generated through other super-resolution methods display obvious distortion in their patterns. In contrast, the proposed technique is shown to have a high degree of similarity with the original image, without any significant distortion of its shape. A closer examination of the reconstructed results of other methods reveals the presence of large-scale non-existent slashes, errors in line direction, and inadequate restoration of straight lines. The proposed method, on the other hand, stands out from previous approaches by skirting these limitations and exhibiting superior capability in reconstructing high-resolution images with richer, more intricate details that are markedly closer to the original.

thumbnail
Fig 5. Reconstruction effect of each method on Urban100 dataset at 4× magnification.

https://doi.org/10.1371/journal.pone.0330615.g005

Concurrently, in instances where authentic images lack associated high-resolution counterparts, it becomes imperative to employ reference-free methods of assessing image quality. In this research, two reference-free metrics of image quality evaluation, namely NIQE and SSQE, were utilized for comparative analysis. The Bicubic, SRCNN, RDN, DBPN, and EDSR methods were incorporated in the comparison, and their outcomes were presented in Table 5. The method presented in this paper can achieve good results in the reconstruction of real images.

thumbnail
Table 5. Comparison of NIQE and SSEQ indicators at 4× magnification.

https://doi.org/10.1371/journal.pone.0330615.t005

The model introduced in the paper is carefully evaluated and rigorously compared with several other prevalent methods, with a particular focus on the computational cost. We take the running time as an evaluation metric. The compared methods include SRCNN, VDSR, DRRN, SAN, RDN, and DBPN, and the comparison results are shown in Table 6.

thumbnail
Table 6. Comparison of individual models on the Set5 dataset.

https://doi.org/10.1371/journal.pone.0330615.t006

Compared to earlier image super-resolution algorithms, recent methods for parameter count, computation, and runtime have increased. However, the reconstructed images are clearer. Compared to current outstanding methods, this article proposes a method that increases fewer parameters but greatly reduces computation, achieving comparable or even better results.

The presented model is compared with other methods in terms of computational cost. The comparison includes the number of parameters and multiple accumulation operations (Multi&Adds) of the model at magnification, using images of the same size when comparing the number of multiple accumulation operations (Multi&Adds). Finally, the running time is also compared, with input images of size and output images of size . The compared methods include SRCNN, VDSR, DRRN, SAN, RDN, and DBPN, Table 6 shows the comparison of the models on the Set5 dataset.

Compared to earlier image super-resolution algorithms, recent methods for parameter count, computation, and runtime have increased. However, the reconstructed images are clearer. Compared to current outstanding methods, this article proposes a method that increases fewer parameters but greatly reduces computation, achieving comparable or even better results.

Ablation analyses

To evaluate the effectiveness of the error feedback mechanism, we replace error feedback fusion in the model with traditional cascade fusion. The transformed network is called a multiscale feature dense residual network (MDN). Comparisons are made at magnification, and the degraded network has the same number of layers as the EMDN, ensuring roughly equivalent parameter counts. Comparison results are shown in Table 7. Error feedback feature fusion outperforms traditional cascade fusion across three datasets. In comparison to MDN, our EMDN consistently improves SSIM by over 0.1%, thereby demonstrating that error feedback feature fusion significantly enhances image reconstruction quality.

thumbnail
Table 7. Comparison of feedback fusion and cascade fusion.

https://doi.org/10.1371/journal.pone.0330615.t007

Next, we explore the role of serial multiscale features in the network, which can be obtained by removing dense residual connections from the original network to create a feedback multiscale network (FMN). Since there are two types of serial multiscale feature transfer in the network, one for small feature maps and one for large feature maps, due to the design of three different networks. FMN-s removes dense residual connections for small feature maps, FMN-l removes them for large feature maps, and FMN-non removes all residual connections. In the magnification, for comparison, all above networks have the same size, and the comparison results are shown in Table 8.

thumbnail
Table 8. Comparing EMDN with networks that remove serial multi-scale feature connection.

https://doi.org/10.1371/journal.pone.0330615.t008

Table 8 indicates that the proposed EMDN outperforms all competing methods across every dataset and metric. Notably, it achieves the most significant improvements on the Urban100 dataset for PSNR (+0.385%) and for SSIM (+0.191%). Ablation studies further demonstrate that removing any component from the serial jump-connected feature fusion architecture degrades the overall reconstruction capability. In particular, the complete exclusion of dense residual connections results in the most substantial decline in image quality, thereby substantiating the critical role of the proposed multi-scale feature fusion framework based on dense residual connections.

The proposed method is experimentally demonstrated to achieve excellent reconstruction results, and ablation experiments verify the effectiveness of the designed structure.

Generalization evaluation

We collected CT images from The Cancer Imaging Archive (TCIA), a publicly accessible repository hosting a large number of CT scans. Our study utilized three datasets from TCIA: TCGA-CODA (colon adenocarcinoma CT images), TCGA-STAD (stomach adenocarcinoma CT images), and TCGA-ESCA (esophageal carcinoma CT images). From these datasets, 600 CT images were selected to form the training set, with each dataset contributing one-third of the total training data (i.e., 200 images per dataset). Additionally, 20 images per dataset were combined, shuffled, and partitioned into three independent test sets, each comprising 20 CT images. Following the experimental protocols described previously, these images were downsampled by factors of , , and to evaluate the generalizability of the proposed method.

We evaluate our method by comparing it with several alternative approaches on a custom-constructed test set. In our study, the techniques under comparison included Bicubic interpolation, SRCNN, VDSR, and DBPN. The reconstruction quality of CT images was quantified at , , and upscaling factors using objective metrics, namely PSNR and SSIM. As demonstrated in Table 9, deep learning–based methods exhibited significant advantages over traditional approaches, such as Bicubic interpolation, while our proposed method further outperformed the other deep learning techniques. In addition to the quantitative evaluation, we conducted a subjective assessment by visually examining the CT images reconstructed by our method at and magnification levels (see Fig 6). The visual comparisons confirm that our approach produces CT images with a richer representation of details compared to the alternative methods.

thumbnail
Table 9. Objective comparison of various super-resolution algorithms across different scaling factors.

https://doi.org/10.1371/journal.pone.0330615.t009

thumbnail
Fig 6. Subjective comparison of various super-resolution reconstruction methods, with the first row displaying 4× upscaling and the second row showing 8× upscaling.

https://doi.org/10.1371/journal.pone.0330615.g006

The proposed Error-driven Multi-scale Dense Residual Network (EMDN) substantially improves detail recovery in image super-resolution reconstruction by integrating multi-scale feature extraction with an error feedback mechanism. Experimental results demonstrate that EMDN outperforms state-of-the-art approaches (e.g., EDSR, RDN) in PSNR and SSIM metrics for magnification tasks ranging from to , notably reducing artifacts in complex texture reconstruction while maintaining high efficiency with 17 million parameters and 490.4G multiply-add operations. In practical applications, EMDN can be seamlessly integrated into medical imaging systems to enhance CT/MRI resolution for detecting subtle lesions, embedded in surveillance devices to improve low-light face recognition accuracy, and deployed on satellite remote sensing platforms to optimize environmental monitoring. For system deployment, scenario-specific input preprocessing (e.g., bicubic downsampling alignment) and TensorRT acceleration—which achieves real-time processing from to in 67 ms—are recommended. Future research may involve compressing the model via knowledge distillation for mobile deployment or integrating it with denoising modules to construct end-to-end enhancement pipelines, thereby extending its applicability to autonomous driving and cultural heritage restoration.

Conclusion

This study introduces an error-driven, multi-scale dense residual network designed to address the prevalent limitation of inadequate high-frequency details in super-resolution images. The proposed single-image super-resolution network strategically integrates error-driven and multi-scale feature fusion by leveraging both the original low-resolution image, which contains essential base information, and its interpolated counterpart, which provides supplementary detail. Experimental results, obtained through rigorous testing, attest to the superior reconstruction performance of the proposed method relative to its contemporaries. Comparative evaluations across various scaling factors indicate that our approach outperforms existing methods in both subjective and objective assessments. Notably, compared to a baseline feed-forward network, our model achieves improvements of up to 0.385% in peak signal-to-noise ratio and 0.191% in structural similarity index measure. Future research will explore the integration of predictive coding techniques into image super-resolution, a promising strategy that may further enhance the accuracy and detail of super-resolved images and push the boundaries of this rapidly evolving field.

References

  1. 1. Dorr F. Satellite image multi-frame super resolution using 3D wide-activation neural networks. Remote Sensing. 2020;12(22):3812.
  2. 2. Xu Q-H, Li B. Multiscale fusion for spatially decoupled multimodal MRI super-resolution reconstruction. IEEE Trans Instrum Meas. 2025;74:1–12.
  3. 3. Sun J, Zeng X, Lei X, Gao M, Li Q, Zhang H, et al. Medical image super-resolution via transformer-based hierarchical encoder–decoder network. Netw Model Anal Health Inform Bioinforma. 2024;13(1).
  4. 4. Han X, Gong X. Medical image super-resolution algorithm based on multi-scale feature aggregation. In: 2024 IEEE International Conference on Medical Artificial Intelligence (MedAI). IEEE; 2024. p. 288–93. https://doi.org/10.1109/medai62885.2024.00044
  5. 5. Zhang X, Feng C, Wang A, Yang L, & Hao Y. CT super-resolution using multiple dense residual block based GAN. Signal, Image and Video Processing. 2021;15:725–33.
  6. 6. Berardini D, Migliorelli L, Galdelli A, Marín-Jiménez MJ. Edge artificial intelligence and super-resolution for enhanced weapon detection in video surveillance. Engineering Applications of Artificial Intelligence. 2025;140:109684.
  7. 7. AlHalawani S, Benjdira B, Ammar A, Koubaa A, Ali AM. DiffPlate: a diffusion model for super-resolution of license plate images. Electronics. 2024;13(13):2670.
  8. 8. Courtrai L, Pham M-T, Lefèvre S. Small object detection in remote sensing images based on super-resolution with auxiliary generative adversarial networks. Remote Sensing. 2020;12(19):3152.
  9. 9. Dai Y, Wu Y, Zhou F, Barnard K. Asymmetric contextual modulation for infrared small target detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021. p. 950–9.
  10. 10. Huang Y, Miyazaki T, Liu X, Jiang K, Tang Z, Omachi S. Learn from orientation prior for radiograph super-resolution: orientation operator transformer. Comput Methods Programs Biomed. 2024;245:108000. pmid:38237449
  11. 11. Li Y, Zhang Y, Timofte R, Van Gool L, Yu L, Li Y, et al. NTIRE 2023 challenge on efficient super-resolution: Methods and results. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023. p. 1921–59.
  12. 12. Yang L, Zhang Z, Song Y, Hong S, Xu R, Zhao Y, et al. Diffusion models: a comprehensive survey of methods and applications. ACM Comput Surv. 2023;56(4):1–39.
  13. 13. Dudhane A, Zamir SW, Khan S, Khan FS, Yang MH. Burst image restoration and enhancement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. p. 5759–68.
  14. 14. Foley TA. Weighted bicubic spline interpolation to rapidly varying data. ACM Trans Graph. 1987;6(1):1–18.
  15. 15. Dong C, Loy CC, He K, Tang X. Learning a deep convolutional network for image super-resolution. In: Computer Vision–ECCV 2014 : 13th European Conference. 2014. p. 184–99.
  16. 16. Chen X, Wang X, Zhou J, Qiao Y, Dong C. Activating more pixels in image super-resolution transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023. p. 22367–77.
  17. 17. Wang P, Bayram B, Sertel E. A comprehensive review on deep learning based remote sensing image super-resolution methods. Earth-Science Reviews. 2022;232:104110.
  18. 18. Kim J, Lee JK, Lee KM. Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 1646–54.
  19. 19. Saharia C, Ho J, Chan W, Salimans T, Fleet DJ, Norouzi M. Image super-resolution via iterative refinement. IEEE Trans Pattern Anal Mach Intell. 2023;45(4):4713–26. pmid:36094974
  20. 20. Zhang Q, Xu Y, Zhang J, Tao D. ViTAEv2: vision transformer advanced by exploring inductive bias for image recognition and beyond. Int J Comput Vis. 2023;131(5):1141–62.
  21. 21. Lim B, Son S, Kim H, Nah S, Mu Lee K. Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017. p. 136–44.
  22. 22. Li J, Fang F, Mei K, Zhang G. Multi-scale residual network for image super-resolution. In: Proceedings of the European Conference on Computer Vision. 2018. p. 517–32.
  23. 23. Chen H, He X, Qing L, Wu Y, Ren C, Sheriff RE, et al. Real-world single image super-resolution: a brief review. Information Fusion. 2022;79:124–45.
  24. 24. Kong F, Li M, Liu S, Liu D, He J, Bai Y, et al. Residual local feature network for efficient super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2022. p. 766–76.
  25. 25. Li J, Fang F, Li J, Mei K, Zhang G. MDCN: multi-scale dense cross network for image super-resolution. IEEE Trans Circuits Syst Video Technol. 2021;31(7):2547–61.
  26. 26. Fang F, Li J, Zeng T. Soft-edge assisted network for single image super-resolution. IEEE Trans on Image Process. 2020;29:4656–68.
  27. 27. Jiang K, Wang Z, Yi P, Jiang J. Hierarchical dense recursive network for image super-resolution. Pattern Recognition. 2020;107:107475.
  28. 28. Liu ZS, Wang LW, Li CT, Siu WC. Hierarchical back projection network for image super-resolution. arXiv preprint 2019. https://arxiv.org/abs/1906.06874
  29. 29. Tai Y, Yang J, Liu X. Image super-resolution via deep recursive residual network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 3147–55.
  30. 30. Dai T, Cai J, Zhang Y, Xia ST, Zhang L. Second-order attention network for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 11065–74.
  31. 31. Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y. Residual dense network for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 2472–81.
  32. 32. Wan Z, Zhang B, Chen D, Zhang P, Chen D, Wen F, et al. Old photo restoration via deep latent space translation. IEEE Trans Pattern Anal Mach Intell. 2023;45(2):2071–87. pmid:35349432
  33. 33. Yan J, Wang Q, Cheng Y, Su Z, Zhang F, Zhong M, et al. Optimized single-image super-resolution reconstruction: a multimodal approach based on reversible guidance and cyclical knowledge distillation. Engineering Applications of Artificial Intelligence. 2024;133:108496.
  34. 34. Lu Z, Li J, Liu H, Huang C, Zhang L, Zeng T. Transformer for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
  35. 35. Ma C, Mi J, Gao W, Tao S. DESRGAN: detail-enhanced generative adversarial networks for small sample single image super-resolution. Neurocomputing. 2025;617:129121.
  36. 36. Chen H, He X, Qing L, Wu Y, Ren C, Sheriff RE, et al. Real-world single image super-resolution: a brief review. Information Fusion. 2022;79:124–45.
  37. 37. Zhao L, Gao J, Deng D, Li X. SSIR: spatial shuffle multi-head self-attention for single image super-resolution. Pattern Recognition. 2024;148:110195.
  38. 38. Ahmadirad Z. The role of AI and machine learning in supply chain optimization. International journal of Modern Achievement in Science, Engineering and Technology. 2025;2(2):1–8.
  39. 39. Arezoomand A, Cheraaqee P, Mansouri A. Perceptually optimized loss function for image super-resolution. In: 2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS). 2021. p. 1–5. https://doi.org/10.1109/icspis54653.2021.9729334
  40. 40. Seydi ST, Boueshagh M, Namjoo F, Minouei SM, Nikraftar Z, Amani M. A Hyperspectral Change Detection (HCD-Net) framework based on double stream convolutional neural networks and an attention module. Remote Sensing. 2024;16(5):827.
  41. 41. Yousefpanah K, Ebadi MJ, Sabzekar S, Zakaria NH, Osman NA, Ahmadian A. An emerging network for COVID-19 CT-scan classification using an ensemble deep transfer learning model. Acta Trop. 2024;257:107277. pmid:38878849
  42. 42. Nia SN, Shih FY. Medical X-ray image enhancement using global contrast-limited adaptive histogram equalization. arXiv preprint 2024. https://doi.org/arxiv:241101373
  43. 43. Niu A, Pham TX, Zhang K, Sun J, Zhu Y, Yan Q, et al. ACDMSR: accelerated conditional diffusion models for single image super-resolution. IEEE Trans on Broadcast. 2024;70(2):492–504.
  44. 44. Zhang L, Li Y, Zhou X, Zhao X, Gu S. Transcending the limit of local window: advanced super-resolution transformer with adaptive token dictionary. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
  45. 45. Martini MG. Measuring objective image and video quality: on the relationship between SSIM and PSNR for DCT-based compressed images. IEEE Trans Instrum Meas. 2025;74:1–13.
  46. 46. Zhao H, Tian L, Xiao X, Hu P, Gou Y, Peng X. AverNet: all-in-one video restoration for time-varying unknown degradations. Advances in Neural Information Processing Systems. 2024;37:127296–316.
  47. 47. Lei X, Zhang W, Cao W. Dvmsr: Distillated vision mamba for efficient super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
  48. 48. Bouhamed SA, Kallel IK, Bossé É, Solaiman B. Two no-reference image quality assessment methods based on possibilistic choquet integral and entropy: application to automatic fingerprint identification systems. Expert Systems with Applications. 2023;224:119926.
  49. 49. Sahu G, Seal A, Jaworek-Korjakowska J, Krejcar O. Single image dehazing via fusion of multilevel attention network for vision-based measurement applications. IEEE Trans Instrum Meas. 2023;72:1–15.