Figures
Abstract
High-resolution Magnetic Resonance Imaging (MRI) plays an important role in clinical diagnosis and pathological assessment, due to its non-invasive nature and lack of ionizing radiation. However, the acquisition of high-resolution MRI is often constrained by hardware limitations and a prolonged scanning duration. To address these limitations, super-resolution (SR) techniques have been introduced to reconstruct high-resolution images from low-resolution inputs. However, despite these advances, existing methods often struggle to effectively extract shallow features, model complex contextual dependencies, and preserve fine anatomical details. To address these limitations, we propose a Hybrid Attention and Channel Retention Network (HACR-Net) for MRI image SR. HACR-Net incorporates a Hybrid Attention Module (HAM) to mitigate information loss during shallow feature extraction by jointly leveraging channel and spatial attention, enhancing informative features, and preserving spatially significant regions. A Multiscale Feature Aggregation Block (MFAB) is incorporated to capture global structural details, local texture, and high-frequency details. Complementing MFAB, the Channel Retention Attention Block (CRAB) enhances the recovery of fine contextual detail through a bottleneck design crafted to maintain a wider channel width and reduce information loss during feature compression. Extensive experiments on two benchmark datasets, IXI and BraTS2018, demonstrate that HACR-Net achieves high-performance reconstruction with only 1.67M parameters and 81.3G FLOPs, offering significant reductions in model size and computational cost compared to existing methods.
Citation: Muhammad A, Hajian A, Achakulvisut T, Aramvith S (2026) HACR-Net: An Efficient hybrid attention network for MRI image super-resolution. PLoS One 21(4): e0345637. https://doi.org/10.1371/journal.pone.0345637
Editor: Pradeep Kumar Gupta, Jaypee University of Information Technology, INDIA
Received: September 12, 2025; Accepted: March 8, 2026; Published: April 8, 2026
Copyright: © 2026 Muhammad et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data used in this study are publicly available. The IXI dataset is available from the IXI Dataset repository (https://brain-development.org/ixi-dataset/). The BraTS 2018 dataset is available from the BraTS Challenge data portal (https://www.med.upenn.edu/cbica/brats2018/data.html). The source code for the proposed HACR-Net, evaluation scripts, preprocessing scripts, and the subject ID lists used for the train/validation/test splits can be access via the GitHub repository at: https://github.com/cuee-mdap/hcar-net.
Funding: This research is funded by the Graduate Scholarship Programme for ASEAN or Non-ASEAN Countries and the Thailand Science Research and Innovation Fund, Chulalongkorn University (IND_FF_69_105_2100_018).
Competing interests: The authors have declared that no competing interests exist.
Introduction
Medical imaging plays an important role in modern healthcare, facilitating early disease detection, precise treatment planning, and ongoing monitoring of various conditions. Commonly used modalities such as ultrasound, Positron Emission Tomography (PET), Computed Tomography (CT), and Optical Coherence Tomography (OCT) provide important diagnostic information [1–3]. However, each modality involves inherent trade-offs between spatial resolution, penetration depth, patient safety, and radiation exposure [4–6]. High-resolution (HR) Magnetic Resonance Imaging (MRI) offers a non-invasive and radiation-free alternative, making it valuable for clinical evaluation and pathological analysis [1,2,7]. However, acquiring HR MRI images remains challenging due to hardware limitations, prolonged acquisition times, the need for a sufficient signal-to-noise ratio (SNR), and patient discomfort during scanning [8–10]. These persistent limitations underscore the urgent need for strategies that can improve MRI image quality without further burdening patients or hardware systems. While this improvement could be pursued through costly equipment upgrades or extended scanning durations, both options present practical and economic constraints. A cost-effective alternative is the use of SR techniques, which reconstruct HR images from LR inputs [11,12]. SR offers a clinically viable solution and has become a central focus in medical image enhancement research [13].
Image SR is an ill-posed inverse problem, which does not have a single definite solution to generate accurate and high perceptual fidelity super-resolved images [14,15]. Recently, there has been substantial growth in deep learning-based models, driven by the powerful representational capabilities of convolutional neural networks (CNNs) and their efficient implementation for both forward and backward computations [16,17]. Dong et al. [18] were the first to introduce CNNs for the image SR task through their Super-Resolution Convolutional Neural Network (SRCNN), which consists of only three convolutional layers. Kim et al. increased the depth of the network using residual learning in VDSR [19]. However, SRCNN is constrained by its shallow depth, leading to poor reconstruction of fine textures and high-frequency details. In contrast VDSR [19] experiences training instability and model degradation with increasing network depth. Advanced strategies, such as residual learning and dense connections, are employed to address this limitation.
Zhang et al. [20] improved SR performance by proposing a Dense Residual Network (RDN), which effectively utilizes the hierarchical characteristics of a deeper network architecture. Similarly, Li et al. [21] introduced a multiscale residual network, designed to leverage features across multiple scales, thus mitigating the loss of image details. These approaches emphasize the importance of leveraging hierarchical and multiscale features for enhanced image reconstruction quality. Feng et al. [22] implemented a multistage aggregation network for SR reconstruction for multi-contrast MRI images. Still, this strategy might require extensive datasets and perform poorly with single-contrast images. Meanwhile, Weng et al. [23] developed a high-frequency-focused network designed to selectively enhance high-frequency details, which could potentially overlook low-frequency information that is essential for maintaining overall image coherence.
CNN-based SR methods rely on standard convolutional operations that treat all spatial locations and feature channels uniformly, thereby overlooking the varying diagnostic importance of different anatomical regions. Because convolutions focus primarily on local neighborhoods, these models struggle to capture long-range contextual dependencies which are essential for accurate medical image reconstruction. As a result, their early feature representations often lack sufficient emphasis on critical structures such as tissue boundaries, interfaces, and fine-grained pathological regions, ultimately limiting reconstruction quality [24,25]. As a result, they prioritize local feature extraction while neglecting the global spatial correlations necessary for reconstructing clinically relevant details. This shortcoming, is particularly evident in regions with complex tissue morphology. In addition, most of these methods treat all spatial pixels equally, an issue that conflicts with the nature of MRI data, where complex anatomical structures, diverse textures, and extensive background regions require adaptive spatial attention.
The spatial position of tissue textures is highly correlated with their complexities, so treating all pixels equally narrows the network’s ability to identify the most important regions for accurate reconstruction [26]. Hongbi et al. [27] introduced MFER to restore degraded high-resolution details through multi-level feature extraction and reconstruction modules. It map features from each level to the high-resolution space using deconvolution layers. Although MFER performs well in capturing global information, its local information recovery capabilities remain limited. Transformer models, with their self-attention mechanisms, capture long-range dependencies that are often overlooked by CNNs; however, they can be less effective in preserving the fine-grained textural details essential for accurate reconstruction. Their ability to retain fine-grained local features depends heavily on architectural design, and some variants may not achieve an optimal balance between global context and local detail [28,29].
While SR techniques have shown promising quantitative gains in MRI, their adoption in routine clinical practice remains limited [30]. Many deep learning–based SR methods prioritize pixel-wise accuracy or perceptual quality, often at the expense of preserving fine anatomical details and tissue contrasts that are essential for clinical interpretation [31]. Additionally, the high computational complexity and multi-stage inference pipelines of many state-of-the-art methods hinder real-time processing and seamless integration into clinical workflows [32]. These limitations highlight the need for SR approaches that balance reconstruction accuracy, structural fidelity, and computational efficiency for clinical use.
Existing traditional methods [18–20] employ simple, shallow feature extraction techniques, which are insufficient to capture the complex hierarchical relationships inherent in LR brain MRI images. Moreover, the lack of effective multi-feature aggregation constrains the network’s ability to preserve global anatomical coherence, such as the organization of ventricles and major white matter tracts, as well as fine-grained details. As a result, reconstructed images often face an inherent trade-off: sacrificing global structural integrity for local detail preservation, or losing fine textures in favor of broader contextual accuracy. Additionally, traditional SR architectures [33,34] also experience progressive information degradation in their processing pipelines, where fine-grained features are lost during successive feature transformation stages. The conventional approach of using narrow channel widths and aggressive dimensionality reduction results in the loss of minute but clinically significant details such as small vessel structures, cortical folding patterns, and early-stage pathological changes [27,35,36]. This information loss is compounded by the lack of effective feature retention mechanisms that can preserve and recover fine contextual information throughout the network’s forward pass.
To address these limitations, we propose Hybrid Attention and Channel Retention Network (HACR-Net) for MRI image SR. As illustrated in Fig 1, our framework integrates a Hybrid Attention Module that combines channel and spatial attention, enabling comprehensive and hierarchical extraction of shallow features. A robust multi-feature aggregation is achieved via parallel convolutions with varying kernel sizes, followed by adaptive channel refinement to model inter-channel dependencies and generate a globally consistent representation. In addition, the channel retention attention block mitigates progressive information degradation by maintaining a wider channel width and employing a bottleneck design to prevent information loss after feature reduction.
The main contributions of this research work are summarized as follows.
- We propose a Hybrid Attention Module (HAM) that integrates channel and spatial attention to emphasize the most informative feature maps and spatially significant regions. This integration enables a comprehensive and hierarchical extraction of shallow features.
- We design a Multi-Feature Aggregation Block (MFAB) to extract and integrate features across diverse receptive fields, enabling the simultaneous capture of fine-grained details and high-frequency information, thereby enhancing reconstruction fidelity across multiple scales in MRI data.
- We propose a Channel-Retention Attention Block (CRAB) to enhance the recovery of fine contextual details. It employs a bottleneck design that maintains a broader channel width, thereby mitigating information loss during feature compression.
- Extensive evaluations on the IXI and BraTS 2018 datasets demonstrate that our proposed HACR-Net outperforms the state-of-the-art methods in both quantitative and qualitative evaluations.
The remainder of this paper is organized as follows. The Related Work section reviews existing studies relevant to this research. The Methodology section describes the proposed method and its architecture. The Experiments section provides a comprehensive evaluation of the proposed approach, including datasets, implementation details, evaluation metrics, results and discussion, complexity analysis, and ablation studies. Finally, the Conclusion summarizes the main findings, and Future Work outlines potential directions for further research.
Related work
Over the past decade, the field of computer vision has experienced substantial growth, with a diverse range of techniques being developed. Notably, the debut of the SRCNN [18] framework catalyzed numerous transformative advances in SR research. Subsequent research has focused on enhancing the architecture and training approaches of CNNs to improve their performance. For example, Kim et al. [19] introduced the VDSR model, employing deeper CNN layers and residual learning to enhance SR. Zhang et al. [20] developed a Residual Dense Network (RDN) that uses residual dense blocks to capture local features for high SR image reconstruction. However, the RDN overlooks fine details between channels and pixels.
Tai et al. later proposed the DRCN [37] and DRRN [38], which use recursive learning to achieve greater depth and efficiency with fewer parameters. However, their reliance on recursive structures can lead to challenges in training stability and require careful parameter initialization to avoid vanishing gradients. To address this, Li et al. [21] proposed a Multiscale Residual Network (MSRN) to extract fine features. At the same time, AWSRN [39] applies adaptive weighted learning to balance performance and computational demands for lightweight image SR efficiently. However, they rely on many scale rate calculations and focus on feature-level attention. Dai et al. [40] replaced global average pooling with second-order statistics to capture channel relationships but overlooked texture reconstruction. To address this, Tian et al. [41] introduced a coarse-to-fine convolutional neural network (CNN) that aggregates complementary information to stabilize the training process.
Fang et al. [42] developed a soft-edge-assisted network that enhances edge textures, thereby improving reconstruction of fine detail. Sun et al. [43] proposed a weighted multiscale residual network to strengthen textural detail recovery and preserve high-frequency information. However, these models exhibit insufficiently diverse feature-context modeling, which constrains their reconstruction abilities. Sun et al. [44] employed a large depth-wise convolution to enhance fine-grained details. However, these approaches struggle with effective learning and fail to address the relationships between feature-level and channel-level attention characteristics.
Attention mechanisms [45] are pivotal in improving feature representation in Image SR models by enhancing focus on key image features. Meanwhile, recent advancements in neural networks underscore the importance of capturing spatial correlations, suggesting that integrating suitable learning strategies can further enhance feature extraction capabilities [46]. Channel attention assigns importance to different channels through weighting, effectively guiding the model’s focus. Building on this foundation, Anwar et al. [33] introduced multiscale Laplacian pyramid attention, Dai et al. [40] introduced second-order channel attention, and Liu et al. [47] introduced spatial attention mechanisms.
Hu et al. [35] pioneered the Squeeze-and-Excitation Network (SENet), which introduced an efficient channel attention mechanism by compressing 2D spatial features into channel weights to explicitly capture interdependencies. While this design significantly improved performance with minimal computational overhead, it relied on aggressive dimensionality reduction that risks discarding fine-grained and clinically important structural features. Building on this foundation, Zhang et al. [36] proposed the Residual Channel Attention Network (RCAN), highlighting the efficacy of channel attention mechanisms in improving feature representation. RCAN was the first to incorporate channel attention mechanisms into residual SR networks, enabling adaptive processing of low- and high-frequency information through the learning of channel-wise interdependence in feature maps. However, its reliance on dimensionality reduction similarly limited the retention of subtle anatomical details. To alleviate this, Wang et al. [48] introduced ECA-Net (Efficient Channel Attention), which removed explicit dimensionality reduction and used a lightweight 1D convolution to capture local cross-channel dependencies while preserving channel information and reducing computational cost. However, the localized design of ECA-Net restricts its ability to capture complex and long-range dependencies that are critical to preserve delicate structural patterns such as early pathological changes.
More recently, Zhang et al. [49] proposed the Squeeze-and-Excitation Reasoning Attention Network (SERAN) that combines channel recalibration with contextual reasoning to enhance the recovery of structural details and improve the accuracy of MRI images. Although SERAN advanced both accuracy and visual quality, it remained limited in addressing high-frequency information loss and complex anatomical structures.
SwinIR [50] Transformer-based model, has shown promise in MRI image SR, due to its ability to handle complex anatomical structures and preserve diagnostic features. However, despite its strength in capturing long-range contextual dependencies, the model can struggle to accurately reconstruct fine local details and high-frequency information, which are essential for preserving minute structural boundaries. Restormer, proposed by Zamir et al. [51], is a Transformer-based architecture employing multi-Dconv head transposed attention (MDTA) and gated-Dconv feed-forward networks (GDFN) to capture long-range dependencies. However, its high computational and memory costs, as well as its limited fine-grained structural recovery in highly textured anatomical regions, pose persistent challenges for MRI SR.
MHAN [52] employs a multi-stage spatial–channel attention cascade to suppress aliasing artifacts and recover high-frequency structural details. However, its limited ability to model diverse features and context significantly reduces its overall performance. In 2024, Hua et al. [53] introduced TDAFD, which incorporates Multi-Scale Feature Distillation (MSFD) blocks coupled with dual-attention mechanisms to enable efficient multi-layer feature extraction. However, its low performance stems from the compounded effect of insufficient feature retention mechanisms, which limit the preservation and recovery of fine contextual information throughout the network’s forward pass.
Hongbi et al. [27] introduced MFER, a model designed to restore degraded high-resolution details through multi-level feature extraction and reconstruction modules, mapping features from each level to the high-resolution space via deconvolution layers. However, its multi-level feature extraction scheme can lead to feature redundancy and ineffective cross-scale feature fusion, and lacks adequate emphasis on informative features during the initial feature extraction stage. Further optimization of the feature extraction module could improve its ability to preserve fine structural details, improve edge sharpness, and minimize artifacts in the reconstructed images.
In 2025, He et al. [54] proposed a dual-channel enhancement model for MRI image SR, using complementary information from different image characteristics. One channel focuses on low-frequency information, capturing overall structure and context, while the other focuses on high-frequency details to enhance edges and textures. However, this separation can lead to information bottlenecks where either global structural coherence is sacrificed for local detail preservation, or fine-grained textures are lost in favor of broader contextual understanding.
The integration of CNNs with specialized attention mechanisms has become a dominant paradigm in biomedical engineering beyond image reconstruction. Prior studies [55,56] showed that hybrid attention–based CNNs significantly improve brain tumor classification by effectively separating pathological features from background noise. In temporal biomedical signal analysis, multi-feature fusion CNNs have been used for epileptic seizure prediction [57,58], highlighting the importance of multi-receptive-field structures. Hybrid architectures have also demonstrated strong robustness in physiological signal tasks, such as ECG authentication (99.7% accuracy) [59] and EEG-based mental-state classification [60].
Algorithm 1 HACR-Net Architecture
1: function HACR-Net x ▷ Input ; nonlinearity
;
2:
3:
4:
5:
6: ;
7: for i = 1–10 do
8: ▷ Store CAAM input for residual
9:
10: ▷ MFAB Block
11:
12:
13:
14:
15:
16: ▷ CRAB Block
17:
18:
19:
20:
21:
22:
23:
24:
25: ▷ Feature Fusion and Residual Connection
26: ▷ Concatenate MFAB and CRAB outputs
27:
28: ▷ Add CAAM input residual
29: end for
30:
31:
32: return ▷ Output
33: end function
Methodology
This section provides a comprehensive overview of the proposed HACR-Net architecture. It begins with an outline of the overall network structure, followed by descriptions of its core components HAM, MFAB, and CRAB, and the training objectives.
Overview of HACR-Net
Fig 1 illustrates the overall architecture of our proposed HACR-Net, which consists of three subnetworks: the shallow feature extraction subnetwork, the deep feature extraction subnetwork, and the reconstruction subnetwork.
Let denote the input LR MRI image, where C is the number of channels, H and W are the height and width of the image. The shallow features
are extracted using HAM, expressed as:
where denotes the operation of shallow feature extraction, and F1 represents the extracted shallow features. After obtaining shallow feature representation, F1, we further develop CAAM to extract the deep features denoted as
. This module is designed to capture richer textures, sharper edges, and semantically coherent features. It integrates a residual block, the MFAB, and CRAB, which collaboratively extract and aggregate features from multiple layers and receptive fields to enhance the overall feature representation, as expressed in the following equation:
where Fd refers to the high-level MRI feature descriptors extracted by the deep feature subnetwork. The features learned by CAAMs are denoted by , where k is the total number of CAAM modules. The symbol ’ + ’ denotes the summation operation employed to fuse low-level shallow features with high-level features, thereby contributing to more stable and effective training. In the reconstruction subnetwork, the low-level spatial features F1, are combined with the high-level semantic features Fd, to produce the final super-resolved MRI image. A convolutional layer first processes these fused features, and the resulting feature maps are then upsampled using a PixelShuffle operation, formulated as:
where is the reconstructed SR MRI image, HUP(.) denotes PixelShuffle layer operation.
Hybrid attention module (HAM)
To obtain comprehensive and hierarchical shallow features, we designed HAM using channel and spatial attention mechanisms. The channel attention enhances feature maps by adaptively assigning weights to each channel based on importance, while spatial attention directs the network’s focus to spatially significant regions. This design is motivated by the nature of brain MRI images, where anatomical structures and tissue boundaries often span larger areas than in natural images. The larger kernel (7×7) used in the spatial attention provides a wider receptive field, enabling the model to capture broader spatial relationships. By positioning HAM at the shallow feature extraction stage, the module preconditions the input representation, ensuring that salient structural boundaries, tissue interfaces, and subtle pathological regions are emphasized before deeper processing. This early refinement reduces the propagation of irrelevant background features, thereby enhancing the efficiency and accuracy of subsequent CAAMs.
Let denote the LR input image, processed through a
convolution layer to extract feature F, which is subsequently fed into a channel attention block to obtain channel attention map, Fc is computed as:
where, denotes global average pooling,
and
represent the first and second
convolutional layers respectively,
is the PReLU activation function, and
is the sigmoid activation function used to normalize the attention weights. The GAP layer extracts channel-wise descriptors by averaging spatial information, capturing the general characteristics of each channel.
layer then reduces the channels to C/r, followed by PReLU for non-linearity.
restores the channels to C, and a sigmoid activation normalizes the attention weights to obtain Fc. The refined feature map is obtained by multiplying Fc and F as shown in equation (6). These help the model focus on the most informative feature channels. The spatial attention block models spatial dependencies in feature maps, highlighting informative regions to improve fine detail. The spatial attention map, Ms, is given by Equation (7) as:
The refined feature map F1 is obtained by multiplying Ms by F′, integrating spatial and channel-wise attention. This enables a comprehensive and hierarchical extraction of shallow features.
Context-aware aggregation module (CAAM)
CAAM integrates MFAB and CRAB with a residual block for deep feature reuse and efficient gradient flow. Together, these components capture and aggregate features at multiple scales while preserving essential structural and semantic information. MFAB processes inputs through parallel convolutional branches with varying kernel sizes to capture features across different receptive fields, combining fine-grained local details with broader structural patterns. CRAB complements this process by employing channel retention and attention strategies that maintain broader channel width, thereby enhancing feature channels and preserving critical information. The outputs of MFAB and CRAB are adaptively fused, ensuring effective integration of low- and high-level information.
Let F1 denote the CAAM input, with N representing the number of CAAM modules; the output of the CAAM can be expressed as:
where denotes the function of feature extraction through the MFAB,
denotes the function of feature extraction through the CRAB, and
denotes feature fusion. FN represents the feature obtained by the operation of the Nth CAAM.
Multiscale feature aggregation block (MFAB)
The Multiscale Feature Aggregation Block (MFAB), illustrated in Fig 2 is designed to extract and aggregate features using different convolution layers with varying receptive fields. This design enables the network to capture and integrate essential features across multiple spatial scales. To achieve this, MFAB employs convolution with kernels of different sizes to extract multiscale features, which are subsequently combined by element-wise summation. An adaptive channel refinement step then dynamically re-weights the aggregated features according to their relative importance. This refinement is particularly advantageous for MRI data, where tissue structures such as white matter and gray matter exhibit varying levels of textural complexity. In contrast to conventional methods such as MSRN [21] and RDN [20], which rely on simple concatenation or summation, the proposed adaptive refinement prioritizes the most salient features for reconstructing fine anatomical details, making the aggregation process more effective and context-aware.
Initially, two sequential convolution layers with 3×3 kernels and PReLU activation are applied to capture the fundamental spatial features of the input feature map.
Afterward, the process continues with two additional convolutional layers: one with a 3×3 kernel and another with a 1×1 kernel, both of which have PReLU activation, further facilitating the extraction of multiscale information.
The outputs from these pathways are fused through element-wise summation, ensuring effective feature integration across different receptive fields. To further refine feature representations, we incorporate an adaptive channel refinement to enable the network to emphasize the most relevant information. Furthermore, a 1×1 point-wise convolution layer with PReLU activation is employed to capture fine-grained details and enhance high-frequency information.
By integrating convolution layers with varying receptive fields and an adaptive channel refinement, MFAB enables effective multiscale feature learning.
Channel retention attention block (CRAB)
As illustrated in Fig 3, the Channel-Retention Attention Block (CRAB) is designed to mitigate the loss of fine-grained anatomical information that often occurs during channel compression in deep networks. Conventional attention modules exacerbate this issue by applying aggressive compression that discards subtle features. To address this, CRAB employs a channel retention strategy that preserves contextual details through the intermediate channel width.
Let the input feature map be denoted as , where C, H, and W represent the number of channels, height, and width, respectively. CRAB begins by applying a
convolution to extract local spatial features. These features are then globally aggregated using GAP to form a compact channel descriptor that captures the global context of each feature channel. To model inter-channel dependencies while preserving representational richness, the descriptor is passed through a reduction–expansion bottleneck structure. Unlike classical SE blocks [35,61], CRAB avoids overly narrow intermediate projections. The
and
layers maintain a relatively wide channel width, reducing the risk of suppressing subtle but informative anatomical features. Attention weights are normalized using a sigmoid function, enabling independent reweighting of each channel, computed in Eq. (15) as:
where, extracts spatial features,
reduces dimensionality,
and
act as retained-width bottlenecks, and
restores the original channel dimension.
denotes the PReLU activation, and
represents sigmoid activation. The residual addition with FA ensures that original semantic information is preserved while the attention mechanism refines salient features.
Training objectives
HACR-Net aims to establish an end-to-end mapping , between the LR image and the HR image. This is achieved by optimizing the model’s parameters to minimize the loss between the reconstructed image and the ground truth. Using a training dataset
, where N represents the total number of images in the training set, the most commonly used loss functions in SR are the mean square error (MSE), L1 and the mean absolute error (MAE), L2. Although these approaches can enhance PSNR, they often result in overly smoothed high-frequency details, adversely affecting the visual quality of the reconstructed image. Advanced loss functions have been introduced in natural image SR to enhance performance. However, when applied to medical images, methods such as perceptual loss [62], Charbonnier loss [63], and adversarial loss [64] can cause distortions in texture and structure, potentially compromising diagnostic accuracy and further analysis. To ensure a fair comparison with prior methods, the L1 loss function was adopted to guide model optimization.
where represents the learnable parameters of our network,
is the ground truth corresponding to
, and
refers to the overall function of our proposed HACR-Net network. In the experiment, training with degraded MRI samples and using L1 results in faster and more stable convergence during the training process.
Experiment
This section presents a comprehensive evaluation of the proposed method. The datasets used in the experiments were first introduced, followed by a detailed description of the implementation procedures and the evaluation metrics employed. The performance of our proposed method is compared with several state-of-the-art approaches to highlight its effectiveness and practical relevance. Finally, an ablation study is presented that investigates the contribution of each component in the network.
Dataset
To ensure the robustness and generalizability of the proposed approach, two benchmark datasets were utilized. Each dataset is partitioned into 70% for training, 10% for validation, and 20% for testing.
- IXI Dataset: Available at IXI Dataset, comprises 578 PD volumes, 581 T1 volumes, and 578 T2 volumes. Each volume has a dimension of 256×256×96 (height, width, depth), where 96 is the number of slices in each MRI volume.
- BraTS2018: The Brain Tumor Segmentation 2018 Challenge dataset [65,66] consists of 285 cases, 210 cases of high-grade glioma (HGG) and 75 cases of low-grade glioma (LGG). Each MRI sequence has dimensions of 240 × 240 pixels with a total of 155 slices per volume.
The IXI dataset includes scans acquired across three hospitals using scanners from different manufacturers at both 1.5T and 3T field strengths, capturing variability in acquisition conditions. BraTS2018 comprises multi-institutional data with heterogeneous scanner protocols, and importantly, includes pathological cases (brain tumors), reflecting real-world clinical scenarios. Together, these datasets cover healthy and diseased tissue, multiple field strengths, diverse acquisition protocols, and multi-institutional variability, supporting the representativeness of our evaluation for real-world MRI scans.
The LR images are obtained by applying 3×3 Gaussian filter with a standard deviation of 1 to the HR images, followed by bicubic downsampling with scale factors of ×2 and ×4. This degradation procedure aligns with the approach adopted in [67,68], which employed a similar strategy to generate LR MRI brain images in the spatial domain. Before training all intensity values in the datasets are normalized to a range of [0–1]. To ensure rigorous and fair comparison, we adhered to a standardized evaluation protocol. Fixed train/validation/test splits were consistently applied across all methods. The quantitative and qualitative results presented were obtained using the same independent test set and identical evaluation scripts as those used from their published models. This uniform setup ensures that performance differences reflect the intrinsic capabilities of each method under similar testing conditions. The subject and volume IDs for the splits are provided in our repository.
Implementation details and evaluation metrics
The proposed HACR-Net was implemented in the PyTorch 2.0 framework and trained on an NVIDIA RTX 3090 Ti GPU. The Adam optimizer [69] was employed with parameters ,
, and
, using an initial learning rate of
. The HACR-Net architecture comprises 10 CAAMs, each containing one CRAB and one MFAB. Training was conducted for 250 epochs with a batch size of 32.
The model was quantitatively evaluated using two widely adopted image quality metrics: peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM). PSNR measures the ratio, on a logarithmic scale, between the maximum possible pixel value and the mean squared error (MSE) of the reconstructed image with respect to the ground-truth, as expressed in equation (17). Higher PSNR values indicate lower reconstruction error and greater fidelity to the reference image. SSIM complements PSNR by evaluating perceptual image quality through the joint assessment of luminance, contrast, and structural information [70]. Unlike pixel-wise error metrics, SSIM captures spatial dependencies between pixels, yielding a measure that is more consistent with human visual perception. Both metrics were computed across all test samples, and the results are reported as mean values to ensure a fair comparison.
PSNR and SSIM often reflect only statistical similarity between the reconstructed image and the ground truth. To address this limitation, we additionally employed the LPIPS metric, which provides a perceptual similarity assessment aligned with human visual judgment [71]. LPIPS computes deep feature embeddings from networks pre‑trained on large‑scale image datasets, capturing high‑level perceptual attributes that better reflect the visibility of anatomical structures and clinically relevant details [72]. This makes the combined metrics more representative of real‑world diagnostic relevance.
where n represents the image’s bit depth (e.g., 8 for an 8-bit image), corresponds to the square of the maximum possible pixel value. MSE (Mean Squared Error) measures the reconstruction error, with smaller values indicating a closer match between the reconstructed image and the ground truth. Higher PSNR values indicate better image reconstruction quality.
where and
are the local means,
and
are the standard deviations,
is the cross-covariance for images x and y, respectively, and C1 and C2 are constant terms.
where Hl and Wl denote the height and width of the input feature at layer l in the pretrained network. The term wl corresponds to the learned weight assigned to each channel in layer l. and
represent the normalized feature vectors at spatial locations (h, w) for images x and y, respectively. The operator ⊙ indicates element-wise multiplication between the feature vectors and their associated weights.
Results and discussion
This subsection presents a comprehensive evaluation of the proposed method through quantitative and qualitative analyses. The quantitative assessment uses PSNR, SSIM, and LPIPS to compare the performance of the proposed method with existing approaches. The qualitative analysis involves visual inspection, focusing on perceptual quality and structural fidelity.
Quantitative comparison
In this subsection, we provide a comprehensive quantitative evaluation of the proposed HACR-Net and compare its performance with that of state-of-the-art SR methods. The compared methods include SRCNN [18], VDSR [19], EDSR [73], RDN [20], RCAN [36], CSN [14], SERAN [49], SwinIR [50], Restormer [51], MFER [27], and SenseSR [54]. Table 1 presents the quantitative results of the compared methods on the BraTS 2018 dataset, while Table 2 presents the results on the IXI dataset. The proposed HACR-Net consistently outperforms state-of-the-art methods, achieving superior SR performance across the two datasets at scaling factors of ×2 and ×4.
On the BraTS 2018 dataset, HACR-Net demonstrates strong robustness, providing high-fidelity reconstructions, and maintaining competitive accuracy on the ×4 scale. Figs 4–6 present the PSNR, SSIM and LPIPS comparisons between the proposed method and various state-of-the-art approaches on the IXI dataset. At the ×4 scaling factor, HACR-Net achieves PSNR values of 33.88 dB, 32.71 dB, and 32.89 dB for PD, T1, and T2, respectively, surpassing the second-best method, SenseSR, which records 33.80 dB, 32.34 dB, and 32.75 dB on the same modalities. This improvement highlights HACR-Net’s robustness in reconstructing fine anatomical details and complex textures under challenging upscaling conditions. For the BraTS 2018 dataset, HACR-Net achieves a PSNR of 43.51 dB at ×2 upscaling, surpassing the second-best method, SenseSR, which attains 42.78 dB. Similar improvements are evident at ×4 upscaling, where HACR-Net attains 34.80 dB, outperforming the second-best method’s 33.91 dB. The best results are in bold, and the second-best are in italics.
HACR-Net achieves a 0.73 dB improvement in PSNR over the second-best method, SenseSR, on the BraTS 2018 dataset at ×2 (Table 1). This represents a meaningful reduction in reconstruction error, not a marginal gain. The LPIPS score of 0.0822, compared to SenseSR’s 0.0903, further demonstrates that HACR-Net produces reconstructions that are perceptually closer to the ground truth. This is especially important in clinical applications, where diagnostic reliability depends on both pixel-level accuracy and perceptual fidelity.
As shown in Table 2, for a ×2 scaling factor, HACR-Net achieves PSNR/SSIM scores of 42.10 dB / 0.9907 for the PD modality, 39.34 dB / 0.9867 for the T1 modality, and 41.15 dB / 0.9891 for the T2 modality on the IXI dataset with a low LPIPS score. Compared to Restormer [51], MFER [27], and SenseSR [54], HACR-Net shows consistent improvements in PSNR and SSIM in all the magnification factors evaluated. This performance is attributed to the integration of the HAM module in the shallow layers and the CAAM module in the deep feature extraction stage, which enables the preservation of diagnostically relevant information and enhances the reconstruction quality of fine anatomical structures, including cortical boundaries and white matter tracts.
Qualitative comparison
The qualitative results presented in Fig. 7 provide direct visual validation of the quantitative results reported in Tables 1 and 2. The error maps generated by the proposed HACR-Net exhibit smoother distributions compared to those of the competing methods. This smoothness is consistent with the higher PSNR and SSIM scores achieved across all test cases. The reconstruction improvements are clearly visible along complex structural edges and low-contrast boundary regions.
The corresponding error maps highlight differences in reconstruction accuracy.
Fig 7 shows the visual comparison of our method with other state-of-the-art approaches, along with their corresponding error maps. Regions with more texture indicate higher reconstruction errors, whereas smoother areas reflect improved accuracy. The competing methods concentrate errors around high-frequency structural edges and low-contrast boundaries, whereas HACR-Net minimizes reconstruction errors and maintains structural fidelity in these critical regions. This improvement is driven by the roles of MFAB, which aggregates features across multiple receptive fields to capture fine-grained and global structural patterns, and CRAB, which preserves anatomical context while mitigating information loss from aggressive dimensionality compression, unlike SERAN, which visibly shows distortion around fine anatomical regions.
As shown in Fig 8, HACR-Net appears to preserve the fine cortical folding of the frontal lobe (green arrow) more effectively and maintains the connectivity of nearby ridges by addressing suboptimal feature representations at the initial stage using HAM. However, other methods, such as SwinIR and Restormer, exhibit partial discontinuities and smoothing in these regions. Fig 9 shows that HACR-Net maintains a clear separation of adjacent tissue types and preserves structural transitions. At the same time, other approaches, such as Restormer and MFER, tend to merge or blur tissue boundaries. Extending this strength to pathological regions shown in Fig 10, HACR-Net demonstrates improved sharpness at tumor boundaries with sharper delineation while suppressing halo artifacts, as observed in competing methods.
The green box indicates the zoomed-in region, where local textures and fine anatomical structures can be clearly assessed.
The green box highlights the zoomed-in region, enabling evaluation of structural fidelity and reconstruction detail.
The green box highlights the zoomed-in region, where edge sharpness and boundary preservation can be clearly observed.
This performance stems from HACR-Net’s enhanced feature aggregation, which enables the recovery of high-frequency edge information while preserving global anatomical coherence. The visual comparisons further confirm that the observed improvements in reconstructed images are both meaningful and directly attributable to the network’s specialized architecture. Specifically, the HAM module effectively preserves shallow structural features (Fig 8, green arrow), while the CRAB module enhances fine contextual details (Fig 9, clear tissue separation). Moreover, these components jointly contribute to greater robustness against input degradation by emphasizing anatomically relevant regions and stabilizing intermediate feature representations. These qualitative improvements are consistent with the quantitative gains summarized in Tables 1 and 2, collectively validating the method’s performance and resilience under adverse imaging conditions.
Complexity analysis
HACR-Net demonstrates exceptional computational efficiency and scalability. As shown in Table 3, it achieves 1,674k parameters with 81.3G FLOPs and 1.25s inference time on T2-weighted IXI scans in terms of speed and resource usage. Its modular CAAM architecture further enables flexible scaling: reducing the number of CAAMs from 10 to 7 results in minimal PSNR degradation (32.61 vs. 32.89dB) while proportionally reducing parameters, with computational cost scaling linearly with network depth (Table 4). This demonstrates that HACR-Net preserves reconstruction quality under constrained computational budgets and can be flexibly adapted to varying input resolutions. Performance and complexity trade-offs on the BraTS 2018 dataset are further illustrated in Fig 11 and Fig 12.
In comparison with existing methods, HACR-Net exhibits a consistently favorable efficiency–performance trade-off. For instance, although VDSR [19] employs fewer parameters (665k), it incurs a substantially higher computational burden (1,225.2G FLOPs), whereas RCAN [36] significantly increases parameter count (12,460k) to achieve lower FLOPs (24.5G). Similarly, despite having a comparable parameter scale to MFER [27], HACR-Net reduces computational cost and achieves faster inference (1.25s vs. 1.92s), also outperforming CSN [14] and SwinIR [50] in runtime. HACR-Net design is between intermediate and ultralightweight architectures such as SRCNN [18] and large-scale networks such as RDN [20], while maintaining reasonable computational efficiency. More notably, compared with the transformer-based Restormer [51], HACR-Net achieves comparable FLOPs with fewer parameters.
Ablation study
We conduct a series of ablation experiments to evaluate the impact of key components of HACR-Net, including HAM, the number of CAAMs, and its two blocks (MFAB and CRAB).
Ablation of the number of CAAM.
To examine the trade-off between reconstruction quality and computational efficiency, and to assess the impact of CAAM quantity, we conducted an ablation study by varying the number of CAAM modules. As shown in Table 4, performance consistently improves as the number of CAAMs increases, but the gains in PSNR, SSIM, and LPIPS become negligible beyond 10 modules. During this increase, the parameter count rises only slightly from 1.67M at 7 CAAMs to 1.68M at 12 CAAMs, which reflects a clear balance between quality and complexity. Using 10 CAAMs provides the optimal configuration, yielding the highest PSNR and SSIM with a lower LPIPS while maintaining a moderate parameter count. Additional stacking leads to diminishing returns with increased computational overhead, whereas reducing to 7 CAAMs minimally lowers parameters but results in a measurable 0.28 dB drop in PSNR. Overall, the ablation confirms that HACR‑Net is structured to maximize reconstruction fidelity while retaining practical efficiency.
Our architectural choices were guided by systematic ablation studies that explicitly balance reconstruction fidelity and computational efficiency. Convolutional kernel sizes across HACR-Net were selected based on their optimal performance–parameter trade-off, achieving strong reconstruction quality without unnecessary model complexity. The placement of the HAM in the shallow layers prior to deep feature extraction was empirically validated through ablation experiments, yielding improvement over alternative placements. This indicates that early attention-based filtering suppresses irrelevant features before they propagate through subsequent CAAMs. The 7×7 kernel used in the spatial attention branch was chosen to effectively capture broader anatomical context while maintaining computational efficiency. The number of CAAMs (10 modules) and the bottleneck width of the CRAB were determined based on the performance saturation trends observed in Tables 4 and 5, beyond which further architectural complexity resulted in diminishing gains. Collectively, these design decisions enable HACR-Net to achieve state-of-the-art reconstruction performance with only 1.67M parameters and 81.3G FLOPs.
Ablation study on CRAB with and without a bottleneck.
To evaluate the effectiveness of the CRAB module, we performed an ablation study comparing variants with and without the bottleneck design. As shown in Table 5, incorporating the bottleneck consistently improves reconstruction quality across the modalities at ×2 and ×4 scaling factors. For example, in the PD case at ×2, the PSNR increased from 41.79 dB to 42.10 dB and the LPIPS decreased from 0.0963 to 0.0835. Meanwhile, at ×4, the SSIM improved from 0.9544 to 0.9583. Similarly, in the T1 modality, the bottleneck yielded a gain of nearly 0.7dB in PSNR (from 38.66 to 39.34 dB) at ×2 and reduced LPIPS from 0.2221 to 0.2014 at ×4. These results confirm that the bottleneck design, which maintains a wide channel width, effectively mitigates information loss during feature reduction and facilitates the recovery of fine contextual details.
Ablation study on the effectiveness of HAM, MFAB, and CRAB.
We conducted an extensive ablation study to assess the individual and joint contributions of the three core modules, HAM, MFAB, and CRAB, to the overall performance of HACR-Net. The experiments were performed on the IXI (PD) dataset at a ×2 up-sampling factor as summarized in Table 6. The baseline model with all three modules removed achieves a PSNR of 40.50 dB. When the modules are added individually, MFAB and CRAB provide incremental gains (40.62 dB and 40.69 dB, respectively), with CRAB offering the largest uplift (+0.19 dB), validating its effectiveness in channel retention and deep-feature refinement.
Beyond the individual performance gains, a clear synergistic effect is observed among the proposed modules. Pairwise combinations such as HAM+MFAB (41.82 dB) and HAM+CRAB (41.73 dB), show substantial enhancements over single-module variants, indicating complementary roles. HAM facilitates shallow-feature preservation, MFAB enhances multi-scale feature aggregation, and CRAB reinforces deep-feature retention. Integrating all three components yields the best performance, with the full model achieving 42.10 dB/0.9907/0.0835 (PSNR/SSIM/LPIPS). This represents a 0.28–0.37 dB improvement over any two-module configuration, demonstrating that the modules are not redundant but instead work collaboratively to optimize the feature space.
We further examine the trade-off between model complexity and performance across individual modules and their combined configuration, as summarized in Table 6. Although incorporating HAM, MFAB, and CRAB increases the parameter count and computational cost, their combination yields consistent improvements in PSNR, SSIM, and LPIPS. These results demonstrate that the additional complexity is justified by the corresponding quantitative gains.
To ensure statistical rigor, Table 7 reports the mean and standard deviation of PSNR and SSIM for the PD, T1, and T2 modalities, computed over 11,136 test images. The standard deviation was calculated across all individual test slices to reflect subject-level variability. These results offer a more comprehensive view of the model’s consistency and robustness across the full benchmark dataset.
Conclusion
In this work, we present HACR-Net, a model designed to address key challenges in brain MRI SR. The framework effectively preserves structural boundaries, tissue interfaces, and multiscale features while maintaining a compact and computationally efficient architecture. The integration of HAM strengthens shallow feature extraction by prioritizing informative channels and spatially significant regions. At the same time, the CAAM module, comprising the MFAB and CRAB blocks, synergistically enhances reconstruction fidelity by capturing fine-grained details and preserving contextual information with minimal feature loss. Evaluations on the IXI and BraTS 2018 benchmark datasets confirm that HACR-Net achieves consistent improvements over state-of-the-art methods at scaling factors of ×2 and ×4. By combining competitive reconstruction accuracy with substantially reduced computational cost, HACR-Net demonstrates a balanced advancement in both performance and efficiency.
Future work
Future extensions of HACR-Net will focus on improving anatomical fidelity, enhancing efficiency, and advancing clinical applicability. Feature representation across spatial hierarchies will be strengthened by incorporating multi-scale attention-guided fusion strategies, while edge-aware loss functions such as gradient consistency and structural dissimilarity will be explored to improve boundary precision and high-frequency detail reconstruction. To address the computational demands of real-time clinical deployment, we will investigate architectural optimizations including depthwise separable convolutions, group convolutions, structured pruning, and quantization-aware training, with the goal of reducing inference time and model size without compromising image quality. To help bridge the gap between synthetic LR data and real clinical conditions, future work will broaden the evaluation by incorporating additional MRI degradation scenarios, such as k-space undersampling or anisotropic resolution, enabling future studies to extend the evaluation to a wider range of clinically relevant conditions.
Acknowledgments
The authors express their sincere gratitude to the Department of Electrical Engineering, Faculty of Engineering, Chulalongkorn University, for their support.
References
- 1. Hussain S, Mubeen I, Ullah N, Shah SSUD, Khan BA, Zahoor M, et al. Modern Diagnostic Imaging Technique Applications and Risk Factors in the Medical Field: A Review. Biomed Res Int. 2022;2022:5164970. pmid:35707373
- 2. Qiu D, Zhang S, Liu Y, Zhu J, Zheng L. Super-resolution reconstruction of knee magnetic resonance imaging based on deep learning. Comput Methods Programs Biomed. 2020;187:105059. pmid:31582263
- 3. Ren J, Li J, Chen S, Liu Y, Ta D. Unveiling the potential of ultrasound in brain imaging: Innovations, challenges, and prospects. Ultrasonics. 2025;145:107465. pmid:39305556
- 4. Kennedy JA, Israel O, Frenkel A, Bar-Shalom R, Azhari H. Super-resolution in PET imaging. IEEE Trans Med Imaging. 2006;25(2):137–47. pmid:16468448
- 5. Tsapaki V. Radiation dose optimization in diagnostic and interventional radiology: Current issues and future perspectives. Phys Med. 2020;79:16–21. pmid:33035737
- 6. Koutsiaris AG, Batis V, Liakopoulou G, Tachmitzi SV, Detorakis ET, Tsironi EE. Optical Coherence Tomography Angiography (OCTA) of the eye: A review on basic principles, advantages, disadvantages and device specifications. Clin Hemorheol Microcirc. 2023;83(3):247–71. pmid:36502308
- 7. Xu L, Li G, Chen Q. Accurate and lightweight MRI super-resolution via multi-scale bidirectional fusion attention network. PLoS One. 2022;17(12):e0277862. pmid:36520931
- 8. Pfaehler E, Pflugfelder D, Scharr H. Untrained perceptual loss for image denoising of line-like structures in MR images. PLoS One. 2025;20(2):e0318992. pmid:40009630
- 9. Plenge E, Poot DHJ, Bernsen M, Kotek G, Houston G, Wielopolski P, et al. Super-resolution methods in MRI: can they improve the trade-off between resolution, signal-to-noise ratio, and acquisition time?. Magn Reson Med. 2012;68(6):1983–93. pmid:22298247
- 10. Muhammad A, Aramvith S, Duangchaemkarn K, Sun M-T. Brain MRI Image Super-Resolution Reconstruction: A Systematic Review. IEEE Access. 2024;12:156347–62.
- 11.
Fang C, Zhang D, Wang L, Zhang Y, Cheng L, Han J. Cross-Modality High-Frequency Transformer for MR Image Super-Resolution. In: Proceedings of the 30th ACM International Conference on Multimedia, 2022. 1584–92. https://doi.org/10.1145/3503161.3547804
- 12. You S, Lei B, Wang S, Chui CK, Cheung AC, Liu Y, et al. Fine Perceptive GANs for Brain MR Image Super-Resolution in Wavelet Domain. IEEE Trans Neural Netw Learn Syst. 2023;34(11):8802–14. pmid:35254996
- 13. Chen J, Wu F, Wang W. Joint MR image reconstruction and super-resolution via mutual co-attention network. Journal of Computational Design and Engineering. 2023;11(1):288–304.
- 14. Zhao X, Zhang Y, Zhang T, Zou X. Channel Splitting Network for Single MR Image Super-Resolution. IEEE Trans Image Process. 2019;28(11):5649–62. pmid:31217110
- 15. Hajian A, Aramvith S. AERU-Net: Adaptive Edge Recovery and Attention U-Shaped Network for Remote Sensing Image Super-Resolution. IEEE Access. 2025;13:59177–97.
- 16. Luo X, Ai Z, Liang Q, Liu D, Xie Y, Qu Y, et al. AdaFormer: Efficient Transformer with Adaptive Token Sparsification for Image Super-resolution. AAAI. 2024;38(5):4009–16.
- 17. Wang S, Liu J, Wan B, Li W. Hybrid feature fusion neural network integrating transformer for DCE-MRI super resolution. Biomedical Signal Processing and Control. 2023;86:105342.
- 18. Dong C, Loy CC, He K, Tang X. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans Pattern Anal Mach Intell. 2016;38(2):295–307. pmid:26761735
- 19.
Kim J, Lee JK, Lee KM. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 1646–54. https://doi.org/10.1109/cvpr.2016.182
- 20.
Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y. Residual Dense Network for Image Super-Resolution. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 2472–81. https://doi.org/10.1109/cvpr.2018.00262
- 21.
Li J, Fang F, Mei K, Zhang G. Multi-scale Residual Network for Image Super-Resolution. Lecture Notes in Computer Science. Springer International Publishing. 2018. p. 527–42. https://doi.org/10.1007/978-3-030-01237-3_32
- 22.
Feng CM, Fu H, Yuan S, Xu Y. Multi-contrast MRI super-resolution via a multi-stage integration network. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, 2021. 140–9.
- 23.
Weng X, Chen Y, Zheng Z, Gu Y, Zhou J, Zhang Y. A high-frequency focused network for lightweight single image super-resolution. 2023. https://arxiv.org/abs/2303.11701
- 24. Yang Y, Qi Y. Hierarchical accumulation network with grid attention for image super-resolution. Knowledge-Based Systems. 2021;233:107520.
- 25. Zou B, Ji Z, Zhu C, Dai Y, Zhang W, Kui X. Multi-scale deformable transformer for multi-contrast knee MRI super-resolution. Biomedical Signal Processing and Control. 2023;79:104154.
- 26. Wang H, Hu X, Zhao X, Zhang Y. Wide Weighted Attention Multi-Scale Network for Accurate MR Image Super-Resolution. IEEE Trans Circuits Syst Video Technol. 2022;32(3):962–75.
- 27. Li H, Jia Y, Zhu H, Han B, Du J, Liu Y. Multi-level feature extraction and reconstruction for 3D MRI image super-resolution. Comput Biol Med. 2024;171:108151. pmid:38387383
- 28. Puttagunta M, Subban R, Babu C NK. SwinIR Transformer Applied for Medical Image Super-Resolution. Procedia Computer Science. 2022;204:907–13.
- 29.
Chen K, Li L, Liu H, Li Y, Tang C, Chen J. SwinFSR: Stereo Image Super-Resolution using SwinIR and Frequency Domain Knowledge. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2023. 1764–74. https://doi.org/10.1109/cvprw59228.2023.00177
- 30. Ikebe Y, Fujima N, Kameda H, Harada T, Shimizu Y, Kwon J, et al. Ultra-fast whole-brain T2-weighted imaging in 7 seconds using dual-type deep learning reconstruction with single-shot acquisition: clinical feasibility and comparison with conventional methods. Jpn J Radiol. 2026;44(1):35–42. pmid:41003971
- 31. Sherif FM, Elmogy SA, Denewar FA. Utility of deep learning reconstruction to reach the magic triangle in brain MRI. Egypt J Radiol Nucl Med. 2025;56(1).
- 32. Liu X, Huang C, Meng J, Chen Q, Ji W, Wang Q. Super-Resolution Reconstruction Approach for MRI Images Based on Transformer Network. AI. 2025;6(11):291.
- 33. Anwar S, Barnes N. Densely Residual Laplacian Super-Resolution. IEEE Trans Pattern Anal Mach Intell. 2022;44(3):1192–204. pmid:32877331
- 34. Suryanarayana G, Nimmagadda SM, Nageswara Rao S, Y Mahnashi AM, Kondamuri SR, Hussain Ahmadini AA, et al. Enhanced MRI-PET fusion using Laplacian pyramid and empirical mode decomposition for improved oncology imaging. PLoS One. 2025;20(5):e0322443. pmid:40388483
- 35.
Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 7132–41. https://doi.org/10.1109/cvpr.2018.00745
- 36.
Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y. Image Super-Resolution Using Very Deep Residual Channel Attention Networks. Lecture Notes in Computer Science. Springer International Publishing. 2018. p. 294–310. https://doi.org/10.1007/978-3-030-01234-2_18
- 37.
Kim J, Lee JK, Lee KM. Deeply-Recursive Convolutional Network for Image Super-Resolution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 1637–45. https://doi.org/10.1109/cvpr.2016.181
- 38.
Tai Y, Yang J, Liu X. Image Super-Resolution via Deep Recursive Residual Network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 2790–8. http://doi.org/10.1109/cvpr.2017.298
- 39.
Wang C, Li Z, Shi J. Lightweight image super-resolution with adaptive weighted learning network. arXiv preprint arXiv:190402358. 2019.
- 40.
Dai T, Cai J, Zhang Y, Xia S-T, Zhang L. Second-Order Attention Network for Single Image Super-Resolution. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 11057–66. https://doi.org/10.1109/cvpr.2019.01132
- 41. Tian C, Xu Y, Zuo W, Zhang B, Fei L, Lin C-W. Coarse-to-Fine CNN for Image Super-Resolution. IEEE Trans Multimedia. 2021;23:1489–502.
- 42. Fang F, Li J, Zeng T. Soft-edge Assisted Network for Single Image Super-Resolution. IEEE Trans Image Process. 2020. pmid:32092001
- 43. Sun L, Liu Z, Sun X, Liu L, Lan R, Luo X. Lightweight Image Super-Resolution via Weighted Multi-Scale Residual Network. IEEE/CAA J Autom Sinica. 2021;8(7):1271–80.
- 44. Sun L, Pan J, Tang J. Shufflemixer: An efficient convnet for image super-resolution. Advances in Neural Information Processing Systems. 2022;35:17314–26.
- 45.
Woo S, Park J, Lee J-Y, Kweon IS. CBAM: Convolutional Block Attention Module. Lecture Notes in Computer Science. Springer International Publishing. 2018. p. 3–19. https://doi.org/10.1007/978-3-030-01234-2_1
- 46. Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Machine Intell. 1998;20(11):1254–9.
- 47.
-Liu J, Zhang W, Tang Y, Tang J, Wu G. Residual Feature Aggregation Network for Image Super-Resolution. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 2356–65. https://doi.org/10.1109/cvpr42600.2020.00243
- 48.
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2020. p. 11534–42.
- 49.
Zhang Y, Li K, Li K, Fu Y. MR Image Super-Resolution with Squeeze and Excitation Reasoning Attention Network. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 13420–9. https://doi.org/10.1109/cvpr46437.2021.01322
- 50.
Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R. Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF international conference on computer vision; 2021. p. 1833-–44.
- 51.
Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang MH. Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022. 5728–39.
- 52. Wang W, Shen H, Chen J, Xing F. MHAN: Multi-Stage Hybrid Attention Network for MRI reconstruction and super-resolution. Comput Biol Med. 2023;163:107181. pmid:37352637
- 53. Hua X, Du Z, Ma J, Yu H. Multi kernel cross sparse graph attention convolutional neural network for brain magnetic resonance imaging super-resolution. Biomedical Signal Processing and Control. 2024;96:106444.
- 54. He C, Liu H, Shen Y, Zhou D, Wu L, Ma H, et al. Improving the magnetic resonance images super-resolution with a dual-channel enhancement model incorporating complementary information. Engineering Applications of Artificial Intelligence. 2025;148:110359.
- 55. Rasheed Z, Ma Y-K, Ullah I, Al-Khasawneh M, Almutairi SS, Abohashrh M. Integrating Convolutional Neural Networks with Attention Mechanisms for Magnetic Resonance Imaging-Based Classification of Brain Tumors. Bioengineering (Basel). 2024;11(7):701. pmid:39061782
- 56.
Rasheed Z, Ma YK, Bharany S, Shandilya G, Ullah I, Ali F. Classification of MRI brain tumor with hybrid VGG19 and ensemble classifier approach. In: 2024 First International Conference on Innovations in Communications, Electrical and Computer Engineering (ICICEC), 2024. 1–7.
- 57. Ahmad I, Liu Z, Li L, Ullah I, Aboyeji ST, Wang X, et al. Robust Epileptic Seizure Detection Based on Biomedical Signals Using an Advanced Multi-View Deep Feature Learning Approach. IEEE J Biomed Health Inform. 2024;28(10):5742–54. pmid:38696293
- 58. Ahmad I, Zhu M, Liu Z, Shabaz M, Ullah I, Tong MCF, et al. Multi-Feature Fusion-Based Convolutional Neural Networks for EEG Epileptic Seizure Prediction in Consumer Internet of Things. IEEE Trans Consumer Electron. 2024;70(3):5631–43.
- 59. Ahmed MJ, Afridi U, Shah HA, Khan H, Bhatt MW, Alwabli A, et al. CardioGuard: AI-driven ECG authentication hybrid neural network for predictive health monitoring in telehealth systems. SLAS Technol. 2024;29(5):100193. pmid:39307457
- 60. Rahman AU, Ali S, Wason R, Aggarwal S, Abohashrh M, Daradkeh YI, et al. Emotion‐Based Mental State Classification Using EEG for Brain‐Computer Interface Applications. Computational Intelligence. 2025;41(4).
- 61.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 770-–8.
- 62.
Johnson J, Alahi A, Fei-Fei L. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. Lecture Notes in Computer Science. Springer International Publishing. 2016. p. 694–711. https://doi.org/10.1007/978-3-319-46475-6_43
- 63.
Charbonnier P, Blanc-Feraud L, Aubert G, Barlaud M. Two deterministic half-quadratic regularization algorithms for computed imaging. In: Proceedings of 1st International Conference on Image Processing. 168–72. https://doi.org/10.1109/icip.1994.413553
- 64. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S. Generative adversarial nets. Advances in Neural Information Processing Systems. 2014;27.
- 65. Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J, et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans Med Imaging. 2015;34(10):1993–2024. pmid:25494501
- 66. Bakas S, Akbari H, Sotiras A, Bilello M, Rozycki M, Kirby JS, et al. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci Data. 2017;4:170117. pmid:28872634
- 67. Shi F, Cheng J, Wang L, Yap P-T, Shen D. LRTV: MR Image Super-Resolution With Low-Rank and Total Variation Regularizations. IEEE Trans Med Imaging. 2015;34(12):2459–66. pmid:26641727
- 68. Shi J, Li Z, Ying S, Wang C, Liu Q, Zhang Q, et al. MR Image Super-Resolution via Wide Residual Networks With Fixed Skip Connection. IEEE J Biomed Health Inform. 2019;23(3):1129–40. pmid:29993565
- 69.
Kingma DP. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980. 2014.
- 70. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process. 2004;13(4):600–12. pmid:15376593
- 71.
Zhang R, Isola P, Efros AA, Shechtman E, Wang O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. 586–95. https://doi.org/10.1109/cvpr.2018.00068
- 72. Ma C, Yang C-Y, Yang X, Yang M-H. Learning a no-reference quality metric for single-image super-resolution. Computer Vision and Image Understanding. 2017;158:1–16.
- 73.
Lim B, Son S, Kim H, Nah S, Mu Lee K. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017. 136–44.