Figures
Abstract
Pansharpening aims to combine the spatial information from high-resolution panchromatic (PAN) images with the spectral information from low-resolution multispectral (LRMS) images generating high-resolution multispectral (HRMS) images. While Convolutional Neural Networks (CNNs) have shown impressive performance in pansharpening tasks, their tendency to focus more on low-frequency information will lead to suboptimal preservation of high-frequency details, which are crucial for producing HRMS images. Recent studies have highlighted the significance of frequency domain information in pansharpening, but existing methods often consider the network as a whole, overlooking the unique abilities of different layers in capturing high-frequency components. This oversight can result in the loss of fine details and limit the overall performance of pansharpening. To overcome these limitations, we propose FIAN, a novel frequency information-adaptive network designed specifically for spatial-frequency domain pansharpening. FIAN introduces an innovative frequency information-adaptive filter module that can dynamically extract frequency-domain information at various frequencies, enabling the network to better capture and preserve high-frequency details during the pansharpening process. Furthermore, we have developed a frequency feature selection strategy to accurately extract the most relevant frequency-domain information, enhancing the network’s representational power. Lastly, we present a multi-frequency information fusion module that effectively combines the frequency-domain information extracted by the filter at different frequencies with the spatial-domain information. We conducted extensive experiments on multiple benchmark datasets to evaluate the effectiveness of the proposed method. The experimental results demonstrate that our approach achieves competitive performance compared to state-of-the-art pansharpening methods.
Citation: Liu Y, Wang W, Li W (2025) FIAN: A frequency information-adaptive network for spatial-frequency domain pansharpening. PLoS One 20(6): e0324236. https://doi.org/10.1371/journal.pone.0324236
Editor: Hirenkumar Kantilal Mewada, Prince Mohammad Bin Fahd University, SAUDI ARABIA
Received: December 13, 2024; Accepted: April 23, 2025; Published: June 3, 2025
Copyright: © 2025 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are available from the GitHub repository at https://github.com/liangjiandeng/PanCollection.
Funding: This study was funded by the Central Science and Technology Development Project led by Hubei Province (Grant Number: 2023EGA001). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Funding Details: - Authors Supported: WL - Grant Number: 2023EGA001 - Funder Name: Central Science and Technology Development Project led by Hubei Province.
Competing interests: The authors have declared that no competing interests exist.
Introduction
High-resolution multispectral (HRMS) images are vital across a range of sectors, including agriculture, industry, and civil infrastructure. Due to the limitations of hardware equipment, HRMS cannot be directly acquired. Pansharpening overcomes this obstacle by effectively fusing high-resolution panchromatic (PAN) images with low-resolution multispectral(LRMS) images, thus synthesizing HRMS images [8, 76], providing better data support for the subsequent analysis and application of remote sensing images [2, 7, 12, 14, 20, 63].
Masi et al. [26] developed and trained PNN, the pioneering pansharpening network based on convolutional neural networks (CNNs), which employs a three-layer convolution structure. The success of PNN has spurred extensive research into CNN-based pansharpening methods, leading to substantial advancements in the field [3, 9, 29, 31, 33, 34, 37, 40–42, 45, 46].
Further investigation into the characteristics of CNN layers reveals notable frequency-related properties. As shown in Fig 1(a), we perform a frequency analysis on the features extracted by each layer of a representative nine-layer CNN at various stages of training. This study employs the discrete gradient as a measure to assess the frequency composition of layer features; larger gradient values suggest an increased proportion of high-frequency components.
Distribution of discrete gradients of network features at different training stages. (a): The evolution of discrete gradient values for each layer in the network as training iterations (epochs). (b): The changes in discrete gradient values for network layers with skip connections throughout the training iterations (epochs).
Fig 1(a) shows that during the initial phase of training (e.g., epoch=1), the gradients of shallow layer features substantially exceed those of deeper layers, indicating that the CNN primarily captures low-frequency image content. As training progresses (for example, epoch = 350), the gradients of the features of the deep layer gradually increase, eventually exceeding those of the shallower layers. This suggests that CNN begins to capture high-frequency details. By Epoch 1000, the gradients of deep layer features significantly outweigh those of shallow layers, implying that deeper layers predominantly focus on high-frequency image information. This observation highlights the frequency-selective nature of CNNs, exhibiting a transition from low-frequency to high-frequency feature extraction. Consequently, this frequency bias impacts the performance of pansharpening in two ways: firstly, it leads to a slower convergence rate of CNNs in the pansharpening task; secondly, the emphasis on low-frequency information in shallow layers may result in the loss of crucial high-frequency details during feature extraction.
To mitigate these challenges, researchers have explored the incorporation of skip connections, such as in DiCNN [50]. Fig 1(b) depicts the distribution characteristics of the finite difference gradients across CNN layers after introducing skip connections. It is evident that odd and even layers exhibit distinct gradient distribution patterns, collectively forming a “W" shape. This suggests that skip connections modulate the frequency extraction behavior of CNNs to a certain degree. By propagating the output features of the current layer along with those of the previous layer to the subsequent layer, skip connections help alleviate the loss of high-frequency information that arises from the limited high-frequency extraction capability of the current layer. However, closer examination reveals that within odd and even layer groups, the frequency extraction characteristics still align with those observed in Fig 1(a). This implies that while skip connections provide some relief for high-frequency loss, they do not fundamentally resolve the frequency extraction bias inherent to CNNs [22].
In an effort to further enhance pansharpening performance, researchers have begun to investigate methods that synergize frequency and spatial domain information. However, existing approaches often treat CNNs as monolithic entities, disregarding the varying capacities of different network layers in extracting high-frequency information. Naively fusing frequency domain information with spatial domain features may fail to provide the most appropriate frequency components for the current CNN layer, exacerbating the loss of high-frequency details and compromising the effectiveness of dual-domain information fusion.
To tackle these challenges, we introduce FIAN, an innovative frequency information-adaptive network designed specifically for spatial-frequency domain pansharpening. At the heart of FIAN lies the frequency information-adaptive filter module, which intelligently extracts vital frequency-domain information for features at various network layers by leveraging their gradient information. Additionally, we have developed a frequency feature selection strategy that judiciously extracts particular frequency information based on the gradient distribution of features. Lastly, to seamlessly merge the frequency-domain information at different frequencies with the spatial-domain information, we present a multi-frequency information fusion module, which significantly enhances the performance of pansharpening. Comprehensive evaluations performed on remote sensing image datasets highlight the exceptional performance of our proposed approach when contrasted with the most advanced pansharpening methods currently available.
The primary contributions of this paper are as follows:
- We introduce a novel frequency information-adaptive filter module that intelligently extracts essential frequency-domain information for features at different network layers by leveraging their gradient information.
- We develop a judicious frequency feature selection strategy that selectively extracts specific frequency information based on the gradient distribution of features.
- We present a multi-frequency information fusion module to seamlessly integrate the frequency-domain information at different frequencies with the spatial-domain information.
Related work
In recent decades, researchers have extensively explored pansharpening, leading to the development of four primary methods [23]: Component Substitution (CS) [30], Multiresolution Analysis (MRA) [35], Variational Optimization (VO), and Deep Learning (DL). CS methods [15, 16, 19, 67, 68] focus on substituting the spatial component of LRMS images with the detailed spatial information from PAN images, aiming to preserve the original spectral content to the greatest extent within the transform domain. MRA techniques [8, 10, 11, 35] involve decomposing LRMS images into multi-resolution components and enhancing them with the high-frequency, detail-rich aspects of PAN images. VO approaches [53, 55–58] leverage prior knowledge to formulate constraints that guide the model, achieving pansharpening through an optimized algorithmic solution.
Pansharpening methodologies grounded in deep learning have captured widespread attention and marked significant advancements. Convolutional neural networks (CNNs), renowned for their exceptional proficiency in extracting both spectral and spatial information, have ascended to a pivotal role in pansharpening research [5, 6, 29, 31, 33, 34, 37, 39–42, 44–46]. Consequently, scholars have been delving into diverse strategies to augment the efficacy of pansharpening techniques within the spatial domain. However, despite the commendable performance of CNNs in this realm, they still grapple with challenges in capturing high-frequency details [22], often culminating in high-resolution multispectral images (HRMS) with inadequate texture details [22]. To mitigate this limitation, some researchers have begun to pivot towards frequency domain methods [13, 24, 38, 43, 49]. These approaches can more adeptly capture and preserve high-frequency information, thereby complementing the limitations inherent in spatial domain techniques. By extracting and amalgamating information within the frequency domain, researchers aspire to elevate the overall effectiveness of pansharpening, thus achieving superior image detail restoration throughout the pansharpening process.
CNN-based pansharpening methods
Masi et al [26]. pioneered this domain with the CNN-based PNN method, which employs a convolutional network to extract spatial features from PAN and LRMS images, subsequently integrating them into the upsampled LRMS image. However, the relatively simple architecture of PNN, consisting of merely three convolutional layers, results in slower convergence speeds. To address this, the DiCNN [50] method introduces skip connections between various layer feature maps, thereby mitigating gradient explosion and accelerating network convergence. Nevertheless, its simplistic network structure and limited number of parameters constrain its feature extraction capabilities. Zhang et al [44]. proposed the BDPN method, which enhances the accuracy of multi-scale detail information extraction in multispectral images. BDPN, inspired by traditional MRA methods, leverages a pyramid-structured bidirectional network to process both LRMS and PAN images. However, the extensive parameter size of BDPN complicates network training and limits the model’s generalization ability. MSDCNN [39] adopts a different strategy, utilizing two branches to extract information from both deep and shallow image layers. Each branch employs three convolution kernels of varying sizes to capture multi-scale features. By integrating features from different receptive fields, MSDCNN improves feature extraction accuracy. However, the versatility of its convolution kernels introduces increased uncertainty in learning deep and shallow features. FusionNet [51], another notable advancement, estimates a nonlinear injection model for detailed information using a deep network structure. It directly processes the original upsampled multispectral and pan images for differential detail extraction. The extracted details are then refined through multiple residual network blocks for further feature extraction and learning. The final output, when added to the upsampled multispectral image, yields the fusion image. This approach, while innovative, contends with the challenge of managing the complex interactions between different network layers.
Dual-domain pansharpening methods based on spatial and frequency domains
In light of advancements in CNN-based spatial domain pansharpening methods, researchers have explored incorporating frequency domain information to further enhance the quality of fused HRMS images. Man Zhou et al. [24] proposed a pansharpening method that fuses both domains, designing the Spatial-Frequency Information Integration Network (SFIIN), which extracts local spatial features from PAN and LRMS images while integrating frequency domain information to enhance their global context. This dual-domain fusion significantly boosts pansharpening performance. Subsequently, Yuan et al. [13] constructed the Pyramid-based Dual-Domain Network (PYDDN), injecting multi-scale frequency domain spatial details from the PAN image into multi-scale spatial information through a frequency feature pyramid. Hou et al. [38] proposed the BIM method, comprising two branches: band-aware local specificity modeling and Fourier global detail reconstruction. The local specificity modeling uses adaptive convolution kernels to process local discrepancies across spectral bands, while the global detail reconstruction branch leverages Fourier domain global modeling capabilities to recover lost global details. BIM exhibits exceptional local-to-global representation learning ability through the combination of spatial and frequency domains. Furthermore, Zhou et al. [43] introduced the Spatial-Frequency Information Integration Network (SFINet) and its improved version, SFINet++. The core component of SFINet, SFIB, fuses a spatial domain branch for processing local information, a frequency domain branch capturing global information using discrete Fourier transform, and a learning mechanism facilitating cross-domain information interaction. SFINet++ achieves significant performance improvements by incorporating lossless information fusion via reversible neural operators. Zhang et al. [49] proposed the Wavelet Domain Network (WINet), leveraging wavelet domain processing for efficient pansharpening. The Wavelet-Inspired Fusion Block (WFB) in WINet aims to achieve lossless information fusion through inter-subband interaction, while the High-frequency Enhancement Block (HEB) integrates subband information to enhance high-frequency features. By combining wavelet and CNN techniques, WINet effectively synthesizes high-quality HRMS images from spatial and frequency domain information. Lastly, Wang et al. [54] introduced an innovative pansharpening architecture where the Feature Extraction and Enhancement Module (FEM) utilizes fast Fourier convolution and attention mechanisms to form a hybrid of global and local receptive fields in the high-frequency domain. The Implicit Neural Alignment (INA) aligns multi-scale high-frequency features through accurate implicit neural representations, while the Pre-align Module develops an efficient trainable upsampling operator to address the inherent alignment challenge between PAN and MS images.
The significance of integrating frequency domain information with spatial domain features in dual-domain pansharpening methods is increasingly evident. This integration enables the production of high-quality HRMS images. By leveraging the complementary strengths of both spatial and frequency domains, these approaches demonstrate superior performance compared to traditional CNN-based spatial domain methods. However, existing methodologies often consider the CNN network in its entirety when integrating dual-domain information, overlooking the fact that different layers within a CNN network process frequency domain information of varying frequencies. This oversight potentially hampers the quality of dual-domain information fusion. Treating the CNN network as a monolith for frequency domain information fusion may not fully harness the capabilities of each layer, resulting in suboptimal dual-domain information fusion. To improve the effectiveness of dual-domain information fusion, we introduce FIAN, an innovative frequency information-adaptive network designed specifically for spatial-frequency domain pansharpening.
Materials and methods
Fig 2 presents the overall framework of FIAN. The process begins by upsampling a multispectral (MS) image to create a low-resolution multispectral (LRMS) image. This LRMS image, combined with a panchromatic (PAN) image, is fed into a convolutional neural network (CNN). In this framework, we propose a frequency information-adaptive filter module to extract frequency information from different features.
The overall framework of the suggested frequency information-adaptive network for spatial-frequency domain pansharpening(FIAN). FIAFM stands for frequency information-adaptive filter module. Reprinted from https://github.com/liangjiandeng/PanCollection under a CC BY license, with permission from Liangjian Deng, original copyright 2025.
The frequency information-adaptive filter module computes the discrete gradient of the CNN features, which, together with the PAN features, directs the filter extraction. These filters exhibit varying frequencies due to differences in the layer outputs. Subsequently, the frequency feature selection strategy identifies suitable features based on the gradient range. The frequency information-adaptive filter module extracts the frequency information from the selected features. The frequency information at various frequencies and their corresponding gradient calculations are then processed in the multi-frequency information fusion module. To mitigate the gradient vanishing problem commonly observed in convolutional neural networks, we have incorporated the Rectified Linear Unit (ReLU) activation function after every CNN layer in our network architecture.
Frequency information-adaptive filter module
The filter generation process in the frequency information-adaptive filter module comprises two crucial stages: the computation of discrete gradients from the features and the subsequent creation of frequency-discriminating filters corresponding to these gradients.
Discrete gradient.
To calculate the discrete gradients of CNN features, we apply a sequence of mathematical computations. Let be the feature tensor of the i-th selected layer of the CNN, where B, C, H, and W denote the batch size, number of channels, height, and width of the feature maps, respectively. The gradient computation process for
is as follows:
First, we calculate the absolute differences between adjacent pixels in both horizontal and vertical directions to obtain the gradient tensors and
:
Here, has dimensions [B, C, H, (W–1)] and
has dimensions [B, C, (H–1), W].
To ensure the gradient tensors have the same dimensions as the original feature tensor, we apply zero-padding to and
:
After padding, both and
have dimensions [B, C, H, W]. Next, we compute the gradient magnitude using the Euclidean norm:
where has dimensions [B, C, H, W]. Finally, we perform global average pooling on the gradient magnitude tensor
to obtain the mean of the scalar gradient
:
The gradient computation process for the features of the i-th selected layer is summarized as:
where and
represent the computation process in Eq 1, while Px and Py represent the padding process in Eq 2.
The computed value represents the gradient magnitude of the current CNN feature. A higher value of
indicates a higher frequency of information present in the network feature. By examining the variations in
values across different network layers, this approach quantitatively directs the extraction process of the frequency-discriminating filter.
Fig 2 depicts the discrete gradient calculation procedure utilized in the frequency information-adaptive filter components. The gradient computation result for the i-th module is denoted by . The aggregated gradient result from the previous (i – 1)-th module is represented by
, while the aggregated result passed to the next
+
-th module is represented by
. The mathematical formulation of
is provided in Eq 6:
This expression represents the process of gathering discrete gradient information layer by layer during the network’s forward pass. is a vector that aggregates the gradient computation results from the first frequency information-adaptive filter module up to the (i – 1)-th module.
is obtained by appending the gradient computation result
of the current layer to the end of
.
Frequency-discriminating filter.
This study presents an innovative frequency-discriminating filter that adaptively captures frequency information from the current feature based on its gradient .
In particular, the cutoff frequency of the frequency-discriminating filter is defined as k times the gradient
, expressed as
. Furthermore, we introduce an enhancement factor
and a decay factor
. These factors are employed to modify the upper and lower cutoff frequencies of the frequency-discriminating filter, respectively, as elaborated in Eq 7:
Here, and
represent the enhanced upper cutoff frequency and the decayed lower cutoff frequency, respectively.
We develop a frequency-discriminating filter . Eq 8 presents the mathematical expression of the filter
for each batch and channel:
Where D(u, v) represents the Euclidean distance from the point (u, v) to the center of the spectrum, defined by the formula . Aggregating
across all batches and channels results in
.
The two-dimensional Fast Fourier Transform (FFT) is first performed on the PAN image, as shown in Eq 9. Each PAN image can be viewed as a collection of B three-dimensional tensors with dimensions (C, H, W). For the two-dimensional image
in the c-th channel of the b-th batch, a two-dimensional FFT is applied to obtain its spectrum
.
Next, the amplitude spectrum and phase spectrum
are extracted separately, as shown in Eq 10:
where
and
represent the modulus and argument of a complex number, respectively. The amplitude spectrum
represents the intensity of each frequency component, while the phase spectrum
contains the phase information of each frequency component.
After performing the Fourier transform on all batches and channels of the images, we obtain the representation of the PAN image in the frequency domain. The amplitude spectrum tensor and the phase spectrum tensor
are composed of all
and
, respectively.
Finally, the is multiplied element-wise with the amplitude spectrum
to obtain the filtered amplitude spectrum
and the phase spectrum tensor
, as shown in Eq 11:
The following is a concise summary of the frequency-discriminating filter process.
The coefficients k, , and
are empirically determined to be 5.0, 1.5, and 0.8, respectively.
The frequency-discriminating filter introduced in this work adaptively modifies the cutoff frequencies according to the discrete gradients derived from the network layer features. This adaptive approach fully leverages the characteristics of the current network layer features, enabling the effective extraction of frequency domain information from PAN images within the appropriate range. Consequently, this enhances the accuracy and adaptability of high-frequency detail extraction.
Frequency feature selection strategy
As shown in Fig 1, the gradient information contained in the features of each CNN layer varies, reflecting information from the frequency domain at different frequencies. Existing pansharpening methods that combine spatial and frequency domains have not fully considered this characteristic. Instead, they treat the CNN network as a whole for frequency domain information fusion, which makes it difficult to fully exploit the unique advantages of each network layer, thereby affecting the performance of dual-domain information fusion.
To thoroughly analyze this problem, we randomly selected 10 samples from the training data and calculated the gradients of their ground truth (GT) images. The results indicate that the gradient range of the GT images is 45.5436-114.6202, as shown in Fig 3(a). Correspondingly, we performed the same gradient calculation on the features of a nine-layer CNN network. The gradient range of the nine-layer network features is 18.1816-82.3504, and as the network layers deepen, the feature gradients exhibit an upward trend. This suggests that when extracting image features, the CNN network prioritizes low-frequency information, which may lead to the loss of high-frequency details during training.
Schematic illustration of the frequency feature selection approach. (a) The spectrum of discrete gradients for ten randomly chosen GT data. (b) The procedure of identifying features that correspond to the numerical range of the GT discrete gradients and subsequently inputting these selected features into the frequency information-adaptive filter module for additional processing. FIAFM stands for frequency information-adaptive filter module. Reprinted from https://github.com/liangjiandeng/PanCollection under a CC BY license, with permission from Liangjian Deng, original copyright 2025.
To mitigate the issue of high-frequency information loss, we earlier proposed a frequency information-adaptive filter module. This filter extracts information from different frequency ranges for different network layers and fuses it with spatial domain features. However, for certain CNN network layers (such as the first layer), the feature gradients are relatively low (only 18.1816), limiting the extracted frequency information. Moreover, this low-frequency information can already be well-extracted by the CNN network. Accordingly, this paper puts forward a frequency feature selection strategy founded on feature gradient intervals to enhance the network’s capacity for representation.
As depicted in Fig 3(b), the gradient values of the network layer features reveal that the frequency information contained in layers five through nine displays a higher degree of correspondence to the frequency information present in the GT. This indicates that the frequency information contained within the network layer features from the fifth to the ninth layers substantially affects the high-frequency details of the generated HRMS images. Consequently, we utilize this frequency information to refine and enhance the intricate high-frequency components of the HRMS images. This targeted feature selection strategy is designed to better supplement and enhance high-frequency details while preserving low-frequency information, ultimately improving the overall performance of dual-domain information fusion. We refer to this process as the range matching of network feature gradients, which involves aligning the numerical span of gradients derived from the CNN features with those obtained from the ground truth data.
Multi-frequency information fusion module
The multi-frequency information fusion module, depicted in Fig 4, integrates the amplitude spectra and phase spectra
obtained from five distinct frequency-discriminating filters using a weighted combination, as formulated in Eq 13.
The multi-frequency information fusion module amalgamates the results from a quintet of distinct frequency-discriminating filters with the feature output derived by the CNN network. In the schematic diagram, “OUT" represents the output features of the CNN network. FFT stands for Fast Fourier Transform, while IFFT represents the Inverse Fourier Transform.
The normalization of the discrete gradient yields the weight vector
, as detailed in Eq 14:
The weighted fusion yields the amplitude spectrum and phase spectrum
. Simultaneously, we perform a fast Fourier transform (FFT) on output features of the CNN network name OUT to obtain its amplitude spectrum
and phase spectrum
, a process previously described in Eq 9 and Eq 10.
Next, the fused amplitude spectrum , phase spectrum
, and the OUT’s amplitude spectrum
and phase spectrum
are respectively fed into two 1
1 convolution, and the ReLU activation function is applied to the convolved features. Then, the two sets of activated amplitude spectrum features and phase spectrum features are concatenated separately, and finally, a 1
1 convolution is used to obtain the fused amplitude spectrum A and phase spectrum P.
Lastly, the fused amplitude spectrum A and phase spectrum P are input into the inverse Fourier transform (IFFT) to obtain the spatial-domain multi-frequency band fusion feature .
The multi-frequency information fusion module integrates data from various frequency ranges by performing a weighted combination of the amplitude and phase spectra, along with the spectral information of the LRMS image.
Result analysis and discussion
In this section, we shall evaluate the proposed method and juxtapose its results with those of recent, competitive approaches. The data employed in these evaluations originate from datasets procured from the WorldView-3 (WV3), WorldView-2 (WV2), and QuickBird (QB) satellites. Initially, we will delineate the specifics of the datasets utilized for this analysis in greater detail. Following this, we will outline the evaluation metrics and the particulars of the training process. Subsequently, we will exhibit the test results of the proposed method alongside those of the comparative methods across the three datasets, incorporating both quantitative metrics and visual illustrations under low-resolution and full-resolution conditions, to affirm the effectiveness of the proposed FIAN. Ultimately, the efficacy of the suggested FIAN and its component modules will be meticulously examined via an extensive series of ablation studies.
Dataset
To evaluate our pansharpening method, we used an extensive dataset, including 4-band datasets from QB and 8-band datasets from WV3 and WV2. The 4-band sets include standard colors: red, green, blue, and near-infrared; the 8-band sets add coastal, yellow, red-edge, and an additional near-infrared band. The spatial resolution ratio between PAN (panchromatic) and MS (multispectral) images is 4:1. Owing to the absence of ground truth images, we spatially degraded the datasets using the Wald’s protocol [47], which involves downsampling the original images by a factor of 4 through filters aligned with each satellite’s modulation transfer function (MTF). In our network, the simulated datasets’ panchromatic and multispectral images serve as inputs, while the original multispectral images act as reference data for training. The datasets used in this study are from the PanCollection dataset [17], which contains detailed data descriptions. For example, in the WV3 dataset, we used 9,714 pairs of PAN-LRMS-GT data for network training, where the PAN images have a size of 6464
1, the LRMS images have a size of 16
16
8, and the GT images have a size of 64
64
8. For low-resolution evaluation, we selected 20 pairs of PAN-LRMS-GT data for testing, where the PAN images have a size of 256
256
1, the LRMS images have a size of 64
64
8, and the GT images have a size of 256
256
8. For full-resolution evaluation, we selected 20 pairs of PAN-LRMS images for testing, where the PAN images have a size of 512
512
1 and the LRMS images have a size of 128
128
8.
Training details and parameters
In this study, we developed our network using the PyTorch 2.0 framework on a Linux system, with a single NVIDIA GeForce GTX 3090 GPU supporting our hardware needs. Our software setup also included Python 3.10. Throughout the training phase, we conducted 1000 epochs. The learning rate started at 3 10−4 for the initial 500 epochs, then reduced to one-tenth of that rate for the latter half. We utilized the ADAM optimizer [52] to manage parameter optimization, maintaining a batch size of 32 and setting
at 0.9 and
at 0.999.
Comparison methods and quantitative metrics
To assess the performance of our proposed method, we performed both qualitative and quantitative comparisons against current state-of-the-art pansharpening techniques. We compared our method with two main types of approaches: CNN-based spatial domain methods (including PNN [26], MSDCNN [39], DRPNN [31], DiCNN [50], BDPN [44], FusionNet [51] and dual-domain methods that integrate spatial and frequency domains (including WINet [49], PYDDN [13], SFINet [43], BIM [38]). All competing methods were trained on identical training sets, using parameters specified in their respective original publications. To ensure fairness and consistency in our evaluations, all methods were tested under the same hardware and software conditions.
This paper provides a comprehensive assessment of the performance of the methods mentioned earlier, using both reduced-resolution and full-resolution datasets. For the reduced-resolution tests, we employ several metrics: ERGAS [21], SAM [4], SCC [32], and Q2n [36]. These metrics help quantify the quality of the results. For the full-resolution tests, we use the , DS, and QNR [1] to evaluate the performance of all methods.
Performance comparison with WV3 data
In this section, we conduct a comprehensive evaluation and comparison of the proposed method and benchmark methods on the WV3 dataset, considering both reduced-resolution and full-resolution aspects.
First, we assess the differences between the predicted pansharpening images and the ground truth (GT) images through reduced-resolution evaluation. Table 1 presents the objective assessment metric results for all compared methods and our proposed model across 20 WV3 test samples. The table demonstrates that our model achieves the best performance across all metrics, showcasing the effectiveness and superiority of our approach. Compared to benchmark pansharpening methods, our model better preserves the spectral information of multispectral images while effectively enhancing spatial resolution, resulting in high-quality image fusion.
Fig 5 provides a visual comparison of the pansharpening results from all compared methods to further illustrate the performance differences. We selected a representative area of the image (marked with a green box), enlarged it, and positioned it in the upper left corner of the original image for detailed observation. Close examination of the enlarged area reveals that the fusion results produced by our model closely resemble the ground truth image, exhibiting excellent performance in texture, edges, and color reproduction.
The visual comparisons of fusion results from various methods applied to reduced-resolution data from the WorldView-3 satellite. Reprinted from https://github.com/liangjiandeng/PanCollection under a CC BY license, with permission from Liangjian Deng, original copyright 2025.
Secondly, we assess the pansharpening performance of various methods on actual data through full-resolution evaluation. Table 2 displays the objective assessment metric results for all compared methods and our proposed model across 20 full-resolution WV3 test samples. The results in the table demonstrate that our model achieves the best performance across all metrics, highlighting the effectiveness and superiority of our approach.
Fig 6 provides a visual comparison of the pansharpening results from all methods evaluated, further illustrating performance differences. We selected a representative area of the image (marked with a green box), enlarged it, and placed it in the lower left corner of the original image for detailed observation. Upon close examination of the enlarged area, it is evident that the fusion results produced by our model excel in texture, edge definition, and color reproduction.
The visual comparisons of fusion results from various methods applied to full-resolution data from the WorldView-3 satellite. Reprinted from https://github.com/liangjiandeng/PanCollection under a CC BY license, with permission from Liangjian Deng, original copyright 2025.
Performance comparison with WV2 data
In this section, we conduct a comprehensive evaluation and comparison of the proposed method and benchmark methods on the WV2 dataset, considering both reduced-resolution and full-resolution aspects.
First, we assess the differences between the predicted pansharpening images and the ground truth (GT) images through reduced-resolution evaluation. Table 3 presents the objective assessment metric results for all compared methods and our proposed model across 20 WV2 reduce-resolution test samples. The table demonstrates that our model achieves the best performance across all metrics, showcasing the effectiveness and superiority of our approach. Compared to benchmark pansharpening methods, our model better preserves the spectral information of multispectral images while effectively enhancing spatial resolution, resulting in high-quality image fusion.
Fig 7 provides a visual comparison of the pansharpening results from all compared methods to further illustrate the performance differences. We selected a representative area of the image (marked with a green box), enlarged it, and positioned it in the lower left corner of the original image for detailed observation. Close examination of the enlarged area reveals that the fusion results produced by our model closely resemble the ground truth image, exhibiting excellent performance in texture, edges, and color reproduction.
The visual comparisons of fusion results from various methods applied to reduced-resolution data from the WV2 satellite. Reprinted from https://github.com/liangjiandeng/PanCollection under a CC BY license, with permission from Liangjian Deng, original copyright 2025.
Secondly, we assess the pansharpening performance of various methods on actual data through full-resolution evaluation. Table 4 displays the objective assessment metric results for all compared methods and our proposed model across 20 full-resolution WV2 test samples. The results in the table demonstrate that our model achieves the best performance across all metrics, highlighting the effectiveness and superiority of our approach.
Fig 8 provides a visual comparison of the pansharpening results from all methods evaluated, further illustrating performance differences. We selected a representative area of the image (marked with a green box), enlarged it, and placed it in the lower left corner of the original image for detailed observation. Upon close examination of the enlarged area, it is evident that the fusion results produced by our model excel in texture, edge definition, and color reproduction.
The visual comparisons of fusion results from various methods applied to full-resolution data from the WV2 satellite. Reprinted from https://github.com/liangjiandeng/PanCollection under a CC BY license, with permission from Liangjian Deng, original copyright 2025.
Performance comparison with QB data
In this section, we conduct a comprehensive evaluation and comparison of the proposed method and benchmark methods on the QB dataset, considering both reduced-resolution and full-resolution aspects.
First, we assess the differences between the predicted pansharpening images and the ground truth (GT) images through reduced-resolution evaluation. Table 5 presents the objective assessment metric results for all compared methods and our proposed model across 20 WV2 reduce-resolution test samples. The table demonstrates that our model achieves the best performance across all metrics, showcasing the effectiveness and superiority of our approach. Compared to benchmark pansharpening methods, our model better preserves the spectral information of multispectral images while effectively enhancing spatial resolution, resulting in high-quality image fusion.
Fig 9 provides a visual comparison of the pansharpening results from all compared methods to further illustrate the performance differences. We selected a representative area of the image (marked with a green box), enlarged it, and positioned it in the lower left corner of the original image for detailed observation. Close examination of the enlarged area reveals that the fusion results produced by our model closely resemble the ground truth image, exhibiting excellent performance in texture, edges, and color reproduction.
The visual comparisons of fusion results from various methods applied to reduced-resolution data from the QB satellite. Reprinted from https://github.com/liangjiandeng/PanCollection under a CC BY license, with permission from Liangjian Deng, original copyright 2025.
Secondly, we assess the pansharpening performance of various methods on actual data through full-resolution evaluation. Table 6 displays the objective assessment metric results for all compared methods and our proposed model across 20 full-resolution QB test samples. The results in the table demonstrate that our model achieves the best performance across all metrics, highlighting the effectiveness and superiority of our approach.
Fig 10 provides a visual comparison of the pansharpening results from all methods evaluated, further illustrating performance differences. We selected a representative area of the image (marked with a green box), enlarged it, and placed it in the lower left corner of the original image for detailed observation. Upon close examination of the enlarged area, it is evident that the fusion results produced by our model excel in texture, edge definition, and color reproduction.
The visual comparisons of fusion results from various methods applied to full-resolution data from the QB satellite. Reprinted from https://github.com/liangjiandeng/PanCollection under a CC BY license, with permission from Liangjian Deng, original copyright 2025.
Ablation study
To further substantiate the efficacy of FIAN, we performed ablation experiments. This subsection delves into the importance of the frequency feature selection strategy, the multi-frequency information fusion module, and the frequency information-adaptive filter module.
Table 7 presents seven ablation experimental comparison strategies for the aforementioned modules. First, an ablation experiment was conducted on the frequency feature selection strategy. In FIAN, the last five network layers were selected as input for the frequency information-adaptive filter module by matching the gradient range of the ground truth (GT) with that of the nine-layer CNN network features. The ablation of the frequency feature selection strategy involved testing the impact of the first five and the last five CNN network layers on pansharpening performance, presented in Table 8 as (I), (II), (III), and (IV), with (IV) representing our proposed FIAN method.
Following this, ablation experiments were conducted on the frequency information-adaptive filter module (FIAFM) and the multi-frequency information fusion module (MFIFM), precisely (V), (VI), and (VII). Notably, (VII) employs the same strategy as (I).
We conducted a series of ablation experiments on reduced-resolution data from the WorldView-3 satellite, with the objective metrics presented in Table 8. By comparing (I), (II), and (III) with (IV), we validated the effectiveness of the proposed frequency feature selection strategy; notably, (IV) achieved the best results across all metrics. The comparison between (V), (VI), and (VII) with (IV) further confirmed the efficacy of the frequency information-adaptive filter module (FIAFM) and the multi-frequency information fusion module (MFIFM).
Fig 11 provides a visual comparison of the pansharpening results from all methods. Our proposed FIAN method (IV) most closely resembles the ground truth (GT) images, displaying superior spatial texture, edge clarity, and color reproduction compared to other scenarios.
Visual comparisons of the effects from all ablation methods applied to reduced-resolution data from the WorldView-3 satellite. Reprinted from https://github.com/liangjiandeng/PanCollection under a CC BY license, with permission from Liangjian Deng, original copyright 2025.
Conclusions
This study introduces the frequency information-adaptive network (FIAN), a groundbreaking pansharpening technique that seamlessly combines information from both the spatial and frequency domains. FIAN incorporates a frequency information-adaptive filter module that adaptively creates filters with varied frequency attributes by leveraging the gradients of distinct CNN features. FIAN also incorporates a frequency feature selection approach that aims to optimally complement and enhance high-frequency details while retaining low-frequency information. This targeted strategy ultimately elevates the overall performance of dual-domain information fusion by striking a balance between preserving crucial low-frequency components and enriching high-frequency nuances. Moreover, the multi-frequency information fusion module amalgamates data from diverse frequency ranges by executing a weighted combination of the amplitude and phase spectra, in conjunction with the spectral information derived from the LRMS image. This integrated approach ensures a comprehensive assimilation of information spanning the entire frequency spectrum, thereby enhancing the richness and fidelity of the fused output. Comprehensive evaluations performed on both downscaled and original resolution datasets showcase FIAN’s superior performance compared to cutting-edge pansharpening techniques. The proposed method shows better performance on quantitative metrics and produces visually appealing pansharpening images that preserve spatial details and spectral fidelity. While the proposed FIAN method demonstrates significant advantages in remote sensing pansharpening, the underlying adaptive frequency selection strategy can also be extended to other imaging domains, such as digital photography and image super-resolution.
Despite the demonstrated effectiveness of our method, certain limitations remain. First, the proposed neural network architecture is relatively complex, leading to longer training times and higher computational demands. Second, while the adaptive band-pass filtering strategy effectively suppresses noise in general, it currently employs a relatively simple model for frequency band selection. Future research could include more targeted modeling of noise characteristics to further enhance robustness. These limitations represent important directions for future improvements.
References
- 1. Vivone G, Alparone L, Chanussot J, Dalla Mura M, Garzelli A, Licciardi G, et al. A critical comparison among pansharpening algorithms. IEEE Trans Geosci Remote Sens. 2014;53:2565–86.
- 2. Li X, Yan H, Xie W, Kang L, Tian Y. An improved pulse-coupled neural network model for pansharpening. Sensors (Basel). 2020;20(10):2764. pmid:32408666
- 3. Liu H, Deng L, Dou Y, Zhong X, Qian Y. Pansharpening model of transferable remote sensing images based on feature fusion and attention modules. Sensors (Basel). 2023;23(6):3275. pmid:36991987
- 4.
Yuhas R, Goetz A, Boardman J. Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm. In: JPL Summaries of the Third Annual JPL Airborne Geoscience Workshop. 1992. p. 147–9.
- 5. Jin X, Feng Y, Jiang Q, Miao S, Chu X, Zheng H, et al. Upgan: an unsupervised generative adversarial network based on u-shaped structure for pansharpening. ISPRS Int J Geo-Inf. 2024;13(222).
- 6. Huang W, Feng J, Wang H, Sun L. A new architecture of densely connected convolutional networks for pan-sharpening. ISPRS Int J Geo-Inf. 2020;9(242).
- 7. Li H, Jing L, Tang Y. Assessment of pansharpening methods applied to WorldView-2 imagery fusion. Sensors (Basel). 2017;17(1):89. pmid:28067770
- 8. Aiazzi B, Alparone L, Baronti S, Garzelli A, Selva M. Mtf-tailored multiscale fusion of high-resolution ms and pan imagery. Photogramm Eng Remote Sens. 2006;72:591–6.
- 9. Siok K, Ewiak I, Jenerowicz A. Multi-sensor fusion: a simulation approach to pansharpening aerial and satellite images. Sensors (Basel). 2020;20(24):7100. pmid:33322345
- 10. Aiazzi B, Alparone L, Baronti S, Garzelli A. Context-driven fusion of high spatial and spectral resolution images based on oversampled multiresolution analysis. IEEE Trans Geosci Remote Sens. 2002;40:2300–12.
- 11. Shah V, Younan N, King R. An efficient pan-sharpening method via a combined adaptive PCA approach and contourlets. IEEE Trans Geosci Remote Sens. 2008;46:1323–35.
- 12. Souza Jr C, Firestone L, Silva L, Roberts D. Mapping forest degradation in the eastern Amazon from SPOT 4 through spectral mixture models. Remote Sens Environ. 2003;87:494–506.
- 13.
He X, Yan K, Li R, Xie C, Zhang J, Zhou M. Pyramid dual domain injection network for pan-sharpening. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023. p. 12908–17.
- 14. Li H, Jing L, Tang Y, Ding H. An improved pansharpening method for misaligned panchromatic and multispectral data. Sensors (Basel). 2018;18(2):557. pmid:29439502
- 15. Tu T, Su S, Shyu H, Huang P. A new look at IHS-like image fusion methods. Inf Fusion. 2001;2:177–86.
- 16. Garzelli A, Nencini F, Capobianco L. Optimal mmse pan sharpening of very high resolution multispectral images. IEEE Trans Geosci Remote Sens. 2007;46:228–36.
- 17. Deng L, Vivone G, Paoletti M, Scarpa G, He J, Zhang Y, et al. Machine learning in pansharpening: a benchmark, from shallow to deep networks. IEEE Geosci Remote Sens Mag. 2022;10:279–315.
- 18. Vivone G, Dalla Mura M, Garzelli A, Restaino R, Scarpa G, Ulfarsson M, et al. A new benchmark based on recent advances in multispectral pansharpening: revisiting pansharpening with classical and emerging pansharpening methods. IEEE Geosci Remote Sens Mag. 2020;9:53–81.
- 19. Kwarteng P, Chavez A. Extracting spectral contrast in landsat thematic mapper image data using selective principal component analysis. Photogramm Eng Remote Sens. 1989;55:339–48.
- 20. Wu C, Du B, Cui X, Zhang L. A post-classification change detection method based on iterative slow feature analysis and Bayesian soft fusion. Remote Sens Environ. 2017;199:241–55.
- 21.
Wald L. Data fusion: definitions and architectures: fusion of images of different spatial resolutions. Presses des MINES. 2002.
- 22. Magid SA, Zhang Y, Wei D, Jang W-D, Lin Z, Fu Y, et al. Dynamic high-pass filtering and multi-spectral attention for image super-resolution. Proc IEEE Int Conf Comput Vis. 2021;2021:4268–77. pmid:35368831
- 23. Zhang T-J, Deng L-J, Huang T-Z, Chanussot J, Vivone G. A triple-double convolutional neural network for panchromatic sharpening. IEEE Trans Neural Netw Learn Syst. 2023;34(11):9088–101. pmid:35263264
- 24.
Zhou M, Huang J, Yan K, Yu H, Fu X, Liu A, et al. Spatial-frequency domain information integration for pan-sharpening. In: European Conference on Computer Vision. 2022. p. 274–91.
- 25. Huang W, Xiao L, Wei Z, Liu H, Tang S. A new pan-sharpening method with deep neural networks. IEEE Geosci Remote Sens Lett. 2015;12:1037–41.
- 26. Masi G, Cozzolino D, Verdoliva L, Scarpa G. Pansharpening by convolutional neural networks. Remote Sens. 2016;8:594.
- 27. Zhou M, Yan K, Fu X, Liu A, Xie C. Pan-guided band-aware multi-spectral feature enhancement for pan-sharpening. IEEE Trans Comput Imaging. 2023;9:238–49.
- 28.
Yang J, Fu X, Hu Y, Huang Y, Ding X, Paisley J. PanNet: A deep network architecture for pan-sharpening. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. p. 5449–57.
- 29.
Wei Y, Yuan Q. Deep residual learning for remote sensed imagery pansharpening. In: 2017 International Workshop on Remote Sensing with Intelligent Processing (RSIP). 2017. p. 1–4.
- 30. Shettigara V. A generalized component substitution technique for spatial enhancement of multispectral images using a higher resolution data set. Photogramm Eng Remote Sens. 1992;58:561–7.
- 31. Wei Y, Yuan Q, Shen H, Zhang L. Boosting the accuracy of multispectral image pansharpening by learning a deep residual network. IEEE Geosci Remote Sens Lett. 2017;14:1795–9.
- 32. Zhou J, Civco D, Silander J. A wavelet transform method to merge Landsat TM and SPOT panchromatic data. Int J Remote Sens. 1998;19:743–57.
- 33.
Rao Y, He L, Zhu J. A residual convolutional neural network for pan-shaprening. In: 2017 International Workshop on Remote Sensing with Intelligent Processing (RSIP). 2017. p. 1–4.
- 34.
Masi G, Cozzolino D, Verdoliva L, Scarpa G. CNN-based pansharpening of multi-resolution remote-sensing images. In: 2017 Joint Urban Remote Sensing Event (JURSE). 2017. p. 1–4.
- 35. Ranchin T, Wald L. Fusion of high spatial and spectral resolution images: the ARSIS concept and its implementation. Photogramm Eng Remote Sens. 2000;66:49–61.
- 36. Garzelli A, Nencini F. Hypercomplex quality assessment of multi/hyperspectral images. IEEE Geosci Remote Sens Lett. 2009;6:662–5.
- 37.
Azarang A, Ghassemian H. A new pansharpening method using multi resolution analysis framework and deep neural networks. In: 2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA). 2017. p. 1–6.
- 38.
Hou J, Cao Q, Ran R, Liu C, Li J, Deng L. Bidomain modeling paradigm for pansharpening. In: Proceedings of the 31st ACM International Conference on Multimedia. 2023. p. 347–57.
- 39. Yuan Q, Wei Y, Meng X, Shen H, Zhang L. A multiscale and multidepth convolutional neural network for remote sensing imagery pan-sharpening. IEEE J Sel Top Appl Earth Obs Remote Sens. 2018;11:978–89.
- 40. Liu Q, Zhou H, Xu Q, Liu X, Wang Y. Psgan: A generative adversarial network for remote sensing image pan-sharpening. IEEE Trans Geosci Remote Sens. 2020;59:10227–42.
- 41. Shao Z, Cai J. Remote sensing image fusion with deep convolutional neural network. IEEE J Sel Top Appl Earth Obs Remote Sens. 2018;11:1656–69.
- 42.
Vitale S, Ferraioli G, Scarpa G. A CNN-based model for pansharpening of worldview-3 images. In: IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium. 2018. p. 5108–11.
- 43.
Zhou M, Huang J, Yan K, Hong D, Jia X, Chanussot J, et al. A general spatial-frequency learning framework for multimodal image fusion. IEEE Trans Pattern Anal Mach Intell. 2024.
- 44. Zhang Y, Liu C, Sun M, Ou Y. Pan-sharpening using an efficient bidirectional pyramid network. IEEE Trans Geosci Remote Sens. 2019;57:5549–63.
- 45. Dong W, Zhang T, Qu J, Xiao S, Liang J, Li Y. Laplacian pyramid dense network for hyperspectral pansharpening. IEEE Trans Geosci Remote Sens. 2021;60:1–13.
- 46. Dong W, Hou S, Xiao S, Qu J, Du Q, Li Y. Generative dual-adversarial network with spectral fidelity and spatial enhancement for hyperspectral pansharpening. IEEE Trans Neural Netw Learn Syst. 2022;33(12):7303–17. pmid:34111007
- 47. Wald L, Ranchin T, Mangolini M. Fusion of satellite images of different spatial resolutions: assessing the quality of resulting images. Photogramm Eng Remote Sens. 1997;63:691–9.
- 48. Ciotola M, Vitale S, Mazza A, Poggi G, Scarpa G. Pansharpening by convolutional neural networks in the full resolution framework. IEEE Trans Geosci Remote Sens. 2022;60:1–17.
- 49. Zhang J, He X, Yan K, Cao K, Li R, Xie C, et al. Pan-sharpening with wavelet-enhanced high-frequency information. IEEE Trans Geosci Remote Sens. 2024;62:1–14.
- 50. He L, Rao Y, Li J, Chanussot J, Plaza A, Zhu J, et al. Pansharpening via detail injection based convolutional neural networks. IEEE J Sel Top Appl Earth Obs Remote Sens. 2019;12:1188–204.
- 51. Deng L, Vivone G, Jin C, Chanussot J. Detail injection-based deep convolutional neural networks for pansharpening. IEEE Trans Geosci Remote Sens. 2020;59:6995–7010.
- 52.
Kingma D, Ba J. Adam: a method for stochastic optimization. arXiv preprint 2014. https://arxiv.org/abs/1412.6980
- 53. Vivone G, Simões M, Dalla Mura M, Restaino R, Bioucas-Dias J, Licciardi G, et al. Pansharpening based on semiblind deconvolution. IEEE Trans Geosci Remote Sens. 2014;53:1997–2010.
- 54.
Wang Y, Lin Y, Meng G, Fu Z, Dong Y, Fan L, et al. Learning high-frequency feature enhancement and alignment for pan-sharpening. In: Proceedings of the 31st ACM International Conference on Multimedia. 2023. p. 358–67.
- 55. Vicinanza M, Restaino R, Vivone G, Dalla Mura M, Chanussot J. A pansharpening method based on the sparse representation of injected details. IEEE Geosci Remote Sens Lett. 2014;12:180–4.
- 56. Palsson F, Sveinsson J, Ulfarsson M. A new pansharpening algorithm based on total variation. IEEE Geosci Remote Sens Lett. 2013;11:318–22.
- 57. Fasbender D, Radoux J, Bogaert P. Bayesian data fusion for adaptable image pansharpening. IEEE Trans Geosci Remote Sens. 2008;46:1847–57.
- 58. Yokoya N, Yairi T, Iwasaki A. Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion. IEEE Trans Geosci Remote Sens. 2011;50:528–37.
- 59. Ma J, Yu W, Chen C, Liang P, Guo X, Jiang J. Pan-GAN: An unsupervised pan-sharpening method for remote sensing image fusion. Inf Fusion. 2020;62:110–20.
- 60. Zhou C, Zhang J, Liu J, Zhang C, Fei R, Xu S. PercepPan: towards unsupervised pan-sharpening based on perceptual loss. Remote Sens. 2020;12(2318).
- 61. Zhou H, Liu Q, Wang Y. Pgman: An unsupervised generative multiadversarial network for pansharpening. IEEE J Sel Top Appl Earth Observ Remote Sens. 2021;14:6316–27.
- 62. Qu Y, Baghbaderani R, Qi H, Kwan C. Unsupervised pansharpening based on self-attention mechanism. IEEE Trans Geosci Remote Sens. 2020;59:3192–208.
- 63.
Yuan X, Xu X, Wang X, Zhang K, Liao L, Wang Z, et al. Osap-loss: efficient optimization of average precision via involving samples after positive ones towards remote sensing image retrieval. CAAI Trans Intell Technol. 2023. p. 1–22.
- 64. Thomas C, Ranchin T, Wald L, Chanussot J. Synthesis of multispectral images to high spatial resolution: a critical review of fusion methods based on remote sensing physics. IEEE Trans Geosci Remote Sens. 2008;46:1301–12.
- 65. Alparone L, Baronti S, Garzelli A, Nencini F. A global quality measurement of pan-sharpened multispectral imagery. IEEE Geosci Remote Sens Lett. 2004;1:313–7.
- 66. Pradhan P, King R, Younan N, Holcomb D. Estimation of the number of decomposition levels for a wavelet-based multiresolution multisensor image fusion. IEEE Trans Geosci Remote Sens. 2006;44:3674–86.
- 67. Li H, Wang W, Wang X, Yuan X, Xu X. Blind 3d video stabilization with spatio-temporally varying motion blur. ACM Trans Multimedia Comput Commun Appl. 2024;20(11):23.
- 68.
Wu W, Wang W, Wang Z, Jiang K, Xu X. From generation to suppression: towards effective irregular glow removal for nighttime visibility enhancement. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI ’23). IJCAI. p. 1533–41.
- 69. Meng X, Bao K, Shu J, Zhou B, Shao F, Sun W, et al. A blind full-resolution quality evaluation method for pansharpening. IEEE Trans Geosci Remote Sens. 2021;60:1–16.
- 70. Scarpa G, Ciotola M. Full-resolution quality assessment for pansharpening. Remote Sens. 2022;14(1808).
- 71.
Wang Z, Yan Z, Yang J. Sgnet: structure guided network via gradient-frequency awareness for depth map super-resolution. arXiv preprint 2023. https://arxiv.org/abs/2312.05799
- 72. Khan M, Alparone L, Chanussot J. Pansharpening quality assessment using the modulation transfer functions of instruments. IEEE Trans Geosci Remote Sens. 2009;47:3880–91.
- 73. Alparone L, Aiazzi B, Baronti S, Garzelli A, Nencini F, Selva M. Multispectral and panchromatic data fusion assessment without reference. Photogramm Eng Remote Sens. 2008;74:193–200.
- 74. Xian Y, Lampert CH, Schiele B, Akata Z. Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans Pattern Anal Mach Intell. 2019;41(9):2251–65. pmid:30028691
- 75. Scarpa G, Vitale S, Cozzolino D. Target-adaptive CNN-based pansharpening. IEEE Trans Geosci Remote Sens. 2018;56:5443–57.
- 76. He Y, Wang W, Wu W, Jiang K. Disentangle nighttime lens flares: self-supervised generation-based lens flare removal. Proc AAAI Conf Artif Intell. 2025;39(3):3464–72.