Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A lightweight hyperspectral image multi-layer feature fusion classification method based on spatial and channel reconstruction

  • Yuping Yin ,

    Contributed equally to this work with: Yuping Yin, Haodong Zhu

    Roles Funding acquisition, Supervision, Writing – review & editing

    Affiliation Faculty of Electrical and Control Engineering, Liaoning Technical University, Huludao, Liaoning, China

  • Haodong Zhu ,

    Contributed equally to this work with: Yuping Yin, Haodong Zhu

    Roles Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    1301106050@qq.com

    Affiliation Faculty of Electrical and Control Engineering, Liaoning Technical University, Huludao, Liaoning, China

  • Lin Wei

    Roles Funding acquisition, Supervision, Writing – review & editing

    Affiliation The Department of Basic Education, Liaoning Technical University, Huludao, Liaoning, China

Abstract

Hyperspectral Image (HSI) classification tasks are usually impacted by Convolutional Neural Networks (CNN). Specifically, the majority of models using traditional convolutions for HSI classification tasks extract redundant information due to the convolution layer, which makes the subsequent network structure produce a large number of parameters and complex computations, so as to limit their classification effectiveness, particularly in situations with constraints on computational power and storage capacity. To address these issues, this paper proposes a lightweight multi-layer feature fusion classification method for hyperspectral images based on spatial and channel reconstruction (SCNet). Firstly, this method reduces redundant computations of spatial and spectral features by introducing Spatial and Channel Reconstruction Convolutions (SCConv), a novel convolutional compression method. Secondly, the proposed network backbone is stacked with multiple SCConv modules, which allows the network to capture spatial and spectral features that are more beneficial for hyperspectral image classification. Finally, to effectively utilize the multi-layer feature information generated by SCConv modules, a multi-layer feature fusion (MLFF) unit was designed to connect multiple feature maps at different depths, thereby obtaining a more robust feature representation. The experimental results demonstrate that, compared to seven other hyperspectral image classification methods, this network has significant advantages in terms of the number of parameters, model complexity, and testing time. These findings have been validated through experiments on four benchmark datasets.

1 Introduction

A spectral imager captures hyperspectral remote sensing images by simultaneously imaging target objects across multiple contiguous spectral bands. As a result, HSI consists of hundreds of continuous spectral bands ranging from the visible to the infrared spectrum. These images not only acquire spatial information such as the structure, positional relationships, and shapes of objects but also contain spectral information that characterizes the physical properties of the materials. This unique “image-spectrum integration” characteristic enables hyperspectral images to play a crucial role in various fields, including environmental monitoring [1], urban planning [2], and smart agriculture [3].

When capturing images, spectral imagers are influenced by their inherent hardware limitations and environmental factors, resulting in various types of noise that affect the quality and analytical precision of hyperspectral images. With the advancement of machine learning, spatial filtering techniques such as mean filtering [4], median filtering [5], and Gaussian filtering [6] have been employed to smooth hyperspectral images, thereby reducing noise. Moreover, the noise in hyperspectral images is distributed across all bands but is often concentrated in specific directions in high-dimensional space. Aiming at this characteristic, several feature extraction methods have been proposed, such as Principal Component Analysis (PCA) [7], Linear Discriminant Analysis (LDA) [8], and Independent Component Analysis (ICA) [9]. Subsequently, more efficient dimensionality reduction techniques have also been explored, including Autoencoder [10], Locally Linear Embedding (LLE) [11], and t-distributed Stochastic Neighbor Embedding (t-SNE) [12]. However, a major drawback of dimensionality reduction is the inevitable loss of information. When reducing high-dimensional data to lower dimensions, some detailed information and spectral features may be lost, potentially affecting the accuracy of classification tasks. Therefore, during the preprocessing of hyperspectral images, it is crucial to preserve spectral information as much as possible and avoid excessive processing that leads to information loss. Zero-phase Component Analysis (ZCA) [13], an extension of PCA, can effectively improve the signal-to-noise ratio of the data while retaining all band information. ZCA achieves this by rotating the PCA-transformed data back to the original feature space, making the processed data more closely resemble the original input data.

In recent years, CNN has been extensively applied in computer vision tasks due to its remarkable multi-level feature learning capabilities. Consequently, various CNN-driven HSI classification methods have been proposed, which can be broadly categorized into 1D convolutional neural network (1DCNN) [14], 2D convolutional neural network (2DCNN) [15], 3D convolutional neural network (3DCNN) [16], and some hybrid methods. The 1D-CNN focuses on spectral feature extraction, treating each pixel’s spectral curve as a one-dimensional signal for convolution operations. The 2D-CNN emphasizes spatial feature extraction, capturing features of spatial dimension by performing convolutions on the two-dimensional image of each spectral band. The 3D-CNN conducts convolutions on the three-dimensional data cube of hyperspectral images, simultaneously capturing joint spectral and spatial features. In 2019, Roy et al. [17] proposed the HybridSN model, which combines 3D-CNN and 2D-CNN, leveraging the complementary information from both spatial-spectral and spectral domains for HSI classification.

However, the aforementioned algorithms based on deep convolutional neural networks rely on extensive computational and storage resources, posing significant challenges for efficient deployment in resource-constrained environments. To overcome these challenges, various network architecture designs have been explored to enhance network efficiency, aiming to reduce the inherent redundancy of model parameters and further achieve lightweight network models. For instance, ResNet [18] and DenseNet [19] employ efficient shortcut connections to improve network topology, connecting all preceding feature maps to reduce redundant parameters while alleviating the training difficulties of deep networks. Inspired by ResNet, Zhong et al. [20] proposed a Spectral-Spatial Residual Network (SSRN), where residual blocks are connected via identity mappings to every other three-dimensional convolution layer, facilitating gradient backpropagation and reducing model parameters while mitigating accuracy degradation. Li et al. [21] introduced DenseNet into the Dual-Branch Dual-Attention Mechanism Network (DBDA), designing dense spectral blocks and dense spatial blocks to learn deeper spectral and spatial features of hyperspectral images. The densely connected arrangement of dense blocks deepens the network, reduces gradient vanishing, and effectively compresses the model. Xue et al. [22] designed a novel Spectral-Spatial Siamese Network (S3Net). This network consists of a lightweight Spectral-Spatial Network (SSN) composed of one-dimensional and two-dimensional convolutions to extract spectral-spatial features. They constructed a dual-branch SSN, which expands the training set by inputting sample pairs into each branch, thereby improving classification performance in small sample scenarios.

To further reduce model parameters and FLOPs, researchers often use various efficient convolution operations to replace traditional convolutions. For example, MADANet [23] utilizes depth-wise separable convolutions to extract and aggregate multi-scale features, effectively capturing local contextual information and achieving outstanding classification accuracy with fewer parameters. GhostNet [24], considering redundancy among feature maps, generates primary feature maps using a small number of standard convolutions and produces multiple ghost feature maps through simple linear transformations (such as pointwise and depth-wise convolutions). This approach reduces the computational cost of standard convolutions while maintaining high representational capability. Similarly, OctConv [25] introduces octave convolution, dividing input features into high-frequency and low-frequency channels, where the latter are processed with reduced spatial resolution to alleviate spatial redundancy, cutting computation without increasing the parameter count. However, the aforementioned methods either focus on reducing redundancy in the channel dimension or the spatial dimension, leaving the network still facing the problem of feature redundancy.

Inspired by previous models aimed at reducing the number of parameters and computational costs of convolutional layers, researchers have proposed a novel CNN compression method known as Spatial and Channel Reconstruction Convolutions (SCConv) [26]. This module consists of Spatial Reconstruction Units (SRU) and Channel Reconstruction Units (CRU). SRU separates and reconstructs features based on weights to suppress spatial redundancy and enhance feature representation. CRU employs split, transformation, and fusion strategies to reduce channel redundancy while simultaneously lowering computational costs and storage requirements. By sequentially arranging SRU and CRU to replace standard convolutions, the combined approach reduces both spatial and channel redundancy in convolutional layers, significantly cutting computational costs and enhancing the performance of deep models.

Harnessing these recent technological advancements, this paper proposes a novel lightweight HSI classification network architecture. In this approach, the original HSI data is first preprocessed using ZCA, which retains all band information and reduces the impact of noise in the raw data. Subsequently, by introducing SCConv to replace standard convolutions, the redundancy in the spatial and spectral dimensions of intermediate feature maps is reduced, leading to a decrease in the number of parameters and computational complexity. Additionally, to fully exploit the multiscale differences of feature maps at different depths, a multi-layer feature fusion (MLFF) unit was proposed to generate more representative features. Finally, experimental results demonstrate the effectiveness of the proposed network architecture.

In general, the main contributions of this article can be enumerated as follows:

(1) This paper proposes a lightweight network structure with fewer layers for overall design. This structure maintains considerable classification performance with a reduced number of parameters and lower computational costs.

(2) To overcome the drawbacks of spatial and channel redundancy produced by conventional convolutional layers, this study introduces SCConv, composed of SRU and CRU. This not only significantly reduces computational costs but also enhances the performance of deep models.

(3) Due to the presence of various feature information at different layers of SCNet, a MLFF unit were designed to fuse these features from different depths. The proposed MLFF unit further improves the classification accuracy of the network.

2 Related works

2.1 Group normalization

Group normalization (GN) [27] is a novel normalization technique in deep learning. Compared to batch normalization (BN) [28], GN is independent of batch size, leading to enhanced stability across varying batch sizes. BN normalizes each channel dimension across the entire batch, calculating the mean and variance along this dimension. Suppose BN was applied to a data set , where N is the batch size, C is the channel dimension, and H and W are the height and width. For each channel i, BN computes the mean and variance along the (N, H, W) axes as follows:

(1)(2)

The calculation of mean and variance in GN differs from that in BN. GN first divides the channels into multiple groups and normalizes the features within each group. Consequently, the mean and variance are computed based on the features within each group. If the channels of X are divided into G groups, with each group containing C/G channels, the mean and variance are calculated as follows:

(3)(4)

For the data X, normalization can be performed by subtracting the mean and dividing by the standard deviation , as shown below:

(5)

where, and are the mean and standard deviation of X, is a tiny constant added for division stability, and and are trainable parameters. The illustrations of BN and GN with g = 2 are shown in Figs 1 and 2. The and are computed from the pixel values within the same color feature map.

Additionally, there are two normalization methods similar to GN in terms of computation: Layer Normalization (LN) and Instance Normalization (IN). Their main difference lies in the dimension over which normalization is performed. LN normalizes along the channel dimension (C) of a single sample, computing the mean and variance across all pixels within the entire channel. In contrast, IN normalizes each channel separately at the spatial level (), computing the mean and variance independently for each channel. GN falls between LN and IN, with its normalization scope determined by the number of groups G. When G = 1, GN degrades to LN, performing normalization across the entire channel dimension. Conversely, when G = C, GN becomes equivalent to IN, normalizing each channel independently.

In terms of normalization relationships, LN, IN, and GN are all batch-independent normalization methods. Unlike BN, which requires computing the mean and variance across multiple samples, these methods are more adaptable to small-batch training environments. They help avoid the issue of unstable statistics in BN during small-batch training, thereby improving model stability and generalization in memory-constrained or dynamically changing input scenarios.

2.2 PCA and ZCA

Principal Component Analysis (PCA) [13] is one of the most commonly used and effective dimensionality reduction algorithms. PCA processes all the bands in the original image, makes them pairwise orthogonal, extracts their main features, and then transforms them into a new feature space and makes them maintain the original information as much as possible when reflecting the spectral information to ensure the accuracy of subsequent training. PCA whitening is an extension of PCA. The difference from PCA is that after making the bands pairwise orthogonal, the variance of each band feature is set to 1, and then the subsequent operations of PCA are performed. This prevents the model from being biased towards certain features during training and makes it more fair and balanced. Zero-phase Component Analysis (ZCA) [13] whitening rotates the PCA-whitened data back to the original feature space on the basis of PCA whitening. In the case of two-dimensional data, the original data, the data after PCA whitening, and the data after ZCA whitening are shown in Fig 3. It is worth noting that ZCA whitening is not a dimensionality reduction algorithm; instead, it retains all the band features, making the transformed data closer to the original input data. The ZCA algorithm process is as follows:

thumbnail
Fig 3. Visualization of whitening effectiveness.

(a) Original data. (b) PCA whiten data. (c) ZCA whiten data.

https://doi.org/10.1371/journal.pone.0322345.g003

Algorithm 1 ZCA

Input:

Output:

1: //Zero Mean Normalization

2:

3: //Orthogonal Decomposition

4:

5:

6:

3 Methodology

3.1 Proposed model architecture for HSI classification

The overall network structure of SCNet is illustrated in Fig 4. Firstly, ZCA whitening is used to eliminate the correlation between the bands of the original HSI. Subsequently, the whitened data is divided into smaller 3D patches, and each is labeled with the ground truth of the central pixel of the 3D patch. In this paper, the Indian Pines dataset is used, with the spatial size of the blocks set to 9 as the input for the entire network.

As observed in Fig 4, the proposed network implements a straightforward backbone unit that first uses a convolution to expand the dimensionality of the input 3D patch, to satisfy the channel requirements for subsequent SCConv layers. By employing more convolutional kernels, the network can comprehensively extract features, thereby better learning the critical information from the data. Then, by stacking three SCConv modules, the spatial and channel redundancies within the 3D patches are reduced. To fully utilize the features from different depths, this paper designed the MLFF unit to connect multiple feature maps along the spectral dimension. This MLFF unit not only stabilizes the network but also makes it easier to train. Furthermore, by using convolution to compress the fused feature maps, high-level HSI features are extracted, and redundant information is reduced. Ultimately, all features are collected and vectorized by a final convolutional and pooling block before the output is sent to the classifier, which is a fully connected (FC) multilayer perceptron (MLP) layer. It should be noted that instead of direct prediction, global residual learning was used to construct the output, which can provide a smoother hypersurface for gradient descent in SCNet and effectively reduce the risk of network degradation. Additionally, the number of bottlenecks utilized for the HSI classification network is greatly decreased by our suggested network topology, which avoids data degradation and gradient vanishing problems during forward and backward propagation in addition to reducing model overfitting.

3.2 The architecture of SCConv

The SCConv module, depicted in Fig 5, comprises two units: Spatial Reconstruction Unit (SRU) and Channel Reconstruction Unit (CRU). Specifically, the SCConv module uses the SRU operation to obtain the spatial-refined features XS for the input features X in the bottleneck residual block. Subsequently, the channel-refined features XC are obtained using the CRU operation. This approach utilizes the spatial redundancy and channel redundancy available in the features within SCConv module. As a result, it decreases redundant information among intermediate feature maps and improves the feature representation capabilities of CNN.

3.2.1. SRU for spatial redundancy.

To utilize the spatial redundancy of features, SCConv module designs Spatial Reconstruction Unit (SRU) shown in Fig 6, which consists of two steps: Separate and Reconstruct.

Separate: The objective of the separation operation is to separate feature maps which have abundant spatial information from feature maps which have less information. Firstly, let’s use the following equation:

(6)

For a given intermediate feature map , GN is applied, and the trainable parameters and in GN are used to evaluate the richness of information in different feature maps. Richer spatial information reflects more variations in spatial pixels, leading to higher values of . By normalizing , the weight can be obtained that indicates the richness of information in different feature maps [26].

(7)

Then, the GN feature maps are re-weighted by , and a Sigmoid function is used to map the values to the (0, 1) range. Subsequently, a threshold is set for gating (the experiments set the threshold to 0.5). Weights exceeding the threshold are set to 1 to obtain the rich information weights W1, while weights below the threshold are set to 0 to obtain the sparse information weights W2. The entire process of obtaining W can be expressed by the following equation, where is element-wise multiplication [26],

(8)

Finally, the input feature X is multiplied with W1 and W2 respectively, resulting in two weighted features: the informative feature and the less informative feature . contains rich and expressive spatial content, while has little or no information and is regarded as redundant content.

Reconstruct: To reduce the spatial redundancy, a further reconstruction operation is proposed, which adds the rich informative features to the less informative ones, generating features with more informative and occupy less spatial. Note that rather than directly adding the two parts of the features, a cross-reconstruction operation is used to fully combine the two weighted features with different information and enhance the information flow between them. Finally, the cross-reconstructed features and are concatenated together to form the spatial-refined feature maps XS. The entire reconstruction process can be expressed as follows:

(9)

where is element-wise multiplication, is element-wise summation, and is concatenation. By applying the SRU on the input feature X, not only are the rich information features separated from the less informative ones, but their representative characteristics are also enhanced through the reconstruction operation, suppressing redundancy in the spatial dimension. Nevertheless, the spatial-refined feature maps XS still contain redundancy along the channel dimension.

3.2.2 CRU for channel redundancy.

To utilize the channel redundancy of features, the SCConv module introduces the Channel Reconstruction Unit (CRU) shown in Fig 7, which applies Split, Transform, and Fuse strategies.

Split: For a given spatial-refined feature XS, CRU first splits XS into two parts along the channel dimension, containing channels and channels, respectively, as illustrated in the split section of Fig 7, where, represents the split ratio ( is set to 0.5 to better balance performance and efficiency). Subsequently, convolutions are further employed to reduce the channels of the feature map to improve computational efficiency. Here, a compression ratio r is introduced to control the number of feature channels to balance the computational cost of the CRU (in the experiments, r is set to 2). After the split and compression operations, the spatial-refined feature XS is divided into an upper part and a lower part .

Transform: is input to the upper transformation stage and the operations of GWC (the group size g was assigned to 2 in this study) and PWC are used for , respectively. Then, the outputs are summed to form a merged representative feature map , as shown in the transformation part of Fig 7. By employing efficient convolution operations (i.e., GWC and PWC) instead of the costly standard convolution, it can not only extract high-level representative information, but also reduce the computational cost. GWC decreases the number of parameters and computational requirements due to sparse convolutional connections, but cuts off the information flow between different channel groups. PWC compensates for the information loss and helps the information transfer between the channel groups. The upper transformation stage can be expressed as [26]:

(10)

where and denote the learnable weight matrices for GWC and PWC, and and are the upper input feature map and output feature map, respectively. In short, the upper transformation stage utilizes the combination of GWC and PWC on the same feature map to extract rich representative features with less computational cost.

is fed into the lower transformation stage, where an economical PWC operation is employed to generate feature maps containing shallow hidden details. Additionally, the feature is reused to obtain more feature maps without incurring extra costs. Finally, the generated and reused feature maps are concatenated to form the output of the lower stage , as follows [26]:

(11)

where represents the learnable weight matrix for PWC, denotes the concatenation operation, and and are the input and output feature maps of the lower transformation stage, respectively. In summary, the lower transformation stage reuses the preceding feature and employs the economical PWC to obtain supplementary detailed features .

Fuse: After the transformation, the two types of features are not directly concatenated or combined. Instead, the output features from the upper and lower transformation stages, and , are adaptively fused, as illustrated in the fusion part of Fig 7. First, global average pooling (GAP) is utilized to gather global spatial information with channel statistics, as expressed by the following formula:

(12)

Next, the upper and lower global channel descriptors, S1 and S2, are stacked together, and the channel soft attention operation is used to generate the feature importance vector , as follows:

(13)

Finally, guided by the feature importance vector and , the channel-refined features XC can be obtained by merging the upper features and the lower features in a channel-wise manner as follows [28]:

(14)

In summary, by employing CRU, the redundancy of the spatial-refined feature map XS along the channel dimension is further reduced. Additionally, CRU extracts rich representative features by employing lightweight convolution operations while handling redundant features using cost-effective operations and feature reuse schemes. Subsequent experiments demonstrated that the sequential spatial-channel combination (SRU+CRU) achieved better performance compared to the sequential channel-spatial combination (CRU+SRU), the parallel use of the two units (CRUSRU), and the individual use of each unit (CRUSRU).

3.2.3. Parameters analysis.

The number of parameters for a standard convolution with a kernel size of can be determined using the formula:

(15)

In the SCConv module, all parameters are concentrated in the split and transform stages of the CRU. The split stage includes two convolutions, and in the transform stage, the number of parameters is mainly concentrated in the group convolution and the two pointwise convolutions. Therefore, the total number of parameters in the SCConv module consists of five parts and can be calculated by the following formula:

(16)

where represents the split ratio, r denotes the compression ratio, g is the group size of the GWC, and C1 and C2 are the input and output feature channels, respectively. Here, his paper provides a contrast to demonstrate the effectiveness of the newly introduced SCConv. In the experiment, the parameter set is , r = 2, g = 2, k = 3, , the number of parameters can be lowered by a factor of 5, where , while the module obtains superior performance compared to normal convolution.

3.3 Multy-layer feature fusion

As shown in Fig 8, different depth layers of the proposed SCNet have different levels of feature information. To effectively utilize these different features between SCConv modules, this study proposes an MLFF unit to connect feature maps at different layers. The MLFF unit in the proposed model is used to combine multiple feature maps belonging to different SCConv groups, as shown in Fig 4. Moreover, the MLFF unit can be regarded as a group of skip connections, which have been demonstrated to effectively address the issues of vanishing gradient and exploding gradient. In the realm of computer vision, the concatenation of multi-layer feature maps can achieve better performance. Yuan et al. [29] used cascade representation for recovery and achieved outstanding results in the HSI denoising challenge. In the HSI reconstruction model, Zou et al. [30] used the multi-layer fusion module to merge the hierarchical information to generate more representative features, which effectively improved the reconstruction accuracy of HSI. The proposed MLFF unit can be expressed as follows:

thumbnail
Fig 8. Different depths contain varying quantities of feature information.

(a) Feature maps of Conv. (b) First SCConv module’s feature maps. (c) Second. (d) Third.

https://doi.org/10.1371/journal.pone.0322345.g008

(17)

where F2, F3, F4 represent the feature maps of different layers in Fig 8, respectively.

Let C represent the number of feature channels, the combined feature is restored to its original size by a convolutional layer

(18)

where P is the feature map size, * represents the convolution operation, and WC and bC represent the weight parameter and bias parameter of the later convolution layer, respectively.

4 Experiment

To assess the effectiveness of SCNet, the experiments will be conducted using four different data sets. These experiments are designed to compare and confirm the accuracy and efficiency of the proposed network in relation to other methods. Three quantitative metrics, overall accuracy (OA), average accuracy (AA), and Kappa coefficient, were used to measure the accuracy of each method. Specifically, OA represents the proportion of all pixels correctly classified. AA represents the average accuracy over all categories. The Kappa coefficient reflects the agreement between ground-truth and classification results. Higher values of the three measures indicate better classification results. In addition, model parameters (Para), floating-point numbers (FLOPs), and Training time and Testing time are used to evaluate the comprehensiveness of the computational complexity and efficiency model.

4.1 Datasets

Pavia University (UP) dataset was acquired by a ROSIS sensor at the University of Pavia in Northern Italy. It contains 115 bands in the spectral region of 430 nm to 860 nm. Due to spectral mixing caused by different materials (such as buildings, roads, and vegetation) appearing within the same pixel in urban environments, the number of bands was reduced to 103 after removing irrelevant bands. The dataset has a spatial resolution of 1.3 m, a spatial size of , and includes a total of 42,776 labeled pixels categorized into 9 classes.

Kennedy Space Center (KSC) dataset was collected by the AVIRIS instrument over the Kennedy Space Center in Florida. It contains 224 bands in the 400 nm to 2500 nm spectral region. Due to the high correlation among certain bands of the AVIRIS sensor, redundant bands need to be removed to optimize information representation. After eliminating irrelevant bands, a total of 176 bands were retained. The dataset has a spatial resolution of 18m, a spatial size of 512614, a total of 5211 labeled pixels and 13 categories.

WHU-Hi-LongKou (WHLK) dataset [31] was obtained using a Headwall Nano-Hyperspec imaging sensor with an 8 mm focal length, mounted on a DJI Matrice 600 Pro (DJI M600 Pro) UAV platform [32] in Longkou Town, Hubei Province, China. The study area is a simple agricultural scene, which contains six crop species: corn, cotton, sesame, broad-leaf soybean, narrow-leaf soybean, and rice. It contains 274 bands from 400 to 1000 nm with a spatial dimension of , a total of 204, 542 labeled pixels and 9 categories, and a spatial resolution of about 0.463m.

IndianPines (IP) dataset was gathered by AVIRIS sensor over the Indian Pines test site in North-western Indian. It contains 200 bands in the 400 nm to 2500 nm spectral region and after removing invalid bands such as those affected by water vapor absorption, it includes 200 bands. The dataset hsa a spatial resolution is 20 m, a spatial size of , a total of 10249 labeled pixels. There are 16 categories in the image with uneven distribution.

Deep learning algorithms are data-driven and rely on a large number of labeled training samples. The more labeled data that is fed into the training, the higher the accuracy produced. Nevertheless, an increase in data leads to a corresponding increase in the amount of time required and the level of computing complexity. Notably, even with only little training examples, the proposed SCNet continues to operate at an exceptional level. Therefore, the size of training and validation samples in the experiment is set at a minimum level. For UP, 1% of the samples were selected for training and another 1% for validation. Both KSC and IP use a 5% sample size for training and validation. For WHLK, we exclusively choose 0.2% of the samples for training and another 0.2% for validation. Tables 1–4 list the training, validation, and test samples for the four datasets. To mitigate the issue of class imbalance in the IP dataset, we employed an oversampling approach for the minority classes (Classes 1, 7, 9, and 16) to increase their training sample sizes.

thumbnail
Table 1. The class names of UP dataset along with the number of training, validation, and test samples for each class.

https://doi.org/10.1371/journal.pone.0322345.t001

thumbnail
Table 2. The class names of KSC dataset along with the number of training, validation, and test samples for each class.

https://doi.org/10.1371/journal.pone.0322345.t002

thumbnail
Table 3. The class names of WHLK dataset along with the number of training, validation, and test samples for each class.

https://doi.org/10.1371/journal.pone.0322345.t003

thumbnail
Table 4. The class names of IP dataset along with the number of training, validation, and test samples for each class.

https://doi.org/10.1371/journal.pone.0322345.t004

4.2 Experiment setting

For this work, to compare the time consumption of training and testing, all experiments were performed on a computer configured with 12 GB memory and NVIDIA GeForce RTX 3060 GPU, and the programming environment is Python-3.8.5 with the PyTorch-1.10.0 framework. The experimental results are shown as the average and standard deviation of 10 experiments.

4.2.1 Split ratio .

To explore the effect of different split ratio in CRU module, the split ratio was adjusted incrementally from 1/8 to 7/8 and evaluated the overall accuracy and FLOPs on four datasets. As shown in Fig 9, the accuracy of SCNet rises with the increase of the split ratio . The higher allows the model to capture more comprehensive feature information during the CRU’s transformation phase, thereby improving the model’s overall performance. When is set to 1/2, the network achieves the best trade-off between FLOPs and accuracy. Consequently, for subsequent experiments, will be chosen as the optimal split ratio for SCConv to ensure a balanced compromise between performance and efficiency.

thumbnail
Fig 9. The FLOPs and OA on four datasets with different split rRESEARCHARTICLEatios in SCConv.

https://doi.org/10.1371/journal.pone.0322345.g009

4.2.2 Patch size selection.

To utilize spatial information for spectral spatial classification, the 3D cube was utilized as input to the model, which preserves all the bands in the original image in the spectral dimension. Since different cube sizes can impact the HSI classification results, in order to find the best cube size, a series of experiments were conducted in the range of 3,5,7,9,11,13,15,17. Fig 10 shows the overall accuracy of SCNet on four hyperspectral datasets for different spatial sizes. It can be found that as the patch size increases, the classification performance of the model initially increases and then decreases. This is due to the fact that as the space size increases, the model learns more spatial information, so the classification accuracy improves to some extent on the four datasets. However, if the spatial size is above a certain value, the classification accuracy starts to decrease because additional neighboring pixels introduce noise and restrict the model’s ability to extract distinctive features from the center pixel. When the patch size is set to 9, high accuracy is achieved on all four datasets. In addition, the FLOPs of the model gradually rise as the size increases. Therefore, to optimize the classification performance and efficiency of the model, the patch size will be set to 9.

thumbnail
Fig 10. Classification results for varying patch sizes on four datasets.

https://doi.org/10.1371/journal.pone.0322345.g010

4.2.3 Parameters configuration.

Table 5 provides the implementation details for each layer. It should be noted that all average pools have been implemented using the adaptive average pool, which adapts itself to the input sizes in order to vectorize the multidimensional input array into a 1-D array.

thumbnail
Table 5. Network configuration of the SCNet model on Indian Pines dataset.

https://doi.org/10.1371/journal.pone.0322345.t005

4.2.4 Comparison method.

To fully evaluate the effectiveness of SCNet, this paper selected several state-of-the-art deep learning methods for comparison, These include CNN-based (SSRN [20], HybridSN [17], S3Net [22]), Generative Adversarial Network(CA-GAN [33]) and attention-based methods (DBDA [21]). In addition, SVM with RBF kernels has been considered [34]. The patch size of each classification method is determined based on the specifications provided in its original research paper. Then, the above methods are briefly introduced respectively.

SSRN: Spectral and Spatial Residual Network (SSRN) utilizes spectral and spatial residual blocks to continuously learn rich spectral features and spatial context features from hyperspectral images. This is achieved by connecting 3D convolutional layers through identity maps. The input space for SSRN is of size , where c represents the number of spectral bands.

HybridSN: HybridSN integrates spectral-spatial 3D-CNN with spatial 2D-CNN to utilize joint feature information and further learn more spatial representations. The size of the input space is .

S3Net: S3Net consists of two lightweight spectral-spatial networks in a dual branch, Each branch consists of 1D-CNN and 2D-CNN to extract spectral-spatial features. The size of the input space is .

CA-GAN: A generative adversarial network based on collaborative learning and attention mechanisms for HSI. The size of the input space is .

DBDA: A two-branch CNN architecture with a dual attention mechanism is used in HSI classification technique to efficiently collect both spatial and spectral data. The size of the input space is .

For SSRN, HybridSN, S3Net, CA-GAN, DBDA, and the proposed method, the batch size is set to 64, the optimizer is set to Adam, the learning rate is 0.0005, and all are trained for about 200 epochs.

4.3 Experiment results and discuss

4.3.1 Classification performance.

Table 6 displays the experimental findings of each approach for the UP dataset, including OA, AA, Kappa, and the accuracy of each category. The maximum value for each item is highlighted in bold. In addition, the three-color map of this dataset, the ground-truth map, and the classification map of each method are shown in Fig 11. Because the spatial information is not considered, only the spectral features of the original HSI are used to train the network for classification, which leads to poor classification performance of SVM, and the classification map has a lot of incorrect classifications, and noise-like situations appear in each category. Although HybridSN utilize the spatial and spectral features of HSI, because it cannot learn effective features with limited training samples, the classification effect is poor, and its classification maps produce quite obvious errors. S3Net for limited sample classification has a high OA but a low AA due to the poor categorization of the fifth category, resulting in block errors in the classification map. In general, SSRN, and DBDA exhibit good classification performance and fewer noisy pixels in the classification map, and among these comparison methods, CA-GAN obtains excellent classification performance in most classes by leveraging generated samples with high quality, especially in the classes having fewer samples. Still, their OA and Kappa coefficients are lower than those of our SCNet method. It can be concluded that our proposed method can still extract effective features while reducing channel and spatial redundancy when the training samples are limited, which not only results in a reduction in the number of parameters and computation of the model but also achieves better classification performance.

thumbnail
Fig 11. Classification maps for the UP dataset using 1% training samples.

(a) RGB image. (b) Ground-truth (GT). (c–h) The classification maps with disparate algorithms.

https://doi.org/10.1371/journal.pone.0322345.g011

thumbnail
Table 6. The classification results for the UP dataset with 1% training samples.

https://doi.org/10.1371/journal.pone.0322345.t006

Table 7 records the results of all metrics for the KSC dataset. It is obvious that our method achieves the best classification performance compared with all the compared methods, with OA, AA and Kappa exceeding 98%, 97% and 98% respectively. Specifically, due to the limited number of training samples, SVM and HybridSN models have large differences in classification accuracy on different categories. For example, in the first and 13th categories, the classification results of the three methods are all above 91%, but the accuracy is very low in the fourth and fifth categories. Although SSRN, CA-GAN, and DBDA all exhibit excellent OA, they have a comparatively low AA, with less than 83% accuracy on multiple categories. S3Net shows good output, but less than 90% accuracy for more than one class. In contrast, our SCNet demonstrates excellent classification performance, with over 92% accuracy across 13 categories. Fig 12 illustrates the three-color map of the KSC dataset, the ground-truth map, and the classification map of different models. Clearly, the classification maps of the other methods contain significant error points. Our approach yields highly precise classification outcomes with minimizing the amount of misclassified pixels. This also shows that our proposed method can obtain better feature representation with limited training samples, so as to obtain a more accurate classification map.

thumbnail
Fig 12. Classification maps for the KSC dataset using 5% training samples.

(a) False-color image. (b) Ground-truth (GT). (c–h) The classification maps with disparate algorithms.

https://doi.org/10.1371/journal.pone.0322345.g012

thumbnail
Table 7. The classification results for the KSC dataset with 5% training samples.

https://doi.org/10.1371/journal.pone.0322345.t007

The experimental results for the WHLK dataset are shown in Table 8 and Fig 13. Our method still maintains high classification accuracy and performs better than other methods. Similarly, SCNet has more than 90% accuracy in all nine categories, and the results are more balanced. In contrast, more than one category has less than 90% or worse accuracy in all other compared methods. This is because our proposed method can still extract effective features while reducing channel and spatial redundancy, and effectively fuse the features between different layers, so that the feature difference between the same category becomes small, and the phenomenon of “foreign objects with the same spectrum, different spectrum with the same object” is avoided as far as possible, which not only improves the classification accuracy, but also reduces the classification accuracy difference of each category. Moreover, the comparison with the ground truth images can prove that the classification maps generated by our method are the most accurate and smooth. The precise classification of each ground object is crucial for practical applications, and SCNet has great potential and advantages in achieving this goal.

thumbnail
Fig 13. Classification maps for the WHLK dataset using 0.2% training samples.

(a) False-color image. (b) Ground-truth (GT). (c–h) The classification maps with disparate algorithms.

https://doi.org/10.1371/journal.pone.0322345.g013

thumbnail
Table 8. The classification results for the WHLK dataset with 0.2% training samples.

https://doi.org/10.1371/journal.pone.0322345.t008

Table 9 shows the classification results of different methods on the IP dataset, and the classification graphs of different methods and ground truth are shown in Fig 14. Since the IP dataset has inter-class imbalance, the classification performance of most compared methods is limited. SVM only considers spatial domain information, so the classification effect is poor, and the noise in the classification graph is serious. Although HybridSN comprehensively consider the spatial spectrum information of hyperspectral images, there are still obvious misclassifications for the 1st, 7th, and 16th categories with fewer training samples. The classification performance of SSRN and DBDA based on the attention mechanism is relatively good, but there are still poor classification results for individual categories. CA-GAN using generative adversarial technology and S3Net using sample pairs as training input have achieved 98.12% and 98.57% for the classification of the 16th category, which shows the superiority of these two technologies in the case of small sample classification. The classification accuracy of the SCNet method proposed by us has reached 100%, 100%, and 98.42% for the 1st, 7th, and 16th categories, which also proves the effectiveness of SCNet in the case of limited training samples.

thumbnail
Fig 14. Classification maps for the IP dataset using 5% training samples.

(a) False-color image. (b) Ground-truth (GT). (c–h) The classification maps with disparate algorithms.

https://doi.org/10.1371/journal.pone.0322345.g014

thumbnail
Table 9. The classification results for the IP dataset with 5% training samples.

https://doi.org/10.1371/journal.pone.0322345.t009

4.3.2 Model complexity analysis.

Similarly, Table 10 records the model complexity, training time, and testing time, demonstrating that the proposed SCNet achieves a highly lightweight design in terms of parameter count and FLOPs. On the UP dataset, SCNet requires only 43,425 parameters and 2.697 FLOPs to achieve an overall accuracy (OA) of 98%, whereas other comparison methods typically require 5-10 times more parameters and 10-70 times more FLOPs to reach a similar accuracy. Moreover, some of these methods still exhibit significantly lower classification performance than SCNet. This indicates that our approach effectively reduces memory usage and initialization costs, making it particularly suitable for devices with limited storage resources. However, the actual testing time does not fully align with the theoretical advantage in FLOPs. This discrepancy primarily arises from the model’s use of 11 convolutions and lightweight operations (such as GWC) to reduce parameter count, which alters the computational pattern. Additionally, on GPUs, standard 33 convolutions typically benefit more from cuDNN optimizations, whereas lightweight operations like GWC may not fully exploit the parallel computing capabilities of the hardware. As a result, although SCNet demonstrates a clear advantage in theoretical computational efficiency, its actual testing time does not decrease proportionally with FLOPs. Nevertheless, in terms of testing speed, SCNet is not inferior to other comparison methods. Therefore, in resource-constrained environments, this approach remains highly practical and efficient.

thumbnail
Table 10. Parameter, FLOPs and runing time(s) of different methods for the four data sets.

https://doi.org/10.1371/journal.pone.0322345.t010

Since SVM is a pixel-based model, it takes less time than 3D cube-based models in most cases. Our model preserves all bands of the network input, resulting in an increase in parameters and FLOPs in the WHLK dataset compared to the UP dataset. Nevertheless, SCNet still maintains efficient training and testing speeds for all pixels of this image, demonstrating good hardware adaptability. Furthermore, the model complexity still outperforms other algorithms. Despite the input patch sizes of HybridSN, SSRN, CA-GAN, DBDA, and S3Net being the same or even smaller than ours, SCNet still has the smallest FLOPs. Although the testing times for HybridSN, SSRN, and S3Net on the WHLK dataset are comparable to our method, our method achieves higher accuracy. This means our method better balances accuracy and efficiency.

4.3.3 Analysis of proportion of training samples.

As mentioned before, deep learning is a data-driven algorithm that heavily relies on a substantial amount of high-quality labeled datasets. Consequently, this part explores the effect of different proportions of training samples on the classification results of the UP datasets, as shown in Fig 15. As anticipated, the accuracy demonstrates improvement as the number of training samples increases. All deep learning-based comparison approaches and our proposed framework exhibit near-perfect performance when provided with sufficient samples, approximately 10% of the entire dataset. Moreover, as the number of training samples increases, the performance disparity between different models diminishes. However, even in cases where there are insufficient samples available for training, our method consistently outperforms other approaches. Given that labeling datasets incur significant costs in terms of manpower and resources, our proposed method offers potential savings.

thumbnail
Fig 15. Accuracy of different methods with different numbers of training samples on UP.

(a) OA; (b) AA; (c) Kappa.

https://doi.org/10.1371/journal.pone.0322345.g015

4.4 Ablation studies

To verify the effectiveness of sequential spatial channel combination (SRU+CRU), we design sequential channel space (CRU+SRU), parallel use of two units (CRUSRU), and separate use of two units (CRUSRU), and compare the performance of these four combinations with sequential spatial channel combination. As can be seen in Table 11, the sequential spatial channel combination outperforms the other four combinations on the UP dataset, whether OA, AA or Kappa. Therefore, the sequential space composition (SRU+CRU) strategy was adopted to compose SCConv to further improve the model performance.

thumbnail
Table 11. Ablation experimental results of different sequential spatial channel combinations on the UP dataset.

https://doi.org/10.1371/journal.pone.0322345.t011

To assess the validity of ZCA, we respectively employed PCA to select 20, 40, 60 spectral bands for the ablation experiments. The experimental results are presented in Table 12. From the results, it can be observed that using ZCA to retain all band information significantly enhances the classification performance compared to using PCA to select the principal bands, which fully demonstrates its validity and superiority. Although more parameters and computational efforts are introduced, the increase is relatively minor and acceptable.

thumbnail
Table 12. Ablation experimental results of different numbers of channels on the UP dataset.

https://doi.org/10.1371/journal.pone.0322345.t012

Furthermore, to validate the hypothesis that normalization methods other than BN are more stable under smaller batch sizes and to further explore their impact on the model, we replaced GN with BN, LN, and IN in the SRU and recorded the training accuracy and loss curves on the UP dataset for batch sizes of {2, 4, 8, 16, 32, 64}, as shown in Fig 16. From the figure, it can be observed that when the batch size is 2, 4, 8, and 16, the curves corresponding to GN, LN, and SN converge faster, whereas the BN curves exhibit significant oscillations and lack smoothness. In contrast, when the batch size is 32 or 64, the BN curves become smoother and converge by the 50th epoch, while the GN, LN, and SN curves converge more slowly and exhibit oscillations. This behavior can be attributed to the fact that LN, IN, SN, and GN share a similar normalization computation approach, as they are all batch-independent normalization methods. Unlike BN, which relies on cross-sample statistics for computing the mean and variance, these methods normalize based on individual samples or subsets of features, making them more stable in small-batch training or scenarios with significant distribution variations.

thumbnail
Fig 16. The training accuracy and loss curves of SRU using different normalization methods on the UP dataset.

(a,b) BN. (c,d) LN. (e,f) IN. (g,h) SN. (i,j) GN.

https://doi.org/10.1371/journal.pone.0322345.g016

As shown in Table 13, training with smaller batch sizes requires more time. This is because, during each backpropagation step, model weights are updated based on the loss of the current batch. A smaller batch size results in more frequent parameter updates, leading to an increase in overall training time. Although replacing GN with LN, IN, or SN also provides good stability and performance, the training time is slightly longer compared to using GN. This is because LN and IN involve more fine-grained computations when calculating the mean and variance, whereas GN, benefiting from its grouped normalization approach, is often more efficient under optimizations provided by modern deep learning libraries such as cuDNN. Additionally, SN requires weighted computations across different normalization methods, further increasing computational complexity.

thumbnail
Table 13. The training time of SRU using different normalization methods on the UP dataset.

https://doi.org/10.1371/journal.pone.0322345.t013

Therefore, considering both normalization stability and computational efficiency, we selected GN as the normalization method and set the batch size to 16 as the training standard, achieving a balance between shorter training time and better model stability.

5 Conclusion

This paper proposes SCNet, a novel lightweight model for hyperspectral image (HSI) classification. Firstly, unlike most classification methods that employ PCA dimensionality reduction to process the original 3D pixel data, this paper utilizes ZCA whitening operation to not only preserve all band features but also maintain the transformed data closer to the original input data. Additionally, by introducing Spatial Reconstruction Unit (SRU) and Channel Reconstruction Unit (CRU), they can reduce space and channel redundancy in the convolutional layer while implementing a lightweight strategy. Through ablation experiments, it is demonstrated that arranging SRU and CRU sequentially to form SCConv blocks enables optimal performance of our model. Moreover, considering the multi-level disparity of feature maps at different depth layers, MLFF unit is designed to aggregate hierarchical information for generating more representative features. Our approach exhibits several advantages over other state-of-the-art deep learning algorithms and lightweight networks including enhanced classification performance and improved model efficiency.

Furthermore, by conducting comparative experiments, we demonstrate the efficacy and superiority of the proposed SCNet. However, certain areas for improvement have also been identified. For instance, our testing time does not exhibit a comparable advantage to FLOPs in the comparison. This discrepancy may be attributed to frequent data accesses and memory transitions that potentially hinder the computational pipeline from achieving its utmost performance as sample size increases. Consequently, future work will focus on integrating hardware architecture to optimize our approach further and minimize model testing time while identifying a network better suited for hardware.

References

  1. 1. Kang X, Wang Z, Duan P, Wei X. The potential of hyperspectral image classification for oil spill mapping. IEEE Trans Geosci Remote Sens. 2022;60:1–15.
  2. 2. Nisha A, Anitha A. Current advances in hyperspectral remote sensing in urban planning. In: 2022 Third international conference on intelligent computing instrumentation and control technologies (ICICICT). IEEE. 2022. p. 94–98.
  3. 3. Barbedo JGA. A review on the combination of deep learning techniques with proximal hyperspectral images in agriculture. Comput Electron Agric. 2023;210:107920.
  4. 4. Shang W, Wu Z, Xu Y, Zhang Y, Wei Z. Hyperspectral supervised classification using mean filtering based kernel extreme learning machine. In: 2018 Fifth international workshop on earth observation and remote sensing applications (EORSA). IEEE. 2018. p. 1–4.
  5. 5. Deborah H, Richard N, Hardeberg JY. Spectral ordering assessment using spectral median filters. In: Mathematical morphology and its applications to signal and image processing: 12th International symposium, ISMM 2015. Berlin: Springer; 2015. p. 387–397.
  6. 6. Dong W, Xiao S, Li Y. Hyperspectral pansharpening based on guided filter and Gaussian filter. J Vis Commun Image Represent. 2018;53:171–9.
  7. 7. Licciardi G, Marpu PR, Chanussot J, Benediktsson JA. Linear versus nonlinear PCA for the classification of hyperspectral data based on the extended morphological profiles. IEEE Geosci Remote Sens Lett. 2011;9(3):447–51.
  8. 8. Ye Q, Yang J, Liu F, Zhao C, Ye N, Yin T. L1-norm distance linear discriminant analysis based on an effective iterative algorithm. IEEE TCSVT. 2016;28(1):114–29.
  9. 9. Falco N, Benediktsson JA, Bruzzone L. Spectral and spatial classification of hyperspectral images based on ICA and reduced morphological attribute profiles. IEEE Trans Geosci Remote Sens. 2015;53(11):6223–40.
  10. 10. Cai Y, Zhang Z, Cai Z, Liu X, Jiang X. Hypergraph-structured autoencoder for unsupervised and semisupervised classification of hyperspectral image. IEEE Geosci Remote Sens Lett. 2021;19:1–5.
  11. 11. Kim DH, Finkel LH. Hyperspectral image processing using locally linear embedding. In: First International IEEE EMBS conference on neural engineering, 2003 conference proceedings. IEEE. 2003. p. 316–319.
  12. 12. Melit Devassy B, George S. Dimensionality reduction and visualisation of hyperspectral ink data using t-SNE. Forensic Sci Int. 2020;311:110194. pmid:32251968
  13. 13. Kessy A, Lewin A, Strimmer K. Optimal whitening and decorrelation. Am Stat. 2018;72(4):309–14.
  14. 14. Hu W, Huang Y, Wei L, Zhang F, Li H. Deep Convolutional neural networks for hyperspectral image classification. J Sens. 2015;2015:1–12.
  15. 15. Makantasis K, Karantzalos K, Doulamis A, Doulamis N. Deep supervised learning for hyperspectral data classification through convolutional neural networks. In: 2015 IEEE international geoscience and remote sensing symposium (IGARSS). IEEE. 2015. p. 4959–4962.
  16. 16. Hamida AB, Benoit A, Lambert P, Amar CB. 3-D deep learning approach for remote sensing image classification. IEEE Trans Geosci Remote Sens. 2018;56(8):4420–34.
  17. 17. Roy S, Krishna G, Dubey SR, Chaudhuri BB. HybridSN: Exploring 3-D–2-D CNN feature hierarchy for hyperspectral image classification. IEEE Geosci Remote Sens Lett. 2019;17(2):277–81.
  18. 18. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770–778.
  19. 19. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. p. 4700–4708.
  20. 20. Zhong Z, Li J, Luo Z, Chapman M. Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans Geosci Remote Sens. 2017;56(2):847–58.
  21. 21. Li R, Zheng S, Duan C, Yang Y, Wang X. Classification of hyperspectral image based on double-branch dual-attention mechanism network. Remote Sens. 2020;12(3):582.
  22. 22. Xue Z, Zhou Y, Du P. S3Net: Spectral–spatial Siamese network for few-shot hyperspectral image classification. IEEE Trans Geosci Remote Sens. 2022;60:1–19.
  23. 23. Cui B, Wen J, Song X, He J. MADANet: A lightweight hyperspectral image classification network with multiscale feature aggregation and a dual attention mechanism. Remote Sens. 2023;15(21):5222.
  24. 24. Paoletti ME, Haut JM, Pereira NS, Plaza J, Plaza A. Ghostnet for Hyperspectral image classification. IEEE Trans Geosci Remote Sens. 2021;59(12):10378–93.
  25. 25. Chen Y. Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. In: Proceedings of the IEEE/CVF international conference on computer vision. 2019. p. 3435–344.
  26. 26. Li J, Wen Y, He L. SCConv: spatial and channel reconstruction convolution for feature redundancy. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023. p. 6153–6162.
  27. 27. Wu Y, He K. Group normalization. In: Proceedings of the European conference on computer vision (ECCV). 2018. p. 3–19.
  28. 28. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. 2015. p. 448–456.
  29. 29. Yuan Q, Zhang Q, Li J, Shen H, Zhang L. Hyperspectral image denoising employing a spatial–spectral deep residual convolutional neural network. IEEE Trans Geosci Remote Sens. 2018;57(2):1205–18.
  30. 30. Zou C, Zhang C, Wei M, Zou C. Enhanced channel attention network with cross-layer feature fusion for spectral reconstruction in the presence of Gaussian noise. IEEE J-STARS. 2022;15:9497–508.
  31. 31. Zhong Y, Hu X, Luo C, Wang X, Zhao J, Zhang L. WHU-Hi: UAV-borne hyperspectral with high spatial resolution (H2) benchmark datasets and classifier for precise crop identification based on deep convolutional neural network with CRF. Remote Sens Environ. 2020;250:112012.
  32. 32. Zhong Y. Mini-UAV-borne hyperspectral remote sensing: From observation and processing to applications. IEEE Geosci Remote Sens Mag. 2018;6(4):46–62.
  33. 33. Feng J, Feng X, Chen J. Generative adversarial networks based on collaborative learning and attention mechanism for hyperspectral image classification. Remote Sens. 2020;12(7):1149.
  34. 34. Melgani F, Bruzzone L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans Geosci Remote Sens. 2004;42(8):1778–90.