Hyperspectral image spectral-spatial classification via weighted Laplacian smoothing constraint-based sparse representation

As a powerful tool in hyperspectral image (HSI) classification, sparse representation has gained much attention in recent years owing to its detailed representation of features. In particular, the results of the joint use of spatial and spectral information has been widely applied to HSI classification. However, dealing with the spatial relationship between pixels is a nontrivial task. This paper proposes a new spatial-spectral combined classification method that considers the boundaries of adjacent features in the HSI. Based on the proposed method, a smoothing-constraint Laplacian vector is constructed, which consists of the interest pixel and its four nearest neighbors through their weighting factor. Then, a novel large-block sparse dictionary is developed for simultaneous orthogonal matching pursuit. Our proposed method can obtain a better accuracy of HSI classification on three real HSI datasets than the existing spectral-spatial HSI classifiers. Finally, the experimental results are presented to verify the effectiveness and superiority of the proposed method.


Introduction
Remote sensing is of paramount importance for several application fields, including environmental monitoring, urban planning, ecosystem-oriented natural resource management, urban change detection, and agricultural region monitoring [1,2]. Hyperspectral images (HSIs), whose structure consists of two spatial dimensions and one spectral dimension [3,4], are generally characterized by hundreds or thousands of continuous observation bands throughout the electromagnetic spectrum with high spectral resolution in the field of remote sensing. The abundance of spectral information in HSI provides an opportunity for the precise classification of ground objects [5,6]. HSI classification, as one of the main challenges in remote sensing technology, has opened new avenues in remote sensing [7][8][9][10]. As a powerful image-processing tool, the support vector machine (SVM) [11][12][13][14] and sparse representation (SR) model [15,16] and its derivative model have attracted much attention for HSI classification [17][18][19][20]. However, the noise and mixed spectral information in HSI cause several theoretical and practical challenges for pixel-wise classification [21][22][23]. A large number of spatial-spectral combined HSI classifiers have been developed in recent decades to incorporate spatial information in the classification. Reference [24] proposed an image patch distance (IPD) that uses the observed pixels and spatial neighbors to measure the pixel patch-wise similarity. Reference [17] presented a joint sparse representation (JSR) model, which first defined a local region of fixed size for each test pixel. Reference [25] reported that a multiscale adaptive sparse representation (MASR) model, which considers the regions of different scales for classification, can further improve classification performance. Reference [26] showed a class-dependent SR classifier for HSI classification, which can effectively combine the SR and k-NN classifiers models in a class-wise manner to exploit both the correlation and Euclidean distance between training and test data. As the traditional joint k-NN algorithm holds, the weight of each test sample in a local region is identical, which is not reasonable because each test sample may have different importance and distribution. To solve this problem, Reference [27] recommended a weighted joint nearest neighbor and sparse representation method, named WJNN-JSR, and can achieve better performance than several traditional joint k-NN methods. More recent HSI classification techniques can be found in references [28][29][30][31][32][33][34][35]. With respect to the above descriptions, it can be concluded that all these methods ignore the boundary information of different features in the HSI. The common shortcomings hindered the achievement of more satisfactory classification accuracy.
Motivated by the above-mentioned discussions, by combining the spectral and spatial information, we propose a new classification algorithm for HIS, which is termed the weighted Laplacian smoothing constraint-based sparse representation (WLSC-SR) classifier. The primary contributions of this study are as follows.
(1) Inspired by the existing ordinary vector Laplacian, a smoothing-constraint weighted Laplacian vector is constructed, which consists of the interest pixel and its four nearest neighbors through their weighting factors [36][37][38].
(2) By forcing the weighted Laplacian vector of the pixel of interest to be zero, a new large block sparse dictionary for sparse representation is developed.
(3) In contrast to several earlier studies, the boundary characteristics of the HIS were fully used.
Experiments on three real HSI datasets were conducted and compared with several stateof-the-art spectral-spatial HSI classification classifiers to evaluate the performance of the proposed WLSC-SR method. The results show that WLSC-SR can substantially improve the accuracy of the HSI classification.

Related works
In this section, we outline the basic theory for WLSC-SR.

Sparse representation
The mathematical essence of sparse representation (SR) is signal decomposition under sparse normalization constraints [39][40][41]. A few atoms with the best linear combination are found in the dictionary to represent a signal using the super-complete dictionary of redundant functions as the basis function. It is demonstrated that the HSI pixel can be represented as a linear combination of training pixels from all classes by an unknown test.
Let y 2 R h×l be a test sample in HSI, the label set of the whole training set be T = {1,2,3, � � �, s}, where h denotes the spectral dimension of the HSI, and s is the number of training samples. Therefore, a dictionary D 2 R h×s can be constructed using the spectrum sets of y and T. Each base of the redundant dictionary D is called an atom. Therefore, y can be represented by a linear combination of atoms of D. However, a linear combination is unlikely to be unique. The sparsest coefficient can help us find a better linear combination. Assuming that there is no noise in the HSI, then the SR model of the clean sample y is defined as: However, there must be noise in the HSI, and the SR model for noisy data can be defined as: Using the Lagrangian multiplier method, the SR model can be regularized as: where σ is the error tolerance and γ denotes the regularization parameter. Generally, the orthogonal matching pursuit (OMP) or simultaneous orthogonal mutation pursuit (SOMP) algorithm is used to calculate formula (3). When the original signal is atomic, the OMP can be unified as the SOMP. The SOMP is selected according to the characteristics of the WLSC-SR algorithm.

Image patch distance
In hyperspectral imagery, the pixels within a small neighborhood usually consist of similar materials whose spectral characteristics are highly correlated.Based on this fact, the image patch distance (IPD) exploits the observed pixels and corresponding spatial neighbors to measure the pixel patch-wise similarity.
For the observed pixel x ij , its ω 2 neighbors in the ω × ω spatial neighborhood can be defined as: in which r = (ω -1)/2.
Let a l and b l be the lth elements of the pixel sets O(x ij ) and O(x pq ). The distance between a l and spatial neighborhood O(x pq ) is defined as dða l ; Ωðx pq ÞÞ ¼ min b2Ωðx pq Þ dða l ; bÞ, and the undirected distance between two pixels a l and b l can be defined as follows: where d(�) is a spectral similarity function, such as the Minkowski Distance (MD) and spectral cosine distance (SCD).
Then, the similarity between the observed pixels x ij and x pq can be defined as follows: This spatial-spectral similarity measure combines the spatial and spectral features into distance, which improves the classification accuracy.

Proposed classifier
The pixels in the HSI dataset are high-dimensional vectors that reflect the spectrums of the ground objects. The spectrum vectors of the same class label are more likely to be similar to those of the different class labels. Based on this assumption, the WLSC-SR method exploits the spatial neighborhood to extract spatial-spectral information.

Procedure of WLSC-SR
The WLSC-SRattempts to construct a smoothing-constraint Laplacian vector, which is forced to be zero. The vector consists of the sparse vector of the pixel of interest and its four nearest neighbors through their weighting factor, by which the aggregation of homogeneous data and the separability of heterogeneous data in HSI can be effectively enhanced. Based on the smoothing-constraint Laplacian vector, a new large block sparse dictionary for SOMP is constructed with six times as many rows and five times as many columns as the original dictionary. Furthermore, the WLSC-SR classifier distinguishes the boundaries of adjacent types of ground objects in HSI, which is beneficial for HSI classification. The procedure for the proposed method is shown in Fig 1.

Weighted Laplacian smoothing constraint
We assumed that the size of the HSI was k × l × h. The spectrum vector x ij (h,l) 2 R h × l represents the pixel in row i and column j in the HSI. At first, we choose a spatial window parameter ω, which is an odd positive integer, and then construct a ω × ω spatial window with central pixel x ij . At the same time, the HSI boundaries are extended by (ω -1)/2 pixels in a mirror manner, which is convenient for processing pixels at the edge or corner of the image.
We can obtain the IPD between pixel sets O(x ij ) and O(x pq ). However, generally, the IPD method requires a large spatial window to exploit the spatial information in HSI, while the time-consuming steps in the iteration process limit its real applications.
The IPD between pixel sets O(x ij ) and O(x pq ) is replaced by the distance between the central pixels in the pixel sets to simplify the IPD calculation method. Thus, the image patch distancebased center (IPDC) calculation method is defined as follows: Thus,the weighting factor can be defined as: where the trade-off parameter t > 0 controls the proportion of spatial information, and x ij and x pq are normalized tox ij andx pq , respectively. These factors affect the value of the reconstruction weight W.
Using the weights obtained, we constructed the weighted Laplacian vector. Let x st be the four nearest neighbors of x ij , where s = i-1, i + 1; t = j-1, j + 1, as shown in Fig 2. Let be α ij be the sparse vector associated with x ij (i.e., Dα ij = x ij ). Then, we construct the weighted Laplacian vector at the pixel x ij as: where w 1 ¼ S 5 i¼2 w i . To incorporate the smoothness across the neighboring spectral pixels, 2 (x ij ) is set to zero, based on which a new large block sparse dictionary for SOMP is constructed with six times as many rows and five times as many columns as the original dictionary. Taking the Indian pines dataset with 10% training samples as an example, the dimension of the original dictionary was 200 × 1027, and the block sparse matrix dimension we built was 1200 × 5135. Then, the optimization problem in (1) can be redefined as a new sparse recovery problem with the Laplacian smoothing constraint, and it is formulated as: where D L arg e BlockSparse ð13Þ Normally, we can assign, w 1 = 1, and w i (i = 2, . . .,5) are normalized asw For the pixels at the center, all weights are present. However, for the pixels on the edge or corner, some weights will not be present, which will cause an imbalance. To avoid the imbalance, we assign w i = 0.25 (i = 2, . . ., 5).
We all pointed out that the L0 norm is a non-deterministic polynomial hard (NP-hard) problem, while the L1 norm is the optimal convex approximation of the L0 norm, and the L1 norm is easier to solve than the L0 norm. Additionally, the equality constraints in (11) cannot be satisfied completely, it allows approximation error, thus the problem can be written as: where γ denotes the regularization parameter. The problem in (15) is a standard sparse recovery problem, and SOMP can be implemented to solve it. Once the problem in (15) is solved, the total class-dependent reconstruction residuals between the original test samples and the approximations obtained from each of the K class sub-dictionaries can be calculated as:

PLOS ONE
Hyperspectral image spectral-spatial classification test sample is x ij assigned to the class that minimizes the residual:

Datasets
Three well-known publicly available HSI datasets, namely the Indian Pines, University of Pavia, and Salinas, were used to evaluate the performance of WLSC-SR in this study. The number of samples in the Indian Pines, Pavia University, and Salinas scene images are shown in Table 1, in which the background color was used to distinguish different classes.

Quantitative metrics
Normally, overall accuracy (OA), average accuracy (AA), class accuracy (CA), and Kappa coefficient are adopted to evaluate the quality of the classification results of HSI.OA refers to the ratio between the number of correctly classified categories and the total number of categories, AA represents the mean of the percentage of correctly classified pixels for each class, CA measures the separate classification accuracy of various ground objects in the dataset, and the Kappa coefficient estimates the percentage of classified pixels corrected by the number of agreements that would be expected purely by chance. It is believed that the classification performance of the classifier is good when the Kappa coefficient is greater than 0.75. However, when the Kappa coefficient is less than 0.40, the performance is poor [10,42].

Parameter analysis
In the proposed classification method, there are two primary impact parameters: the sparsity level Sl and the WLSC-SR model test region scale Sc, which can affect the classification performance from different aspects. Experiments on the Indian Pines, Pavia University, and Salinas showed the OAs of different Sl and Sc, based on which the optimal parameters were determined. Fig 3A-3C show the effects of Sl and Sc in the three datasets, respectively. The optimal classification result is shown in the graph. As shown in Fig 3, when the value of the test region scale Sc is fixed, the OA for Indian Pines, Pavia University, and Salinas Scene can consistently achieve the best performance when the sparsity level Sl is 1 or 2. As Sl increases, the solution of (16) converges to the pseudo inverse solution, which is no longer sparse, which deteriorates the classification performance. Additionally, when the sparsity level Sl is small, if Sc is too large, the neighboring pixels cannot be faithfully approximated by a few training samples. In other words, the OA is reduced. Moreover, the value of Sc for a large dataset is relatively large, and vice versa. For example, when Sl for Indian Pines, Pavia University, and Salinas Scene are 1, 2, and 3, respectively, the OA can obtain the best value when Sc is 5, 7, and 5, respectively. On the contrary, the experimental results show that when Sc is equal to 40, the classification performance is far worse than the best value.
Additionally, since the Pavia University image is larger than the Indian Pines and Salinas Scene image, OA obtains the best value when Sc = 7 in Pavia University image classification. For Indian Pines and Salinas Scene images, the corresponding Sc is equal to 5.

Comparison of different classifiers
In this section, the proposed methods are compared with the SVM method [5], JSR classification method [17], SR classification method [16], and sparse representation nearest neighbor (NN-SR) classification method [18]. Additionally, the original nearest neighbor classification methods, such as multiscale adaptive sparse representation (MASR) [19] as well as the joint sparse representation joint nearest neighbor(JNN-JSR) [27], are also compared with the joint sparse representation weighting joint nearest neighbor method (WJNN-JSR) [27]. These classification methods were implemented using optimal parameters.
Three different experiments were conducted on three different datasets: Indian Pines, Pavia University, and Salinas. For each class of every dataset, 30% of the labeled pixels were randomly sampled for training, while the remaining 70% were used to test the classifiers. Figs 4-6 illustrate different classification maps obtained by different methods on different datasets.
The first experiment was performed using the Indian Pines dataset. Table 2 shows the classification performance with the corresponding OA, AA, and Kappa values. The bold values indicate the best classification accuracy. As can be observed, the classification maps of the SVM and SR methods have a very noisy appearance. By considering the spatial context, the JSR, MASR, NN-SR, and WJNN-JSR algorithms can deliver a comparatively smooth result but fail to detect meaningful regions. Although the JNN-JSR algorithm shows improvements in detecting the details, some noisy behavior will exist on the obtained classification maps for these approaches. In contrast, the proposed WLSC-SR algorithm has a limited improvement in the average classification accuracy, denoising, and misclassification at the edges of the data, and the overall scene is significantly reduced. Therefore, according to the classification results, the proposed method still has advantages in terms of OA and kappa values. For example, compared with other methods, the WLSC-SR algorithm achieves the highest classification accuracy in classes 2, 4, 8, 9, 11, and 12. Additionally, OA and Kappa reached their highest values. The second and third experiments were conducted on the Pavia University and Salinas datasets, respectively. The training sample selection was the same as in the first experiment. Table 3 presents the classification performance with the corresponding OA, AA, and Kappa values for Pavia University. As shown in the table, the proposed WLSC-SR algorithm obtains higher accuracy than the other compared methods in terms of OA, AA, and Kappa. These spectral-spatial joint algorithms, such as JSR, MASR, NN-SR, JNN-JSR, WJNN-JSR, and WLSC-SR, perform better than SVM and SR which only use spectral information. For example, the OA of the SR algorithm is only 72.01%, and compared with the SR algorithm, the OA of the WJNN-JSR and WLSC-SR algorithms were improved by 25.41% and 27.17%, respectively. Table 4 presents the classification performance with the corresponding OA, AA, and Kappa values for the Salinas image. It can be seen that because WJNN-JSR effectively utilizes multiscale spatial information through an adaptive sparse strategy, the AA of WJNN-JSR has been significantly improved, but some noise still exists around the boundary of different ground

Computational complexity
Experiments were performed using MATLAB 2018b on a computer with an Intel-2.60GHz CPU, 16GB memory, and a 64-bit Windows 7 system. On three real HSI datasets, complete execution of our algorithm may take several minutes to several hours, but the other compared methods in this study do not take that long. Specifically, the main computational cost of this method is the operation of weighted parameters and the large block sparse dictionary in SOMP. With the development of computing hardware and cloud computing technology, we believe that the consumption time will be significantly reduced. Additionally, the ideal parameters or hyperparameters used by the various algorithms for the results are listed in Table 5.

Conclusions
In this context, we proposed a new classification method for HSI. The proposed WLSC-SR strengthens the spatial information between the center pixel and its four nearest neighborpixels by constructing a smoothing constraint Laplacian vector. The vector can overcome the boundary characteristics of adjacent ground objects in the HSI. Experiments on three real HSI datasets revealed that the proposed WLSC-SR method outperforms several other well-known classifiers in terms of OA, AA, Kappa, and visual comparison of classification maps. Finally, we verified the effectiveness and superiority of WLSC-SR. Another method that the authors will explore in future work to further improve the classification accuracy is employing discriminative learning algorithms and optimizing the dictionary structure. Therefore, the focus of our future research is to explore more efficient solutions to optimize this method.