A landslide susceptibility map based on spatial scale segmentation: A case study at Zigui-Badong in the Three Gorges Reservoir Area, China

Xianyu Yu; Huachen Gao

doi:10.1371/journal.pone.0229818

Abstract

China experiences frequent landslides, and therefore there is a need for landslide susceptibility maps (LSMs) to effectively analyze and predict regional landslides. However, the traditional methods of producing an LSM are unable to account for different spatial scales, resulting in spatial imbalances. In this study, Zigui-Badong in the Three Gorges Reservoir Area was used as a case study, and data was obtained from remote sensing images, digital elevation model, geological and topographic maps, and landslide surveys. A geographic weighted regression (GWR) was applied to segment the study area into different spatial scales, with three basic principles followed when the GWR model was applied for this propose. As a result, 58 environmental factors were extracted, and 18 factors were selected as LSM factors. Three of the most important factors (channel network basic level, elevation, and distance to river) were used as segmentation factors to segment the study area into 18 prediction regions. The particle swarm optimization (PSO) algorithm was used to optimize the parameters of a support vector machine (SVM) model for each prediction region. All of the prediction regions were merged to construct a GWR-PSO-SVM coupled model and finally, an LSM of the study area was produced. To verify the effectiveness of the proposed method, the outcomes of the GWR-PSO-SVM coupled model and the PSO-SVM coupled model were compared using three evaluation methods: specific category accuracy analysis, overall prediction accuracy analysis, and area under the curve analysis. The results for the GWR-PSO-SVM coupled model for these three evaluation methods were 85.75%, 87.86%, and 0.965, respectively, while the results for the traditional PSO-SVM coupled model were 68.35%, 84.44%, and 0.944, respectively. The method proposed in this study based on a spatial scale segmentation therefore acquired good results.

Citation: Yu X, Gao H (2020) A landslide susceptibility map based on spatial scale segmentation: A case study at Zigui-Badong in the Three Gorges Reservoir Area, China. PLoS ONE 15(3): e0229818. https://doi.org/10.1371/journal.pone.0229818

Editor: Claudionor Ribeiro da Silva, Universidade Federal de Uberlandia, BRAZIL

Received: September 30, 2019; Accepted: February 16, 2020; Published: March 11, 2020

Copyright: © 2020 Yu, Gao. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The original data of all the map data used in this paper are confidential in China, including the 1: 10,000-scale landslide hazard map, 1: 50,000-scale topographic map, and 1: 50,000-scale geological map, we have no way to obtain them online in publicly. However, after declassification processing, these map data used only in this study area can be obtained by contacting the corresponding author and Youjian Hu, professor and doctoral tutor at the School of Geography and Information Engineering, China University of Geosciences (Wuhan) (imxg@cug.edu.cn). All other data (landsat-8 data, ASTER GDEM data, seismic activity data and atmospheric rainfall data) can be downloaded through the download link in the paper.

Funding: Funded by X.Y. 41807297; Q20171410. National Natural Science Foundation of China; Hubei Provincial Department of Education. http://www.nsfc.gov.cn/; http://jyt.hubei.gov.cn/. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Located on the eastern edge of the Asian continent, China, with active geological tectonic movements and a complex geological environment, is a country that experiences frequent geological disasters such as landslides. During the 15 years from 2002 to 2016, 275,787 landslides occurred in China, accounting for 73.0% of all geological disasters in the country. In 2018 alone, there were 2,966 geological disasters around the country, including 1,631 landslides, i.e., 55.0% of all geological disasters that year.

A landslide susceptibility map (LSM), as a non-deterministic method of prediction, is currently the main method used for the prediction of regional landslides. Using the engineering geological analogy method, an LSM can be obtained through the use of a mathematical model to determine and assign the degree of importance of the LSM factors that cause landslides. Mondal and Mandal used a logistic regression (LR) model to evaluate landslide susceptibility in the Balason River Basin in the Indian Darjeeling region of the Himalayas. The result showed that the LR model can be used for landslide hazard research and decision making [1]. Wang et al. compared several methods for constructing an LSM, such as the frequency ratio (FR), LR, decision tree (DT), weights of evidence (WE), and artificial neural network (ANN) in a study in Mizunami, Japan, and found that LR had the best area under the curve (AUC) value [2]. Aditian, Kubota and Shinohara used three methods, FR, LR, and ANN, in a study of landslides triggered by heavy rains in the Ambon region of Indonesia: the study showed that the ANN had the best results among these three methods, and was the best method for interpreting the relationship between landslide and LSM factors [3]. Saro et al. used two methods, LR and ANN, for the construction of an LSM in Inje City, South Korea, with the results indicating that the accuracy of the ANN was higher than that of the LR model [4]. Hong et al. compared the effects of four support vector machine (SVM) models based on different kernel functions in the LSM by taking Luxi City, Jiangxi Province, as a study area. The results showed that the SVM models using these four different kernel functions achieved good results, with the model using a radial basis function (RBF) as the kernel function having the best effect, regardless of the success rate or prediction rate [5]. Pham et al. conducted an LSM study in Pauri Garhwal, India, and compared the SVM model with four Bayesian algorithms: the naive Bayes tree, Bayes network, naive Bayes, and decision table naive Bayes models. The analysis results showed that the SVM model had the best predictive performance [6]. Despite having achieved acceptable results in their application, such methods tend to ignore the spatial distribution of landslide hazards and extends them to the entire study area without considering the spatial applicability of the models. This affects the selection and assignment of important evaluation factors, and thus reduces the accuracy of the LSM.

To overcome the above problems, LSM methods that consider the spatial scale of landslides have emerged. About 20 years ago, Fell et al. published LSM guidelines. The authors believed that landslides of different scales should be evaluated at the corresponding spatial scale, and that the selection of the LSM factors should have a scale that is compatible with the spatial scale [7]. In the same year, Cascini affirmed the guidelines proposed by Fell et al. and focused on the applicability of the susceptibility and hazard zoning of landslides at different scales. In this study, according to the scales and applications of landslide zoning, landslides were divided into two categories: small & medium scales and large & detailed scales, and the results indicated that the guidelines were a “powerful tool for landslide and hazard zoning at different scales” [8]. Paudel, Oguchi and Hayakawa extracted the best scale of each LSM factor using the random forest model, and then constructed an LSM. Their experimental results for Niigata and Ehime prefectures in Japan showed that a multi-scale LSM model was superior to a traditional model [9]. Schlögel et al. extracted LSM factors using a digital elevation model (DEM) with different precisions (5, 10, and 25 m), and the experiments found that the LSM factors for the DEM with 10-m precision was the best data combination for acquiring the LSM [10]. These methods explore the relationship between the spatial scale of the LSM and the selected data accuracy, sampling accuracy, and applicable range, and promotes research on the spatial scale of LSMs. However, these methods weaken the concept of spatial scale in the scale or resolution of an LSM, and do not analyze the differences between the LSM at different spatial scales or consider the essential importance of such differences. They also ignore the definitiveness of spatial scale to the production of an LSM.

Some researchers have used the geographic weighted regression (GWR) model to overcome these problems. Zhang et al. employed the GWR model, and compared it with the traditional LR models when producing an LSM of the Three Gorges Reservoir Area. Among the six evaluation indicators considered in their study, the GWR model achieved the best outcome [11]. In the following year, Hong et al. plotted a zoning plan for an LSM in Xingguo County, Shanxi Province using the GWR model and compared it with the traditional LR and SVM models. The results indicated that the GWR model had the highest success rate and prediction accuracy [12]. In the same year, Matsche studied the western part of Oregon and determined that the precision of the GWR model was 6.2% higher than that of the LR model [13]. The use of only the GWR model as an ordinary LSM prediction model improved the LSM to some extent, and enabled spatial scale problems to be considered in an LSM study, but it failed to reveal the essence of the spatial imbalance of the LSM.

In this study, we quantitatively expressed the spatial scale concept of an LSM when studying the spatial scale problem, introduced the concept of spatial scale into the study of LSMs, and built a GWR—particle swarm optimization—SVM (GWR-PSO-SVM) coupled model, to determine the root cause of the impact of spatial scale on an LSM. The aim was to explain the spatial imbalance problem of an LSM, and improve its scientific applicability, accuracy, and reliability.

The remainder of this paper is organized as follows. Section 2 describes the study area and data used in this work. Section 3 reviews the algorithms and model used in this work. Section 4 presents the process used to establish the GWR-PSO-SVM coupled model. Section 5 reports the experimental results, including a comparison between the traditional PSO-SVM coupled models and our new model. Section 6 is a discussion of our model and the final section presents our concluding remarks.

Study area and data sources

Study area

In this work, the Zigui-Badong in the Three Gorges Reservoir Area was used as a study area (Fig 1). In terms of topography and geomorphology, the study area is located in the eastern part of the two natural geography units of the Three Gorges Reservoir Area. The area is a basin, and the topography along the river has the characteristics of being low in the middle and high on both banks. In terms of geology, the strata in the study area are fully developed, and only the Lower Devonian, the Upper Silurian and Carboniferous, most of the Cretaceous and a small amount of Tertiary strata are deficient (Fig 2) [14]. Geological disasters occur frequently in the study area, with landslides being the most prominent type of geological disaster. There have been 202 proven landslides in the study area, covering a total area of 23.4 km2, accounting for 6.03% of the entire study area [15].

Download:

Fig 1. Geographical location of the study area.

https://doi.org/10.1371/journal.pone.0229818.g001

Download:

Fig 2. Geological map of the study area.

https://doi.org/10.1371/journal.pone.0229818.g002

Data Source

The following data were used in this study:

➢. 1: 10,000-scale landslide hazard map [15].
➢. Landsat-8 operational land imager (OLI+) sensor data, acquired on 15 September 2013, with a path/row number of 127/39 (https://earthexplorer.usgs.gov/);
➢. Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM) data (https://lpdaac.usgs.gov/tools/data-pool/);
➢. 1: 50,000-scale topographic map and 1: 50,000-scale geological map [14];
➢. Seismic activity data and atmospheric rainfall data from the China Earthquake Administration (CEA, http://www.ceic.ac.cn/history) and the China Meteorological Administration (CMA, http://data.cma.cn/data/cdcdetail/dataCode/SURF_CLI_CHN_MUL_DAY_V3.0.html).

The spatial resolution of the remote sensing (RS) data and the GDEM data was 30 m, and the 1: 10,000-scale landslide hazard map, the 1: 50,000-scale topographic and geological maps could match these data in terms of spatial resolution. The seismic activity and atmospheric rainfall were point data, which had a temporal resolution but no spatial resolution.

Methods

The GWR model

Fotheringham et al. first proposed GWR as a method to study the quantitative relationship between two or more variables with spatial distribution characteristics using the regression principle [16]. Local features are used as weights to change the multicollinearity in the global regression model [17, 18]. The related functions are defined as follows: (1) where (u_i, v_i) are the spatial coordinates of the i-th sample; L and Q are the number of samples and regression coefficients, respectively; y_i is the independent variable of the function at point i, x_ik is the value of the k-th explanatory variable of point i; β_k(u_i,v_i) is the local regression parameter of the k-th explanatory variable of point i; and β₀(u_i,v_i) is the intercept parameter of point i. The least squares estimate for β_i is as follows: (2)

The variance is: (3) where W_i is a diagonal matrix of n dimension, which is called the spatial weight matrix and is the core of the GWR model. The value on the diagonal is the geographic weight: (4)

The W_i is chosen based on the choice of kernel function, and the selection of the spatial weight function has a large influence on the parameter estimation of the GWR model.

The SVM model

The SVM model was first proposed by Vapnik [19]. The model, established on the basis of the Vapnik–Chervonenkis dimension theory and structural risk minimization principle, has many unique advantages in solving small sample, nonlinear, and high-dimensional pattern recognition problems [20, 21]. Its function is defined as follows: (5) where x_i is a point on the hyperplane; y_i is the classification marker, i = 1,2,⋯,R; R is the number of samples; w is a vector perpendicular to the hyperplane; b is a constant that is applied to prevent the hyperplane from passing the origin of the coordinate axis; and ‖w‖ is the 2- norm of w. When formula (5) introduces a non-negative slack variable ξ_i, a penalty factor C must be introduced to represent the distance from a misclassified point to its correct position. Therefore, the formula (5) can be expressed as: (6)

The RBF can be selected as the kernel function of the SVM, and is used to map the vector of the low-dimensional space into the high-dimensional characteristic space for classification. The function is expressed as: (7) where, γ is the kernel parameter of different radial basis functions.

The PSO algorithm

The performance of the SVM model relies heavily on two parameters, the penalty factor C and the kernel parameter γ. The most common method for selecting these two parameters is to use the PSO algorithm to find the optimal solution of the model. Eberhart and Kennedy first proposed the PSO as an intelligent optimization algorithm that mimics bird foraging [22–24]. Its function form is: (8) where i =1,2,⋯,K; K is the number of particles; n is the current number of iterations; t is the inertia weight; is individual optimal position of the i-th particle; is the optimal position of all particles in the n-th iteration; and are the velocity and position of the i-th particle in the n-th iteration; and are the speed and position at which the i-th particle is updated in the (n+1)-th iteration, respectively; c₁ and c₂ are learning factors; and r₁ and r₂ are two random numbers between 0 and 1.

Evaluation models

Specific category accuracy analysis.

The specific category accuracy analysis method is an improved quantitative analysis method [25]. In this study, the specific category accuracy method considers the number of slope units in the prediction regions. It can be expressed as: (9) where, i = 1,2,⋯,n; n is the classification number of landslide-prone zonings; A_i is the number of slope units occupied by landslides in i-th landslide susceptibility zoning classification; B_i is the number of the slope units in i-th landslide susceptibility zoning classification; and P_i is the specific category accuracy in the i-th landslide susceptibility zoning classification.

Overall prediction accuracy analysis.

The overall prediction accuracy analysis is a commonly used evaluation method for the construction of an LSM. In this study, the original formula was rewritten because there were no landslides in some prediction regions. It was expressed as: (10) where i = 1,2,⋯,n_pr; n_pr is the number of prediction regions; a_i is the number of slope units correctly predicted as landslides in the i-th prediction region; b_i is the number of slope units correctly predicted as non-landslide areas in the i-th prediction region; and S_i is the number of total slope units in the i-th prediction region.

Receiver operating characteristic (ROC) curve analysis.

Each point on the ROC curve reflects the susceptibility to the same signal stimulus, with the X- axis representing the negative positive rate specificity and the Y- axis representing the true positive rate sensitivity [26, 27]. There are four possible cases for a binary classification problem, as shown in Table 1.

Download:

Table 1. The four cases for a binary classification problem.

https://doi.org/10.1371/journal.pone.0229818.t001

The AUC refers to the area under the ROC curve. It ranges between 0–1 and its value can be used to intuitively evaluate the quality of the classifier.

The proposed model

Coupled model for the LSM based on spatial scale segmentation

By taking the spatial autocorrelation of LSM factors as the breakthrough point, this study regarded the GWR coefficients of the LSM factors as the mathematical basis for the segmentation of the study area. Three basic principles were followed to ensure the rationality of segmentation. First, 58 environmental factors were extracted from the data sources, 18 factors were selected as LSM factors after factor screening, and three of the most important factors were used as segmentation factors to segment the study area into 18 prediction regions. Then, the SVM parameters were optimized by the PSO algorithm, and an LSM for each prediction region was obtained. Finally, all the prediction regions were integrated to establish the LSM model with spatial scale segmentation. A flowchart of the coupled model for the LSM based on a spatial scale analysis was established, as shown in Fig 3.

Download:

Fig 3. A flowchart of the establishment of the coupled model for the LSM based on spatial scale segmentation.

Abbreviations in this figure: GWR = geographically weighted regression, PSO = particle swarm optimization, SVM = support vector machine, LSM = landslide susceptibility map, ROC = receiver operation characteristic, DEM = digital elevation model, CA = catchment area, FPL = flow path length, TWI = topographic wetness index, ELA = engineering lithologic assemblage, DF = distance to fault, BS = bedding structure, ARVI = atmospherically resistant vegetation index, DVI = difference vegetation index, GVI = green vegetation index, EWI = enhanced water index, MNDWI = modified normalized difference water index, NDMI = normalized difference moisture index, MNDBI = modified normalized difference building index, ULI = urban land-use index, AAR = average annual rainfall, SM = slope morphology, SL = slope length, CNBL = channel network basic level, DR = distance to river.

https://doi.org/10.1371/journal.pone.0229818.g003

Selection of LSM calculation units

According to Guzzetti et al., all LSM calculation units can be summarized as either grid cells, geographic units, unique conditional units, slope units, or sub-basin units [28]. In this study, the slope unit was selected as the LSM calculation unit. After calculation and modification, a total of 2,790 slope units were obtained in the study area, of which the smallest area was 11,823.9 m² and the largest was 819,444 m².

Screening of LSM factors

In this work, through an analysis of historical landslide data and a summary of previous research in the study area, the LSM factors were divided into two categories: control factors and influencing factors. The control factors included geomorphological, geological, and hydrological factors. The influencing factors included surface cover index, geophysical, and meteorological factors. In this study, based on the geological and topographic maps, RS image data, field survey reports, and other data, a total of 58 LSM factors in two categories and eight sub-categories were extracted by RS and geographic information system. This is summarized in Table 2.

Download:

Table 2. The initial landslide susceptibility evaluation factors.

https://doi.org/10.1371/journal.pone.0229818.t002

Some of these 58 LSM factors were obtained by the DEM, and had a large correlation with each other. Therefore, not all of these 58 factors were involved in the modeling and calculation of LSM, but they needed to be further analyzed and screened. There were two main steps in the analysis.

Pearson product-moment correlation coefficient (PPMCC) analysis and principal component analysis (PCA).

In this study, a PPMCC analysis was used to analyze the correlations among the LSM factors of five sub-categories (geomorphology, hydrology, vegetation index, wetness index, and building index), and the factors with a significant correlation were deleted [29].

In the three sub-categories other than geomorphology and hydrology, there were strong correlations among multiple factors. In the geomorphology sub-category, the profile curvature, topographic position index (TPI), TPI based landform classification, cross-sectional curvature, general curvature, longitudinal curvature, maximum curvature and minimum curvature factors had strong correlations, and formed a curvature factor combination. The same phenomenon occurred in the vegetation index, wetness index and the building index sub-categories, and formed the vegetation index factor combination, the wetness index factor combination, and the building index factor combination.

To retain the multi-factor effective information and remove the linear correlation of these factor combinations, the PCA method was used [30]. In this study, the first, second, and third principal component of the curvature factor combination (PCCFC-1, 2, 3), the first principal component of the vegetation index factor combination (PCVIFC-1), the first principal component of the wetness index factor combination (PCWIFC-1), and the first principal component of the building index factor combination (PCBIFC-1) were retained. After the PPMCC analysis and PCA, there were 32 factors remaining.

Factor importance screening based on the SVM model.

In this study, the SVM was used as the prediction model for the LSM. The model can determine the importance of each factor according to the degree of contribution of the LSM. Based on this, this study removed the unimportant factors to improve the efficiency and accuracy of the LSM. After repeated experiments and comparisons, in combination with previous research results and based on the LSM factors that played a major role in most landslide studies, the importance threshold of the LSM factors was determined (0.005). Finally, 18 LSM factors were obtained. This is summarized in Table 3.

Download:

Table 3. Final landslide susceptibility evaluation factors after screening.

https://doi.org/10.1371/journal.pone.0229818.t003

The GWR-based segmentation of the study area

Based on the calculation of the GWR coefficients of each LSM factor, the natural breakpoint method was used for classification in this study [32]. To segment the study area, theoretically, the classification results of all the LSM factors should be superimposed to reduce the spatial autocorrelation for each LSM factor. However, due to the excessive number of LSM factors, the superposition of all LSM factor classification results may generate too many small areas, and have a great impact on the subsequent steps. Moreover, too many segmentation areas may also make the spatial distance between the areas smaller, in turn increasing the spatial autocorrelation of the LSM factors. After repeated studies, three basic principles were identified that should be followed when the GWR model was used for spatial scale segmentation:

➢. Select the same appropriate number of classifications for all LSM factors;
➢. Select only the most important LSM factors as the segmentation factors to segment the study area;
➢. In light of the results of spatial scale segmentation, regions that are too small should be merged into adjacent regions, and the integrity of the landslide surface should be guaranteed.

Based on principles 1 and 2, we selected the three most important LSM factors in the SVM model (channel network basic level (CNBL), elevation, and distance to river) for use as the segmentation factors. Each segmentation factors was then divided into two categories by the natural breakpoint method. The final result of the spatial scale segmentation of the study area was superimposed and processed by principle 3, with a total of 18 small areas, which were called 18 prediction regions. The segmentation process is shown in Fig 4.

Download:

Fig 4. The process of spatial scale segmentation based on GWR.

https://doi.org/10.1371/journal.pone.0229818.g004

Establishment of an LSM model based on GWR

For each prediction region, all the landslide slope units and randomly selected non-landslide slope units constituted a training data set (at a 1: 1 ratio) to conduct training of the PSO-SVM coupled model. All the slope units in the region, as the verification sample data set, were input into the trained coupled model, and an LSM was obtained for each prediction region. The optimal solution of the PSO-SVM coupled model for each prediction region is shown in Table 4.

Download:

Table 4. Optimal solutions of the PSO-SVM coupled model for each prediction region.

https://doi.org/10.1371/journal.pone.0229818.t004

Results

Experimental results of the GWR-PSO-SVM coupled model

The LSMs of all prediction regions were combined to obtain an LSM based on spatial scale segmentation, i.e., the LSM of the GWR-PSO-SVM coupled model. The landslide susceptibility index (LSI) is a form of LSM, which is a continuous value from 0 to 1. This is shown in Fig 5.

Download:

Fig 5. The landslide susceptibility index (LSI) produced by the GWR-PSO-SVM coupled model.

https://doi.org/10.1371/journal.pone.0229818.g005

Establishment of a comparative experiment to test the PSO-SVM coupled model

To compare the precision and accuracy of the GWR-PSO-SVM coupled model proposed in this study, and especially to verify the correctness of the study area spatial scale segmentation using the GWR method, a comparative experiment was conducted. To verify the influence of spatial imbalance on the LSM, the PSO-SVM coupled model was used in the comparative experiment. The operational process of the PSO-SVM coupled model was basically the same as that of the GWR-PSO-SVM coupled model, with just the spatial scale segmentation using GWR coefficients removed, and the selection of the LSM factors were consistent with those selected for the GWR-PSO-SVM coupled model. The PSO algorithm determined that the optimal solutions for C and γ in the SVM model were 4 and 1, respectively, and the LSM for the PSO-SVM coupled model was obtained. This is shown in Fig 6.

Download:

Fig 6. The LSI produced by the PSO-SVM coupled model.

https://doi.org/10.1371/journal.pone.0229818.g006

To increase the readability of the LSM, the fixed threshold method was used in this study. Values of 0.1, 0.3, 0.7, and 0.9 were selected as the classification thresholds. The LSI was divided into five categories to obtain the landslide susceptibility zoning (LSZ): very low susceptibility areas, low susceptibility areas, medium susceptibility areas, high susceptibility areas, and very high susceptibility areas. The LSZs from the two experiments are shown in Fig 7 and Fig 8.

Download:

Fig 7. The landslide susceptibility zoning (LSZ) based on the GWR-PSO-SVM coupled model.

https://doi.org/10.1371/journal.pone.0229818.g007

Download:

Fig 8. The landslide susceptibility zoning (LSZ) based on the PSO-SVM coupled model.

https://doi.org/10.1371/journal.pone.0229818.g008

Evaluation model results and analysis

Specific category accuracy analysis.

The specific category accuracy results of the two experiments were calculated using formula (9) and are shown in Table 5.

Download:

Table 5. Specific category accuracy analysis results for the two experiments.

https://doi.org/10.1371/journal.pone.0229818.t005

The results in Table 5 show that the GWR-PSO-SVM coupled model identified more slope units in the “Very High” LSZ category (85.75%) than the PSO-SVM coupled model (68.35%). The GWR-PSO-SVM coupled model was significantly superior to the PSO-SVM coupled model.

Overall prediction accuracy analysis.

The overall prediction accuracy analysis results of the two experiments are shown in Table 6.

Download:

Table 6. Overall prediction accuracy analysis results for the two experiments.

https://doi.org/10.1371/journal.pone.0229818.t006

It can be clearly seen from Table 6 that the overall prediction accuracy of the PSO-SVM coupled model was 84.44%. In the GWR-PSO-SVM coupled model, the prediction accuracy of most prediction regions was greater than that of the PSO-SVM coupled model. The overall prediction accuracy of the GWR-PSO-SVM coupled model was 87.86%, which was more accurate than the PSO-SVM coupled model.

The ROC curve analysis.

In this study, the ROC curve was constructed using the real data of each slope unit as the state variable, and the LSMs at different spatial scales as the test variable, as shown in Fig 9.

Download:

Fig 9. The ROC curves for the two experiments.

https://doi.org/10.1371/journal.pone.0229818.g009

It can be seen from Fig 9 that both of the experiments produced good results. However, in the ROC curve of the GWR-PSO-SVM coupled model, the closest point to the upper left corner was farther from the reference line than for the PSO-SVM coupled model. A qualitative analysis showed that the result of the GWR-PSO-SVM coupled model was better.

For a quantitative analysis, the AUC calculations for the two experiments are shown in Table 7.

Download:

Table 7. Area under the curve (AUC) analysis for the two experiments.

https://doi.org/10.1371/journal.pone.0229818.t007

As shown in Table 7, the AUC value of the GWR-PSO-SVM coupled model was 0.965, i.e., greater than the value of 0.944 for the PSO-SVM coupled model, indicating that in the ROC curve analysis, the result for the GWR-PSO-SVM coupled model was better than that for the PSO-SVM coupled model.

Discussion

Based on previous analyses and LSM characteristics, there were four main reasons for the differences among LSMs: (1) the spatial scale of LSMs; (2) the factors used in the construction of LSMs;(3) the calculation unit used to the construct the LSMs; and (4) the prediction model used to construct the LSMs. In most LSM studies, the factors, calculation units, and prediction models have been considered to be the main reasons for the differences among LSMs. Although these are not the same points considered in this work, this is not a contradictory position.

At the same spatial scale, the main reasons leading to the differences among LSMs were derived from the factors, calculation units, and prediction models. However, as research has intensified and with the introduction of spatial scale problems, the spatial scale, factors, calculation units, and prediction models have been identified as the root causes of the differences among LSMs.

In the actual experiments, the LSMs obtained from large areas were often different from and even the opposite of those obtained from a smaller area inside the large area when the same factors, calculation unit, and prediction model were used.

However, many of the LSM prediction models used previously were not originally based on geology or geography, but evolved from economics, statistics, and other disciplines. Therefore, these prediction models have been subjected to repeated verifications over several years or even decades, and have been shown to have objectivity, applicability, and stability. Although there are five kinds of calculation unit, they are all fundamentally based on the grid unit. The grid unit is determined by the mathematical and physical properties of remote sensing satellite images, which also have objectivity. Considering this situation, this study focused on the LSM factors and spatial scale.

A total of 13 experiments using the GWR-PSO-SVM coupled model were completed in this study, and in each experiment, each LSM factor had a different importance, as shown in Fig 10. For the convenience of comparison, the order of factors in the legend was arranged from high (0.241) to low (0.005) according to the importance score of LSM factors in the SVM model.

Download:

Fig 10. Schematic diagram of the changes in the significance of LSM factors in each prediction region based on the GWR-PSO-SVM coupled model.

https://doi.org/10.1371/journal.pone.0229818.g010

The following results can be observed from Fig 10:

The important factors in the PSO-SVM coupled model did not have importance (i.e., the value was 0) in some prediction regions of the GWR-PSO-SVM coupled model. In prediction region 5, for instance, the most important LSM factor in the PSO-SVM coupled model (CNBL) had no importance. There were significant differences in the importance of LSM factors at different spatial scales.
After the study area was segmented, the figure shows that even the adjacent regions 2 and 3 had different importance rankings for the LSM factors, indicating the variable importance of the LSM factors in different prediction regions, and illustrating that the LSM factors had regional characteristics.

Conclusion

Using Zigui-Badong in the Three Gorges Reservoir Area as a case study, the GWR model was coupled with the PSO-SVM model to utilize the advantages of GWR in the processing of spatial heterogeneity. According to the GWR coefficients of LSM factors, the study area was divided into several prediction regions to solve the problem of spatial imbalances in an LSM. To verify the effectiveness of the proposed method, the outcomes of the GWR-PSO-SVM coupled model and the PSO-SVM coupled model were compared using three evaluation methods: specific category accuracy analysis, overall prediction accuracy analysis, and AUC analysis. The results for the GWR-PSO-SVM coupled model for these three evaluation methods were 85.75%, 87.86%, and 0.965, respectively, while the results for the traditional PSO-SVM coupled model were 68.35%, 84.44%, and 0.944, respectively. Comparing the three evaluation methods, the results for the GWR-PSO-SVM coupled model were 17.4%, 3.42%, and 0.021 higher than those of the PSO-SVM coupled model, respectively, and the new model had obvious advantages over the former model.

It was found that the importance of LSM factors in different areas were actually different. The method in which LSM factors were statistically calculated and assigned a weighted value by the prediction model for a complete study area was obviously questionable.

The spatial scale of the study area essentially affects the importance of LSM factors. Therefore, based on the LSM factors and GWR model, the spatial scale segmentation method of the study area developed in this study that was obtained by the selection of regional segmentation factors, calculation and classification of GWR coefficients, superposition of the classification results, and human-computer interaction modification was an effective method to solve this problem.

Acknowledgments

We are grateful to the Headquarters of Prevention and Control of Geo-Hazards in the Area of the Three Gorges Reservoir for providing data and material. We also thank the editor and anonymous referees for their comments.

References

1. Mondal S, Mandal S. RS & GIS-based landslide susceptibility mapping of the Balason River basin, Darjeeling Himalaya, using logistic regression (LR) model. Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards. 2018;12(1):29–44.
- View Article
- Google Scholar
2. Wang L-J, Guo M, Sawada K, Lin J, Zhang J. A comparative study of landslide susceptibility maps using logistic regression, frequency ratio, decision tree, weights of evidence and artificial neural network. Geosciences Journal. 2016;20(1):117–36.
- View Article
- Google Scholar
3. Aditian A, Kubota T, Shinohara Y. Comparison of GIS-based landslide susceptibility models using frequency ratio, logistic regression, and artificial neural network in a tertiary region of Ambon, Indonesia. Geomorphology. 2018;318:101–11.
- View Article
- Google Scholar
4. Saro L, Woo JS, Kwan-Young O, Moung-Jin L. The spatial prediction of landslide susceptibility applying artificial neural network and logistic regression models: A case study of Inje, Korea. Open Geosciences. 2016;8(1):117–32.
- View Article
- Google Scholar
5. Hong H, Pradhan B, Jebur MN, Bui DT, Xu C, Akgun A. Spatial prediction of landslide hazard at the Luxi area (China) using support vector machines. Environmental Earth Sciences. 2016;75(1):40.
- View Article
- Google Scholar
6. Pham BT, Prakash I, Khosravi K, Chapi K, Trinh PT, Ngo TQ, et al. A comparison of Support Vector Machines and Bayesian algorithms for landslide susceptibility modelling. Geocarto International. 2018:1–23.
- View Article
- Google Scholar
7. Fell R, Corominas J, Bonnard C, Cascini L, Leroi E, Savage WZ. Guidelines for landslide susceptibility, hazard and risk zoning for land use planning. Engineering Geology. 2008;102(3):85–98.
- View Article
- Google Scholar
8. Cascini L. Applicability of landslide susceptibility and hazard zoning at different scales. Engineering Geology. 2008;102(3–4):164–77.
- View Article
- Google Scholar
9. Paudel U, Oguchi T, Hayakawa Y. Multi-resolution landslide susceptibility analysis using a DEM and random forest. International Journal of Geosciences %/ Scientific Research Publishing. 2016;7(05):726.
- View Article
- Google Scholar
10. Schlögel R, Marchesini I, Alvioli M, Reichenbach P, Rossi M, Malet J-P. Optimizing landslide susceptibility zonation: Effects of DEM spatial resolution and slope unit delineation on logistic regression models. Geomorphology. 2018;301:10–20.
- View Article
- Google Scholar
11. Zhang M, Cao X, Peng L, Niu R. Landslide susceptibility mapping based on global and local logistic regression models in Three Gorges Reservoir area, China. Environmental Earth Sciences. 2016;75(11):958.
- View Article
- Google Scholar
12. Hong H, Pradhan B, Sameen MI, Chen W, Xu C. Spatial prediction of rotational landslide using geographically weighted regression, logistic regression, and support vector machine models in Xing Guo area (China). Geomatics, Natural Hazards and Risk. 2017;8(2):1997–2022.
- View Article
- Google Scholar
13. Matsche DT. A Geographically Weighted Regression Approach to Landslide Susceptibility Modeling: University of Idaho; 2017.
14. Survey HPG, cartographer Geological Map of Zigui and Badong County (1:50,000). Wuhan, China: Hubei Province Geological Survey Press; 1997.
15. Reservoir HoPaCoG-HiAoTG, cartographer 1:10,000 geological hazard mapping database2011.
16. Fotheringham AS, Charlton ME, Brunsdon C. Geographically weighted regression: a natural evolution of the expansion method for spatial data analysis. Environment and planning A. 1998;30(11):1905–27.
- View Article
- Google Scholar
17. Hong H, Pradhan B, Sameen MI, Kalantar B, Zhu A, Chen W. Improving the accuracy of landslide susceptibility model using a novel region-partitioning approach. Landslides. 2018;15(4):753–72.
- View Article
- Google Scholar
18. Jiang W, Rao P, Cao R, Tang Z, Chen K. Comparative evaluation of geological disaster susceptibility using multi-regression methods and spatial accuracy validation. Journal of Geographical Sciences. 2017;27(4):439–62.
- View Article
- Google Scholar
19. Vapnik V. The nature of statistical learning theory: Springer science & business media; 2013.
20. Pham BT, Jaafari A, Prakash I, Bui DT. A novel hybrid intelligent model of support vector machines and the MultiBoost ensemble for landslide susceptibility modeling. Bulletin of Engineering Geology and the Environment. 2019;78(4):2865–86.
- View Article
- Google Scholar
21. Mandal S, Mondal S. Machine Learning Models and Spatial Distribution of Landslide Susceptibility. Geoinformatics and Modelling of Landslide Susceptibility and Risk: Springer; 2019. p. 165–75.
22. Eberhart R, Kennedy J, editors. A new optimizer using particle swarm theory1995 1995-01-01: IEEE.
23. Moayedi H, Mehrabi M, Mosallanezhad M, Rashid ASA, Pradhan B. Modification of landslide susceptibility mapping using optimized PSO-ANN technique. Engineering with Computers. 2019;35(3):967–84.
- View Article
- Google Scholar
24. Xi W, Li G, Moayedi H, Nguyen H. A particle-based optimization of artificial neural network for earthquake-induced landslide assessment in Ludian county, China. Geomatics, Natural Hazards and Risk. 2019;10(1):1750–71.
- View Article
- Google Scholar
25. Yu X, Wang Y, Niu R, Hu Y. A combination of geographically weighted regression, particle swarm optimization and support vector machine for landslide susceptibility mapping: a case study at Wanzhou in the Three Gorges Area, China. International journal of environmental research and public health %/ Multidisciplinary Digital Publishing Institute. 2016;13(5):487.
- View Article
- Google Scholar
26. Chen W, Panahi M, Tsangaratos P, Shahabi H, Ilia I, Panahi S, et al. Applying population-based evolutionary algorithms and a neuro-fuzzy system for modeling landslide susceptibility. Catena. 2019;172:212–31.
- View Article
- Google Scholar
27. Cantarino I, Carrion MA, Goerlich F, Ibañez VM. A ROC analysis-based classification method for landslide susceptibility maps. Landslides. 2019;16(2):265–82.
- View Article
- Google Scholar
28. Guzzetti F, Carrara A, Cardinali M, Reichenbach P. Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy. Geomorphology. 1999;31(1):181–216.
- View Article
- Google Scholar
29. Bossi G, Marcato G. Planning Landslide Countermeasure Works through Long Term Monitoring and Grey Box Modelling. Geosciences. 2019;9(4):185.
- View Article
- Google Scholar
30. Aït-Sahalia Y, Xiu D. Principal component analysis of high-frequency data. Journal of the American Statistical Association. 2019;114(525):287–303.
- View Article
- Google Scholar
31. Peng L, Niu R, Huang B, Wu X, Zhao Y, Ye R. Landslide susceptibility mapping based on rough set theory and support vector machines: A case of the Three Gorges area, China. Geomorphology. 2014;204:287–301.
- View Article
- Google Scholar
32. Song Y, Niu R, Xu S, Ye R, Peng L, Guo T, et al. Landslide susceptibility mapping based on weighted gradient boosting decision tree in Wanzhou section of the Three Gorges Reservoir Area (China). ISPRS International Journal of Geo-Information. 2019;8(1):4.
- View Article
- Google Scholar

[ref1] 1. Mondal S, Mandal S. RS & GIS-based landslide susceptibility mapping of the Balason River basin, Darjeeling Himalaya, using logistic regression (LR) model. Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards. 2018;12(1):29–44.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Wang L-J, Guo M, Sawada K, Lin J, Zhang J. A comparative study of landslide susceptibility maps using logistic regression, frequency ratio, decision tree, weights of evidence and artificial neural network. Geosciences Journal. 2016;20(1):117–36.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Aditian A, Kubota T, Shinohara Y. Comparison of GIS-based landslide susceptibility models using frequency ratio, logistic regression, and artificial neural network in a tertiary region of Ambon, Indonesia. Geomorphology. 2018;318:101–11.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Saro L, Woo JS, Kwan-Young O, Moung-Jin L. The spatial prediction of landslide susceptibility applying artificial neural network and logistic regression models: A case study of Inje, Korea. Open Geosciences. 2016;8(1):117–32.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Hong H, Pradhan B, Jebur MN, Bui DT, Xu C, Akgun A. Spatial prediction of landslide hazard at the Luxi area (China) using support vector machines. Environmental Earth Sciences. 2016;75(1):40.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Pham BT, Prakash I, Khosravi K, Chapi K, Trinh PT, Ngo TQ, et al. A comparison of Support Vector Machines and Bayesian algorithms for landslide susceptibility modelling. Geocarto International. 2018:1–23.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Fell R, Corominas J, Bonnard C, Cascini L, Leroi E, Savage WZ. Guidelines for landslide susceptibility, hazard and risk zoning for land use planning. Engineering Geology. 2008;102(3):85–98.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Cascini L. Applicability of landslide susceptibility and hazard zoning at different scales. Engineering Geology. 2008;102(3–4):164–77.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Paudel U, Oguchi T, Hayakawa Y. Multi-resolution landslide susceptibility analysis using a DEM and random forest. International Journal of Geosciences %/ Scientific Research Publishing. 2016;7(05):726.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Schlögel R, Marchesini I, Alvioli M, Reichenbach P, Rossi M, Malet J-P. Optimizing landslide susceptibility zonation: Effects of DEM spatial resolution and slope unit delineation on logistic regression models. Geomorphology. 2018;301:10–20.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Zhang M, Cao X, Peng L, Niu R. Landslide susceptibility mapping based on global and local logistic regression models in Three Gorges Reservoir area, China. Environmental Earth Sciences. 2016;75(11):958.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Hong H, Pradhan B, Sameen MI, Chen W, Xu C. Spatial prediction of rotational landslide using geographically weighted regression, logistic regression, and support vector machine models in Xing Guo area (China). Geomatics, Natural Hazards and Risk. 2017;8(2):1997–2022.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Matsche DT. A Geographically Weighted Regression Approach to Landslide Susceptibility Modeling: University of Idaho; 2017.

[ref14] 14. Survey HPG, cartographer Geological Map of Zigui and Badong County (1:50,000). Wuhan, China: Hubei Province Geological Survey Press; 1997.

[ref15] 15. Reservoir HoPaCoG-HiAoTG, cartographer 1:10,000 geological hazard mapping database2011.

[ref16] 16. Fotheringham AS, Charlton ME, Brunsdon C. Geographically weighted regression: a natural evolution of the expansion method for spatial data analysis. Environment and planning A. 1998;30(11):1905–27.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref17] 17. Hong H, Pradhan B, Sameen MI, Kalantar B, Zhu A, Chen W. Improving the accuracy of landslide susceptibility model using a novel region-partitioning approach. Landslides. 2018;15(4):753–72.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref18] 18. Jiang W, Rao P, Cao R, Tang Z, Chen K. Comparative evaluation of geological disaster susceptibility using multi-regression methods and spatial accuracy validation. Journal of Geographical Sciences. 2017;27(4):439–62.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref19] 19. Vapnik V. The nature of statistical learning theory: Springer science & business media; 2013.

[ref20] 20. Pham BT, Jaafari A, Prakash I, Bui DT. A novel hybrid intelligent model of support vector machines and the MultiBoost ensemble for landslide susceptibility modeling. Bulletin of Engineering Geology and the Environment. 2019;78(4):2865–86.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref21] 21. Mandal S, Mondal S. Machine Learning Models and Spatial Distribution of Landslide Susceptibility. Geoinformatics and Modelling of Landslide Susceptibility and Risk: Springer; 2019. p. 165–75.

[ref22] 22. Eberhart R, Kennedy J, editors. A new optimizer using particle swarm theory1995 1995-01-01: IEEE.

[ref23] 23. Moayedi H, Mehrabi M, Mosallanezhad M, Rashid ASA, Pradhan B. Modification of landslide susceptibility mapping using optimized PSO-ANN technique. Engineering with Computers. 2019;35(3):967–84.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref24] 24. Xi W, Li G, Moayedi H, Nguyen H. A particle-based optimization of artificial neural network for earthquake-induced landslide assessment in Ludian county, China. Geomatics, Natural Hazards and Risk. 2019;10(1):1750–71.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref25] 25. Yu X, Wang Y, Niu R, Hu Y. A combination of geographically weighted regression, particle swarm optimization and support vector machine for landslide susceptibility mapping: a case study at Wanzhou in the Three Gorges Area, China. International journal of environmental research and public health %/ Multidisciplinary Digital Publishing Institute. 2016;13(5):487.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref26] 26. Chen W, Panahi M, Tsangaratos P, Shahabi H, Ilia I, Panahi S, et al. Applying population-based evolutionary algorithms and a neuro-fuzzy system for modeling landslide susceptibility. Catena. 2019;172:212–31.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref27] 27. Cantarino I, Carrion MA, Goerlich F, Ibañez VM. A ROC analysis-based classification method for landslide susceptibility maps. Landslides. 2019;16(2):265–82.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref28] 28. Guzzetti F, Carrara A, Cardinali M, Reichenbach P. Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy. Geomorphology. 1999;31(1):181–216.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref29] 29. Bossi G, Marcato G. Planning Landslide Countermeasure Works through Long Term Monitoring and Grey Box Modelling. Geosciences. 2019;9(4):185.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref30] 30. Aït-Sahalia Y, Xiu D. Principal component analysis of high-frequency data. Journal of the American Statistical Association. 2019;114(525):287–303.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref31] 31. Peng L, Niu R, Huang B, Wu X, Zhao Y, Ye R. Landslide susceptibility mapping based on rough set theory and support vector machines: A case of the Three Gorges area, China. Geomorphology. 2014;204:287–301.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref32] 32. Song Y, Niu R, Xu S, Ye R, Peng L, Guo T, et al. Landslide susceptibility mapping based on weighted gradient boosting decision tree in Wanzhou section of the Three Gorges Reservoir Area (China). ISPRS International Journal of Geo-Information. 2019;8(1):4.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

Figures

Abstract

Introduction

Study area and data sources

Study area

Data Source

Methods

The GWR model

The SVM model

The PSO algorithm

Evaluation models

Specific category accuracy analysis.

Overall prediction accuracy analysis.

Receiver operating characteristic (ROC) curve analysis.

The proposed model

Coupled model for the LSM based on spatial scale segmentation

Selection of LSM calculation units

Screening of LSM factors

Pearson product-moment correlation coefficient (PPMCC) analysis and principal component analysis (PCA).

Factor importance screening based on the SVM model.

The GWR-based segmentation of the study area

Establishment of an LSM model based on GWR

Results

Experimental results of the GWR-PSO-SVM coupled model

Establishment of a comparative experiment to test the PSO-SVM coupled model

Evaluation model results and analysis

Specific category accuracy analysis.

Overall prediction accuracy analysis.

The ROC curve analysis.

Discussion

Conclusion

Acknowledgments

References