Figures
Abstract
Landslides constitute a significant geological hazard in China, particularly in high-altitude regions like the Himalayas, where the challenging environmental conditions impede field surveys. This research utilizes the IOE model to refine non-landslide samples and integrates it with multiple machine learning models to conduct a comprehensive assessment of landslide susceptibility in Luolong County, Tibet. The IOE model objectively assigns weights to conditioning factors based on the degree of data dispersion, thereby enhancing the predictive accuracy when combined with machine learning models. This research employed Google Earth satellite imagery to construct a comprehensive database comprising 2517 landslide debris in Luolong County. Twelve conditioning factors were identified, encompassing geological environment, topography, meteorology, hydrology, vegetation, soil, and human activities. The IOE model was integrated with SVC, MLP, LDA, and LR models to systematically evaluate landslide susceptibility in Luolong County. The results demonstrate that, after optimizing the non-landslide samples, the coupled models significantly outperformed the unoptimized models in terms of AUC, accuracy, precision, and F1 score. The ranking of classification performance and effect among the four coupled models is IOE-MLP > IOE-SVC > IOE-LR > IOE-LDA. Notably, the AUC value of the IOE-MLP coupled model increased from 0.8172 to 0.9747. Moreover, in the extremely high susceptibility zones, the IOE-MLP model had the highest landslide frequency ratio among the four coupled models, demonstrating the optimal classification performance and the best classification effect. The study identifies land use, elevation, and slope as the predominant controlling factors conditioning landslides in Luolong County. The regions with the highest susceptibility to landslides in Luolong County are predominantly situated in the central areas near rivers and roads, whereas the areas with the lowest susceptibility are largely located in the southwestern, northern, and certain central regions at elevations above 4500 m, which are consistently shrouded in snow and ice. This comprehensive method effectively resolves the challenge of selecting non-landslide samples, thereby improving the predictive accuracy of the landslide susceptibility model. The results of this study offer significant insights for disaster prevention, mitigation, and land use planning in analogous geological settings.
Citation: Yuzhong K, Hua W, Chong X, Jingjing S, Kangcheng Z, Chenguang Z, et al. (2025) Landslide susceptibility mapping using an entropy index-based negative sample selection strategy: A case study of Luolong county. PLoS One 20(5): e0322566. https://doi.org/10.1371/journal.pone.0322566
Editor: Trung Van Nguyen, Hanoi University of Mining and Geology, VIET NAM
Received: October 24, 2024; Accepted: March 24, 2025; Published: May 9, 2025
Copyright: © 2025 Yuzhong et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All the data sources used in this paper are publicly available,with no copyright issues and no need for authorization from the data owners. All data used in this study is publicly available on the Figshare platform. The dataset is shared at “https://figshare.com/articles/figure/Landslides_of_the_LuoLong_county_/28200587” All relevant data are available without restrictions, ensuring reproducibility of the study.
Funding: This study was supported by the National Natural Science Foundation of China in the form of a grant awarded to WH (U23A2047), and by the Tibet Autonomous Region Science and Technology Department in the form of salaries for WH (XZ202401YD0028, XZ202401ZY0057, XZ202402ZD0001). The specific roles of this author are articulated in the “Author Contributions” section. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
As per the “14th Five-Year Plan for the Prevention and Control of Geological Disasters” issued by the Ministry of Land and Resources of China, geological disasters in our country are highly susceptible, frequent, and widespread. By the end of 2020, a total of 328,654 registered geological disaster hazard points were documented nationwide, posing a potential threat to the safety of 13.99 million people and 605.3 billion yuan in property[1]. Landslides are the most extensively distributed and numerous types of geological disaster in China, with the Himalayan region of Tibet being a focal area for the prevention and control of high-position remote chain landslides [2]. The Himalayan region is marked by steep mountains, high altitudes, thin air, and perennial snow and ice cover, leading to extremely adverse geological conditions. Many areas are inaccessible to human efforts, making it crucial to employ integrated remote sensing techniques to pre-evaluate landslide susceptibility and delineate different levels of targeted areas [3].
At present, the primary models employed in landslide susceptibility analysis are experience-driven models, physics-driven models, and data-driven models. Experience-driven models, which depend heavily on expert experience, are significantly influenced by subjective factors, thereby restricting their applicability. Physics-driven models necessitate comprehensive hydrological and geotechnical data, rendering them unsuitable for large-scale landslide susceptibility assessments. Owing to their objectivity and broad applicability, data-driven models have garnered favor among numerous scholars and experts.
Data-driven models can be further categorized into traditional statistical analyses and a variety of machine learning methods [4].Statistical models investigate the relationships between landslide debris and their conditioning factors, emphasizing the derivation and interpretability of the models, thereby more accurately capturing the spatial relationships of landslide samples. Commonly utilized statistical models include the Analytic Hierarchy Process (AHP) [5,6], Frequency Ratio Method [7], and the Index of Entropy (IOE) [8]. The AHP is highly susceptible to subjective biases, whereas the Frequency Ratio Method, while capable of objectively characterizing the relationships between conditioning factors and landslides, does not yield weight values for each conditioning factor. The IOE method objectively determines attribute weights based on the inherent dispersion of the data. It automatically normalizes the data during computation, obviating the need for preliminary standardization or normalization, thereby streamlining the calculation process and rendering it ideal for large datasets. Consequently, integrating the IOE model with other models can effectively minimize the impact of subjective biases and mitigate overfitting in machine learning. Machine learning models emphasize the exploration of relationships and structures inherent in the data, prioritizing predictive accuracy and optimization performance, thereby achieving superior predictive outcomes. Presently, models that have demonstrated notable success include Random Forests (RF) [9], Multilayer Perceptron (MLP) [10,11], Support Vector Classification (SVC) [12,13], Linear Discriminant Analysis (LDA) [14], and Logistic Regression (LR) [15]. Each of these models has its own strengths and weaknesses, yet all demonstrate strong generalization capabilities. However, individual models exhibit limited capacity to handle non-linear relationships, are unable to fully capture the complexity of landslide susceptibility and are prone to overfitting. Coupling statistical models with machine learning models can balance the spatial representation of landslides, more comprehensively capture data characteristics, reduce the risk of overfitting, and enhance prediction accuracy [16,17]. The Random Forest (RF) model is particularly sensitive to noisy data and may overfit the noise. Therefore, this study selects the Multi-Layer Perceptron (MLP) and Support Vector Classifier (SVC) models, which perform well with non-linear data, and the Linear Discriminant Analysis (LDA) and Logistic Regression (LR) models, which perform well with linear data, to be coupled with the IOE model to improve model accuracy and performance.
The expression of landslide susceptibility is influenced by the type of mapping unit, the algorithm type and its parameter optimization, and the choice of samples [18]. The more rational and accurate the mapping unit, the more effectively it can capture the terrain and geological conditions, thereby improving the precision of landslide susceptibility evaluations. The two predominant mapping units in use are grid cells and slope units. While slope units can partially represent terrain and geological environments, they may manifest as jagged or linear shapes, necessitating adjustments based on actual topography and geomorphology. This adjustment process is labor-intensive and susceptible to errors. Although the grid cell method does not capture terrain and environmental nuances, it provides extremely straightforward and efficient calculations, enabling the delineation of extensive regions, which is currently the most prevalent method[19]. When the grid cell size is reduced to less than 2000m, machine learning techniques applied to landslide susceptibility assessments exhibit enhanced predictive accuracy [20]. Various algorithm types of process data differently and have distinct applications. Logistic regression, linear regression, and linear discriminant analysis are well-suited for linear relationships, while support vector machines, decision trees, and neural networks are adept at handling nonlinear relationships. Algorithmic parameter optimization enhances model performance, reduces computational resource consumption, and improves model interpretability. Utilizing Bayesian algorithms for parameter optimization of logistic regression and random forest models can markedly improve the accuracy of landslide susceptibility predictions. The optimization of the AdaBoost model through recursive feature elimination and particle swarm optimization can significantly enhance the predictive effectiveness of landslide susceptibility [21]. The application of Bayesian optimization techniques for hyperparameter tuning of logistic regression and random forest models has resulted in substantial improvements in their predictive performance for landslide susceptibility [22].
While algorithms and mapping units are significant, in machine learning, the selection and quality of samples are paramount. This is because, regardless of the algorithm’s sophistication, issues with the sample data (such as noise, bias, imbalance, etc.) can adversely impact the model’s performance and predictive accuracy. Furthermore, the quality of samples directly influences the model’s learning effectiveness and generalization capabilities, whereas algorithms and mapping units primarily dictate how information is extracted from the samples and how the model is constructed. Consequently, landslide samples should fully encapsulate the diverse conditions and characteristics associated with landslide events. These samples can be derived from remote sensing image analysis or field investigations, thereby ensuring a high degree of accuracy. Non-landslide samples should be carefully selected to minimize the inclusion of potential landslide sites and are generally obtained indirectly through various methods, as direct acquisition is not feasible. Researchers have yet to establish a standardized approach for selecting non-landslide samples. Presently, three primary methods are employed [23]. The first is random sampling, which is extensively utilized. Some researchers directly choose non-landslide points outside the landslide areas, which may share similar geological characteristics with the landslide points [24]. Furthermore, other scholars create buffer zones around landslide points and select non-landslide points beyond these zones. While random sampling is straightforward, it can lead to an uneven distribution of non-landslide samples, thereby compromising the model’s accuracy. The determination of buffer distances is highly subjective, and the non-landslide points selected may lack precision[25,26]. The second category involves selecting non-landslide samples based on factor constraints, which involves choosing non-landslide samples outside the range of environmental factors associated with landslide occurrences. For example, non-landslide points are selected in low-slope areas[27,28], using the target space outward sampling method[29], and selecting non-landslide points through the optimization of environmental factors[30]. When employing this method, selecting only one environmental factor may overemphasize its role, resulting in unreliable outcomes. Conversely, selecting multiple or all environmental factors can introduce noise, which also compromises the accuracy of the results. The third category involves selecting non-landslide samples through coupled models, which represents one of the more sophisticated approaches to negative sample selection. Utilizing random selection and information value methods to identify non-landslide samples, it has been demonstrated that the information value method can markedly enhance sample quality and model accuracy [31]. When employing random selection, buffer zone, frequency ratio, and analytic hierarchy methods to select non-landslide samples, it has been noted that the analytic hierarchy method yields the best results, followed by the frequency ratio method, with the random selection method being the least effective [32]. Integrating the certainty coefficient method with multi-kernel support vector machines for landslide susceptibility evaluation can produce outstanding results [33]. In conclusion, the adoption of a coupled model to identify non-landslide samples yields higher-quality and more diverse samples, consequently leading to a notable increase in the model’s accuracy[34].
This study initially utilizes satellite imagery from the Google Earth platform, employing a human-computer interactive visual interpretation technique to analyze landslide debris in Luolong County and create a comprehensive landslide relic database. It then integrates the advantages of the IOE model with those of machine learning models, applying IOE-LDA, IOE-MLP, IOE-SVC, and IOE-LR coupled models to perform a detailed comparative analysis of the landslide debris in Luolong County. Subsequently, it employs the area under the Receiver Operating Characteristic Curve (AUC), accuracy (Acc), precision, F1 score, and landslide frequency ratio to assess the overall effectiveness of each model, thereby developing a more appropriate method for selecting non-landslide samples and evaluating landslide susceptibility in high-altitude cold regions.
Materials and methods
Study area
Luolong County is situated in the northeastern part of the Tibet Autonomous Region and the southwestern part of Changdu City, at the southeastern extremity of the Nyenchen Tanglha Mountains and the upper reaches of the Nu River (Fig 1). It is positioned between 30°10’ N and 30°50’ N latitude, and 95°10’ E and 95°50’ E longitude, encompassing a total area of 8098.4 km² and an average elevation of approximately 3859 m [35].
Luolong County predominantly features high mountain and canyon topography, with lower elevations in the central area and significant variations in terrain. The region is interlaced with numerous rivers and abundant lakes, including the Nu River, the second-largest river in Tibet, which traverses the central and northern parts of the county. The southern part of the county is dominated by the Nyenchen Tanglha Mountains, which extend in a northwest-southeast direction, spanning over 100 km and are perpetually covered with snow and ice. Luolong County lies within a plateau temperate semi-arid monsoon climate zone, characterized by extended daylight hours, low temperatures, substantial diurnal temperature variations, minimal annual temperature differences, distinct dry and rainy seasons, and a
lengthy, cold winter. Annual rainfall ranges from 372 to 559 mm, predominantly occurring from June to September, with the maximum daily rainfall varying between 14.0 and 39.2 mm. Luolong County is situated at the convergence of the Gondwana and Cathaysian ancient continents, within the Cathaysian Tethyan tectonic domain, specifically in the Gangdise-Nyainqentanglha mountain sheet. The county hosts several fault zones, such as the Shuoban Duo Fault Zone, the Chada-Baqu Fault Zone, the Xinben Fault Zone, and the Baqu Dongcun Fault Zone. The exposed geological formations include the Carboniferous, Paleogene, and Quaternary systems. The exposed bedrock comprises the Paleogene Zongbai Group, the Upper Carboniferous and Lower Permian Laigu Group, and the Lower Cretaceous granite, with predominant lithologies of siltstone, black slate, gray metamorphic sandstone, conglomerate-bearing slate, and a minor amount of mafic volcanic rock [36].
Luolong County is typified by a classic “V” shaped deeply incised canyon landscape, with elevations descending from 5476 m to 3153 m, resulting in a vertical drop exceeding 2300 m. This steep gradient creates conditions that are highly conducive to landslides. The area is situated in the globally recognized seismic active zone (The Mediterranean-Himalayan seismic belt) and is traversed by multiple fault zones, causing rock fragmentation and facilitating the formation of landslides. Furthermore, Luolong County serves as a vital corridor for China’s second railway to Tibet (the Sichuan-Tibet Railway) and a key transportation route in southeastern Tibet. It is also a significant region for hydropower development, hosting projects such as the Xinrong, Maquyong, Zongzha Cuo, and Zhongyi hydropower stations. The development of these human-engineered projects may elevate the risk of landslides. In recent years, influenced by global climate warming, the freeze-thaw processes in the region have become more pronounced, thereby augmenting the potential for landslides. Consequently, undertaking a comprehensive investigation of landslides in Luolong County is crucial for safeguarding the life and property of local inhabitants and fostering regional economic growth.
Data sources
Source of landslide relic database.
Landslide relic data is of paramount importance for the study and mitigation of landslide disasters. These data not only reveal the spatial distribution characteristics of landslides but also provide essential support for predicting the likelihood of landslides and assessing their risks. In landslide disaster research, the precision and comprehensiveness of data are critical. Therefore, effectively collecting and analyzing landslide relic data has become a key issue in research. Presently, there are two predominant methods for obtaining landslide relic data: automated landslide extraction and human-computer interactive visual interpretation [37]. Automated landslide extraction employs machine learning algorithms to discern the features of sample data, enabling automatic extraction [38]. However, its precision is contingent upon the type of remote sensing imagery, regional topographic characteristics, and the nature of the landslide, rendering it more appropriate for post-earthquake emergency activities. Visual interpretation harnesses human and temporal resources to precisely identify targets in complex topographic regions, yielding a comprehensive and accurate database [39,40]. Through visual interpretation, researchers can meticulously analyze each topographic feature, ensuring that every detail within the landslide area is accurately documented and categorized. While visual interpretation achieves relatively high accuracy in landslide identification, it is constrained by the interpreter’s professional knowledge and experience, introducing considerable subjectivity. Additionally, it is characterized by low automation, inefficiency, and high costs. The landslide identification method leveraging human-computer interactive interpretation technology can markedly improve the efficiency of automatic landslide identification compared to traditional visual interpretation. It minimizes human intervention, rendering landslide identification more intelligent and efficient [41,42].
This study employs remote sensing imagery from the Google Earth platform, segmenting the research area into smaller grid cells and utilizing a human-computer interactive visual interpretation technique for frame-by-frame analysis. This approach resulted in the creation of a landslide relic dataset containing 2517 landslide points, as depicted in Fig 2. The interpretation of landslides is primarily based
on the differences in spectral characteristics, shape, and texture relative to surrounding features, following this general procedure: Initially, objects with higher brightness are identified based on hue. Subsequently, regular artificial objects among the high-bright features are filtered out based on morphological characteristics. Next, interference from features such as agricultural land is eliminated using texture information. Finally employing manual methods to accurately determine the correctness of the identified landslides. During manual interpretation, the approximate location of the landslide is ascertained based on its morphological features and topographical context. Subsequently, the landslide boundary is delineated by contrasting the landslide with its surrounding environment. Ultimately, the precise location of the landslide center is established using the scarp and the landslide deposit [23,43]. Overall, the landslide debris in Luolong County are predominantly distributed along a northwest-southeast orientation, with a higher incidence of landslides in areas where rivers and roads are more concentrated, and relatively fewer landslides in regions with significantly higher surrounding elevations.
Data sources for landslide conditioning factors.
Building upon the foundational work of previous researchers, this study identifies 12 conditioning factors that are closely associated with landslides across five key dimensions: geological environment, topography, meteorology, hydrology, vegetation and soil, and human engineering activities [44] (Fig 3). It establishes a comprehensive landslide susceptibility evaluation index system for Luolong County, with the data sources for each conditioning factor detailed as follows (Table 1).
Conditioning factors.
In this study, all conditioning factors are uniformly projected using the Mercator projection with a 3-degree zone method for the 47N zone. Given that the resolution of the DEM, rainfall, NDVI, and land use data employed in this study is 30 meters, it is most appropriate to process the mapping units into 30m × 30m grids.
Elevation: Variations in elevation denote the terrain’s ridges or valleys, and changes in height result in alterations in slope. Rainfall and land use are also closely linked to elevation. The elevation data for Luolong County was derived from the national DEM digital elevation model with a spatial resolution of 30 m. The elevation spans from 3153 to 5476 m and is categorized into five levels: 3153–3500 m, 3500–4000 m, 4000–4500 m, 4500–5000 m, and > 5000 m, with areas above 4000 m comprising 85%.
Slope: Characterizes the inclination of the surface, indicating the steepness of the slope. A higher slope indicates a steeper surface, greater shear force in the rock mass, reduced stability of the slope body, and increased likelihood of landslides. The slope data are derived from the DEM with a spatial resolution of 30 m and are categorized into five levels: 0°-15°, 15°-30°, 30°-45°, 45°-60°, and > 60°, with the 15°-30° range comprising 49%.
Aspect: Broadly categorized into sunny and shady slopes, where slopes facing the sun are designated as sunny slopes, and those facing away from the sun are designated as shady slopes. Aspect indicates the accumulation and distribution of moisture on sunny and shady slopes, with sunny slopes typically being more susceptible to landslides. Aspect data are derived from the DEM with a spatial resolution of 30 m and are divided into nine categories: flat, north, northeast, east, southeast, south, southwest, west, and northwest.
Profile curvature: Profile curvature denotes the degree of curvature of the slope surface in the direction perpendicular to the slope, influencing the stress distribution and water accumulation on the slope, and thus indirectly affecting the risk of landslides. Profile curvature data are derived from the DEM with a spatial resolution of 30 m and are categorized into five levels: 0–0.5, 0.5–1, 1–1.5, 1.5–2, and 2–4.3, with the range of 0–1.5 comprising 82%.
Geological lithology: Geological lithology is a key internal factor in landslide development. The mechanical properties of different rock groups differ markedly, resulting in varying degrees of slope stability. Geological lithology is derived from the 1:500,000 geological map of the Tibetan Plateau, categorizing rock types into five groups: hard rock groups, moderately hard rock groups, moderately weak rock groups, weak rock groups, and loose rock groups. The moderately weak rock group covers the largest area, comprising 39%.
Distance to Fault (DTF): The fault structures within Luolong County predominantly develop in a north-south orientation, exerting a significant impact on the development of joints and fractures in geological bodies. The presence of faults results in rock fragmentation, destabilizing their structure and facilitating landslide formation. The degree of rock fragmentation is directly proportional to the distance from faults and fractures; the closer the distance to faults and fractures, the less stable the rocks become, and the greater the likelihood of landslides. Fault information is derived from geological maps, and buffer zones are established at 1000-meter intervals, categorized into five levels.
Rainfall: While the primary controlling factors for the formation of landslide geological hazards are the geological environment and topography, the intensity of rainfall is also a significant triggering factor for landslides. Rainfall increases the self-weight of the soil and the saturation of soil moisture, reducing the shear strength of the rock and soil mass, thereby elevating the risk of landslides. Moreover, rainfall can lead to increased surface runoff and rising groundwater levels, impacting the stability of the rock and soil mass. This study utilizes national annual average rainfall data from 1991 to 2020, categorized into five levels using the natural breaks method: 575–599mm, 599–611mm, 611–625mm, 625–644mm, and 644–688mm, with the 575–625mm range comprising 79%.
Distance to River (DTS): The proximity to rivers is a significant factor influencing landslides. Firstly, slopes adjacent to rivers are more prone to erosion and water level fluctuations, which can compromise slope stability. River erosion may diminish slope stability, and fluctuations in water levels can cause soil saturation, thereby elevating the risk of landslides. Secondly, regions near rivers are generally moist, facilitating the penetration of water into the slope, resulting in soil saturation and decreased stability, thus increasing the potential for landslides. This study derives river vector data from fundamental geographic data, creates buffer zones at 1000-meter intervals, and categorizes them into five levels.
Normalized Difference Vegetation Index (NDVI): Indicates the growth status of vegetation in a region. Lower NDVI values suggest sparser vegetation, characterized by shallow and sparse plant roots, which makes it challenging for vegetation to anchor the local soil, thus increasing the potential for landslides. This study employs national NDVI data with a 30-meter spatial resolution from 2020, spanning from -9999–9999. Dividing this data by 10000 results in NDVI values ranging from -1–1. The data is subsequently categorized into five levels using the natural breaks method: 0–0.18, 0.18–0.33, 0.33–0.45, 0.45–0.57, and 0.57–0.92, with the 0.33–0.57 range comprising 50%.
Topographic Wetness Index (TWI): TWI serves as a physical indicator of the influence of regional terrain on runoff direction and accumulation. A higher TWI value signifies a greater water content or potential soil moisture in the area. Increased soil moisture not only augments the self-weight of the slope but also diminishes the soil’s cohesion and angle of internal friction, thereby reducing the soil’s shear strength, enhancing the slope’s instability, and consequently elevating the risk of landslides. TWI is derived from the DEM with a spatial resolution of 30 m and is categorized into five levels using the natural breaks method: 5.10–8.70, 8.70–10.59, 10.59–13.75, 13.75–18.16, and 18.16–28.17, with the range of 5.1–10.59 comprising 81%.
Land use: Forests with high vegetation cover exhibit superior soil shear strength, thereby stabilizing slopes. In contrast, slopes resulting from excavated roads and mines, due to their low vegetation cover, have reduced soil shear strength, increasing the likelihood of landslides. This study employs land cover product data from the Qinghai-Tibet Plateau with a 30-meter spatial resolution from 2020, categorizing land use into nine classes based on original land categories: forests, shrublands, grasslands, wetlands, farmlands, built-up areas, glaciers, bare land, and water bodies, with grasslands comprising 57%.
Distance to Road (DTR): Slopes adjacent to roads are more susceptible to influences from road construction, drainage systems, increased loads, and traffic activities, potentially elevating the risk of landslides. This study derives road vector data from fundamental geographic data, creates buffer zones at 1000-meter intervals, and categorizes them into five levels.
Data correlation analysis method
The landslide process is a gradual change process, influenced by a multitude of factors. Conditioning factors may sometimes display collinearity and strong correlation, resulting in data redundancy that impacts the accuracy of the evaluation model. Therefore, it is crucial to choose an appropriate method to detect correlations among factors and minimize data redundancy. Collinearity diagnostics are employed to identify severe collinearity issues among multiple independent variables in regression analysis. Typically, a Variance Inflation Factor (VIF) exceeding 5 and a tolerance below 0.2 signify the presence of multicollinearity [45]. Pearson correlation analysis is mainly utilized to evaluate the correlation between two variables. The greater the absolute value of the correlation coefficient (r), the stronger the linear relationship between the variables. Generally, an absolute value of r greater than 0.8 indicates a strong correlation between factors [46].
Landslide susceptibility assessment method
(1) IOE model.
The IOE model [8] is a statistical approach that objectively quantifies the significance of each conditioning factor in relation to the indicator, determining the differences among the conditioning factors and their contributions to the evaluation index. The calculation steps and formulas are as follows:
In the formula, FRij denotes the frequency ratio of landslides for each conditioning factor across various interval ranges; a and b respectively indicate the ratio of landslide points and the area ratio for the conditioning factor within these intervals; Pij signifies the probability density; S represents the count of intervals for each conditioning factor; Hi and Hi,max respectively signify the entropy value and the maximum entropy value; Ii denotes the information rate of each conditioning factor; Wi denotes the comprehensive weight of each conditioning factor.
In the equation, LSI denotes the landslide susceptibility index.
(2) MLP model.
The MLP model [47] is a type of multilayer feed forward network within artificial neural networks, capable of highly parallel processing, exhibiting excellent fault tolerance and robust adaptive and self-learning capabilities. It primarily comprises an input layer made up of a set of perceptron units, one or more hidden layers composed of computational nodes, and an output layer consisting of computational nodes. Each layer of nodes is fully interconnected with the subsequent layer, such that any neuron in the preceding layer is connected to all neurons in the following layer. Each hidden layer node incorporates an activation function, and through the application of multiple layers of activation functions, it can transform linear rules, thereby enabling the recognition of nonlinear data. Consequently, it can address many nonlinear classification problems.
(3) SVC model.
The SVC model [48] is a robust machine learning algorithm designed for classification tasks. Its fundamental concept involves identifying an optimal hyperplane that segregates samples from different classes. This hyperplane not only aims to maximize classification accuracy but also seeks to maximize the margin between classes, which is the distance from the support vectors to the hyperplane. By locating this optimal hyperplane within the feature space, SVC enhances its generalization capability for new data. It can address nonlinear classification problems by employing kernel functions to project data into a higher-dimensional space.
(4) LDA model.
The LDA model [49] is a supervised learning technique utilized in statistics and machine learning, primarily for addressing classification and dimensionality reduction tasks. It utilizes the Fisher linear discriminant criterion to maximize the between-class scatter and minimize the within-class scatter, thereby identifying the optimal projection direction. This method ensures that data from different classes are maximally separated, while data from the same class are closely clustered, thereby facilitating classification.
(5) LR model.
The LR model [50] is a predictive technique employed in statistics and machine learning to tackle classification problems. Its principal entails utilizing a logistic function to convert the output of linear regression into values ranging from 0 to 1, thereby enabling binary classification. It is primarily composed of three steps:
- (a) Employ a linear regression equation to estimate the value Z.
In the formula, β is the coefficient, and X represents the feature.
- (b) The Z value derived from the linear equation is converted into a value P ranging from 0 to 1 using the logistic function (Sigmoid function).
- (c) Classify according to the value of P. If P exceeds 0.5, predict it as the positive class; otherwise, predict it as the negative class.
(6) Coupled model.
The objective IOE model is distinguished by its high stability and consistency, allowing it to holistically account for the interrelations and impacts among various indicators. The IOE models integrated with SVC, MLP, LDA, and LA, through their optimization of non-landslide samples, decrease the likelihood of landslide samples being misclassified as non-landslide samples, thus improving the precision of the predictive models. The process is divided into two stages: The initial stage employs the landslide relic dataset as landslide samples, creates a no-landslide area by buffering the dataset by 1km, and subsequently selects an equivalent number of non-landslide samples from this area as non-landslide samples. The landslide and non-landslide samples are merged into sample set 1, and are analyzed using the SVC, MLP, LDA, and LA models individually, which are considered single models. In the second part, the landslide scar dataset was used as the landslide samples. An IOE model was employed to generate landslide susceptibility maps with five distinct susceptibility levels. The extremely low and low susceptibility zones were selected from these maps as non-landslide areas. Non-landslide samples, equal in number to the landslide samples, were randomly selected from these non-landslide areas with a minimum spacing of 500 meters to serve as non-landslide samples. The positive and non-landslide samples were then combined to form Sample Set 2, which was analyzed using the SVC, MLP, LDA, and LA models, thereby constituting the coupled models.
Validation metrics
The ROC curve is frequently employed to assess the performance of classification models. The horizontal and vertical axes of the ROC curve represent specificity and sensitivity, respectively, indicating the proportions of correctly predicted non-landslide and landslide samples. The closer the ROC curve is to the upper left corner, the greater the area under the curve (AUC), signifying superior classification performance of the model, thus rendering the model more effective and precise. Furthermore, the confusion matrix in the field of machine learning is commonly utilized to compare classification outcomes with actual measured values. Each column corresponds to the predicted class, while each row corresponds to the true class. Acc, Precision, and F1 can be derived from the confusion matrix, where Acc denotes the proportion of correctly identified samples relative to the total samples, Precision denotes the proportion of truly landslide samples among those classified as positive by the model, and F1 signifies the harmonic meaning of precision. The formulas are as follows [51]:
Landslide susceptibility assessment workflow
(1) Methodological Approach.
The methodological flowchart for this study is presented in Fig 4.
(2) Assessment Procedure.
- (a) Construction of a landslide inventory dataset. Landslide remnants were identified through human–computer interactive visual interpretation of Google Earth satellite imagery and then integrated with historical landslide records and field survey data to establish a comprehensive landslide inventory dataset.
- (b) Identification of landslide conditioning factors: Twelve factors related to geological environment, geomorphology, meteorology and hydrology, vegetation and soil, and human engineering activities were selected to construct a landslide influencing factor system.
- (c) Assessing the correlation of landslide conditioning factors: The potential multicollinearity among landslide conditioning factors was evaluated using the variance inflation factor (VIF) and tolerance, followed by Pearson correlation analysis to assess the strength of the correlation between each pair of variables.
- (d) Constructing the modeling dataset: Datasets 1 and 2 were constructed according to the coupling method. Subsequently, each dataset was randomly split into training (70%) and testing (30%) subsets using the Python 3.11 programming language.
- (e) Model development and evaluation: The attribute values of landslide conditioning factors were extracted into datasets 1 and 2. Using the Python 3.11 programming language, the SVC, MLP, LDA, and LR models were employed to train and test the data in datasets 1 and 2 at a 7:3 ratio. Model performance was assessed using AUC, accuracy (Acc), precision, and F1 score.
- (f) Landslide susceptibility mapping and evaluation: The vector data generated in the previous step were converted to raster data using ArcGIS 10.8 software and classified into five categories using the natural breaks classification method. The classification performance and effect of each model and landslide susceptibility map were assessed using ROC curves and landslide frequency ratios to determine the most suitable model and optimal landslide susceptibility map for the study area.
Results
Multicollinearity diagnostics and pearson correlation analysis
Utilize ArcGIS 10.8 software to extract the attribute values of the 12 factors into the point attributes of the landslide dataset and export the attribute table. Perform collinearity diagnosis analysis among the factors using the linear regression analysis module in SPSS 27.0 software, as presented in Table 2. Conduct Pearson correlation analysis between each pair of factors using the correlation analysis module in SPSS 27.0, as illustrated in Fig 5.
In the multicollinearity analysis of the data, the tolerance values for all 12 conditioning factors exceed 0.2, and the VIF values are below 5, suggesting that there is no multicollinearity among the factors.
In the Pearson correlation analysis, the Pearson correlation coefficients between all factors are below 0.8, suggesting that there is no strong correlation between any pair of conditioning factors.
The relationship between conditioning factors and landslide relics
Utilize the “Extract Multi Values to Points” functionality in ArcGIS 10.8 to extract the attribute values of each conditioning factor to the landslide relic data points. Subsequently, tally the number of disaster points and grid cells for each level of the conditioning factors. Finally, employ an Excel spreadsheet to compute the values of a, b, FRij, Pij, Hi, Hi,max, Ii, Pi, and Wi based on formulas (1) to (7) (Table 3).
The weight value (Wi) of the entropy index reflects the relationship between each conditioning factor and landslide relic data. A larger weight value signifies a more substantial contribution of the factor to landslides, and conversely, a smaller weight value indicates a lesser contribution. The calculation results demonstrate that in the entropy index model, the conditioning factors contributing to landslides in descending order are land use > slope > DEM > TWI > lithology > rainfall > distance to road > NDVI > aspect > distance to river > distance to fault > profile curvature. Land use, slope, and elevation significantly influence landslides, resulting in lower entropy values and higher weight values; in contrast, profile curvature and distance to fault have a minimal impact, leading to higher entropy values and lower weight values.
In terms of land use, the density of landslide debris is greatest in grasslands and farmlands, followed by shrublands and bare land. In high-altitude regions characterized by very high elevations, the proportion of farmland area is minimal, thus necessitating a focus on landslides in grasslands, shrublands, and bare land. Regarding slopes, areas with slopes of 15–30° exhibit the highest density of landslides and are most susceptible to landslides, whereas areas with slopes exceeding 60° experience virtually no landslides. In terms of elevation, within the study area, lower elevations are associated with higher landslide densities, while higher elevations are linked to lower densities.
The highest density of landslide debris in the study area is found between 3153 and 4000 m, largely due to the concentration of human activities in areas below 4000 m in elevation, which also align with regions where rivers and roads are densely located. Concerning the three distance parameters (distance to road, distance to fault, and distance to river), they demonstrate an inverse correlation with landslide debris, indicating that the closer the distance, the greater the propensity for landslides, with the highest incidence occurring within a 1km radius. The density of landslide debris is higher on slopes facing southeast, south, and southwest, as the study area is situated in the Northern Hemisphere, where south-facing slopes are sunny and more susceptible to landslides. Regarding rock strength, areas with weak rock, moderately weak rock, and moderately hard rock are more prone to landslides compared to areas with hard rock.
Model evaluation
Accuracy analysis of single and coupled models.
We employed the scikit-learn library in the Python 3.11 programming language to construct machine learning models. The datasets were divided into training and testing sets in a 7:3 ratio, and the SVC, MLP, LDA, and LR models were trained and tested on each dataset respectively. Precision of the models was validated by plotting the ROC curves, as shown in Fig 6. During the actual model construction process, using the default parameters may lead to some models failing to converge. Following extensive parameter tuning, the primary settings for each model were established as follows: For the SVC model, probability = True and random state = 42; for the MLP model, random state = 42, max iter = 1000, early stopping = True, and validation fraction = 0.1; for the LR model, max iter = 1000 and random state = 42; the LDA model used the default values. To avoid bias caused by imbalanced data distribution and to enhance the precision of model training, we employed the StratifiedKFold function for 5-fold cross-validation with the following parameters: n splits = 5, shuffle = True, and random state = 42. During the model training process, we calculated the 95% confidence intervals of AUC, Accuracy, Precision, and F1 values to evaluate the stability and reliability of the models, as presented in Table 4.
As depicted in Fig 6, the AUC, Accuracy, Precision, and F1 values of the IOE and machine learning coupled models are significantly higher than those of the individual models. The AUC values increased from a range of 0.7822 to 0.8508 to a range of 0.9378 to 0.9747, the precision increased from a range of 0.7176–0.7682 to a range of 0.8942–0.9836, accuracy increased from a range of 0.722–0.777 to a range of 0.8848–0.9385, and F1 scores increased from a range of 0.7248–0.7807 to a range of 0.88–0.9343. Among the individual models, SVC and MLP exhibited superior performance, with the SVC model performing the best. Among the coupled models, all four demonstrated excellent performance. The performance ranking from best to worst is IOE-MLP, IOE-SVC, IOE-LR, and IOE-LDA. Notably, the IOE-MLP and IOE-SVC models significantly outperformed the IOE-LR and IOE-LDA models.
As shown in Table 4, the relatively narrow confidence intervals of all models indicate good stability and reliability. Compared to individual models, the coupled models exhibit narrower confidence intervals, indicating superior stability and reliability.
Comparative analysis of model performance.
Given that individual models perform worse than coupled models, only the performance of coupled models is compared and analyzed in this section. Using the Python 3.11 programming language and the SVC, MLP, LDA, and LR models with dataset 2, we conducted a landslide susceptibility analysis for Luolong County and generated a landslide susceptibility map. Subsequently, the susceptibility map was classified into five levels—extremely high, high, moderate, low, and very low susceptibility—using the natural breaks classification method in ArcGIS 10.8, as shown in Fig 7. The original landslide points were overlaid on the susceptibility map to statistically analyze the number of landslide points and grid cells at each level, and to calculate the landslide density and landslide frequency ratio (the proportion of landslide points at each level divided by the proportion of the area at each level), as presented in Table 5.
As indicated by the results in Table 4, the landslide frequency ratios of the four coupled models decrease progressively from the extremely high susceptibility zones to the very low susceptibility zones. This suggests that the classification of landslide susceptibility by each coupled model is rational. Specifically, in the extremely high susceptibility zones, the ranking of landslide frequency ratios is as follows: IOE-MLP > IOE-SVC > IOE-LR > IOE-LDA. This indicates that the classification performance of the four models, from best to worst, is IOE-MLP, IOE-SVC, IOE-LR, and IOE-LDA. Overall, the coupled models outperform the individual models. The IOE-MLP and IOE-SVC models perform better than the IOE-LR and IOE-LDA models. The IOE-MLP model outperforms the IOE-SVC model.
Therefore, considering both model performance and classification effectiveness, the IOE-MLP model is the best.
Spatial distribution of landslide susceptibility
As depicted in Fig 7, the spatial distribution of the evaluation results for the four coupled models (IOE-MLP, IOE-LR, IOE-SVC, and IOE-LDA) exhibit significant similarities. The very high susceptibility zones for landslides in Luolong County are predominantly aligned with river and road networks, particularly extending from the northwest to the southeast of the county, with additional concentrations observed in the western and southern regions. Several factors contribute to this spatial pattern: Firstly, the Nu River, which is the second largest river in Tibet, flows through the county from northwest to southeast. Additionally, numerous rivers such as Zhuomalongcuoqu and Daqu course through the western part, while Tongcuoqu, Dongcuoqu, Kangyucuo, and Baqu traverse the southern part. These riverine areas possess a high topographic wetness index, which facilitates landslide development. Secondly, Luolong County features prominently incised canyon topography, where rivers have deeply incised the landscape. In these areas, the rocks are soft, the elevation is low, and there is a prevalence of faults and well-developed fractures, all of which facilitate the formation of landslides. Thirdly, the average elevation of Luolong County is approximately 3859 m, which is quite high. Human settlements are typically located in the relatively lower-elevation river valleys, and most roads within the county are constructed along these valley areas. The frequent human engineering activities in these regions contribute to the occurrence of landslides. Fourthly, the southwestern part of Luolong County is dominated by the high-altitude Nyenchen Tanglha Mountains, while the northern part is characterized by the Tengri nor Mountain. These areas are primarily composed of high-altitude glaciers, where landslides are relatively rare. In contrast, the central region, which is predominantly grassland, is a zone where landslides are highly developed.
Conclusion and discussion
Discussion
Characteristics of landslide conditioning factors in Luolong county.
This study utilized imagery from the Google Earth platform, employing a human-computer interactive visual interpretation method to identify historical landslides in Luolong County, revealing a total of 2517 landslide scars of varying sizes. Landslides in this region are primarily influenced by geological settings, geomorphological background, and human engineering activities, representing the combined effect of multiple factors. The relative importance of the 12 factors investigated in this study is ranked as follows: land use type > slope > DEM > TWI > lithology > rainfall > distance to roads > NDVI > aspect > distance to rivers > distance to faults > profile curvature. Among these, land use type, slope, and elevation are identified as the dominant factors. Firstly, land use is one of the main causes of the high incidence of landslides in Luolong County. Due to the high altitude and cold climate of Luolong County, large trees with extensive root systems cannot thrive, and only a limited variety of grasses with shallow roots can adapt to the local environment. This results in sparse vegetation and a relatively fragile ecosystem[52]. Consequently, the vegetation types in Luolong County are distributed as follows: grasslands account for 79.25%, forests for 10.10%, glaciers for 2.70%, bare land for 6.12%, and other types for 1.83% of the total area. Secondly, slope is also one of the significant factors contributing to the susceptibility of landslides in Luolong County. The elevation in Luolong County ranges from 5476 m to 3153 m, with a difference of over 2000 m, and areas above 4000 m account for 85% of the total area. The greater the slope gradient and elevation difference, the more efficiently the potential energy of a landslide mass is converted into kinetic energy during movement, thereby increasing the velocity of the landslide[53]. Within the study area, regions with slopes between 15° and 30° comprise 58.66% of the area, while those between 30° and 45° account for 23.85%. This range of slopes is considered the most sensitive for landslides. Consequently, the significant elevation differences and the large proportion of areas prone to landslides are key reasons for the frequent occurrence of landslides, including high-velocity and long-runout landslides, in Luolong County. Thirdly, elevation is one of the primary factors promoting the development of landslides in Luolong County. The impact of elevation on landslides is not only evident in its influence on the stability of slopes through topography and geomorphology [54], but it also reflects the distribution characteristics of factors such as land use, distance to rivers and roads, and topographic wetness index.
Characteristics of landslide spatial distribution in Luolong county.
Spatial analysis reveals that landslides in Luolong County are extensively and densely distributed. The areas of very high and high susceptibility are predominantly located in the central part of the county, characterized by relatively low elevations, proximity to roads and rivers, and high soil moisture content. The susceptibility and density of landslides decrease with increasing distance from faults, rivers, and roads. Slopes facing southwest, south, and southeast are more prone to landslides, which is consistent with the findings of previous researchers[55,56]. High susceptibility areas are characterized by elevations below 4000 m, steep slopes, dense faulting, and weak to moderately weak geological lithology. They experience moderate rainfall, have a high density of human engineering activities, and are predominantly covered by grasslands. Luolong County, characterized by high elevation and significant topographic relief, is traversed by the Nu River and its tributaries, resulting in the formation of typical incised valley landforms [57]. The weak rock layers in Luolong County cover 73.61% of the total area and exhibit low strength and high deformability. They primarily consist of sandstone, siltstone, shale, conglomerate, and slate, which are susceptible to longitudinal and lateral erosion by rivers. This results in deep surface incision, significant elevation changes, and a high topographic wetness index. These regions often experience more intense tectonic activity, with frequent faults and earthquakes, leading to rock fragmentation and facilitating the development of landslides. Additionally, it is commonly accepted that elevations above 4000 m are unsuitable for long-term human habitation. With an average elevation of 3859 m in Luolong County, people typically reside along rivers, constructing their homes, roads, and living facilities on relatively lower-elevation terraces and gentle valley areas. Due to the inherent advantages of rivers with significant elevation drops for hydroelectric power generation, hydropower stations are also constructed in valley areas. Human engineering activities increase the susceptibility of these areas to landslides, resulting in high landslide susceptibility zones being located close to and along rivers. Additionally, these areas are also near roads and have relatively lower elevations. The very low susceptibility areas are primarily located in the Tengnate Mountain Range in the southwestern part of Luolong County, the Nyenchen Tanglha Mountain Range in the northern part, and some areas with relatively low slopes in the central region. These areas are characterized by elevations above 4500 m, perennial snow and ice cover, minimal human presence, and predominantly hard and moderately hard rock types, which are not conducive to the development of landslides.
Comparison with other studies.
Situated in a high-altitude and cold region, the southeastern Tibetan Plateau is characterized by high elevation, thin air, and sparse population, rendering geological surveys extremely challenging and the acquisition of samples highly difficult. Many researchers have focused on using algorithms and mapping units to enhance the accuracy of landslide susceptibility assessments. For instance, Mao Yimin et al. integrated SVM, RF, DT (Decision Tree), LR, FL (Fuzzy Logic), and TOPSIS to improve landslide susceptibility assessments in the Urmia Lake basin [13]. Yu Lanbing et al. applied LR, RF, SVC, and DL models to evaluate landslide-prone areas in the mid-to-upper reaches of the Three Gorges Reservoir region in Chongqing, China, achieving satisfactory results [45]. However, these studies often overlook the completeness and quality of samples. Incomplete samples can lead to insufficient information for training machine learning models, while poor-quality samples may cause misclassification, thereby affecting the final prediction outcomes. This study established a relatively comprehensive landslide database for Luolong County using remote sensing interpretation techniques and coupled the IOE model with machine learning models, optimizing the non-landslide samples during the coupling process. On the one hand, this approach enhanced the completeness of landslide samples and the quality of non-landslide samples, thereby improving prediction accuracy. On the other hand, the statistical learning model retained the spatial distribution characteristics of landslides, while the coupled model enhanced both prediction accuracy and precision.
Located in the Mediterranean-Himalayan seismic belt, the landslide influencing factors in Luolong County differ from those in inland China. Inland China is predominantly characterized by rainfall-induced landslides, with rainfall, elevation, and slope being the dominant factors. Areas with high susceptibility are typically found in mountainous and hilly regions with high rainfall, elevation, and slope [17,58,59]. In Luolong County, landslides are induced by non-rainfall. Among the 12 factors investigated in this study, the contribution of rainfall ranks sixth. The northern and northwestern regions, which receive the highest rainfall, are categorized as low and very low susceptibility zones. Conversely, some eastern areas with low rainfall are classified as high susceptibility zones. Rainfall increases soil moisture content, which can destabilize rock masses and promote landslide occurrence. However, it also supports the growth of water-demanding arboreal vegetation. The extensive root systems of these trees can stabilize slopes and reduce landslide likelihood. As a result, areas with dense forests in the northern and northwestern regions exhibit lower landslide susceptibility. Landslides in Luolong County are more earthquake-induced. The southeastern Tibetan Plateau, located at the collision zone between the Eurasian and Indian Ocean plates, is one of the most tectonically active and geologically complex regions globally [60,61]. Seismic factors significantly influence landslides in the southeastern Tibetan Plateau [62]. Over 50% of large- to mega-scale landslides along the Tibetan Plateau margin are triggered by paleo earthquakes or historical earthquakes [63]. The combined effect of heavy rainfall and strong earthquakes further increases the likelihood of landslide occurrence [64]. Faults are the primary sources of seismic activity, and fault density often indirectly reflects the frequency of earthquakes. The eastern region, with low rainfall, dense faults, low elevation, and intensive human engineering activities, exhibits high landslide susceptibility.
Future research directions
Machine learning is data-driven and relies on large-scale training samples to support model training. In machine learning, small-sample issues are typically addressed through oversampling, which can, however, strengthen the information of a single class and lead to model overfitting. Sample imbalance primarily manifests as discrepancies between sample quantities and training data [65]. This issue can be addressed through oversampling or under sampling, both of which can either enhance or diminish certain class information, potentially leading to loss of information and exacerbating sample imbalance if not handled properly. This study employs the IOE model to optimize the sampling of non-landslide samples; however, it does not address the issue of sample imbalance. In recent years, researchers have explored various strategies to address sample imbalance. For instance, Yang Yajie et al. employed a sampling method based on Frequency Ratio (FR) and SBAS-InSAR interpretation results to improve sample quality [66]. Wang Lixia et al. utilized a Faster R-CNN landslide target detection method with multi-source imbalanced samples to address sample imbalance [65]. Liu Mengmeng et al. optimized samples using a fuzzy C-means method, thereby enhancing the training accuracy and reliability of landslide susceptibility models [67]. Wu Hongyang et al. effectively resolved sample imbalance in their study area and improved landslide prediction accuracy by optimizing samples using a hybrid sampling method coupled with semi-supervised classification and deep neural network models [68]. Given the diverse types and characteristics of samples, optimization methods vary accordingly. Future research can explore additional approaches to optimize samples, address small-sample and imbalance issues, and thereby enhance the prediction accuracy of machine learning models.
Landslide occurrence is closely related to the mechanical properties of rock masses, such as strength, deformation characteristics, and fracture development. Traditional landslide susceptibility assessments predominantly rely on empirical or physically-based mechanical models, which often face challenges such as difficulties in parameter acquisition and computational complexity in complex geological conditions. Yang Liu utilized rainfall data, topographic parameters, and hydrological parameters to determine the slope safety factor under rainfall conditions, thereby selecting safer non-landslide samples. This approach, which couples the TRIGRS physical model with the Random Forest algorithm, offers a novel perspective for predicting the susceptibility of shallow rainfall-induced landslides [69]. Integrating physically-based mechanical models with machine learning models to develop landslide susceptibility assessment methods that balance mechanical mechanisms and interpretability represents a promising direction for future research.
High-altitude cold regions are characterized by extensive deep permafrost layers, with some areas covered by snow and ice. Repeated freeze-thaw cycles lead to the destruction of soil structure and the degradation of mechanical properties, thereby triggering landslides. In recent years, some researchers have focused on this process. For instance, Guo Yanchen employed a Random Forest model to investigate the impact of freeze-thaw cycles on landslides in the Nanqian area of Qinghai, China, achieving satisfactory results [70]. With the rise in global temperatures, the accelerated melting of snow and ice, the degradation of permafrost, and the expansion of seasonal frozen ground have made the study of the relationship between freeze-thaw cycles and landslides in high-altitude cold regions an urgent research direction.
Conclusion
The southeastern Tibet region is not only a crucial passage for the Sichuan-Tibet and Yunnan-Tibet railways but also a significant area for major water conservancy projects in China. It is part of the “Asian Water Tower” and the Three Rivers Source region and serves as a natural museum of geological disasters in China. This study takes Luolong County in southeastern Tibet as the research area, employing an optimized sampling strategy for non-landslide samples. It integrates the IOE model from statistics with various machine learning models to conduct landslide susceptibility mapping. The precision and effectiveness of each model are comprehensively assessed using AUC, Acc, Precision, and F1 scores. The conclusions drawn are as follows:
- (1) Optimizing non-landslide samples using the IOE model reduces the probabilistic misclassification issues during the selection of non-landslide samples in machine learning models. Consequently, the IOE-coupled machine learning models (IOE-SVC, IOE-MLP, IOE-LDA, and IOE-LR) not only preserve the spatial characteristics of landslide evaluation inherent in statistical methods but also enhance prediction accuracy. The AUC, Acc, Precision, and F1 values of the four coupled models optimized with non-landslide samples are all higher than those of the single models before optimization. Specifically, the AUC values increased from a range of 0.7822 to 0.8508 to a range of 0.9378 to 0.9747, the precision increased from a range of 0.71760.7682 to a range of 0.8942–0.9836, accuracy increased from a range of 0.722–0.777 to a range of 0.8848–0.9385, and F1 scores increased from a range of 0.72480.7807 to a range of 0.88–0.9343. Among the four coupled models, the IOE-MLP model demonstrated superior performance, achieving the best overall performance and effectiveness. It provided a rational classification of landslide susceptibility zones, achieved the highest AUC value, and exhibited the best classification performance. Moreover, it had the highest landslide frequency ratio in the extremely high susceptibility zone, achieving the best classification effect.
- (2) In terms of conditioning factors, landslides in Luolong County are influenced by a combination of geological environment, topography and geomorphology, and human engineering activities. They result from the interaction of multiple factors, with land use, elevation, and slope being the primary controlling factors.
- (3) At the spatial scale, the areas of very high susceptibility to landslides in Luolong County are distributed around the Nu River and its tributaries, extending from the northwest to the southeast. This distribution aligns with the orientation of the Nu River, faults, and the trends of hard and weak rocks. The areas of very low susceptibility are predominantly found in the high-altitude, snow-covered regions surrounding the county.
In summary, optimizing non-landslide samples using the IOE model and subsequently employing machine learning coupled models for landslide susceptibility assessment yields superior results. We should focus on the impact of key conditioning factors such as land use, elevation, and slope, particularly in high mountain gorge areas that are closer to rivers and roads and have dense human engineering activities. In these areas of very high landslide susceptibility, it is crucial to enhance community-based prevention and control efforts, invest in advanced prediction and early warning hardware, and improve the capacity to respond to landslide disasters.
Acknowledgments
This research was carried out under the professional guidance of Professor Gao Mingxing from Xinjiang University and Professor Xu Chong from the National Institute of Natural Hazards, Ministry of Emergency Management of China.
References
- 1..
Wang JJ. Research on Deep Learning Methods for Identifying Potential Landslides and Assessing Susceptibility in Luding County. Ph.D, China University of Geosciences. 2023. https://doi.org/10.27492/d.cnki.gzdzu.2023.000233
- 2.. Yin YP, Gao SH. Research on high-altitude and long-runout rockslides: review and prospects. The Chinese J Geological Hazard Control. 2024;35: 1–18.
- 3..
Zhang B. Research on Safety Risk Management of Remote Maneuver on the Plateau. M.D, National University of Defense Technology. 2021. https://doi.org/10.27052/d.cnki.gzjgu.2018.001102
- 4.. Lu Y, Xu H, Wang C, Yan G, Huo Z, Peng Z, et al. A Novel strategy coupling optimised sampling with heterogeneous ensemble machine-learning to predict landslide susceptibility. Remote Sensing. 2024;16(19):3663.
- 5.. Xu C, Dai FC, Yao X, Chen J, Tu XB, Sun Y, et al. Gis-based landslide susceptibility assessment using analytical hierarchy process in Wenchuan earthquake region. Chinese J Rock Mechanics Eng. 2009;28: 3978–3985. )/ 2–3978–08.
- 6.. Nanehkaran Y, Mao Y, Azarafza M, Kockar M, Zhu H-H. Fuzzy-based multiple decision method for landslide susceptibility and hazard assessment: a case study of Tabriz, Iran. Geomechanics Eng. 2021;24: 407–418.
- 7.. Sonker I, Tripathi JN, . Remote sensing and GIS-based landslide susceptibility mapping using frequency ratio method in Sikkim Himalaya. Quaternary Sci Adv. 2022;8:100067.
- 8.. Barman J, Ali SS, Biswas B, Das J. Application of index of entropy and Geospatial techniques for landslide prediction in Lunglei district, Mizoram, India. Natural Hazards Res. 2023;3(3):508–21.
- 9.. Li M, Wang H, Chen J, Zheng K. Assessing landslide susceptibility based on the random forest model and multi-source heterogeneous data. Ecol Indic. 2024;158:111600.
- 10.. Nanehkaran YA, Licai Z, Chen J, Azarafza M, Yimin M. Application of artificial neural networks and geographic information system to provide hazard susceptibility maps for rockfall failures. Environ Earth Sci. 2022;81(19).
- 11.. Nanehkaran YA, Licai Z, Chengyong J, Chen J, Anwar S, Azarafza M, et al. Comparative analysis for slope stability by using machine learning methods. Appl Sci. 2023;13(3):1555.
- 12.. Kumar D, Thakur M, Dubey CS, Shukla DP. Landslide susceptibility mapping & prediction using support vector machine for Mandakini River Basin, Garhwal Himalaya, India. Geomorphology. 2017;295:115–25.
- 13.. Mao Y, Li Y, Teng F, Sabonchi AKS, Azarafza M, Zhang M. Utilizing hybrid machine learning and soft computing techniques for landslide susceptibility mapping in a drainage basin. Water. 2024;16(3):380.
- 14.. Ramos-Cañón AM, Prada-Sarmiento LF, Trujillo-Vela MG, Macías JP, Santos-R AC. Linear discriminant analysis to describe the relationship between rainfall and landslides in Bogotá, Colombia. Lslides. 2015;13(4):671–81.
- 15.. Cemiloglu A, Zhu L, Mohammednour AB, Azarafza M, Nanehkaran YA. Landslide susceptibility assessment for maragheh county, Iran, using the logistic regression algorithm. L. 2023;12(7):1397.
- 16.. Wang QS, Xiong JN, Cheng WM, Cui XJ, Pang Q, Liu J, et al. Landslide susceptilbility mapping methods coupling with statistical methods, machine learning models and clustering alorithms. J Geo-Info Sci. 2024;26: 620–637.
- 17.. Cao WG, Pan D, Xu ZJ, Zhang WP, Ren Y, Nan T. Landslide disaster vulnerability mapping study in henan province: comparison of different machine learning models. Bulletin Of Geological Sci Technol.: 1–11.
- 18.. Huang F, Xiong H, Jiang S-H, Yao C, Fan X, Catani F, et al. Modelling landslide susceptibility prediction: A review and construction of semi-supervised imbalanced theory. Earth-Sci Rev. 2024;250:104700.
- 19.. Sun D, Gu Q, Wen H, Xu J, Zhang Y, Shi S, et al. Assessment of landslide susceptibility along mountain highways based on different machine learning algorithms and mapping units by hybrid factors screening and sample optimization. Gondwana Res. 2023;123:89–106.
- 20.. Liao M, Wen H, Yang L. Identifying the essential conditioning factors of landslide susceptibility models under different grid resolutions using hybrid machine learning: A case of Wushan and Wuxi counties, China. CATENA. 2022;217:106428.
- 21.. Wen H, Liu L, Zhang J, Hu J, Huang X. A hybrid machine learning model for landslide-oriented risk assessment of long-distance pipelines. J Environ Manage. 2023;342:118177. pmid:37210819
- 22.. Sun D, Xu J, Wen H, Wang D. Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: A comparison between logistic regression and random forest. Eng Geol. 2021;281:105972.
- 23.. Cui Y, Chen C, Zhu L, Xu C, Xu M, Zheng J. Spatial distribution law of landslides and landslide susceptibility assessment in the eastern Himalayan syntaxis region. QJEGH. 2024;57(3).
- 24.. Zhou C, Yin K, Cao Y, Ahmed B, Li Y, Catani F, et al. Landslide susceptibility modeling applying machine learning methods: a case study from longju in the three gorges reservoir area, China. Comp GeoScis. 2018;112: 23–37.
- 25.. Li M, Jiang W, Dong J, Jin S-F, Zhang C, Niu R. Evaluation of landslide haz-ards susceptibility based on machine learning: taking the three gorges reservoir area as anexample. South China Geol. 2023;39: 413–427.
- 26.. Greco R, Sorriso-Valvo M. Influence of management of variables, sampling zones and land units on LR analysis for landslide spatial prevision. Nat Hazards Earth Syst Sci. 2013;13(9):2209–21.
- 27.. Kavzoglu T, Sahin EK, Colkesen I. Landslide susceptibility mapping using GIS-based multi-criteria decision analysis, support vector machines, and logistic regression. Lslides. 2013;11(3):425–39.
- 28.. Wang C, Lin Q, Wang L, Jiang T, Su B, Wang Y, et al. The influences of the spatial extent selection for non-landslide samples on statistical-based landslide susceptibility modelling: a case study of Anhui Province in China. Nat Hazards. 2022;112(3):1967–88.
- 29.. Miao Y, Zhu A, Yang L, Bai S, Zeng C. a New Method of pseudo absence data generation in landslide susceptibility mapping. Geography Geo-Inform Sci. 2016;32: 61-67+127.
- 30.. Yulong C, Lulu Z, Min X, Haibo M. Optimizing tses method based on the environmental factors to select negative samples and its application in landslide susceptibility evaluation. Bulletin of Geological Sci Technol. 2024;43: 192–199.
- 31.. Sun D, Wu X, Wen H, Gu Q. A LightGBM-based landslide susceptibility model considering the uncertainty of non-landslide samples. Geomatics, Natural Hazards Risk. 2023;14(1).
- 32.. Zhu Y, Sun D, Wen H, Zhang Q, Ji Q, Li C, et al. Considering the effect of non-landslide sample selection on landslide susceptibility assessment. Geomatics, Natural Hazards Risk. 2024;15(1).
- 33.. Liu J, Ling E, Xu S, Liu M, Wang Y, Zhang F, et al. Multi-kernel support vector machine considering sample optimization selectionfor analysis and evaluation of landslide disaster susceptibility. Acta Geodaetica et Cartographica Sinica. 2022;51: 2034–2045.
- 34.. Wang Y, Cao Y, Xu F, Zhou C, Yu L, Wu L, et al. Reservoir landslide susceptibility prediction considering non-landslidesampling and ensemble machine learning methods. Earth Sci. 2024;49: 1619–1635.
- 35.. Gesang DZ. Soil nutrient status analysis and soil fertility comprehensive evaluation in luolong county of Changdu city in Tibet. Chinese Agricultural Sci Bulletin. 2022;38: 30–34.
- 36..
He XX. Study on the Disaster Mechanism of Debris Flow Caused by Dongcuoqu Lake Break in Luolong County, tibet. M.D, Southwest Jiaotong University. 2020. https://doi.org/10.27414/d.cnki.gxnju.2020.000862
- 37.. Zhang S, Xu C, Meng Z, Li T, Li C, Huang Y, et al. Establishing a landslide traces inventory for the baota district, Yan’an city, China, using high-resolution satellite images. L. 2024;13(10):1580.
- 38.. Huang F, Tao S, Chang Z, Huang J, Fan X, Jiang S-H, et al. Efficient and automatic extraction of slope units based on multi-scale segmentation method for landslide assessments. Lslides. 2021;18(11):3715–31.
- 39.. Sun J, Shao X, Feng L, Xu C, Huang Y, Yang W. An essential update on the inventory of landslides triggered by the Jiuzhaigou Mw6.5 earthquake in China on 8 August 2017, with their spatial distribution analyses. Heliyon. 2024;10(2):e24787. pmid:38312686
- 40.. Xu C, Xu XW, Wu XY, Dai FC, Yao X, Yao Q. Detailed catalog of landslides triggered by the 2008 wenchuan earthquake and statistical analyses of their spatial distribution. J Eng Geol. 2013;21: 25–44.
- 41.. Fiorucci F, Ardizzone F, Mondini AC, Viero A, Guzzetti F. Visual interpretation of stereoscopic NDVI satellite images to map rainfall-induced landslides. Lslides. 2018;16(1):165–74.
- 42.. Dai FC, Xu C, Yao X, Xu L, Tu XB, Gong QM. Spatial distribution of landslides triggered by the 2008 Ms 8.0 Wenchuan earthquake, China. J Asian Earth Scis. 2011;40(4):883–95.
- 43.. Cui Y, Yang W, Xu C, Wu S. Distribution of ancient landslides and landslide hazard assessment in the Western Himalayan Syntaxis area. Front Earth Sci. 2023;11.
- 44..
Wang H. Selection and Evaluation of Integration Model for Landslide Geological Hazard Susceptibility in Northern Guangxi. M.D, Guilin University of Technology. 2023. https://doi.org/10.27050/d.cnki.gglgc.2023.000056
- 45.. Yu L, Wang Y, Pradhan B. Enhancing landslide susceptibility mapping incorporating landslide typology via stacking ensemble machine learning in Three Gorges Reservoir, China. GeoSci Frontiers. 2024;15(4):101802.
- 46.. Liu Y, Meng Z, Zhu L, Hu D, He H. Optimizing the sample selection of machine learning models for landslide susceptibility prediction using information value models in the Dabie mountain Area of Anhui, China. Sustainability. 2023;15(3):1971.
- 47.. Alkhasawneh MS, Ngah UK, Tay LT, Mat Isa NA, Al-batah MS. Determination of important topographic factors for landslide mapping analysis using MLP network. ScientificWorld J. 2013;2013:415023. pmid:24453846
- 48.. Chang K-T, Merghadi A, Yunus AP, Pham BT, Dou J. Evaluating scale effects of topographic variables in landslide susceptibility models using GIS-based machine learning techniques. Sci Rep. 2019;9(1):12296. pmid:31444375
- 49.. Zhang S, Yu P. Seismic landslide susceptibility assessment based on ADASYN-LDA model. IOP Conf Ser: Earth Environ Sci. 2020;525(1):012087.
- 50.. Hemasinghe H, Rangali RSS, Deshapriya NL, Samarakoon L. Landslide susceptibility mapping using logistic regression model (a case study in Badulla District, Sri Lanka). Procedia Eng. 2018;212:1046–53.
- 51.. Huang W, Ding M, Li Z, Yu J, Ge D, Liu Q, et al. Landslide susceptibility mapping and dynamic response along the Sichuan-Tibet transportation corridor using deep learning algorithms. CATENA. 2023;222:106866.
- 52.. Jiang W, Jin G, Kong W, Wang Z, Yang A. Research on vegetation characteristics and ecological restoration of engineering disturbance areas in Alpine and High-Altitude Areas. J Gansu Scis. 2024;36: 31–37.
- 53.. Zou Y, Qi S, Guo S, Zheng B, Zhan Z, He N, et al. Factors controlling the spatial distribution of coseismic landslides triggered by the Mw 6.1 Ludian Earthquake in China. Eng Geol. 2022;296.
- 54..
Cheng Y. Study on Regional Distribution Discipline of Landslides and Prevention Programme. PhD Thesis, Southwest Jiaotong University. 2013.
- 55.. Hu L, Hu YQ, Sun P, Zhang Q, Zhou Q, Chen J. A quantitative analysis of disaster threshold and landslide risk of rainfall-type landslide in Southeast Tibet. J Catastrophology. 2021;36: 194–199.
- 56.. Wang Y, Jin JL, Yuan RM. Analysis on spatial distribution and influencing factors of geological disasters in Southeast Tibet. J Seismological Res. 2019;42.
- 57.. Guo JN, Shan KQ, Cheng J. Study on landslide development regularity in southeast tibet. resources & industries. 2009;11: 132–139.
- 58.. Huang WJ, Li JH, Liu ZY, Hu XM, Huang HB, Li WK. Spatial susceptibility analysis of landslide based on pblc algorithm. Acta Scientiarum Naturalium Universitatis Sunyatseni. 2023;62: 54–64.
- 59.. Zhao P, Zhao SY, Sun Y, Ruan XD, Wang N, Zhang SH. Landslide susceptibility assessment in southern anhui province based on slope units and semantic segmentation. Chinese J Geol. 2024;59: 562–574.
- 60.. Zhang LP, Shao ZG, Yan R. Study on the characteristics of seismic activity in Southeastern Tibet and surrounding areas. Earthquake. 2011;31: 9–18.
- 61.. Wang FW, Chen Y, Liu WC, Gao J. Characteristics and challenges to dynamics of long-runout landslides with high-altitude in Souheast Tibet. J Eng Geol. 2022;30: 1831–1841.
- 62.. Du P, Chen NS, Wu KN, Li Z, Zhang YYL. Evaluation of landslide susceptibility in Southeast Tibet based on a random forest model. J Chengdu University of Technol (Sci & Technol Edition). 2024;51: 328–344.
- 63.. Zhang YS. Characteristics of seismic geological hazards and progress in monitoring and prevention in China. City Disaster Reduction. 2018; 9–18.
- 64.. Li H, Liu S, Hu KH. Climate Extremes in a typical glacier-related debris flow watershed of southeast Tibet during 1979-2018. Plateau Meteorology. 2023;42: 1518–1528.
- 65.. Wang L, Xi W, Shi Z, Zhao Z, Qian T, Zhao L, et al. Study on sample unbalance in landslide recognition algorithm based ondepth learning. bulletin of surveying and mapping. 2024; 12–18.
- 66.. Yang Y, Ma X, Ding W, Wen H, Sun D. A novel dataset replenishment strategy integrating time-series InSAR for refined landslide susceptibility mapping in Karst Regions. Water. 2024;16(17):2414.
- 67..
Liu MM. Landslide Susceptibility Analysis Method Considering Sample Optimization and Spatial Characteristics. PH. D, Liaoning Technical University. 2024. https://doi.org/10.27210/d.cnki.glnju.2023.000031
- 68.. Wu HY, Zhou C, Liang X, Wang Y, Yuan PC, Wu LX. Evaluation of Landslide Susceptilbility Based on Sample Optimization Strategy Research. Geomatics Info Sci of Wuhan University. 2023;49: 1–15. whugis20220527
- 69..
Yang L. Typhoon Rainstorm Landslide Susceptibility Assessment Based on the Coupling of TRIGRS Physical Model and Random Forest. Master’s Thesis, Anhui University of Science and Technology. 2024.
- 70.. Yanchen G, Zhihong Z, Fuchu D. Freeze-thaw landslide susceptibility assessment and its future development on the seasonally frozen ground of the Qinghai-Tibet Plateau under warming-humidifying climate. Cold Regions Sci Technol. 2024;227:104293.