Correlation analysis and prediction models for loess compressibility in Ili region, Xinjiang

Zhiqi Liu; Lifeng Chen; Kai Chen; Zizhao Zhang; Jinyu Chang

doi:10.1371/journal.pone.0345028

Abstract

Loess compressibility is a crucial engineering parameter governing the deformation of loess foundations and the evolution of slope geohazards. Based on a comprehensive collection of physical, hydraulic, and mechanical parameters of loess in the Ili region, this study selected Huocheng, Nilka, and Xinyuan counties as typical study areas. Statistical methods were employed to a perform normality tests and necessary transformations on the data, followed by correlation analysis to identify key factors influencing the compression coefficient. Using Multiple Linear Regression (MLR) as a baseline, six machine learning models were constructed, including Random Forest (RF), Multilayer Perceptron (MLP), Radial Basis Function (RBF), Support Vector Machine (SVM), Classification and Regression Tree (CART), and XGBoost models. The results indicate that the compression coefficient is significantly positively correlated with the void ratio and negatively correlated with dry density and compressibility modulus. Consequently, compressibility modulus, dry density, and void ratio were selected as core input indicators. All constructed models successfully predicted the compression coefficient and its engineering classification. Under the evaluation principle of “error metrics priority, classification accuracy auxiliary,” the MLP model achieved the best overall performance across the three counties, followed by the Random Forest model. This study provides a methodological basis for the rapid estimation of loess compressibility parameters and engineering judgment in the Ili region.

Citation: Liu Z, Chen L, Chen K, Zhang Z, Chang J (2026) Correlation analysis and prediction models for loess compressibility in Ili region, Xinjiang. PLoS One 21(3): e0345028. https://doi.org/10.1371/journal.pone.0345028

Editor: Xianggang Cheng, China University of Mining and Technology, CHINA

Received: November 27, 2025; Accepted: February 26, 2026; Published: March 23, 2026

Copyright: © 2026 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The minimal dataset underlying the findings of this study is not publicly available because it forms part of non-public project outcomes. Data requests should be directed to the corresponding author (chenk412@126.com). To ensure long-term availability in accordance with PLOS ONE’s Data Policy, a non-author institutional point of contact is also provided to ensure continuity of access: Administration Office, School of Geology and Mining Engineering, Xinjiang University (Email: XJUdk@xju.edu.cn).

Funding: This work was supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region of China (Grant Nos. 2022B03017-4 and 2022D01C360); the Young Top Talent Project of Xinjiang Uygur Autonomous Region of China (Grant No. 2023TSYCCX0010); the 2025 Xinjiang Uygur Autonomous Region Graduate Student Research Innovation Project (Grant No. XJ2025G101); the 2025 National Student Innovation Training Program Project of Xinjiang University (Grant No. 202510755011); the 2024 Professional Degree Graduate Student Teaching Case Bank Construction Project of Xinjiang University; and the 2025 Xinjiang Uygur Autonomous Region Education and Teaching Reform Research Project (Grant No. XJGXJGPTB-2025001). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Loess compressibility refers to its ability to undergo volumetric compression deformation under external loads, influenced by factors such as mineral composition, particle size distribution, pore structure, and water content. Loess is widely distributed in the Ili region. Under inducing factors such as self-weight, rainfall, and human engineering activities, it is prone to compression deformation, directly impacting the bearing capacity and settlement characteristics of foundations. This holds significant importance for the stability and safety of infrastructure like buildings, roads, and bridges. Therefore, scientifically selecting parameters to construct loess compressibility prediction models is crucial, as it provides an important scientific basis for practical engineering construction and geological hazard prevention and control.

The correlation analysis between loess compressibility and soil properties forms the foundation for establishing its prediction models. Researchers domestically and internationally have explored the compression characteristics of loess from different regions, types, and states, along with their influencing factors, using various methods and experiments. They have developed mathematical models or empirical formulas based on statistical and mechanistic analyses to quantify the relationship between loess compressibility indicators and its physical and mechanical properties. Jiang et al. [1] utilized the Discrete Element Method (DEM) to simulate one-dimensional compression and wetting tests of loess, investigating the influence of water content and void ratio on loess compression and collapse. Mu et al. [2] employed SEM, XRD, and MIP tests, alongside triaxial compression tests, to study the correlation between the compression properties of undisturbed and compacted loess and their structure and pore size. Chen et al. [3] constructed multiple linear regression and RBF neural network models for loess collapsibility prediction in Gongliu County, Ili River Valley, based on correlation analysis. Yuan et al. [4] quantitatively explored the correlation between the modulus of compressibility and collapsibility of loess in southern Shanxi Province through indoor compression tests. Jian [5] fitted e-lgp curves for Malan loess with different dry densities, water contents, and regions using Gregory’s logarithmic function model and analyzed the correlation.

Prediction models for loess compressibility are vital tools for assessing and managing loess settlement issues. Prediction models based on machine learning methods offer significant advantages in handling vague, stochastic, and nonlinear data [6,7]. Chen et al. [8] developed prediction models for loess collapsibility in Xinyuan County, Ili region, using multiple linear regression and neural network theories. Xu [9] constructed a 1D loess compression model based on the Disturbed State Concept (DSC) theory, defining the disturbance function using the void ratio as a parameter. C. A. A. et al. [10] proposed a nonlinear regression model to predict parameters such as the secondary compression index, liquid limit, plastic limit, and water content for evaluating soil compression characteristics. Shi et al. [11] conducted high-pressure consolidation tests on remolded loess in different states with varying water contents and dry densities, establishing settlement prediction models for filled sites of different thicknesses (loads). Huang et al. [12] developed multifactor regression models for loess collapsibility coefficient using ordinary multiple linear regression and partial least squares regression methods. Ma et al. [13] analyzed correlations using loess from central Shanxi Province as an example, then quantitatively ranked them by partial correlation analysis, and constructed an RBF neural network model. Zhang [14] compared the accuracy of RBF neural network models and Newrbe functions through modeling factor analysis, verified by bivariate two-tailed correlation analysis. Zhan [15] and Gao et al. [16] established BP neural network prediction models for loess collapsibility coefficient using samples from the Dingxi-Lintao Expressway project and Xi’an loess, respectively, based on geotechnical test parameters.

Formed by the complex interaction of glacial transport and aeolian deposition, Ili loess exhibits distinct genesis, collapsibility, and soluble salt content compared to the typical loess of other regions [17–19]. Despite the maturity of research on the Loess Plateau, research specifically targeting the compressibility mechanisms of Ili loess remains sparse. Furthermore, the comparative applicability of advanced machine learning algorithms in this specific geological context has not been fully explored. Given the significant spatial variability and non-linear physical-mechanical characteristics of Ili loess, traditional prediction models based on simple physical relationships or linear regression often fail to accurately capture the multi-factor coupling effects. Therefore, focusing on the loess in the Ili region, this study utilizes a comprehensive dataset of geotechnical parameters. We first employ multivariate statistical theory to analyze the correlation between soil properties and the compression coefficient, optimizing the input parameters for prediction models. Subsequently, multiple regression and various machine learning models are constructed and compared to identify the optimal prediction tool, thereby providing a robust scientific basis for engineering construction and geohazard prevention in the Ili region.

2. Study area overview and data source

2.1. Study area and sampling strategy

The Ili region is located in western Xinjiang, China. The terrain generally slopes from east to west, with high elevations in the north and south, and low elevations in the center. East-west trending mountainous belts, with altitudes between 1000 - 5000m, are distributed on both northern and southern sides and in the central part. The central area comprises the Ili River, Kashi River, Gongnaisi River valleys, and the Tekes River valley, with altitudes between 500 - 1000m. The Ili region has a diverse climate, with an average annual temperature ranging from 2.6 - 10.4 °C. Precipitation in the Ili region is relatively abundant but unevenly distributed due to topography; the average annual precipitation in plain areas is 200–500 mm, while mountainous areas can exceed 800 mm, with a spatial trend of higher rainfall in the east than in the west [19]. As a western sub-region of loess distribution in Xinjiang, loess in the Ili region is widely distributed. Situated in the westernmost loess belt of Xinjiang, the Ili region exhibits a distinctive depositional pattern governed by the topographic interception of the Westerlies. The spatial distribution is characterized by lenticular accumulation along the river valleys, presenting a typical ‘thick-center, thin-margin’ profile. In terms of granulometry, a clear aeolian sorting gradient is observed, with grain size fining from west to east, reflecting the diminishing transport energy of the prevailing winds [20,21].

A preliminary statistical analysis of geotechnical parameters collected from various geological hazard projects across the Ili region revealed significant data variability and indistinct correlations. To mitigate this noise and capture representative compressibility characteristics, this study adopted a sampling strategy guided by the spatial distribution of loess in the Ili River Valley. Specifically, representative samples were selected from Huocheng County (west), Nilka County (central), and Xinyuan County (east) to highlight typical geotechnical relationships. The general distribution of loess and specific sampling locations are illustrated in Fig 1(a). Additionally, Fig 1(b) presents a typical geological cross-section of the strata, while Fig 1(c) and 1(d) display representative field photographs of the sampling sites.

Download:

Fig 1. Overview of study area and schematic diagram of sampling locations.

(a)General distribution map of loess in Ili region (Schematic redrawn based on geological information from Ye et al. [20]); (b)Typical geological cross-section of loess strata; (c)Field photograph of Upper Pleistocene aeolian loess on slope surface; (d)Field photograph of Holocene colluvial loess at slope toe.

https://doi.org/10.1371/journal.pone.0345028.g001

Probabilistic statistical analysis of the physical and mechanical parameters of loess in the study area was conducted to obtain characteristic statistical quantities such as mean, standard deviation, and skewness, which characterize the spatial randomness of geotechnical parameters. The results of the probabilistic statistical analysis for loess strata parameters in the study area are presented in Table 1.

Download:

Table 1. Probabilistic statistical analysis results of loess geotechnical parameters.

https://doi.org/10.1371/journal.pone.0345028.t001

2.2. Research methods

2.2.1. Statistical analysis and parameter optimization.

Based on the compiled data, IBM SPSS Statistics software was used to perform descriptive statistics and normality tests (Kolmogorov-Smirnov test combined with Q-Q plots) on the physical-mechanical parameters of loess within the Ili region. For data that did not satisfy normal distribution, logarithmic transformation was prioritized for positive variables, while Johnson system transformation [22] was applied to other skewed data for preprocessing, thereby reducing the influence of data skewness on parameter statistics and correlation analysis [23–25]. On this basis, Pearson correlation analysis [25] was employed to quantitatively evaluate the linear correlation degree between various soil properties and the compression coefficient. Based on the correlation coefficients and significance levels, the most representative key influencing factors were screened out to serve as input variables (feature vectors) for the prediction models. In this study, a Pearson correlation coefficient (R) with an absolute value between 0.8-1.0 is considered extremely strong correlation, 0.6-0.8 as strong correlation, 0.4-0.6 as moderate correlation, 0.2-0.4 as weak correlation, and 0.0-0.2 as very weak or no correlation, with 0.05 set as the significance level for probability standards.

2.2.2. Prediction model construction and evaluation.

Given that loess compression deformation is controlled by the non-linear coupling of multiple physical state parameters, this study constructed a model evaluation system using Multiple Linear Regression (MLR) neural network as the baseline, Random Forest (RF) and Multilayer Perceptron (MLP) as the core, and Radial Basis Function (RBF) neural network, Support Vector Machine (SVM), Classification and Regression Tree (CART), and eXtreme Gradient Boosting (XGBoost) for comparison.

First, the MLR model was constructed as the evaluation benchmark. While MLR has the advantage of clear physical meaning, its nature based on linear assumptions makes it difficult to accurately capture the complex non-linear features among high-dimensional geotechnical parameters; therefore, this study mainly used it as a reference baseline to measure the performance improvement of machine learning models. Addressing the non-linear regression needs of loess parameters, this study focused on introducing and tuning two core algorithms: RF and MLP. As a representative of ensemble learning, RF constructs multiple regression trees via Bootstrap resampling [26–29]; meanwhile, it introduces random feature subsets in node splitting to reduce inter-tree correlation, thereby reducing the variance and overfitting risk of single decision trees and enhancing robustness to noise and outliers; during modeling, hyperparameters such as the number of trees and maximum depth were optimized via grid search to obtain better generalization performance [30,31]. As a typical feedforward artificial neural network model, MLP consists of an input layer, several hidden layers, and an output layer. It achieves non-linear mapping from input soil indices to the predicted compression coefficient through non-linear activation. Under the backpropagation framework, optimization strategies such as quasi-Newton or conjugate gradient methods are adopted to iteratively update network weights and bias parameters [32–35] to characterize the complex non-linear relationship between the compression coefficient and multiple indices. In addition, to ensure the robustness of the model selection conclusion, this study also introduced four mainstream algorithms with different mechanisms including RBF, SVM, CART, and XGBoost, for cross-validation comparison.

In the model training and evaluation stage, based on the IBM SPSS Modeler platform, the three regions were modeled and evaluated independently. By comparing the result errors of each model on the test set and combining them with the accuracy of engineering classification, the model with the best comprehensive performance was finally selected under the sample and indicator system conditions of this study.

3. Loess prediction model establishment

3.1. Selection of soil properties

Based on the collected geotechnical parameters, normality tests and necessary data transformations (logarithmic or Johnson system transformation) were performed following the methods described in Section 2.2 to satisfy the prerequisites for linear correlation analysis. The results of the correlation analysis are presented in Table 2.

Download:

Table 2. Statistical results of correlation analysis between loess compression coefficient and soil properties.

https://doi.org/10.1371/journal.pone.0345028.t002

As indicated in Table 2, significant linear correlations exist between the compression coefficient and dry density, porosity, void ratio, and modulus of compressibility across all three regions, verifying the reliability of the analysis results. Among these, the modulus of compressibility exhibits an extremely strong correlation, while void ratio, dry density, and porosity show strong correlations. Since void ratio and porosity are interconvertible indicators both reflecting pore properties, the modulus of compressibility (E_s), dry density (ρ_d), and void ratio (e) were ultimately selected as the three optimal input parameters for the prediction models. The scatter plots of their correlation analysis are presented in Figs 2–4, and the selected correlation coefficients are summarized in Table 3.

Download:

Table 3. Statistical table of selected soil property correlation coefficients.

https://doi.org/10.1371/journal.pone.0345028.t003

Download:

Fig 2. Scatter plots of loess compression coefficient vs. compression modulus correlation analysis.

(a) Huocheng County; (b) Nilka County; (c) Xinyuan County.

https://doi.org/10.1371/journal.pone.0345028.g002

Download:

Fig 3. Scatter plots of loess compression coefficient vs. void ratio correlation analysis.

(a) Huocheng County; (b) Nilka County; (c) Xinyuan County.

https://doi.org/10.1371/journal.pone.0345028.g003

Download:

Fig 4. Scatter plots of loess compression coefficient vs. dry density correlation analysis.

(a) Huocheng County; (b) Nilka County; (c) Xinyuan County.

https://doi.org/10.1371/journal.pone.0345028.g004

3.2. Multiple regression prediction model

Multiple regression prediction models were constructed for each region, with a total of 320 sets of selected geotechnical parameter data as input. The independent variables were dry density (ρ_d), void ratio (e), and the logarithm of modulus of compressibility (E_s), while the dependent variable was the compression coefficient (a). Summaries of the established multiple regression prediction models are presented in Table 4 and Table 5.

Download:

Table 4. Summary of loess compressibility and soil property regression models.

https://doi.org/10.1371/journal.pone.0345028.t004

Download:

Table 5. Regression coefficients and significance analysis for loess compressibility and soil properties.

https://doi.org/10.1371/journal.pone.0345028.t005

Tables 4 and 5 indicate that the independent variables in the three established models explain 87.9%, 90.1%, and 86.7% of the variation in the dependent variable, respectively, demonstrating excellent explanatory power. The data independence supports the assumption of residual independence, suggesting a very high goodness-of-fit for these predictive regression models. Furthermore, the significance (p-value) for all parameters (dry density ρ_d, void ratio e, modulus of compressibility E_s, and constant) in the established loess compressibility regression prediction models is less than 0.05, confirming that the selected parameters for these prediction models are statistically significant and reasonable.

Based on the pre-processing of loess parameters (dry density ρ_d, void ratio e, and modulus of compressibility E_s) in the study area, and combined with the statistical test results of the regression models, regression prediction models for loess compressibility were established for Huocheng County equation (1), Nilka County equation (2), and Xinyuan County equation (3), respectively.

(1)

(2)

(3)

In equation (3), ρ_d^*, e^*, and E_s^* represent data processed by Johnson transformation. The transformation process is detailed in equations (4)–(6).

(4)

(5)

(6)

It is noteworthy that the models for Huocheng and Nilka adopt the same functional form with similar coefficient magnitudes, indicating a consistent response of compressibility to pore state and stiffness indices in these regions. In contrast, the Xinyuan model differs formally because the data required normalization via Johnson system transformation prior to linear regression, essentially representing a composite mapping of monotonic transformation and linear regression. Furthermore, since ρ_d and e both characterize pore geometric states and are statistically correlated, their regression coefficients in the multivariate model may differ in direction from univariate correlations due to multicollinearity, exhibiting higher sensitivity to the sample range.

The loess compressibility for each region was graded according to the “Code for Design of Building Foundation” (GB 50007−2011) [36] and compared with the predicted values from the established multiple regression models, as shown in Table 6 and Fig 5.

Download:

Table 6. Comparison of loess compressibility multiple regression model prediction results with measured data.

https://doi.org/10.1371/journal.pone.0345028.t006

Download:

Fig 5. Error Plots of Loess Compressibility Multiple Regression Model Predictions vs. Measured Values.

(a) Huocheng County; (b) Nilka County; (c) Xinyuan County.

https://doi.org/10.1371/journal.pone.0345028.g005

As observed from Table 6, for the three regions, the number of samples where the multiple regression model’s predicted compressibility grade matched the actual compressibility grade was 44 for Huocheng County, achieving a prediction effectiveness of 95.65%; 63 for Nilka County, with an effectiveness of 84%; and 170 for Xinyuan County, with an effectiveness of 85.43%. Thus, the established multiple regression models are capable of predicting loess compressibility in the Ili region.

3.3. Random forest prediction model

Random Forest prediction models were constructed for each region, taking the selected geotechnical parameter dataset of 320 groups as input. The distribution and proportion of datasets for each sub-region are shown in Table 7.

Download:

Table 7. Dataset partitioning for random forest prediction model.

https://doi.org/10.1371/journal.pone.0345028.t007

The loess compressibility for each region was graded and compared with the predicted values from the established RF models, as shown in Table 8 and Fig 6.

Download:

Table 8. Comparison of loess compressibility random forest model prediction results with measured data.

https://doi.org/10.1371/journal.pone.0345028.t008

Download:

Fig 6. Error Plots of Loess Compressibility Random Forest Model Predictions vs. Measured Values.

(a) Huocheng County; (b) Nilka County; (c) Xinyuan County.

https://doi.org/10.1371/journal.pone.0345028.g006

As observed from Table 8, for the three regions, the Random Forest model predicted compressibility grades identical to the actual compressibility grades for 46 samples in Huocheng County, achieving a prediction effectiveness of 100%; for 74 samples in Nilka County, with an effectiveness of 98.67%; and for 195 samples in Xinyuan County, with an effectiveness of 97.99%. Therefore, the established Random Forest models are capable of effectively predicting loess compressibility in the Ili region.

3.4. Neural network prediction model

MLP neural network prediction models were constructed for each region. The distribution and proportion of datasets for each sub-region are shown in Table 9.

Download:

Table 9. Dataset partitioning for multilayer perceptron neural network prediction model.

https://doi.org/10.1371/journal.pone.0345028.t009

The loess compressibility for each region was graded and compared with the predicted values from the established MLP neural network models, as shown in Table 10 and Fig 7.

Download:

Table 10. Comparison of loess compressibility multilayer perceptron neural network model prediction results with measured data.

https://doi.org/10.1371/journal.pone.0345028.t010

Download:

Fig 7. Error Plots of Loess Compressibility Multilayer Perceptron Neural Network Model Predictions vs. Measured Values.

(a) Huocheng County; (b) Nilka County; (c) Xinyuan County.

https://doi.org/10.1371/journal.pone.0345028.g007

As observed from Table 10, for the three regions, the Multilayer Perceptron neural network model predicted compressibility grades identical to the actual compressibility grades for 46 samples in Huocheng County, achieving a prediction effectiveness of 100%; for 75 samples in Nilka County, with an effectiveness of 100%; and for 194 samples in Xinyuan County, with an effectiveness of 97.49%. Therefore, the established Multilayer Perceptron neural network models are capable of effectively predicting loess compressibility in the Ili region.

4. Discussion

4.1. Evaluation of indicator parameters

Loess compressibility is essentially a change in the soil pore structure, which is closely related to collapsibility and has significant implications for engineering construction [37]. Through the correlation analysis of soil properties, it was found that in all three selected regions, significant correlations exist between the compression coefficient (a) and dry density, porosity, void ratio, and modulus of compressibility. These parameters serve as effective indicators for establishing prediction models to more accurately assess loess compression properties under various conditions.

The strong correlation among void ratio, dry density, and compressibility align with the developed structural pore system and cementation-skeletal characteristics of Ili loess, as shown in the Scanning Electron Microscopy (SEM) images in Fig 8.

Download:

Fig 8. Scanning Electron Microscopy (SEM) images of Ili loess microstructure.

(a) ×1500; (b) ×6000; (c) ×30000.

https://doi.org/10.1371/journal.pone.0345028.g008

Combining the microstructural features in Fig 8, the loess in the study area typically develops a macroporous skeletal framework, where silt particles are loosely packed and strength is maintained primarily by soluble salts and clay mineral cementation. A higher void ratio implies a skeletal structure with abundant unstable point contacts, which are highly susceptible to yield and structural collapse under external loads, leading to significant macroscopic volume shrinkage. Conversely, a higher dry density indicates denser particle packing with enhanced interlocking effects, effectively restricting particle slippage and rearrangement under stress, thereby manifesting as lower compressibility.

The compression coefficient is a direct measure of loess compressibility and is closely related to water content and pore structure. The modulus of compressibility characterizes the stiffness response and strain change under confined conditions and is extremely strongly correlated with a (|R| > 0.9). As a stiffness parameter describing the stress-strain relationship, E_s integrates the macroscopic response of mineral composition, cementation strength, and stress history. In Ili loess, the combined action of carbonate and clay mineral cementation forms a structural skeleton, significantly enhancing the initial modulus. When loading exceeds the structural yield level, cementation bonds progressively break, potentially causing stiffness degradation.

In summary, the three selected indicators are not isolated statistical variables but characterize the key dimensions of compression response from “macroscopic stiffness,” “skeletal density,” and “pore structure potential,” providing a basis for the physical interpretability of the model inputs, which may also be one of the reasons for the high prediction accuracy of the MLP model.

4.2. Model comparison and analysis

The established loess compressibility prediction models were compared with measured results, as shown in Fig 9.

Download:

Fig 9. Comparison of Measured Compression Coefficient Values with Predicted Values from the Established Model.

(a) Huocheng County; (b) Nilka County; (c) Xinyuan County.

https://doi.org/10.1371/journal.pone.0345028.g009

To select the optimal prediction model for loess compressibility in the Ili region, in addition to the previously mentioned regression model, Random Forest model, and MLP, RBF, SVM, CART, and XGBoost models were also established, totaling seven prediction models. The comparative results of the established models are presented in Table 11.

Download:

Table 11. Comparative statistical table of established prediction models for each region.

https://doi.org/10.1371/journal.pone.0345028.t011

The classification accuracy is a discretized result based on engineering standard thresholds and is highly sensitive to samples near the boundaries. Even with low numerical errors, samples near thresholds may lead to reduced classification accuracy. Therefore, based on the practical needs of engineering calculations and the sensitivity of parameters to continuous variables, this study prioritizes error metrics as the primary criterion, with classification accuracy serving as an auxiliary constraint, ensuring the model possesses both numerical precision and engineering usability.

As shown in Table 11, the performance ranking of the models is consistent across the three counties. Non-linear models (MLP, RF, XGBoost) generally outperform the linear regression model, indicating a significant non-linear coupling relationship between loess compressibility and multi-indices. Under the principle of “error metrics priority, classification accuracy auxiliary,” the MLP model achieved the lowest mean error and relative error in Huocheng, Nilka, and Xinyuan, demonstrating the best comprehensive deviation control capability, thus being identified as the optimal model; the RF model followed, showing good robustness.

From a mechanistic perspective, MLP relies on multi-layer non-linear mapping to represent complex input-output relationships in high-dimensional feature space. RF reduces variance and overfitting risk via Bootstrap integration and random feature subsets, making it more robust to noise and outliers, which is advantageous for engineering classification stability. CART, as a single-tree model, is interpretable but limited in capturing complex non-linearity and generalization. XGBoost offers high precision via gradient boosting but is sensitive to hyperparameters. The linear regression model is constrained by linearity assumptions and potential multicollinearity. RBF and SVM are sensitive to kernel or structural parameters, leading to performance fluctuations with varying sample sizes and feature scales.

In summary, under the sample scope and indicator system of this study, MLP performs as the optimal model in most counties, followed by RF. The two can complement each other in different application scenarios: MLP is preferred for continuous value prediction and error control, while RF is preferred for robustness in engineering classification.

5. Conclusion

Based on the physical, hydraulic, and mechanical parameters of collapsible loess in the Ili region, this study selected Huocheng, Nilka, and Xinyuan counties as typical study areas. By employing statistical methods, Multiple Linear Regression (MLR), and machine learning methods, correlations were analyzed, and prediction models were established and verified. The main conclusions are as follows:

(1) The correlation between loess compressibility and soil properties in the Ili region was analyzed. The analysis results indicate that the key factors influencing loess compressibility are physical-mechanical indicators. The compression coefficient a is significantly positively correlated with pore structure indices (void ratio e) and negatively correlated with dry density ρ_d and modulus of compressibility E_s. Considering correlation strength, significance tests, and engineering accessibility, E_s, ρ_d, and e were selected as core input indicators.
(2) Prediction models for loess compressibility in the Ili region were established. Using multiple regression and machine learning theories, prediction models (including MLR, MLP, RF, RBF, SVM, CART, and XGBoost) were constructed for Huocheng, Nilka, and Xinyuan counties using E_s, ρ_d, and e as input indicators. These models can successfully predict the compression coefficient a and its engineering classification within the scope of this study, providing methodological support for the rapid estimation of compressibility parameter.
(3) The prediction evaluation indicators and model performance for loess compressibility in the study area were analyzed. Under the evaluation principle of “error metrics priority, classification accuracy auxiliary,” the MLP model achieved the best overall error metrics across all three counties, demonstrating the best comprehensive deviation control capability. The RF model showed superior stability and robustness, ranking second. Therefore, under the sample scope and county calibration conditions of this study, the MLP model is recommended as the optimal model, with RF as an engineering alternative. Future applications should consider local calibration to reduce extrapolation uncertainty.

References

1. Jiang MJ, Li T, Hu HJ, Thornton C. DEM analyses of one-dimensional compression and collapse behaviour of unsaturated structural loess. Comput Geotech. 2014;60:47–60.
- View Article
- Google Scholar
2. Mu QY, Zhou C, Ng CWW. Compression and wetting induced volumetric behavior of loess: Macro- and micro-investigations. Transportation Geotechnics. 2020;23:100345.
- View Article
- Google Scholar
3. Chen L, Chen K a i, He G, Liu Z. Prediction model of loess collapsibility in Gongliu County of Ili River Valley. Journal of Engineering Geology. 2023;31:1282–92.
- View Article
- Google Scholar
4. Yuan K, Li Z, Jin L. Correlation of loess compress modulus and collapsible. Journal of Liaoning Technical University (Natural Science). 2013;32:1480–3.
- View Article
- Google Scholar
5. Tao J. Experimental research on structure and compression deformation characteristics of Malan loess. Chang’an University; 2020.
- View Article
- Google Scholar
6. Chang C, Bo J, Li X, Qiao G, Yan D. A BP neural network model for forecasting sliding distance of seismic loess landslides. China Earthquake Engineering Journal. 2020;42:1609–14.
- View Article
- Google Scholar
7. Mu Q, Song T, Lu Z, Xiao T, Zhang L. Evaluation of the collapse susceptibility of loess using machine learning. Transportation Geotechnics. 2024;48:101327.
- View Article
- Google Scholar
8. Chen L, Chen K, He G, Liu Z. Research on the prediction model of loess collapsibility in Xinyuan County, Ili River Valley Area. Water. 2023;15(21):3786.
- View Article
- Google Scholar
9. Xu Y. A 1D compression model for loess based on disturbed state concept. RCMA. 2019;29(2):125–9.
- View Article
- Google Scholar
10. Anagnostopoulos CA, Grammatikopoulos IN. A new model for the prediction of secondary compression index of soft compressible soils. Bull Eng Geol Env. 2011;70:423–7.
- View Article
- Google Scholar
11. Shi B, Ni W, Wang Y, Li Z, Yuan Z. A model for calculating the compressive deformation of remolded loess. Rock and Soil Mechanics. 2016;37:1963–8.
- View Article
- Google Scholar
12. Huang J, Li X, Teng H. Evaluation model of loess collapsibility based on the partial least squares method. Journal of Catastrophology. 2021;36:60–4.
- View Article
- Google Scholar
13. Ma Y, Wang J, Peng S, Li B. Relationships between physical-mechanical parameters and collapsibility of loess soil and its prediction model. Bulletin of Soil and Water Conservation. 2016;36:120–8.
- View Article
- Google Scholar
14. Zhang M. Study on compression prediction model of unsaturated loess. Highway Traffic Science and Technology (Applied Technology Edition). 2019;15:68–72.
- View Article
- Google Scholar
15. Zhan H, Lin J. Application of BP Neural network in prediction collapsibility coefficient of loess. Soil Eng and Foundation. 2020;34:493–6.
- View Article
- Google Scholar
16. Gao L, Luo G, Yang X. Protoplast fusion of Streptomyces chromogenes Sy20-2 and Streptomyces rimosus SY20-4. Journal of Dalian Minzu University. 2006:24–6.
- View Article
- Google Scholar
17. Wang Y, Zhang A, Zhao Q, Yu C. Effect of soluble salt content on water-holding characteristics of Ili Loess. Chinese Journal of Geotechnical Engineering. 2018;40:212–7.
- View Article
- Google Scholar
18. Jia J, Xia D, Wang B, Li G, Zhao H, Liu X. The comparison between loess plateau and ili loess magnetic properties and their implications. Quaternary Sciences. 2012;32:749–60.
- View Article
- Google Scholar
19. Song Y, Shi Z. Distribution and compositions of loess sediments in Yili basin, Central Asia. Scientia Geographica Sinica. 2010;30:267–72.
- View Article
- Google Scholar
20. Ye W, Sang C, Zhao Z. Spatial-temporal distribution of loess and source of dust in Xinjiang. Journal of Desert Research. 2003;:38–44.
- View Article
- Google Scholar
21. Song Y, Chen X, Qian L, Li C, Li Y, Li X, et al. Distribution and composition of loess sediments in the Ili Basin, Central Asia. Quaternary International. 2014;334–335:61–73.
- View Article
- Google Scholar
22. Chou Y-M, Polansky AM, Mason RL. Transforming non-normal data to normality in statistical process control. Journal of Quality Technology. 1998;30(2):133–41.
- View Article
- Google Scholar
23. Pearson K. Notes on the history of correlation. Biometrika. 1920;13(1):25–45.
- View Article
- Google Scholar
24. Curran-Everett D. Explorations in statistics: hypothesis tests and P values. Adv Physiol Educ. 2009;33(2):81–6. pmid:19509391
- View Article
- PubMed/NCBI
- Google Scholar
25. Curran-Everett D. Explorations in statistics: correlation. Advances in Physiology Education. 2010;34:186–91.
- View Article
- Google Scholar
26. Breiman L. Random forests. Machine learning. 2001;45:5–32.
- View Article
- Google Scholar
27. Josso P, Hall A, Williams C, Le Bas T, Lusty P, Murton B. Application of random-forest machine learning algorithm for mineral predictive mapping of Fe-Mn crusts in the World Ocean. Ore Geology Reviews. 2023;162:105671.
- View Article
- Google Scholar
28. Li X. Using “random forest” for classification and regression. Chinese Journal of Applied Entomology. 2013;50:1190–7.
- View Article
- Google Scholar
29. Yao D, Yang J, Zhan X. Feature selection algorithm based on random forest. Journal of Jilin University (Engineering and Technology Edition). 2014;44: 137–41.
- View Article
- Google Scholar
30. Marques F. PC. Confidence intervals for the random forest generalization error. Pattern Recognition Letters. 2022;158:171–5.
- View Article
- Google Scholar
31. Borup D, Christensen BJ, Mühlbach NS, Nielsen MS. Targeting predictors in random forest regression. International Journal of Forecasting. 2023;39(2):841–68.
- View Article
- Google Scholar
32. He Y, Deng W. The compare between three training algorighms of multilayer perceptorns. Journal of Suzhou University (Engineering Science Edition). 2008:1–3.
- View Article
- Google Scholar
33. Wang Z, Deng W. Journal of Qinghai Normal University (Natural Science Edition). 2007:37–9.
- View Article
- Google Scholar
34. Pal SK, Mitra S. Multilayer perceptron, fuzzy sets, and classification. IEEE Trans Neural Netw. 1992;3(5):683–97. pmid:18276468
- View Article
- PubMed/NCBI
- Google Scholar
35. Zhu Y, Xiong W, Fan W, Wu C. Predicting macro-mechanical properties of loess from basic physical properties using various machine learning methods. Environ Earth Sci. 2025;84(10).
- View Article
- Google Scholar
36. Ministry of Housing and Urban-Rural Development of the People’s Republic of China. Code for Design of Building Foundation. Peking: China Construction Industry Press; 2011.
37. Zhang L, Qi S, Ma L, Guo S, Li Z, Li G, et al. Three-dimensional pore characterization of intact loess and compacted loess with micron scale computed tomography and mercury intrusion porosimetry. Sci Rep. 2020;10(1):8511. pmid:32444623
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Jiang MJ, Li T, Hu HJ, Thornton C. DEM analyses of one-dimensional compression and collapse behaviour of unsaturated structural loess. Comput Geotech. 2014;60:47–60.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Mu QY, Zhou C, Ng CWW. Compression and wetting induced volumetric behavior of loess: Macro- and micro-investigations. Transportation Geotechnics. 2020;23:100345.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Chen L, Chen K a i, He G, Liu Z. Prediction model of loess collapsibility in Gongliu County of Ili River Valley. Journal of Engineering Geology. 2023;31:1282–92.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Yuan K, Li Z, Jin L. Correlation of loess compress modulus and collapsible. Journal of Liaoning Technical University (Natural Science). 2013;32:1480–3.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Tao J. Experimental research on structure and compression deformation characteristics of Malan loess. Chang’an University; 2020.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Chang C, Bo J, Li X, Qiao G, Yan D. A BP neural network model for forecasting sliding distance of seismic loess landslides. China Earthquake Engineering Journal. 2020;42:1609–14.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Mu Q, Song T, Lu Z, Xiao T, Zhang L. Evaluation of the collapse susceptibility of loess using machine learning. Transportation Geotechnics. 2024;48:101327.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Chen L, Chen K, He G, Liu Z. Research on the prediction model of loess collapsibility in Xinyuan County, Ili River Valley Area. Water. 2023;15(21):3786.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Xu Y. A 1D compression model for loess based on disturbed state concept. RCMA. 2019;29(2):125–9.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Anagnostopoulos CA, Grammatikopoulos IN. A new model for the prediction of secondary compression index of soft compressible soils. Bull Eng Geol Env. 2011;70:423–7.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Shi B, Ni W, Wang Y, Li Z, Yuan Z. A model for calculating the compressive deformation of remolded loess. Rock and Soil Mechanics. 2016;37:1963–8.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Huang J, Li X, Teng H. Evaluation model of loess collapsibility based on the partial least squares method. Journal of Catastrophology. 2021;36:60–4.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Ma Y, Wang J, Peng S, Li B. Relationships between physical-mechanical parameters and collapsibility of loess soil and its prediction model. Bulletin of Soil and Water Conservation. 2016;36:120–8.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Zhang M. Study on compression prediction model of unsaturated loess. Highway Traffic Science and Technology (Applied Technology Edition). 2019;15:68–72.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Zhan H, Lin J. Application of BP Neural network in prediction collapsibility coefficient of loess. Soil Eng and Foundation. 2020;34:493–6.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Gao L, Luo G, Yang X. Protoplast fusion of Streptomyces chromogenes Sy20-2 and Streptomyces rimosus SY20-4. Journal of Dalian Minzu University. 2006:24–6.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Wang Y, Zhang A, Zhao Q, Yu C. Effect of soluble salt content on water-holding characteristics of Ili Loess. Chinese Journal of Geotechnical Engineering. 2018;40:212–7.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. Jia J, Xia D, Wang B, Li G, Zhao H, Liu X. The comparison between loess plateau and ili loess magnetic properties and their implications. Quaternary Sciences. 2012;32:749–60.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref19] 19. Song Y, Shi Z. Distribution and compositions of loess sediments in Yili basin, Central Asia. Scientia Geographica Sinica. 2010;30:267–72.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref20] 20. Ye W, Sang C, Zhao Z. Spatial-temporal distribution of loess and source of dust in Xinjiang. Journal of Desert Research. 2003;:38–44.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref21] 21. Song Y, Chen X, Qian L, Li C, Li Y, Li X, et al. Distribution and composition of loess sediments in the Ili Basin, Central Asia. Quaternary International. 2014;334–335:61–73.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref22] 22. Chou Y-M, Polansky AM, Mason RL. Transforming non-normal data to normality in statistical process control. Journal of Quality Technology. 1998;30(2):133–41.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref23] 23. Pearson K. Notes on the history of correlation. Biometrika. 1920;13(1):25–45.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Curran-Everett D. Explorations in statistics: hypothesis tests and P values. Adv Physiol Educ. 2009;33(2):81–6. pmid:19509391
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref25] 25. Curran-Everett D. Explorations in statistics: correlation. Advances in Physiology Education. 2010;34:186–91.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref26] 26. Breiman L. Random forests. Machine learning. 2001;45:5–32.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref27] 27. Josso P, Hall A, Williams C, Le Bas T, Lusty P, Murton B. Application of random-forest machine learning algorithm for mineral predictive mapping of Fe-Mn crusts in the World Ocean. Ore Geology Reviews. 2023;162:105671.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref28] 28. Li X. Using “random forest” for classification and regression. Chinese Journal of Applied Entomology. 2013;50:1190–7.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref29] 29. Yao D, Yang J, Zhan X. Feature selection algorithm based on random forest. Journal of Jilin University (Engineering and Technology Edition). 2014;44: 137–41.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref30] 30. Marques F. PC. Confidence intervals for the random forest generalization error. Pattern Recognition Letters. 2022;158:171–5.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref31] 31. Borup D, Christensen BJ, Mühlbach NS, Nielsen MS. Targeting predictors in random forest regression. International Journal of Forecasting. 2023;39(2):841–68.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref32] 32. He Y, Deng W. The compare between three training algorighms of multilayer perceptorns. Journal of Suzhou University (Engineering Science Edition). 2008:1–3.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref33] 33. Wang Z, Deng W. Journal of Qinghai Normal University (Natural Science Edition). 2007:37–9.
View Article
Google Scholar

[99] View Article

[100] Google Scholar

[ref34] 34. Pal SK, Mitra S. Multilayer perceptron, fuzzy sets, and classification. IEEE Trans Neural Netw. 1992;3(5):683–97. pmid:18276468
View Article
PubMed/NCBI
Google Scholar

[102] View Article

[103] PubMed/NCBI

[104] Google Scholar

[ref35] 35. Zhu Y, Xiong W, Fan W, Wu C. Predicting macro-mechanical properties of loess from basic physical properties using various machine learning methods. Environ Earth Sci. 2025;84(10).
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref36] 36. Ministry of Housing and Urban-Rural Development of the People’s Republic of China. Code for Design of Building Foundation. Peking: China Construction Industry Press; 2011.

[ref37] 37. Zhang L, Qi S, Ma L, Guo S, Li Z, Li G, et al. Three-dimensional pore characterization of intact loess and compacted loess with micron scale computed tomography and mercury intrusion porosimetry. Sci Rep. 2020;10(1):8511. pmid:32444623
View Article
PubMed/NCBI
Google Scholar

[110] View Article

[111] PubMed/NCBI

[112] Google Scholar

Figures

Abstract

1. Introduction

2. Study area overview and data source

2.1. Study area and sampling strategy

2.2. Research methods

2.2.1. Statistical analysis and parameter optimization.

2.2.2. Prediction model construction and evaluation.

3. Loess prediction model establishment

3.1. Selection of soil properties

3.2. Multiple regression prediction model

3.3. Random forest prediction model

3.4. Neural network prediction model

4. Discussion

4.1. Evaluation of indicator parameters

4.2. Model comparison and analysis

5. Conclusion

References