An RVFLNs ensemble modeling method integrating PCA and PSO: Application to yield prediction of Nongxiang Baijiu

Qiang Han; Yibo Xu; Suyi Zhang; Qinwen Deng; Lan Deng; Liang Zhang; Hui Qin; Jie Zhao; Bo Liu

doi:10.1371/journal.pone.0348784

Abstract

To investigate the mapping relationship between key process parameters and Baijiu yield during the steaming and distillation process of Baijiu fermented material (SDP-BFM) and to optimize these parameters for enhanced production efficiency, a Random Vector Functional Link Networks (RVFLNs) ensemble modeling method integrating Principal Component Analysis (PCA) and Particle Swarm Optimization (PSO) is proposed for yield prediction of Nongxiang Baijiu. First, to improve computational efficiency and avoid multicollinearity, PCA is applied to reduce the dimensionality of the high-dimensional output matrix of RVFLNs, following data cleaning and feature selection. Second, a PSO algorithm is introduced to optimize both the number of hidden layer nodes in sub-learners and the weight combination strategy of the ensemble linear regression method, ultimately achieving Baijiu yield prediction modeling based on the PSO-P-ERVFLNs algorithm. Comparative experiments demonstrate that the optimization strategy introduced by PSO can enhance the prediction accuracy of the RVFLNs algorithm and alleviate overfitting. Moreover, the proposed algorithm exhibits better computational efficiency and higher estimation accuracy, enabling accurate prediction of Baijiu yield during the SDP-BFM.

Citation: Han Q, Xu Y, Zhang S, Deng Q, Deng L, Zhang L, et al. (2026) An RVFLNs ensemble modeling method integrating PCA and PSO: Application to yield prediction of Nongxiang Baijiu. PLoS One 21(5): e0348784. https://doi.org/10.1371/journal.pone.0348784

Editor: Sheng Du, China University of Geosciences, CHINA

Received: May 19, 2025; Accepted: April 21, 2026; Published: May 6, 2026

Copyright: © 2026 Han et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and Supporting Information files.

Funding: This research was funded by the Luzhou Laojiao Co., Ltd. Horizontal Technology Projects [Grant No.HX2024046], Solid-state Brewing Technology Innovation Center of Sichuan [Grant No.GNLMLX202504] and Postgraduate Innovation Fund Project of Sichuan University of Science and engineering [Grant No.Y2024267].

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Chinese Baijiu, recognized as one of the world’s seven major distilled spirits, is a traditional liquor with a millennia-old heritage and unique fermentation craftsmanship. The production process of traditional solid-state fermented Baijiu involves three key stages: qu-making (including grinding materials, brick pressing, chamber placement, and regular turning), liquor making (mixing ingredients, steaming, spreading, adding qu, anaerobic pit fermentation, and distillation), followed by extended aging and expert blending [1,2]. In the Baijiu brewing process, the steaming and distillation process of Baijiu fermented material (SDP-BFM) is a critical step for extracting flavor compounds from the fermented grains [3–5]. This step largely determines the quality of the base liquor and the final distillation yield.

Therefore, developing a data-driven model to predict Baijiu yield during in the SDP-BFM is of great significance for exploring the relationship between key process parameters and production yield and for optimizing these parameters. However, most existing studies have focused on the impact of individual production indicators on Baijiu quality [6–8], with limited research on comprehensively analyzing all key factors influencing Baijiu yield during in the SDP-BFM [9].

Moreover, the SDP-BFM involves complex mechanisms, exhibiting strong nonlinearity and variable coupling [10]. Data-driven modeling methods do not require prior knowledge of these internal complexities and can construct predictive models solely through machine learning algorithms and industrial data processing [11], making such approaches a recent research hotspot in industrial process modeling [12–14]. For example, reference [15] used an artificial neural network to predict daily oil production based on operational parameters. Reference [16] developed a shale gas production forecast model using geological and engineering parameters. Reference [17] combined neuro-fuzzy inference with gene expression programming to construct a hybrid prediction model for electricity consumption, improving convergence speed and robustness while reducing computation time and model error. Given the effectiveness of these data-driven methods, building a yield prediction model for Baijiu represents a critical step toward full automation in Baijiu production.

In recent years, the Random Vector Functional-Link Network (RVFLNs) has gained popularity in industrial prediction due to its ability to handle complex nonlinear data, fast training speed, and strong generalization performance, effectively addressing issues such as long training time, sensitive hyperparameter tuning, and local minima in traditional neural networks [18–20]. For instance, reference [19] applied an improved RVFLNs with fixed hidden nodes to model molten iron quality, while reference [20] used an ensemble of RVFLNs with randomly generated hidden nodes within a certain range. Although these methods improved model performance to some extent, they neglected the optimization of the number of hidden nodes in RVFLNs, failing to control model complexity and potentially reducing prediction accuracy and generalization ability.

To address these issues, this study uses real-time operational data from an automated distillation system in a distillery along with Baijiu yield, aiming to solve two key problems: (1) extracting key variables relevant to the SDP-BFM through feature selection methods, and (2) Optimal selection of the number of hidden nodes in RVFLNs using intelligent optimization algorithms. First, to ensure model performance and reduce modeling complexity, feature selection is conducted by comparing the modeling performance of different feature selection methods, based on initial data cleaning and statistical analysis. Second, the number of hidden layer nodes is determined through comparative analysis of how modeling performance indicators vary with the number of nodes, thereby defining the sub-learners of the ensemble model. Finally, the intelligent optimization algorithm is employed to optimize both the hidden layer node number of sub-learners and the weight combination strategy of the ensemble linear regression method, ultimately establishing a Baijiu yield forecasting model based on the PSO-P-ERVFLNs algorithm. The aim is to accurately predict Baijiu yield during the SDP-BFM by exploring the complex mapping relationship between process parameters of the SDP-BFM and Baijiu yield.

2. Research methodology

2.1. RVFLNs base learners

RVFLNs were originally proposed by Pao and Takefuji [21] as a novel single-hidden-layer feedforward neural network architecture. The defining characteristic of RVFLNs lies in their randomized parameter initialization: both input weights and hidden layer biases are randomly assigned within specified bounds, while output weights are analytically determined through pseudo-inverse based least squares estimation. This unique design endows RVFLNs with several notable advantages, including rapid training speed, excellent generalization capability, and inherent suitability for robust modeling applications.

For N distinct arbitrary sample sets:

(1)

A single-hidden-layer feedforward neural network with hidden nodes and activation function can be mathematically expressed as:

(2)

where represents the weight matrix between the i-th hidden layer node and the input layer neurons, denotes the output weight matrix between the i-th hidden layer node and the output nodes, and is the bias term of the i-th hidden layer node.

Training traditional RVFLNs is equivalent to minimizing the following cost function:

(3)

2.2. PCA-based dimensionality reduction of RVFLNs hidden layer outputs

In RVFLNs, the randomness of input weights and hidden layer biases leads to multicollinearity issues in the hidden layer output matrix. This results in numerous redundant or low-contribution neurons, which not only complicates the network architecture but also reduces computational efficiency [22]. To address this issue, literature [23] applied PCA technology to perform dimensionality reduction on the hidden layer output matrix of RVFLNs, achieving significantly improved results. The structure of this P-RVFLNs algorithm is illustrated in Fig 1.

Download:

Fig 1. Architecture diagram of the principal component analysis (PCA)-enhanced RVFLNs algorithm (P-RVFLNs).

https://doi.org/10.1371/journal.pone.0348784.g001

The PCA-based data dimensionality reduction process mainly involves the following steps: computing the covariance matrix, calculating eigenvalues and eigenvectors, determining the variance contribution rate and cumulative variance contribution rate of principal components to extract and interpret them. One primary objective of PCA is to reduce the number of variables while retaining maximal information from the original dataset. Typically, the top D principal components corresponding to eigenvalues with cumulative contribution rates between 85% and 95% are selected. As shown in Equation (4):

(4)

Rewriting the above in matrix form:

(5)

where , .

2.3. Particle swarm optimization (PSO) algorithm

The Particle Swarm Optimization (PSO) algorithm is a heuristic global optimization technique. It operates by simulating a population of particles, each characterized by its own velocity and position in the search space. Through a combination of individual experience and group learning, the particles continuously adjust their movement, thereby progressively converging toward optimal solutions [24].

The implementation of the PSO algorithm begins with the initialization of a swarm of particles with random positions and velocities. For each particle, the fitness value of its current position is computed and used to update its personal best known position. Subsequently, both the individual particle’s best position and the global best position found by the entire population are utilized to update the velocity and position of each particle, guiding their trajectories toward promising regions of the search space.

The velocity update equation and the position update equation of the PSO algorithm are given in Equation (6) and Equation (7), respectively:

(6)

(7)

where denotes the position of the i-th particle at the t-th iteration, represents its velocity at the t-th iteration, refers to the personal best position of the i-th particle, and indicates the global best position. is the inertia weight, while and are the cognitive and social learning factors, respectively. is a random number between 0 and 1. By continuously adjusting the positions and velocities of particles, the PSO algorithm can effectively search for the optimal solution to a problem [25].

2.4. Algorithm structure and implementation steps

An ensemble model is a machine learning approach that combines the predictions of multiple base learners to significantly improve model performance, robustness, and generalization ability. Compared with a single learner model, ensemble models offer higher prediction accuracy and stronger generalization capability [26,27]. The most common ensemble algorithms include Bagging, Stacking, and Boosting. The core idea of Bagging is to generate multiple training subsets from the original dataset through bootstrap sampling, train a base learner on each subset, and finally integrate the regression results of these learners to obtain the final prediction. This approach can effectively enhance prediction accuracy and prevent overfitting. Therefore, to ensure the reliability and stability of liquor yield prediction during the SDP-BFM, an RVFLNs ensemble modeling method integrating PCA and PSO is proposed for predicting Baijiu yield in the SDP-BFM. The overall structure is shown in Fig 2.

Download:

Fig 2. Structure of the PSO-P-ERVFLNs (an RVFLNs ensemble modeling method integrating PCA and PSO) Algorithm.

https://doi.org/10.1371/journal.pone.0348784.g002

Finally, the implementation steps of the PSO-P-ERVFLNs algorithm-based modeling are as follows:

Given the initial dataset , perform data cleaning and statistical analysis, and conduct feature selection by comparing the modeling performance of different feature selection methods.
Determine the sub-learners P-RVFLNs of the ensemble model by analyzing how modeling performance metrics vary with the number of hidden layer nodes.
Divide the cleaned and feature-selected dataset into a training set and a testing set . Apply bootstrap sampling on to generate N training subsets (here, N = 15).
Train 15 P-RVFLNs-based sub-learners on the training subsets . Each sub-learner is generated through perturbation of the base learner, with specific strategies as follows: five P-RVFLNs sub-learners using Sigmoid, ReLU, and Hardlim activation functions, respectively. The weights and biases of each sub-learner are randomly selected within the range [−1, 1], and the number of hidden layer nodes is optimized using the PSO algorithm.
Use all sub-learners to predict the testing set . Optimize the weight combination strategy of the ensemble linear regression method with PSO, take the weighted sum of sub-learners’ predictions as the final result, and evaluate the model performance.

3. Experimental results and discussion

This study collected data on Baijiu production process parameters from an automated brewing workshop of a distillery in Luzhou City, Sichuan Province, during November and December 2024. Based on the brewing technology and the configuration of relevant instruments, 12 key process variables affecting Baijiu yield during the SDP-BFM were identified. The characteristics of the corresponding dataset are summarized in Table 1.

Download:

Table 1. Dataset Feature Descriptions.

https://doi.org/10.1371/journal.pone.0348784.t001

3.1. Data cleaning

Firstly, the collected data were segmented according to individual distillation batches based on the Baijiu brewing process, resulting in a total of 483 datasets comprising production process parameters and corresponding Baijiu yield values. Subsequently, the Interquartile Range (IQR) method was employed to detect outliers [28], identifying a total of 44 anomalous data points, which were visualized using histogram plots. Missing values were then imputed using the K-Nearest Neighbors algorithm. Finally, the statistical analysis results after outlier detection and Min–Max normalization are shown in Fig 3.

Download:

Fig 3. Data Processing Results.

(A) presents histograms of four features—V2, V3, V7, and V8—that contained more captured outliers. As shown in (A), the effectiveness of the IQR method in detecting data outliers can be more clearly and intuitively illustrated. (B) provides a clear statistical analysis of the dataset. These data processing methods improve the accuracy of data analysis and provide a reliable foundation for subsequent modeling.

https://doi.org/10.1371/journal.pone.0348784.g003

3.2. Optimal feature selection

The selection of feature variables is crucial for the predictive performance of the constructed model [29]. In this study, Pearson correlation analysis was first conducted to evaluate feature relevance. Features with correlation coefficients below 0.2 relative to the target variable were eliminated to exclude low-relevance variables, while those exhibiting inter-correlation above 0.8 were removed to mitigate multicollinearity. Subsequently, a Random Forest regression model optimized via grid search was employed to rank feature importance. The top features, equal in number to those retained after Pearson filtering, were selected as the final feature subset.

The feature sets obtained from the Pearson method, the Random Forest algorithm, as well as the full unselected feature set, were each used as input to individual RVFLNs base learners for modeling. The weights and biases of the hidden layer in each base learner were randomly initialized within the interval [−1, 1] following reference [30]. The number of hidden nodes was uniformly set to 20, and the sigmoid function was adopted as the activation function. The dataset was split into training and testing sets with a ratio of 8:2. The results of the feature optimization process are summarized in Table 2, and a statistical overview of the dataset after feature optimization using Random Forest is provided in Fig 4.

Download:

Table 2. Results of the Feature Optimization.

https://doi.org/10.1371/journal.pone.0348784.t002

Download:

Fig 4. Histogram of Statistical Analysis of the Dataset after Feature Selection by the Random Forest Algorithm.

https://doi.org/10.1371/journal.pone.0348784.g004

As shown in Table 2, the feature set selected by the Random Forest algorithm yielded the best performance, achieving the lowest Root Mean Square Error (RMSE) and the highest coefficient of determination (R²). Therefore, this feature set was chosen as the input for subsequent modeling. Compared with the unselected feature set, the results demonstrate that appropriate variable selection and screening are essential for constructing accurate and reliable prediction models.

3.3. Baijiu yield prediction experiments during the SDP-BFM

After data cleaning and feature selection, the proposed PSO-P-ERVFLNs algorithm is implemented based on bootstrap sampling for sample perturbation, with additional parameter perturbations applied to the base learners. The specific strategies are as follows: (1) the feature set selected by the random forest is used as the input of the subsequent model, with the training and testing sets divided at a ratio of 8:2; (2) the weights and biases of each sub-model’s hidden layer are randomly selected within the range of [−1, 1], following the approach in [30]; (3) this study investigates the changes in RMSE of the training and testing sets for both the RVFLNs and P-RVFLNs algorithms as the number of hidden layer nodes increases, in order to examine the algorithm’s ability to address overfitting, as shown in Fig 5.

Download:

Fig 5. Sensitivity Analysis of Hidden Layer Node Count.

Curve of RMSE versus Number of Hidden Layer Nodes for RVFLNs and P-RVFLNs Sub-learners with Different Activation Functions.

https://doi.org/10.1371/journal.pone.0348784.g005

As shown in Fig 5(A), with the increase in the number of hidden layer nodes, the RVFLNs algorithm exhibits overfitting: the RMSE of the training set shows a downward trend, while that of the testing set shows an upward trend. In contrast, Fig 5(B) illustrates that, except for the Gaussian activation function, the RMSEs of both the training and testing sets for the P-RVFLNs algorithm decrease and eventually stabilize as the number of hidden layer nodes increases. This improvement can be attributed to the PCA technique introduced in P-RVFLNs, which reduces the dimensionality of the high-dimensional hidden layer output matrix.

Meanwhile, to avoid the influence of activation function selection on model performance, only P-RVFLNs sub-learners with Sigmoid, ReLU, and Hardlim activation functions were retained, with five sub-learners of each type, generating a total of 15 RVFLNs sub-learners. The particle swarm optimization (PSO) algorithm was then applied to optimize both the number of hidden layer nodes of the P-RVFLNs sub-learners and the weight combination strategy of the ensemble linear regression method, ultimately achieving Baijiu yield prediction modeling based on the PSO-P-ERVFLNs algorithm. Fig 6 presents the prediction performance of each P-RVFLNs sub-model and the ensemble model before PSO optimization.

Download:

Fig 6. Modeling Effect Comparison Chart.

(A) Performance of each sub-model. (B) Performance of the ensemble model. (C) Comparative analysis of model error curves.

https://doi.org/10.1371/journal.pone.0348784.g006

Fig 6(A) shows the prediction performance of each P-ERVFLNs sub-model and the ensemble model after optimization with the particle swarm optimization algorithm. Fig 6(B) presents the prediction results of the ensemble model based on PSO-P-ERVFLNs. Fig 6(C) compares the probability density functions (PDFs) of prediction errors before and after optimization to verify the effectiveness of the proposed PSO-based optimization. From the error PDFs, it can be observed that the liquor yield prediction model optimized with the PSO algorithm achieves higher accuracy.

Although the PSO-P-ERVFLNs model can establish the mapping relationship between process parameters and Baijiu yield, it lacks interpretability. Therefore, the SHapley Additive explanation (SHAP) interpretation model was applied for further analysis, and the specific results are shown in Fig 7. Fig 7(A) presents the feature importance bar chart, indicating the influence level of each parameter, while Fig 7(B) shows the SHAP scatter plot, illustrating the influence range and pattern of each parameter. As shown in Fig 7, during the SDP-BFM, variable V5 has the greatest impact on Baijiu yield, followed by V10, V7, V9, and V2.

Download:

Fig 7. Results of the SHapley Additive exPlanations (SHAP) Model Analysis.

(A) Reflects the overall impact intensity of each parameter on the model output; the higher the value, the greater the impact. (B) Shows the distribution of SHAP values for each parameter, indicating the direction and range of each parameter’s influence on the prediction results; the magnitude reflects the strength of the positive contribution.

https://doi.org/10.1371/journal.pone.0348784.g007

Finally, to further enhance the robustness of the model and verify its generalization capability, a ten-fold cross-validation was conducted to evaluate the performance of the PSO-P-ERVFLNs model [31,32], and model sensitivity analysis was performed by adding Gaussian noise of different intensities to the test set. The results of the ten-fold cross-validation and sensitivity analysis are shown in Table 3.

Download:

Table 3. Results of Ten-Fold Cross-Validation and Sensitivity Analysis.

https://doi.org/10.1371/journal.pone.0348784.t003

As shown in Table 3, after applying ten-fold cross-validation, the RMSE has an average value of 1.2611 with a standard deviation of 0.1526, indicating small error fluctuations across the folds. The MAPE is 4.59%, with a standard deviation of less than 0.5%, suggesting that the model’s average relative error is very low and its stability is high. The average coefficient of determination is approximately 0.83, with a standard deviation of only 0.0466, demonstrating good generalization consistency of the model.

In addition, under different levels of Gaussian noise intensity, the variations in model performance metrics remain minimal, with no significant performance fluctuations observed. This indicates that the model’s prediction results are stable and not sensitive to input perturbations. Therefore, the PSO-P-ERVFLNs model exhibits strong robustness and noise resistance, making it suitable for liquor yield prediction tasks that involve certain measurement errors.

3.4. Comparative experiments

To further verify the effectiveness and superiority of the proposed algorithm in Baijiu yield prediction during the SDP-BFM, the PSO-P-ERVFLNs algorithm is compared with the P-RVFLNs algorithm, P-ERVFLNs algorithm, GA-P-ERVFLNs algorithm, HO-P-ERVFLNs algorithm, as well as the Back Propagation Neural Network (BPNN), LightGBM algorithm, CatBoost algorithm and XGBoost algorithm, using the same dataset for predictive experiments. Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), and the coefficient of determination (R²) are employed to evaluate the prediction performance of models constructed by different algorithms. The results of the comparative experiments are reported as the average values of five independent runs. The parameter settings and evaluation results of the algorithms are summarized in Table 4. The Baijiu yield prediction results modeled by different algorithms are illustrated in Fig 8.

Download:

Table 4. Details of Model Parameters and Performance Evaluation Results for Baijiu Yield Prediction with Different Algorithms.

https://doi.org/10.1371/journal.pone.0348784.t004

Download:

Fig 8. Prediction Results of Baijiu Yield Modeling with Different Algorithms.

https://doi.org/10.1371/journal.pone.0348784.g008

The number of hidden layer nodes in the P-ERVFLNs algorithm is determined in Section 3.3, based on the RMSE variation curves of RVFLNs and P-RVFLNs sub-learners with the same activation function. Specifically, the top five hidden layer node settings with the lowest RMSE values of the P-RVFLNs sub-learners are selected. For the GA-P-ERVFLNs, HO-P-ERVFLNs, and PSO-P-ERVFLNs algorithms, the number of hidden layer nodes is optimized within the range [1, 200] using the GA, HO, and PSO algorithms, respectively. The hyperparameters of XGBoost, LightGBM and CatBoost are optimized using grid search.

As shown in Table 4 and Fig 8, by comparing the PSO-P-ERVFLNs algorithm with BPNN, XGBoost, LightGBM, and CatBoost, it can be observed that the model constructed by PSO-P-ERVFLNs demonstrates better computational efficiency and higher estimation accuracy. Furthermore, by comparing the RMSE performance metrics of the P-RVFLNs, P-ERVFLNs, GA-P-ERVFLNs, HO-P-ERVFLNs, and PSO-P-ERVFLNs algorithms, it can be observed that incorporating PCA for output matrix dimensionality reduction, together with applying the PSO algorithm to optimize both the hidden layer node number of P-RVFLNs sub-learners and the weight combination strategy of ensemble linear regression, can effectively improve the prediction accuracy of the RVFLNs algorithm and mitigate overfitting issues.

4. Conclusion

In this study, an RVFLNs ensemble modeling method integrating PCA and PSO is proposed to establish a Baijiu yield prediction model for the SDP-BFM. First, the feature variables selected by the Pearson correlation coefficient and random forest algorithm, as well as those without selection, are separately used as inputs to RVFLNs sub-learners for modeling. By comparing the RMSE performance metrics of the models, it is found that the feature set selected by the random forest yields the best results. Second, through comparative analysis of modeling performance metrics with varying numbers of hidden layer nodes, the sub-learners of the ensemble are determined to be P-RVFLNs. Then, on this basis, the particle swarm PSO algorithm is employed to optimize both the hidden layer node number of sub-learners and the weight combination strategy of ensemble linear regression, ultimately realizing Baijiu yield prediction modeling based on the PSO-P-ERVFLNs algorithm.

Finally, by comparing the RMSE performance metrics of the P-RVFLNs, P-ERVFLNs, GA-P-ERVFLNs, HO-P-ERVFLNs, and PSO-P-ERVFLNs algorithms, it is shown that incorporating PCA for dimensionality reduction of the output matrix, along with PSO for optimizing the hidden layer node number of P-RVFLNs sub-learners and the weight combination strategy of ensemble linear regression, can indeed improve the prediction accuracy of the RVFLNs algorithm and mitigate overfitting to a certain extent. Moreover, the model constructed by the PSO-P-ERVFLNs algorithm demonstrates better computational efficiency and higher estimation accuracy, enabling accurate prediction of Baijiu yield during the SDP-BFM. Nevertheless, although the proposed PSO-P-ERVFLNs algorithm performs well on the experimental dataset, due to limitations in current Baijiu production processes and data collection techniques, it has not been validated on severely nonlinear data. Therefore, its capability to handle highly dynamic or strongly nonlinear data may be limited.

Supporting information

S1 File. Raw Data.

https://doi.org/10.1371/journal.pone.0348784.s001

(CSV)

References

1. Tao W, Chen X, Chen J, Liu L, Zhang T, Wang Q. Chinese Baijiu: The perfect works of microorganisms. Frontiers in Microbiology. 2022;13:919044.
- View Article
- Google Scholar
2. Jin GY, Yuan SK, Tang QY, Sun Y, Jin SW, Wu JF. Progress in the intelligentization of Nongxiangxing Baijiu production. Food Ferment Ind. 2024;50:341–8.
- View Article
- Google Scholar
3. Shen FY. Comprehensive Manual of Baijiu Production Technology. Beijing: China Light Industry Press; 1998.
4. Li G, Zhang J. Research progress of distilled spirit with solid and liquid distillation technologies—taking baijiu and brandy for example. Chin Brew. 2020;39(7):13–6.
- View Article
- Google Scholar
5. Ma Z, Lan H, Zhao D, Wei J, Zheng J, Su J, et al. Widely targeted metabolomics analysis reveals the dynamically changes in volatile metabolites during xiaoqu baijiu fermentation. Food Chem X. 2025;25:102185. pmid:39897971
- View Article
- PubMed/NCBI
- Google Scholar
6. Gao Y, Qin R, Jin G, Zhang R, Chen S, Xu Y. Comprehensive evaluation of Chinese baijiu solid-state distillation operating conditions effect on aroma compounds distillation based on entropy weight-TOPSIS analysis. Food Bioscience. 2024;58:103705.
- View Article
- Google Scholar
7. Xu HF, Wang HY, Tian SR, Pan D, Zhu T. Effect of Steam Volume and Tail Liquor Volume on the Yield of Sauce-Flavor Baijiu. Chin Brew. 2019;38(5):121–4.
- View Article
- Google Scholar
8. Li H, Huang W, Shen C, Yi B. Optimization of the distillation process of Chinese liquor by comprehensive experimental investigation. Food and Bioproducts Processing. 2012;90(3):392–8.
- View Article
- Google Scholar
9. Li H. A research of liquor yield prediction and production process parameters optimization algorithm. Chengdu: University of Electronic Science and Technology of China. 2020.
10. Xu D, Sun YQ, Liu Y, Yang JP, Ye F, Liu RH. Characteristics of the temperature field inside the steaming bucket during baijiu distillation. Chin Brew. 2023;42(7):72–7.
- View Article
- Google Scholar
11. Zhou P, Wang H, Chai TY. Data-driven modeling, control and monitoring: A blast furnace ironmaking process case study. Beijing: Science Press. 2022.
12. Salman MA, Mahdi MA, Al-Janabi S. A GMEE-WFED system: optimizing wind turbine distribution for enhanced renewable energy generation in the future. Int J Comput Intell Syst. 2024;17(5).
- View Article
- Google Scholar
13. Zhou P, Jiang Y, Wen C, Dai X. Improved Incremental RVFL with compact structure and its application in quality prediction of blast furnace. IEEE Trans Ind Inf. 2021;17(12):8324–34.
- View Article
- Google Scholar
14. Jin G, Boeschoten S, Hageman J, Zhu Y, Wijffels R, Rinzema A, et al. Identifying variables influencing traditional food solid-state fermentation by statistical modeling. Foods. 2024;13(9):1317. pmid:38731688
- View Article
- PubMed/NCBI
- Google Scholar
15. Al-Janabi S, Mohammed G. An intelligent returned energy model of cell and grid using a gain sharing knowledge enhanced long short-term memory neural network. J Supercomput. 2023;80(5):5756–814.
- View Article
- Google Scholar
16. Al-Janabi S, Al-Barmani Z. Intelligent multi-level analytics of soft computing approach to predict water quality index (IM12CP-WQI). Soft Comput. 2023;27(12):7831–61.
- View Article
- Google Scholar
17. Bakare MS, Abdulkarim A, Shuaibu AN, Muhamad MM. A hybrid long-term industrial electrical load forecasting model using optimized ANFIS with gene expression programming. Energy Reports. 2024;11:5831–44.
- View Article
- Google Scholar
18. Zhou P, Li W, Wang H, Li M, Chai T. Robust Online Sequential RVFLNs for Data modeling of dynamic time-varying systems with application of an ironmaking blast furnace. IEEE Trans Cybern. 2020;50(11):4783–95. pmid:31226096
- View Article
- PubMed/NCBI
- Google Scholar
19. Zhou P, Zhang L, Li WP, Dai P, Chai TY. Autoencoder and PCA Based RVFLNs modeling for multivariate molten iron quality in blast furnace ironmaking. Acta Autom Sin. 2018;44(10):1799–811.
- View Article
- Google Scholar
20. Zhou P, Wu ZW, Zhang RY, Wu YJ. Intelligent prediction of burning through point based on sparse-representation pruning ensemble modeling. Control Theory Appl. 2024;41(3):436–46.
- View Article
- Google Scholar
21. Pao YH, Takefuji Y. Functional-link net computing: Theory, system architecture, and functionalities. Computer. 1992;25(5):76–9.
- View Article
- Google Scholar
22. Zhou P, Xie J, Li W, Wang H, Chai T. Robust neural networks with random weights based on generalized M-estimation and PLS for imperfect industrial data modeling. Control Engineering Practice. 2020;105:104633.
- View Article
- Google Scholar
23. Zhang H, Yin Y, Zhang S. An improved ELM algorithm for the measurement of hot metal temperature in blast furnace. Neurocomputing. 2016;174:232–7.
- View Article
- Google Scholar
24. Chai Y, Liu Z, Cao W, Yang Q, Chen G. Research on power prediction using particle swarm optimized wavelet neural network. Journal of Nanjing Normal University (Natural Science Edition). 2025;48(03):129–38.
- View Article
- Google Scholar
25. Yi-min MO, Zihao YU, Yang LIN. Research on optimization of connector structure parameters based on particle swarm. Journal of Wuhan University of Technology. 2022;44(12):105–13.
- View Article
- Google Scholar
26. Zhou ZH, Wu JX, Zhang W, Tang W. Ensembling Neural Networks: Many Could Be Better Than All. Artificial Intelligence. 2002;137(1–2):239–63.
- View Article
- Google Scholar
27. Zhang B, Liu Q, Wu X, Chen W. Remaining useful lifetime prediction of turbofan engine based on ensemble learning. China Measurement & Test. 2022;48(07):47–52.
- View Article
- Google Scholar
28. Yang M, Wei X, Jia Q, Zhao J, Xiao L. Instability model and classification prediction of double-side box steel-concrete composite girder. Railway Engineering. 2024;64(12):50–5.
- View Article
- Google Scholar
29. Zhao Z, Liang L, Liu S. Dynamic soft measurement model of NO x concentration at the outlet of SNCR denitrification system based on variable selection and POA-NARX. Journal of Chinese Society of Power Engineering. 2025;45(04):592–601.
- View Article
- Google Scholar
30. Schmidt WF, Kraaijveld MA, Duin RPW. Feedforward neural networks with random weights. Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems. 1–4. https://doi.org/10.1109/icpr.1992.201708
31. Al-Janabi S, Al-Jaberi ZA. Development of deep learning method for predicting DC power based on renewable solar energy and multi-parameters function. Neural Comput Appl. 2023;35(21):15273–94.
- View Article
- Google Scholar
32. Al-Janabi S, Alkaim A, Al-Janabi E, Aljeboree A, Mustafa M. Intelligent forecaster of concentrations (PM2.5, PM10, NO2, CO, O3, SO2) caused air pollution (IFCsAP). Neural Comput & Applic. 2021;33(21):14199–229.
- View Article
- Google Scholar

[ref1] 1. Tao W, Chen X, Chen J, Liu L, Zhang T, Wang Q. Chinese Baijiu: The perfect works of microorganisms. Frontiers in Microbiology. 2022;13:919044.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Jin GY, Yuan SK, Tang QY, Sun Y, Jin SW, Wu JF. Progress in the intelligentization of Nongxiangxing Baijiu production. Food Ferment Ind. 2024;50:341–8.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Shen FY. Comprehensive Manual of Baijiu Production Technology. Beijing: China Light Industry Press; 1998.

[ref4] 4. Li G, Zhang J. Research progress of distilled spirit with solid and liquid distillation technologies—taking baijiu and brandy for example. Chin Brew. 2020;39(7):13–6.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref5] 5. Ma Z, Lan H, Zhao D, Wei J, Zheng J, Su J, et al. Widely targeted metabolomics analysis reveals the dynamically changes in volatile metabolites during xiaoqu baijiu fermentation. Food Chem X. 2025;25:102185. pmid:39897971
View Article
PubMed/NCBI
Google Scholar

[12] View Article

[13] PubMed/NCBI

[14] Google Scholar

[ref6] 6. Gao Y, Qin R, Jin G, Zhang R, Chen S, Xu Y. Comprehensive evaluation of Chinese baijiu solid-state distillation operating conditions effect on aroma compounds distillation based on entropy weight-TOPSIS analysis. Food Bioscience. 2024;58:103705.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref7] 7. Xu HF, Wang HY, Tian SR, Pan D, Zhu T. Effect of Steam Volume and Tail Liquor Volume on the Yield of Sauce-Flavor Baijiu. Chin Brew. 2019;38(5):121–4.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref8] 8. Li H, Huang W, Shen C, Yi B. Optimization of the distillation process of Chinese liquor by comprehensive experimental investigation. Food and Bioproducts Processing. 2012;90(3):392–8.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref9] 9. Li H. A research of liquor yield prediction and production process parameters optimization algorithm. Chengdu: University of Electronic Science and Technology of China. 2020.

[ref10] 10. Xu D, Sun YQ, Liu Y, Yang JP, Ye F, Liu RH. Characteristics of the temperature field inside the steaming bucket during baijiu distillation. Chin Brew. 2023;42(7):72–7.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref11] 11. Zhou P, Wang H, Chai TY. Data-driven modeling, control and monitoring: A blast furnace ironmaking process case study. Beijing: Science Press. 2022.

[ref12] 12. Salman MA, Mahdi MA, Al-Janabi S. A GMEE-WFED system: optimizing wind turbine distribution for enhanced renewable energy generation in the future. Int J Comput Intell Syst. 2024;17(5).
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref13] 13. Zhou P, Jiang Y, Wen C, Dai X. Improved Incremental RVFL with compact structure and its application in quality prediction of blast furnace. IEEE Trans Ind Inf. 2021;17(12):8324–34.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref14] 14. Jin G, Boeschoten S, Hageman J, Zhu Y, Wijffels R, Rinzema A, et al. Identifying variables influencing traditional food solid-state fermentation by statistical modeling. Foods. 2024;13(9):1317. pmid:38731688
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref15] 15. Al-Janabi S, Mohammed G. An intelligent returned energy model of cell and grid using a gain sharing knowledge enhanced long short-term memory neural network. J Supercomput. 2023;80(5):5756–814.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref16] 16. Al-Janabi S, Al-Barmani Z. Intelligent multi-level analytics of soft computing approach to predict water quality index (IM12CP-WQI). Soft Comput. 2023;27(12):7831–61.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref17] 17. Bakare MS, Abdulkarim A, Shuaibu AN, Muhamad MM. A hybrid long-term industrial electrical load forecasting model using optimized ANFIS with gene expression programming. Energy Reports. 2024;11:5831–44.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref18] 18. Zhou P, Li W, Wang H, Li M, Chai T. Robust Online Sequential RVFLNs for Data modeling of dynamic time-varying systems with application of an ironmaking blast furnace. IEEE Trans Cybern. 2020;50(11):4783–95. pmid:31226096
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref19] 19. Zhou P, Zhang L, Li WP, Dai P, Chai TY. Autoencoder and PCA Based RVFLNs modeling for multivariate molten iron quality in blast furnace ironmaking. Acta Autom Sin. 2018;44(10):1799–811.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref20] 20. Zhou P, Wu ZW, Zhang RY, Wu YJ. Intelligent prediction of burning through point based on sparse-representation pruning ensemble modeling. Control Theory Appl. 2024;41(3):436–46.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref21] 21. Pao YH, Takefuji Y. Functional-link net computing: Theory, system architecture, and functionalities. Computer. 1992;25(5):76–9.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref22] 22. Zhou P, Xie J, Li W, Wang H, Chai T. Robust neural networks with random weights based on generalized M-estimation and PLS for imperfect industrial data modeling. Control Engineering Practice. 2020;105:104633.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref23] 23. Zhang H, Yin Y, Zhang S. An improved ELM algorithm for the measurement of hot metal temperature in blast furnace. Neurocomputing. 2016;174:232–7.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref24] 24. Chai Y, Liu Z, Cao W, Yang Q, Chen G. Research on power prediction using particle swarm optimized wavelet neural network. Journal of Nanjing Normal University (Natural Science Edition). 2025;48(03):129–38.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref25] 25. Yi-min MO, Zihao YU, Yang LIN. Research on optimization of connector structure parameters based on particle swarm. Journal of Wuhan University of Technology. 2022;44(12):105–13.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref26] 26. Zhou ZH, Wu JX, Zhang W, Tang W. Ensembling Neural Networks: Many Could Be Better Than All. Artificial Intelligence. 2002;137(1–2):239–63.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref27] 27. Zhang B, Liu Q, Wu X, Chen W. Remaining useful lifetime prediction of turbofan engine based on ensemble learning. China Measurement & Test. 2022;48(07):47–52.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref28] 28. Yang M, Wei X, Jia Q, Zhao J, Xiao L. Instability model and classification prediction of double-side box steel-concrete composite girder. Railway Engineering. 2024;64(12):50–5.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref29] 29. Zhao Z, Liang L, Liu S. Dynamic soft measurement model of NO x concentration at the outlet of SNCR denitrification system based on variable selection and POA-NARX. Journal of Chinese Society of Power Engineering. 2025;45(04):592–601.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref30] 30. Schmidt WF, Kraaijveld MA, Duin RPW. Feedforward neural networks with random weights. Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems. 1–4. https://doi.org/10.1109/icpr.1992.201708

[ref31] 31. Al-Janabi S, Al-Jaberi ZA. Development of deep learning method for predicting DC power based on renewable solar energy and multi-parameters function. Neural Comput Appl. 2023;35(21):15273–94.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref32] 32. Al-Janabi S, Alkaim A, Al-Janabi E, Aljeboree A, Mustafa M. Intelligent forecaster of concentrations (PM2.5, PM10, NO2, CO, O3, SO2) caused air pollution (IFCsAP). Neural Comput & Applic. 2021;33(21):14199–229.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

Figures

Abstract

1. Introduction

2. Research methodology

2.1. RVFLNs base learners

2.2. PCA-based dimensionality reduction of RVFLNs hidden layer outputs

2.3. Particle swarm optimization (PSO) algorithm

2.4. Algorithm structure and implementation steps

3. Experimental results and discussion

3.1. Data cleaning

3.2. Optimal feature selection

3.3. Baijiu yield prediction experiments during the SDP-BFM

3.4. Comparative experiments

4. Conclusion

Supporting information

S1 File. Raw Data.

References