Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A new design of wind power prediction method based on multi-interaction optimization informer model

  • Wenjuan Zhou ,

    Roles Data curation, Formal analysis, Writing – original draft, Writing – review & editing

    2208085810@qq.com

    Affiliations Hunan Institute of Engineering, School of Electrical and Information Engineering, 88 Fuxing East Road, Yuetang, Xiangtan, Hunan, China, Chongqing Jiangdong Machinery Co, Ltd., Chongqing, China

  • Feng Huang,

    Roles Conceptualization, Investigation

    Affiliations Hunan Institute of Engineering, School of Electrical and Information Engineering, 88 Fuxing East Road, Yuetang, Xiangtan, Hunan, China, Chongqing Jiangdong Machinery Co, Ltd., Chongqing, China

  • Bing Wei,

    Roles Conceptualization, Funding acquisition

    Affiliations Hunan Institute of Engineering, School of Electrical and Information Engineering, 88 Fuxing East Road, Yuetang, Xiangtan, Hunan, China, Chongqing Jiangdong Machinery Co, Ltd., Chongqing, China

  • Liang Li,

    Roles Investigation, Methodology

    Affiliations Hunan Institute of Engineering, School of Electrical and Information Engineering, 88 Fuxing East Road, Yuetang, Xiangtan, Hunan, China, Chongqing Jiangdong Machinery Co, Ltd., Chongqing, China

  • Shixi Dai,

    Roles Project administration, Resources

    Affiliations Hunan Institute of Engineering, School of Electrical and Information Engineering, 88 Fuxing East Road, Yuetang, Xiangtan, Hunan, China, Chongqing Jiangdong Machinery Co, Ltd., Chongqing, China

  • Xin Xie,

    Roles Project administration, Software, Supervision

    Affiliations Hunan Institute of Engineering, School of Electrical and Information Engineering, 88 Fuxing East Road, Yuetang, Xiangtan, Hunan, China, Chongqing Jiangdong Machinery Co, Ltd., Chongqing, China

  • Youyuan Peng,

    Roles Software

    Affiliations Hunan Institute of Engineering, School of Electrical and Information Engineering, 88 Fuxing East Road, Yuetang, Xiangtan, Hunan, China, Chongqing Jiangdong Machinery Co, Ltd., Chongqing, China

  • Hong You

    Roles Visualization

    Affiliations Hunan Institute of Engineering, School of Electrical and Information Engineering, 88 Fuxing East Road, Yuetang, Xiangtan, Hunan, China, Chongqing Jiangdong Machinery Co, Ltd., Chongqing, China

Abstract

The accurate prediction of wind power is imperative for maintaining grid stability. In order to address the limitations of traditional neural network algorithms, the Informer model is employed for wind power prediction, delivering higher accuracy. However, due to insufficient exploration of dynamic coupling among multi-source features and inadequate data health status perception, both prediction accuracy and computational efficiency deteriorate under complex working conditions.This study proposes a prediction framework for the Informer model based on multi-source feature interaction optimization (MFIO-Informer). Integrating physical feature collaborative analysis with data health status perception has been shown to enhance prediction accuracy and reduce computation time. First, the Lasso algorithm and Pearson correlation coefficient method are applied to screen key multi-source features from wind turbine operation and maintenance data, quantifying their dynamic correlations with power output. Secondly, a fully-connected neural network (FNN) is employed to establish a hidden coupling model of wind speed, blade deflection angle, and power for extracting the Dynamic Synergistic Coefficient (DSC), which characterizes equipment performance. Subsequently, a health assessment of wind turbine data is conducted, leveraging historical power data and DSC. This assessment yields a health matrix, which is instrumental in optimizing the encoding, decoding, and embedding vector prediction processes of the Informer model. Finally, power prediction experiments are conducted on two public wind power datasets using the proposed MFIO-Informer model.The experimental results demonstrate that, in comparison with the traditional Informer model, the MFIO-Informer model attains approximately 20% higher prediction accuracy and 54.85% faster prediction speed.

Introduction

Fluctuations in wind cause instability in wind power generation [1], affecting grid operation after connection. To ensure grid safety, power prediction and timely regulation are essential [2]. However, the strong randomness of wind power (which is affected by environmental factors such as wind speed and turbulence) and the progressive degradation of equipment performance (e.g., blade wear and gearbox aging) present dual challenges for traditional prediction models operating under complex conditions: decreased accuracy and inadequate timeliness.

In recent years, deep learning models (e.g., long short-term memory (LSTM) and transformer) have significantly improved the accuracy of wind power prediction by capturing temporal dependencies [3]. Nevertheless, traditional models are limited in modeling multi-source feature coupling and data health perception, which restricts their practical application [4]. To enhance prediction speed and accuracy, researchers have explored two major approaches: First, statistical learning methods (e.g., ARIMA and support vector machines) simplify prediction through linear assumptions, yet they struggle to characterize nonstationary correlations between wind speed and power [5]. Second, deep neural networks (e.g., CNN-LSTM hybrid models) excel at mining local spatiotemporal features, yet they are limited in their ability to perceive data health [6].

Deep neural networks use fixed encoding structures that lack the ability to adapt dynamically to wind power equipment degradation. This compromises prediction accuracy. Recently, the Transformer variant Informer model reduced computational complexity with ProbSparse self-attention and demonstrated advantages in long-sequence prediction. This model shows great potential in wind power forecasting [7]. However, the Informer model has several limitations: First, input features are mostly concatenated trivially, failing to model the physical synergies among multi-source data (e.g., wind speed and blade angle). Second, the attention mechanism ignores the evolution of wind turbine health status, leading to error accumulation in long-term predictions [8]. Third, the static parameter allocation of the encoder-decoder makes balancing computational efficiency and prediction accuracy difficult. These issues render existing methods inadequate for meeting the high demands of real-time scheduling and intelligent operation and maintenance in wind farms.

To improve prediction performance, researchers optimized Informer in several ways. Liu, F., Wang, X., Gong, M., et al. [911] improved the encoding method to reduce prediction error rates in long-term, multidimensional, strongly correlated sequences. They also strengthened the encoder module’s encoding capability and enhanced Informer’s prediction accuracy. Long, H., Cao, Y., Yang, Z., et al. [1214] introduced an adversarial mechanism to strengthen the decoder module’s ability to resist interference, mitigate the impact of placeholders and autoregressive effects, prevent error accumulation, and improve prediction accuracy. Wu, Bradateanu, and Li [1517] improved the sampling strategy to focus on strongly correlated features, reduce the computational complexity of self-attention, accelerate sparse matrix sampling, and increase Informer’s prediction speed. Currently, most optimization methods fail to enhance both Informer’s prediction accuracy and speed simultaneously. These methods produce unsatisfactory optimization results, increase prediction costs, and hinder Informer’s practical application.

This article is structured into five sections: The first section presents the overall methodological framework. The second section describes the principles of model construction. The third section details the comprehensive optimization scheme and its validation. The fourth section provides case verifications and result analyses. The fifth section offers the conclusion.

1. Methodological framework

The principle of the algorithm is illustrated in Fig 1, comprising the following steps:

  1. (1) Integrate the original dataset to obtain multi-source data,including humidity, yaw angle, blade deflection angle, wind speed, and temperature, from the target wind turbine.
  2. (2) Preprocess the data by removing null values and outliers and normalizing it to meet the requirements of the neural network input.
  3. (3) Apply the Lasso algorithm and the Pearson correlation method to select features from wind turbine O&M data and eliminate irrelevant features.
  4. (4) Train an FNN using the filtered data to derive the functional relationship between power output and operational features.
  5. (5) Calculate the dynamic synergy coefficient k of wind power using the trained FNN and the chain rule.
  6. (6) Conduct a health assessment by combining k with historical power values.
  7. (7) Construct a health matrix, adjust weight allocation, optimize hyperparameters, and train the Informer network.
  8. (8) Optimize the predicted power output, generate test results for a comparative analysis, and draw conclusions.

The principle of the power prediction method is shown in Fig 1.

2. Principles of model construction

2.1. Feature selection via lasso and pearson correlation methods

The Lasso algorithm performs feature selection and model sparsification through L1 regularization. The core principle is to impose an L1 norm constraint on the regression model, which forces some feature coefficients to zero and eliminates redundant features. The penalty function refines the model by compressing or setting to zero the coefficients of less relevant features when multiple features exist. The remaining non-zero coefficients correspond to highly correlated features, thereby achieving feature selection. To mitigate the impact of feature scale disparities on regression performance, the algorithm standardizes the predictors so that they have a mean of 0 and a variance of 1.

The Pearson correlation method measures vector correlation, with outputs ranging from −1 to +1: A value of 0 indicates no correlation, negative values denote negative correlation, and positive values signify positive correlation. [18]The formula is as follows:

(1)

In the formula, the closer the value of i is to 1, the higher the correlation.

Using a combination of the Pearson correlation and Lasso algorithms to analyze wind power data filters out irrelevant features and reduces dimensionality. The results are shown in Figs 23.Table 1 shows the specific values of the stronger correlations found among wind speed (S), blade deflection angle (θ), and power (P).

thumbnail
Fig 2. Three-dimensional scatter plot of wind speed, blade deflection angle, and power.

https://doi.org/10.1371/journal.pone.0330464.g002

thumbnail
Fig 3. Two-dimensional scatter plot of wind speed, blade deflection angle, and power.

https://doi.org/10.1371/journal.pone.0330464.g003

2.2 Construction and training of FNN-based wind power prediction model

With [St, θt] as the input space and [Pt] as the output space, the training, validation, and test sets are divided in the ratio 8:1:1. A 2-500-50-1 feed-forward neural network (FNN) architecture is constructed, adopting the adaptive moment estimation (Adam) optimizer and mean squared error (MSE) loss function for training. An early-stopping strategy is implemented, whereby iteration halts when the loss difference between the training and test sets is less than 5%. The FNN adjusts the model weights using the backpropagation (BP) algorithm.

The FNN model was evaluated as follows: the adjusted determination coefficient on the test set was 0.9921, and the root mean square error (RMSE) was 0.0288.

2.3. Construction and training of FNN prediction model

The data objects—wind speed (S), blade deflection angle (θ), and power (P)—are continuous and cannot undergo abrupt changes. Because continuity implies differentiability, the partial derivative of power with respect to wind speed can be derived using a feed-forward neural network (FNN):

(2)

In the formula, S denotes wind speed.

Direct differentiation of implicit functions is unfeasible, but when applying the BP algorithm, the loss function (L) performs global partial derivation, enabling calculation of and . Assuming L is the mean squared error (MSE) loss function, and sinceis a constant, the chain rule further yields as:

(3)

In the formula, n denotes the data volume, but in this formula, n is set to 1.

Calculate the rate of change of the power generation at this moment with respect to wind speed, and concatenate the power and the rate of change at the corresponding moment into vectors .

Taking the data of the No. 1 wind turbine as an example, the calculation results are shown in Table 2.

thumbnail
Table 2. Sample data of wind speed, blade angle, active power, and derivative rate.

https://doi.org/10.1371/journal.pone.0330464.t002

The derivative rate represents the partial derivative of active power with respect to wind speed (), calculated based on the proposed FNN model.From this, the dynamic coordination coefficient and the changing trend of power P under a certain S can be obtained.

2.4. Informer power prediction model

Informer is a self-attention-based prediction model proposed by Zhou Haoyi et al. in 2021 [18]. Unlike conventional prediction models, it features a separated encoder-decoder structure. During prediction, the encoder and decoder only interact via coding output transmission, with no feedback mechanisms involved in any prediction steps.

In the Informer, there are three key structures:

  1. Self-attention Distilling Operation: The encoder module reduces the length of the output sequence layer by layer to extract strongly correlated features. After feature connection, the encoded outputs are concatenated as inputs to the decoder module.[19]
  2. Generative Style Decoder: Using masked multi-head attention for decoding, this structure enables simultaneous prediction of multiple tokens. Unlike traditional sequential token prediction (where each step’s output is fed back, causing loss accumulation in long sequences), it mitigates error propagation and preserves predictive accuracy..
  3. ProbSparse Self-Attention Mechanism: By focusing on high-score dot products, Informer prioritizes strongly correlated features during self-attention. Sampling only these features reduces computational complexity and prediction cost.

The principle of the Informer model is illustrated in Fig 4.

The existing Informer neural network model exhibits three fundamental limitations:

  1. (1) Frequent Attention Overhead: The Encoder module necessitates secondary dot-product calculations to accentuate strongly correlated features, resulting in a substantial reduction in prediction speed.[20,21].
  2. (2) Decoding Imbalance: The Decoder utilizes sequence placeholders for auxiliary prediction; however, equal weighting of placeholders and feature data results in accuracy degradation due to unreasonable weight distribution.[22,23]
  3. (3) Computational Complexity: Implementing the self-attention mechanism necessitates additional modules or feature dimension expansion, increasing computational complexity and compromising the model’s lightweight design [24,25].

In order to address these constraints and enhance prediction accuracy and speed, the Informer model is proposed to be optimized through multi-source data reconstruction and wind turbine health assessment. By leveraging a health matrix to refine the encoder, decoder, and embedding vector processes, the optimized MFIO-Informer model is developed to balance computational efficiency and predictive performance.

2.5. Principles of health matrix evaluation

Health assessment, a weight – optimization algorithm proposed by Lars Landberg in 2011, transforms a two – dimensional probability density function into a weight – based health matrix. This transformation is carried out according to the statistical probability distribution characteristics of the model output.

  1. (1) Data Collection: Collect operating data of wind turbines, such as wind speed, temperature, pressure, rotational speed, and power, through sensors and SCADA systems.;
  2. (2) Data Cleaning: Clean missing values, abnormal values, and transmission noise in the operating data to enhance data reliability.;
  3. (3) Feature Selection: From the cleaned operating data, select feature data. Then, determine the dimension of the health matrix based on the dimension of the selected feature data;
  4. (4) State Classification: Use the feature data to calculate feature weights. Classify the states of wind turbines (e.g., normal, minor fault, severe fault) according to the magnitude of these weight values [26];

The formula for calculating feature weights is as follows:

(4)

In the formula, i represents the feature index, where i ranges from 1 to n, μ is the mean vector, and Σ is the covariance matrix.

  1. (5) Matrix Generation: The feature weights calculated via Equation (4) are arranged sequentially to form a weight matrix . Each element in Q represents the weight coefficient of the corresponding feature in the same state.Subsequently, factor analysis is applied to the operational data to generate a correlation matrix . Finally, the health matrix A is obtained by multiplying Q and R, expressed as:
(5)

In the formula, n denotes the dimension of the health matrix, typically set to 6–12.

The health assessment workflow is illustrated in Fig 5.

The optimization analysis of health assessment is as follows:

Let the first-layer input of the prediction model be , with a coefficient matrix where the principal diagonal elements are 1 and the remaining elements are within (0, 1). The row and column labels of are sorted to maintain consistent ordering.

During prediction, the coefficient matrix for the i-th layer input can be viewed as the i-th power of :

(6)

The health assessment optimization is realized by multiplying the input with the health matrix A to obtain , i.e.,.The coefficient matrix of , denoted as , is calculated by:

(7)

Here, A reduces the weights of weakly correlated features in . While maintaining diagonal dominance and symmetry, A adjusts the corresponding principal elements of to be less than 1. By using instead of , each forward computation decreases the weights of weakly correlated features once, thereby mitigating their influence and enhancing the model’s prediction performance.

3 Optimization plan

3.1 Encoding optimization

The Encoder module consists of multiple encoding layers and convolutional layers. The input of the i-th layer is and the output is . In each encoding layer, undergoes forward propagation through Multi-Head Self-Attention, followed by dimensionality reduction via a convolutional layer to obtain , expressed as:

(8)

Here, Conv denotes convolutional computation in the convolutional layer, and Attention represents the ProbSparse self-attention function:

(9)

In Equation (9), softmax is the normalization function, is the probability sparse matrix sampled from , is the scaling factor for key-value sampling, and K, , V are key-value adjustment matrices.

All features in initially have equal weights. To emphasize strongly correlated features, the input is weighted as:

(10)

In Equation (8), replacing with = ×A enhances the weights of strongly correlated features, accelerating the sampling and generation of , and improving the Encoder’s encoding speed.

Encoding optimization analysis:

In Equation (9), the i-th query element of isderived via:

(11)

Where denote the i-th row of K, V,respectively.The asymmetric exponential kernel =Exp(/), and sparse metric M(,K) for Top-u query,are defined as:

(12)

In the formula, is the sequence length of matrix K.

From Equations (8) and (9), it is known that the in Equation (10) increases the weights of strongly correlated features and will be input as the input of each layer of the Encoder module. Thus, the queries involved in Equations (8), (9), (11), and (12) only need to perform a single dot product calculation, and the computational complexity of sparse self-attention is reduced to O( ln L), thereby improving the encoding speed and the prediction speed of Informer.

In sparse self-attention, the lengths of queries and keys are both equal to L, leading to a computational complexity of O(L ln L). However, when all feature weights in are equal, the M(,K) query requires a secondary dot-product calculation to enhance strongly correlated features, which decreases the Encoder’s encoding speed.

From Equations (8) and (9), weighting with A (as in Equation (10)) emphasizes strongly correlated features. Using =×A as the Encoder’s input allows thequeries in Equations (8), (9), (11), and (12) to require only single dot-product calculations. This reduces the computational complexity of sparse self-attention to O( ln L), thereby enhancing the Encoder’s encoding speed and Informer’s prediction efficiency.

3.2. Decoding optimization

Let the Decoder module comprise L decoding layers. In each layer, the input and feature data y are processed through multiple decoding sub-layers, expressed as:

(13)

In, the weights of all features are equal. In the consideration of improve , the weights of strongly correlated features, that is,

In the formula, denotes the decoding calculation of the i-th layer, is the input of the first layer, is the input of the i-th layer, and y represents the feature data at the prediction time.

All features in initially have equal weights. To enhance the weights of strongly correlated features in , we define:

(14)

In the formula, h denotes the health factor, derived from the compressive transformation of the health matrix. The output x of each layer, when multiplied by the health factor h, serves as the input for the subsequent layer.

The input to the first layer of the Decoder module can also be expressed as:

(15)

In the formula, denotes the encoded outputs of the Encoder module, and is a sequence placeholder with a scalar value of 0 (corresponding to the all-zero part of the auxiliary input). Here, represents the sequence length of the encoded output, denotes the sequence length of the placeholder, is the feature dimension, and Concat indicates vector concatenation. By setting the placeholder and masking the dot product to −∞, Informer applies masked multi-head attention in the Decoder’s decoding process.

Although the masked dot product prevents autoregression (where each feature focuses on its next timestamp in ), the equal weighting of and causes cumulative errors from to propagate during each decoded sublayer’s forward computation. This error diffusion degrades the Decoder’s decoding accuracy.

The optimization in Equation (13) effectively increases the weight of while reducing that of . As shown in Equations (8) and (9), each forward decoding iteration in Equation (14) mitigates cumulative errors, thereby enhancing the Decoder’s decoding accuracy. The improved decoding accuracy directly boosts Informer’s prediction accuracy after optimization.

3.3. Embedding vector optimization

Embedded vectors serve as the input to the first layer of the Encoder module. As the Encoder module progressively reduces the length of the encoded sequence layer-by-layer to extract strongly correlated features, we enhance the weights of these features through optimization, as shown in Equation (16):

(16)

In the formula, is the unoptimized embedding vector, and represents the optimized embedding vector. After optimization, replaces as the input to the first layer of the Encoder module.

Let be the t-th input sequence, where the global timestamp type of is p, and the feature dimension being . The embedding vector that preserves local context is obtained through fixed-position embedding using the following two formulas:

(17)(18)

In the formulas, POS represents the learnable embedding using global timestamps, with a limited vector length (up to 60, in minute units). To align dimensions, the Informer model projects the scalar context into a -dimensional vector via a 1D convolution filter (kernel width = 3, stride = 1).

Another representation of the embedding vector is as follows:

(19)

In the formula,are the amplitude factors between the balanced scalar projection and the local/global embedding. If the sequence has been normalized,.

The structure of the embedding vector is illustrated in Fig 6.

It is evident that the inherent sampling weights of trigonometric sine and cosine functions, as demonstrated in Equations (14) and (15), result in a divergent weight distribution from the original data. As illustrated in Fig 3, the integration of local and global timestamps cannot achieve a one-to-one correspondence, thereby introducing systematic errors. As demonstrated in Equations (8), (9), (10), (11), and (12), these systematic errors undergo a gradual expansion during the ensuing encoding-decoding process, thereby impacting the prediction accuracy of Informer.

Equation (13) optimizes the embedding vector by accounting for feature distribution changes. According to Equations (8) and (9), systematic errors arising from feature distribution changes gradually diminish in the subsequent encoding-decoding process after optimization, thereby improving Informer’s prediction accuracy.

4. Instance verification

4.1. Error metrics

Three error indicators are employed: R₂ (Coefficient of Determination), MSE (Mean Squared Error), and MAE (Mean Absolute Error), as defined in Equations (20)(22):

(20)(21)(22)

The formulas define as the true value, as the predicted value, and as the true mean. R2 (Coefficient of Determination) measures goodness-of-fit, where values closer to 1 indicate better model consistency with observed data. MSE (Mean Squared Error) and MAE (Mean Absolute Error) quantify prediction dispersion—smaller values signify lower prediction fluctuations.

For effective optimization, ideal scenarios show increasing R2 alongside decreasing MSE and MAE. However, due to varying prediction dispersions across models [24], enhancing overall fit might compromise local fitting at specific points, potentially increasing outlier handling costs [27]. Optimization is still considered effective if:

  • Two metrics improve by ≥10% relative to the baseline, and
  • The third metric decreases by ≤2%.

4.2. Simulation testing

The experimental phase entailed the execution of tests employing two publicly accessible datasets: the Baidu KDD CUP 2022 dataset and the “China Software Cup”—Longyuan Wind Power Track dataset. Each dataset contained between 17,000 and 21,000 pieces of data.

To facilitate a comprehensive comparison of the optimization effects, eight power prediction models were selected for simulation testing and comparison: no DSC optimization, with DSC optimization, no DSC-Encoder optimization, with DSC-Encoder optimization, no DSC-Encoder-Decoder optimization, with DSC-Encoder-Decoder optimization, no DSC-embed optimization, and with DSC-embed optimization. Each model was subjected to three rounds of training and six rounds of training. The initial 80% of the dataset was allocated for training, while the remaining 20% was designated for evaluation. The test platform comprised the following components: 12th Gen Intel(R) Core(TM) i5-12500H 2.50 GHz CPU, 16 GB RAM, 200 GB HDD, and the operating system was Windows 11.

In order to mitigate the impact of units, dimensions, and other variables, the dataset, prediction data, and related materials were standardized according to the following protocol:

(23)

In the formula, represents the unstandardized prediction result, represents the standardized prediction result, represents the variance of the prediction result, and represents the average value of all prediction results.

The component details of the Informer are shown in Table 3.

The prediction results based on the KDD CUP dataset are shown in Figs 710:

thumbnail
Fig 7. Prediction Results of Three Rounds of Training for Unit K-C 1.

https://doi.org/10.1371/journal.pone.0330464.g007

thumbnail
Fig 8. Prediction Results of 6 Rounds of Training for Unit K-C 1.

https://doi.org/10.1371/journal.pone.0330464.g008

thumbnail
Fig 9. Prediction Results of Three Rounds of Training for Unit 5 of K-C Power Plant.

https://doi.org/10.1371/journal.pone.0330464.g009

thumbnail
Fig 10. Prediction Results of 6 Rounds of Training for Unit 5 of K-C Power Plant.

https://doi.org/10.1371/journal.pone.0330464.g010

The prediction results based on the Longyuan Wind Power dataset are shown in Figs 1114.

thumbnail
Fig 11. Prediction Results of Three Rounds of Training for Unit 2 of Longyuan Power Station.

https://doi.org/10.1371/journal.pone.0330464.g011

thumbnail
Fig 12. Prediction Results of 6 Rounds of Training for Unit 2 of Longyuan Power Plant.

https://doi.org/10.1371/journal.pone.0330464.g012

thumbnail
Fig 13. Prediction Results of Three Rounds of Training for Unit 13 of Longyuan Power Station.

https://doi.org/10.1371/journal.pone.0330464.g013

thumbnail
Fig 14. Prediction Results of 6 Rounds of Training for Unit 13 of Longyuan Power Plant.

The prediction indicators of each dataset are presented in Tables 4–7.

https://doi.org/10.1371/journal.pone.0330464.g014

thumbnail
Table 6. Prediction indicators for unit 2 of Longyuan wind farm.

https://doi.org/10.1371/journal.pone.0330464.t006

thumbnail
Table 7. Prediction indicators for unit 13 of Longyuan Wind Farm.

https://doi.org/10.1371/journal.pone.0330464.t007

In Figs 714, V1 in Tables 4–7 denotes the prediction performance without DSC and any optimization, V2 represents the performance with DSC but without optimization, V3 signifies the performance without DSC and with no Encoder optimization, V4 indicates the performance with DSC but without Encoder optimization, V5 shows the performance without DSC but with Encoder-Decoder combination optimization, V6 represents the performance with both DSC and Encoder-Decoder combination optimization, V7 denotes the performance without DSC but with embed optimization, and V8 signifies the performance with both DSC and embed optimization.

As observed from Figs 714, the prediction accuracy of all optimized models except the Encoder-optimized one has improved compared to the unoptimized models. Tables 4–7 reveals that the seven proposed optimized models have achieved at least one of the following optimization effects: a 20% improvement in prediction accuracy or a 54.85% improvement in prediction speed compared to the unoptimized models. Furthermore, the three Encoder-Decoder combination optimized models have simultaneously achieved a 15% improvement in prediction accuracy and a 50% improvement in prediction speed. Additionally, the selected combined optimized models require only half the number of training rounds, and their prediction accuracy has already approached or exceeded that of the unoptimized models. This demonstrates that the Informer model based on multi-variable interaction optimization is reasonable and effective, and the MFIO-Informer model is a high-quality wind power prediction model.

4.3. Comparison of model convergence and optimization effect

The Fig 15 are derived from the prediction indicator data of Unit 5 in the KDD CUP dataset, presenting the variation trends of validation loss and training loss for three models (V1, V6, V8) across 6 training epochs (Epoch). In terms of validation loss (left chart), the V1 model shows significant fluctuations, with values ranging from approximately 0.07 to 0.09, while the V8 model exhibits a relatively stable and continuously decreasing trend, with the minimum value reaching 0.05679, far lower than V1’s minimum of 0.07131. For training loss (right chart), the V8 model also demonstrates a more rapid and steady decline, with the final loss approaching 0.08, notably outperforming V1 and V6. Evidently, the optimized V8 model, compared to the baseline V1 and partially optimized V6, exhibits superior convergence speed and loss control, with the loss reduction effect being particularly prominent, fully verifying the effectiveness of the MFIO – Informer optimization strategy in wind power prediction for Unit 5 of the KDD CUP dataset.

thumbnail
Fig 15. Prediction Results of 6 Rounds of Training for Unit 13 of Longyuan Power Plant.

https://doi.org/10.1371/journal.pone.0330464.g015

5. Conclusion

By leveraging multi-dimensional data for wind turbine health assessment, a multi-dimensional interactive health matrix is derived. Optimization of the encoder, decoder, embedding vectors, and prediction process in Informer accelerates model convergence while enhancing prediction accuracy and speed. Simulation tests demonstrate that the encoder integrated with the optimized Informer model exhibits superior prediction performance. Compared to the traditional Informer, the proposed MFIO-Informer achieves better optimization effects in wind power prediction with lower data costs.

References

  1. 1. Farah S, David A W, Humaira N, Aneela Z, Steffen E. Short-term multi-hour ahead country-wide wind power prediction for Germany using gated recurrent unit deep learning. Renewable and Sustainable Energy Reviews. 2022;167:112700.
  2. 2. Tarek Z, Shams MY, Elshewey AM. Wind power prediction based on machine learning and deep learning models. Computers, Materials & Continua. 2023;75(1).
  3. 3. Li P, Zhou K, Lu X, Yang S. A hybrid deep learning model for short-term PV power forecasting. Applied Energy. 2020;259:114216.
  4. 4. Nadeem MW, Goh HG, Ali A, Hussain M, Khan MA, Ponnusamy VA. Bone Age Assessment Empowered with Deep Learning: A Survey, Open Research Challenges and Future Directions. Diagnostics (Basel). 2020;10(10):781. pmid:33022947
  5. 5. Rajagopalan S, Santoso S. Wind power forecasting and error analysis using the autoregressive moving average modeling. In: 2009 IEEE Power & Energy Society General Meeting, 2009.
  6. 6. Rangapuram SS, Seeger MW, Gasthaus J. Deep state space models for time series forecasting. Advances in Neural Information Processing Systems. 2018;31.
  7. 7. Xu Y, Zhao J, Wan B, Cai J, Wan J. Flood Forecasting Method and Application Based on Informer Model. Water. 2024;16(5):765.
  8. 8. Tepetidis N, Koutsoyiannis D, Iliopoulou T, Dimitriadis P. Investigating the Performance of the Informer Model for Streamflow Forecasting. Water. 2024;16(20):2882.
  9. 9. Liu F, Dong T, Liu Y. An improved informer model for short-term load forecasting by considering periodic property of load profiles. Frontiers in Energy Research. 2022;10:950912.
  10. 10. Wang X, Xia M, Deng W. MSRN-Informer: Time Series Prediction Model Based on Multi-Scale Residual Network. IEEE Access. 2023.
  11. 11. Gong M, Zhao Y, Sun J. Load forecasting of district heating system based on informer. Energy. 2022;253:124179.
  12. 12. Long H, Luo J, Zhang Y, Li S, Xie S, Ma H, et al. Revealing Long-Term Indoor Air Quality Prediction: An Intelligent Informer-Based Approach. Sensors (Basel). 2023;23(18):8003. pmid:37766057
  13. 13. Cao Y, Liu G, Luo D. Multi-timescale photovoltaic power forecasting using an improved stacking ensemble algorithm based LSTM-Informer model. Energy. 2023;283:128669.
  14. 14. Yang Z, Liu L, Li N. Time series forecasting of motor bearing vibration based on informer. Sensors. 2022;22(15):5858.
  15. 15. Wu Z, Pan F, Li D. Prediction of photovoltaic power by the informer model based on convolutional neural network. Sustainability. 2022;14(20):13022.
  16. 16. Bradateanu V, Milan T, McIntyre L, et al. Customizable, Dynamic and On-Demand Database-Informer for Relational Databases: U.S. Patent 8,135,758[P]. 2012.
  17. 17. Li F, Wan Z, Koch T. Improving the accuracy of multi-step prediction of building energy consumption based on EEMD-PSO-Informer and long-time series. Computers and Electrical Engineering. 2023;110:108845.
  18. 18. Nasir IM, Khan MA, Yasmin M, Shah JH, Gabryel M, Scherer R, et al. Pearson Correlation-Based Feature Selection for Document Classification Using Balanced Training. Sensors (Basel). 2020;20(23):6793. pmid:33261136
  19. 19. Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, et al. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. AAAI. 2021;35(12):11106–15.
  20. 20. Li W, Fu H, Han Z. Intelligent tool wear prediction based on informer encoder and stacked bidirectional gated recurrent unit. Robotics and Computer-Integrated Manufacturing. 2022;77:102368.
  21. 21. Zhang Z, Chen P, Xing C. A data augmentation boosted dual informer framework for the performance degradation prediction of aero-engines. IEEE Sensors Journal. 2023.
  22. 22. Wang H, Guo M, Tian L. A Deep Learning Model with Signal Decomposition and Informer Network for Equipment Vibration Trend Prediction. Sensors (Basel). 2023;23(13):5819. pmid:37447674
  23. 23. Qiu M, Ming Z, Li J, Liu J, Quan G, Zhu Y. Informer homed routing fault tolerance mechanism for wireless sensor networks. Journal of Systems Architecture. 2013;59(4–5):260–70.
  24. 24. Zou R, Duan Y, Wang Y, Pang J, Liu F, Sheikh SR. A novel convolutional informer network for deterministic and probabilistic state-of-charge estimation of lithium-ion batteries. Journal of Energy Storage. 2023;57:106298.
  25. 25. Zhao Y, Gong M, Sun J, Han C, Jing L, Li B, et al. A new hybrid optimization prediction strategy based on SH-Informer for district heating system. Energy. 2023;282:129010.
  26. 26. Mallioris P, Teunis G, Lagerweij G, Joosten P, Dewulf J, Wagenaar JA, et al. Biosecurity and antimicrobial use in broiler farms across nine European countries: toward identifying farm-specific options for reducing antimicrobial usage. Epidemiol Infect. 2022;151:e13. pmid:36573356
  27. 27. Al-qaness MAA, Ewees AA, Fan H. Wind power prediction using random vector functional link network with capuchin search algorithm. Ain Shams Engineering Journal. 2023;14(9):102095.