A robust multi-location evaluation of a machine learning framework for wind power forecasting

doi:10.1371/journal.pone.0344971

Fig 1.

Hourly wind speed changes.

More »

Expand

Fig 2.

Electricity generation and distribution from wind turbines to residential homes.

(created using Canva elements, © Canva 2025, used under Canva Content License).

More »

Expand

Table 1.

Challenges and limitations in current wind power forecasting approaches.

More »

Expand

Table 2.

Previous approaches.

More »

Expand

Fig 3.

Research methodology.

(created using Canva elements, © Canva 2025, used under Canva Content License).

More »

Expand

Fig 4.

Research flow chart.

More »

Expand

Table 3.

Meteorological and power data across locations.

More »

Expand

Table 4.

Representative environmental characteristics and forecasting implications across four regional locations.

More »

Expand

Fig 5.

Statistical and feature distributions of wind power datasets across four locations.

Histograms show distributions of wind power output, while feature plots capture temporal variability in key meteorological and power variables, highlighting site-specific heterogeneity.

More »

Expand

Fig 6.

Flowchart of Data Preprocessing.

(created using Canva elements, © Canva 2025, used under Canva Content License).

More »

Expand

Fig 7.

Location-wise boxplot analysis of datasets.

More »

Expand

Table 5.

Summary of all dataset entities.

More »

Expand

Fig 8.

Outliers boxplot with varying wind speed of datasets.

More »

Expand

Fig 9.

Architectural Diagram of Random Forest Regressor Implementation.

(created using Canva elements, © Canva 2025, used under Canva Content License).

More »

Expand

Fig 10.

Architectural Diagram of XGBoost Regressor Implementation.

(created using Canva elements, © Canva 2025, used under Canva Content License).

More »

Expand

Fig 11.

Architectural Diagram of SVL Regressor Implementation.

(created using Canva elements, © Canva 2025, used under Canva Content License).

More »

Expand

Table 6.

Performance metrics for all locations using the XGBoost.

More »

Expand

Table 7.

Performance metrics for all locations using the Random Forest Regressor.

More »

Expand

Fig 12.

Accuracy curves of the XGBoost regressor across multiple wind power prediction datasets.

Training and validation accuracies increase constantly with larger training samples, representing effective learning and generalization.

More »

Expand

Fig 13.

Loss convergence of the XGBoost regressor for different datasets.

Training and validation losses decrease steadily and converge, representing stable optimization and limited overfitting.

More »

Expand

Table 12.

ML model performance assessed by 5-fold cross-validation, Mean ± S.D of R² and MAE are presented for each location.

More »

Expand

Fig 14.

Accuracy graph of the RFR across all location datasets.

Multiple sub-figures presenting the accuracy results of the RFR for each location.

More »

Expand

Fig 15.

Loss graph of the RFR across all location datasets.

Sub-figures showing the loss trends of the RFR for each study location.

More »

Expand

Table 8.

Performance metrics for all locations using the SVR model (polynomial kernel).

More »

Expand

Fig 16.

Accuracy graphs of the SVR model with polynomial kernel across all location datasets.

Sub-figures displaying the prediction accuracy of the polynomial kernel–based SVR model for each study location.

More »

Expand

Fig 17.

Loss graphs of the SVR model with Polynomial Kernel across all locations.

Collection of sub-plots showing the loss values obtained by the Polynomial Kernel–based SVR model for each study location.

More »

Expand

Table 9.

Performance metrics for all locations using the SVR model (Linear kernel).

More »

Expand

Fig 18.

Accuracy graphs of the Linear Kernel–based SVR model for all study locations.

The accuracy trends across the different sites illustrate the model’s capability to capture linear relationships in wind-power patterns.

More »

Expand

Fig 19.

Loss graphs of the Linear Kernel–based SVR model across all locations.

Sub-plots reporting the error progression for the SVR model with a Linear Kernel at each study location.

More »

Expand

Table 10.

Performance metrics for all locations using the SVR model (RBF kernel).

More »

Expand

Fig 20.

Accuracy Graphs of the SVR model using the BRF Kernel across all study locations.

This figure comprises multiple sub-plots comparing actual wind-power values with those predicted by the BRF Kernel–based SVR model.

More »

Expand

Fig 21.

Loss graphs of the BRF Kernel–based SVR model across all locations.

Sub-plots presenting the error distribution and convergence behavior of the SVR model utilizing the BRF Kernel for each study location.

More »

Expand

Fig 22.

Comparative accuracy visualization of all forecasting models.

This figure displays the accuracy outcomes for each machine-learning model across all study locations.

More »

Expand

Fig 23.

Comparative loss analysis of all forecasting models.

Multiple subplots illustrating the error profiles for each machine-learning model across all study locations.

More »

Expand

Table 11.

Model performance comparison across four locations (R²/MAE) using different split ratios. Values are presented as mean ± standard deviation. The mean denotes the overall model performance metric (R² or MAE) calculated on the held-out test set, whereas the standard deviation reflects the dispersion of individual prediction errors across test samples, indicating the internal consistency of predictions within each split ratio.

More »

Expand

Table 13.

Cross-location generalization results of ML models. Values are described as mean ± S.D of R² and MAE when models are trained on one location and evaluated on other three test location.

More »

Expand

Fig 24.

Cross-location generalization capacity of ML models obtained from the tabulated results.

Bars represent mean R² (± SD) when models are trained on one location and tested on various sites, emphasizing the provisional transferability of RFR, XGBoost, and SVR variations through environments.

More »

Expand

Table 14.

Comparison with the existing state of the art.

More »

Expand

Fig 25.

Comparison of R-squared values across diﬀerent locations for all models.

XGBoost and SVR with linear kernel demonstrate superior and consistent performance across all geographical locations.

More »

Expand

Table 15.

Climate-specific identification of ML model performance based on 5-fold cross-validation. Values are reported as mean ± S.D of R², and demographic consequence was assessed using an independent samples t-test.

More »

Expand

Fig 26.

Comparison of Mean Absolute Error (MAE) values across diﬀerent models (average of all locations).

SVR with a linear kernel demonstrates exceptionally low error rates, significantly outperforming other models.

More »

Expand

Fig 27.

Prediction uncertainty of the suggested model using 95% PI.

Shaded bands show residual-based uncertainty bounds, while lines show actual vs. predicted power. High coverage (PICP = 0.93) shows reliable and moderately tight uncertainty measures.

More »

Expand

Fig 28.

Wind speed prediction over 24 hours at Location 1.

XGBoost and SVR with linear kernel closely track the actual values, while RFR shows more variability and less accuracy in following the actual wind patterns.

More »

Expand

Table 16.

Average features importance and description of all locations dataset.

More »

Expand

Fig 29.

Feature importance analysis showing that wind speed at 100 m height is the most signifcant predictor of power output (49.1% importance), followed by dewpoint at 2 m (9.2%) and temperature at 2 m (8.8%).

More »

Expand

Fig 30.

Distribution of prediction errors across diﬀerent models.

SVR with a linear kernel shows the narrowest error distribution centered near zero, indicating the highest precision, followed by XGBoost. RFR and SVR with RBF kernel show wider error distributions.

More »

Expand

Table 17.

Reference ratings for creating algorithm performance radar chart.

More »

Expand

Fig 31.

Radar chart comparing algorithm performance across multiple dimensions.

XGBoost shows balanced performance across all metrics, while SVR Linear excels in accuracy and RFR demonstrates strong robustness.

More »

Expand

Fig 32.

Training convergence showing how diﬀerent algorithms minimize loss over epochs.

SVR with a linear kernel converges fastest with the lowest final loss, indicating superior learning efficiency.

More »

Expand

Fig 33.

Computational efficiency analysis showing training time as a function of dataset size.

SVR with a linear kernel shows the best scalability for large datasets, making it suitable for real-time forecasting applications.

More »

Expand