Figures
Abstract
The integration of renewable energy sources (RESs) introduces significant challenges related to uncertainty and intermittency in power grids. While Artificial Intelligence (AI) offers promising solutions for Virtual Power Plants (VPP) optimization, existing approaches often treat load forecasting, system dispatch, and demand response as loosely coupled components, limiting their ability to holistically manage these deep uncertainties. To address this, we propose a novel AI-enhanced multi-timescale optimization strategy that creates a synergistic, integrated framework. Methodologically, the approach begins with an attention-augmented Bidirectional Long Short-Term Memory (BiLSTM) model that generates high-fidelity spatiotemporal load forecasts, providing crucial spatial-aware inputs often overlooked by traditional models. These enhanced forecasts are then leveraged by a Model Predictive Control (MPC) strategy for more robust and proactive day-ahead and intraday dispatch. Crucially, the framework integrates a dynamic demand response (DDR) mechanism that is directly coupled with real-time MPC outputs, ensuring that load flexibility is mobilized based on immediate system needs rather than static signals alone. Simulations, driven by real-world operational data, confirm that this integrated strategy not only reduces operational costs and improves forecasting accuracy but also establishes a more resilient and adaptive VPP operational paradigm compared to prior AI-based methods.
Citation: Xu G, Yang G, Bao J, Feng H, Zhang F, Zheng H (2026) AI-enhanced multi-timescale optimization strategy for virtual power plants: Advancing losad forecasting and dynamic demand response integration. PLoS One 21(1): e0339606. https://doi.org/10.1371/journal.pone.0339606
Editor: Ijaz Ahmed, King Fahd University of Petroleum & Minerals, PAKISTAN
Received: October 8, 2025; Accepted: December 9, 2025; Published: January 23, 2026
Copyright: © 2026 Xu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files. S1_Figs8-15.xlsx contains the input parameters and profile data underlying the VPP dispatch simulations (Figs 8-15 in S1 File). S2_Figs6-7.xlsx contains the numerical data underlying the forecasting model training loss and error distribution analysis (Figs 6-7 in S1 File). The original raw operational data are subject to third-party restrictions and are not publicly available, but the provided minimal datasets allow for the replication of the study’s findings.
Funding: Grant Information Consistency We have provided the grant information and number in both the “Funding Information” section of the submission system and the cover letter as follows: Project Title: Research on Precise Load Control Technology considering the influence of Active Load Characteristics in a Data-Driven Context Grant Number: kj2024-007.
Competing interests: NO authors have competing interests.
1 Introduction
Energy security and the transition to a low-carbon future are fundamental to the evolution of modern power systems. As countries strive to achieve global carbon neutrality, the power sector is pivotal in supporting these ambitions. The large-scale integration of renewable energy sources (RESs), such as wind and photovoltaic (PV) systems, is essential to these objectives. However, the intermittent and variable nature of RESs has introduced significant uncertainties in the power supply-demand balance, complicating the stability and economic efficiency of power system operations [1,2].
To address these challenges, Virtual Power Plants (VPP)—which aggregate distributed energy resources (DERs), flexible loads, and energy storage systems—have emerged as a strategic solution. By optimizing the integration of RESs and managing system dispatch, VPP play a critical role in enhancing grid stability and operational efficiency, contributing to the realization of smarter, more resilient energy systems [3,4].
Artificial Intelligence (AI) techniques, including deep learning, reinforcement learning, and other advanced machine learning methods, offer promising solutions for managing the complexities posed by high RES penetration [5]. AI enables smarter operations in renewable energy-integrated power systems through improved real-time monitoring, load forecasting, and adaptive control. However, significant gaps remain in the use of AI for key tasks such as load forecasting, VPP scheduling, and dynamic demand response (DDR). However, despite these advancements, a critical gap persists in the literature: the majority of AI-based VPP studies apply sophisticated algorithms to individual sub-problems—such as forecasting, scheduling, or demand response—without creating a methodologically coherent, integrated framework. This siloed approach results in suboptimal performance, as forecasting errors are not effectively mitigated by scheduling, and demand response is not dynamically aligned with real-time operational constraints. The primary contribution of this work is to bridge this gap by introducing a holistic, AI-Enhanced optimization architecture.
Historically, load forecasting has evolved from statistical models like Autoregressive Integrated Moving Average (ARIMA) [6] to deep learning architectures. Early applications of deep learning saw studies proposing Long Short-Term Memory (LSTM) networks, which demonstrated exceptional performance in capturing long-term temporal dependencies in load data [7]. However, these models typically process aggregated load sequences and inherently overlook the critical spatial heterogeneity within VPP operational areas [8]. For instance, load responses to meteorological changes may differ significantly between urban and rural areas—a spatial correlation that purely temporal models fail to capture [9]. Recent advances have attempted to address this issue by incorporating spatial information. For example, some studies have used Graph Neural Networks (GNNs) to model physical grid topologies [10], while others have employed Convolutional Neural Networks (CNNs) to extract features from geographic data [11]. Although these approaches represent significant progress, they often treat spatial and temporal feature extraction as separate stages, potentially failing to capture their dynamic coupled interactions [12]. Additionally, methods focusing solely on uncertainty quantification, such as interval forecasting [13] and quantile regression [14], while improving robustness, do not address the fundamental issue of spatiotemporal feature fusion [15]. This leaves a critical gap for a model capable of comprehensively learning these deeply intertwined spatiotemporal dynamics, which serves as the primary motivation for our proposed attention-enhanced Bidirectional Long Short-Term Memory (BiLSTM) architecture.
In VPP dispatch, the paradigm has shifted from static day-ahead planning to dynamic multi-timescale optimization to better accommodate the volatility of RESs [16]. Model Predictive Control (MPC) has become a cornerstone of this approach, enabling rolling-horizon optimization that utilizes real-time data to correct day-ahead schedules [17]. For instance, proposed multi-timescale frameworks can effectively coordinate different temporal resolutions [18]. However, a key limitation of many such frameworks is their reliance on simple point forecasts or basic uncertainty models [19], which forces the MPC controller to be reactive—primarily correcting errors after they are observed—rather than proactively anticipating them. While some studies have integrated stochastic optimization or robust optimization to handle forecast errors [20], these methods can lead to overly conservative or computationally intensive solutions. The fundamental shortcoming lies in the inability to leverage the rich predictive information latent within high-dimensional spatiotemporal data [21]. Existing MPC applications for VPP [22] are not designed to ingest and act upon the nuanced, geographically-aware forecasts that modern AI can provide. Our work bridges this critical gap by designing an MPC framework explicitly tailored to the spatiotemporal predictions of the BiLSTM model, thereby enabling more proactive and cost-effective dispatch.
DDR is a crucial resource for VPP, but its practical effectiveness is often hindered by disconnection from system operations [23]. Advanced techniques such as Multi-Agent Reinforcement Learning (MARL) [24] and Evolutionary Game Theory [25] have produced sophisticated models of consumer behavior in response to price signals. However, a fundamental limitation of these approaches lies in the nature of the signals themselves. Typically, demand response (DR) programs are triggered by day-ahead price forecasts or predetermined peak periods [26], which are static and cannot adapt to sudden, intraday events such as unexpected drops in PV generation or transmission line constraints. For instance, proposed MARL frameworks optimize bidding strategies based on market prices but lack direct feedback loops from the physical state of VPP assets [27]. This temporal and information gap can lead to suboptimal or even counterproductive load shifts, as system operators are unable to precisely mobilize flexibility where and when it is most needed [28]. This highlights the urgent need for DR mechanisms that are not only price-sensitive but also integrated with dispatch. Our work addresses this issue by proposing a closed-loop DDR strategy, where DR signals are generated directly from the real-time outputs of the daily MPC scheduler, ensuring synergistic alignment between demand-side flexibility and the VPP’s immediate operational needs.
1.1 Contributions and innovations
To address these gaps, this paper proposes a novel multi-timescale VPP optimization strategy that integrates spatiotemporal feature-based load forecasting with dynamic DR. The key innovations of the proposed approach include:
- The introduction of an attention-augmented BiLSTM model establishes a spatiotemporally-aware forecasting approach. Unlike conventional deep learning forecasters that only capture temporal dependencies, this model explicitly fuses spatial features. This provides the VPP scheduler with critical information on regional load heterogeneity, leading to more accurate and geographically nuanced dispatch plans.
- Building upon the enhanced forecasts, a multi-timescale scheduling framework based on MPC is developed. This framework is specifically designed to leverage the rich spatiotemporal information from the forecasting model. This allows for proactive management of RES uncertainties, moving beyond the reactive adjustments common in existing MPC applications for VPP.
- The core methodological advancement is a dispatch-integrated DDR mechanism that closes the loop between VPP dispatch and demand-side management. The proposed dynamic DR strategy is directly informed by the real-time outputs of the MPC scheduler. This overcomes the critical limitation of prior DR models, which often rely on static price signals and fail to align user responses with the immediate operational needs of the VPP, thereby achieving true supply-demand synergy.
The remainder of this paper is structured as follows: Section 2 describes the attention-augmented BiLSTM model for spatiotemporal load forecasting, detailing the network architecture and information flow. Section 3 presents the VPP system model, including mathematical formulations and constraints for DERs such as wind, PV systems, energy storage, and gas turbines. Section 4 introduces the multi-timescale optimization strategy, encompassing day-ahead economic dispatch, intraday rolling optimization, and dynamic DR. Section 5 validates the proposed strategies through simulation, analyzing forecasting accuracy, multi-energy flow balance, and intraday correction effectiveness. Finally, the conclusions and future research directions are presented.
2 Enhanced bidirectional LSTM load spatiotemporal prediction model integrated with attention mechanisms
The dynamic coupling of spatial and temporal features in power load data reveals that: within the same season, sensitivity to temperature variations differs significantly across regions; within the same region, dominant load types and their variation patterns shift seasonally. Accurate prediction requires models capable of simultaneously capturing and integrating such spatiotemporal interactions.
Therefore, this paper proposes an enhanced bidirectional LSTM load forecasting model integrated with attention mechanisms, designed to achieve precise predictions by comprehensively considering spatiotemporal characteristics of the data.
2.1 Bidirectional long short-term memory network
Conventional LSTM neural networks represent a specialized form of Recurrent Neural Network (RNN). By incorporating gating mechanisms to regulate information flow—comprising forget gate, input gate, and output gate—they effectively capture long-term dependencies [29].
The three gating structures modify the cell state by selectively removing or adding information, thereby mitigating gradient vanishing/explosion issues. The initial step operates as follows:
where and
denote the weight matrix and bias term of the forget gate, respectively.
After the forget gate operation, the information stored in LSTM’s internal memory state is jointly determined by the input gate and output gate. The process begins with the input gate defining which new information to update, expressed as follows:
where represents a scaling factor between 0 and 1. When its value is 1, it signifies complete updating of current information; conversely, 0 indicates no updating. Subsequently, the input gate generates a candidate vector through the tanh activation function:
Based on the aforementioned processes, the cell state is updated using the obtained ,
, and
, transitioning from the previous state to the current state as follows:
The final step in LSTM computes the current output using the updated cell state and the output gate’s activation function, expressed as follows:
However, traditional unidirectional neural networks predict power load parameters by processing time-series data in a purely forward direction. This training approach makes inefficient use of the data and fails to fully capture its underlying characteristics. To address this, our paper proposes an attention-based BiLSTM model. Its bidirectional architecture allows information from both past and future hidden states to be recursively processed, enabling a more thorough discovery of the intrinsic relationships between the current load data and data from preceding and subsequent time steps. This ultimately enhances both the model’s prediction accuracy and its data utilization efficiency.
As illustrated in Fig 1, the BiLSTM network, in contrast to a standard LSTM, combines both a forward and a backward recurrent structure. While a conventional LSTM processes data unidirectionally from the past to the future, a BiLSTM introduces an additional, independent data flow from the future to the past. As a result, the BiLSTM is better able to capture the temporal dependencies within the data.
For the BiLSTM network proposed in this paper, the hidden state at each level is synthesized from three components: Forward-propagated hidden state from the previous timestep along the temporal axis . Reverse-propagated hidden state from the subsequent timestep along the temporal axis
. Input vector at the current timestep
.
This combinatorial process of hierarchical hidden states is mathematically repre sented by Equation (7):
where represents computational procedure of standard LSTM.
represents forward hidden state.
represents backward hidden state.
and
represents Weight matrix for forward propagation and backward propagation.
is Bias term for current hidden layer.
2.2 Transformer attention mechanism
Based on the aforementioned BiLSTM architecture, this paper integrates a self-attention mechanism to capture dependencies among different timesteps in the sequence, comprising three core components: Query (Q), Key (K), and Value (V).
where ,
and
denote the query vector, key vector, and value vector respectively;
,
and
represent their corresponding weight matrices.
Building upon this foundation, further introduce an attention score computation mechanism. The calculation formula is defined as follows:
where the normalization function computes attention weights. The calculation formula for attention weights can be determined by the following equation:
This weight primarily indicates the degree of influence from other inputs when computing outputs at different positions. Finally, perform weighted summation on the value vectors using attention weights:
The result of the entire self-attention mechanism can be obtained, which can be described as follows:
To enhance the expressive power of the model, this paper also employs a multi-head attention mechanism. By performing parallel computation of multiple attention heads, each head independently learns distinct representation spaces, thereby capturing different aspects of information within the sequence. Finally, the outputs of all heads are concatenated and linearly transformed to obtain the final attention result.
The overall structure of the Transformer attention mechanism designed in this paper can be shown in Fig 2 below, mainly consisting of multiple encoders and decoders stacked together. Each encoder and decoder layer contains multi-head attention mechanisms and corresponding feedforward neural networks.
Each encoder layer mainly consists of two sub-layers: a multi-head self-attention mechanism for capturing dependencies between different positions in the sequence, and a feedforward neural network for performing non-linear transformations on representations at each position. Each sub-layer is followed by residual connections and layer normalization, which can be described as:
where denotes the multi-head attention mechanism or feedforward neural network.
The decoder layer resembles the encoder layer but incorporates a third sub-layer—a masked multi-head attention mechanism. This layer is crucial as it prevents the model from “looking ahead” at future information in the target sequence during the generation process, thereby ensuring prediction integrity. As with the encoder, each sub-layer is followed by a residual connection and layer normalization.
The feed-forward network (FFN) is a simple, two-layer fully-connected network. It typically employs the Rectified Linear Unit (ReLU) as its activation function, which is defined by the following expression:
where and
denote weight terms, respectively.
The Transformer architecture, lacking recurrence or convolution, has no inherent understanding of sequence order. Therefore, positional encodings must be injected to provide the model with information about the relative or absolute position of each token. These encodings are generated using the following sine and cosine functions:
where denotes the token’s position in the sequence.
2.3 Fused attention mechanism BiLSTM load forecasting model
This paper introduces an Attention-based BiLSTM model for load forecasting, with its overall architecture illustrated in Fig 3. The model is designed to capture the deep, coupled relationships between spatio-temporal features by integrating a BiLSTM network with an attention mechanism.
The model takes two primary inputs: the historical load sequence and a spatio-temporal feature matrix. This feature matrix is constructed to encompass both seasonal and regional dimensions. These inputs are concatenated to form an enhanced feature vector
, which is then fed into the BiLSTM layer for the extraction of high-level spatio-temporal features:
The output of this layer simultaneously incorporates both the dynamic temporal characteristics of load and implicit spatial correlation information. In the spatiotemporal attention fusion layer, the weight distribution is generated through scaled dot-product attention:
then weighted fusion vector:
This process enables focusing on key time points along the temporal dimension, dynamically assigning weights across feature dimensions, and modeling spatiotemporal coupling. The final prediction output layer generates prediction results through a fully connected network:
where is the prediction step size. Model training employs the mean squared error loss function:
The model features a dynamic optimization mechanism, where attention weights adaptively adjust to seasonal patterns and differentially focus on unique regional characteristics. The model is trained end-to-end via backpropagation, with all parameters optimized jointly. To prevent overfitting and enhance generalization in complex spatio-temporal scenarios, an early stopping strategy is employed: training is halted if the validation loss does not improve for 10 consecutive epochs. The complete architecture, depicted in Fig 3, illustrates the entire information flow—from the input layer, through the BiLSTM for feature extraction and the attention mechanism for weighting, to the final prediction output.
3 Multi-energy complementary VPP model
The multi-energy complementary VPP proposed in this study comprises a range of DERs, including wind turbines (WT), PV panels, a micro-gas turbine (GT), a Water Heater Boiler (WHB), a heat pump(HP), a heat storage tank, a battery energy storage system (BESS), and both thermal and electrical loads. The overall system architecture is depicted in Fig 4.
The indicator is the greenhouse gas emissions corresponding to producing one kilowatt-hour of electricity. For substations, carbon emissions originate from two phases: the construction phase and the operational phase. Carbon emissions per kilowatt-hour can be expressed as:
where is the power generation output of the micro GT during time period t
is the thermal power output of the GT during time period t.
denotes the power generation efficiency;
is the thermal efficiency coefficient;
represents the volume of gas consumed by the micro GT for power generation;
is the lower heating value (LHV) of natural gas, typically fixed at 9.7 kWh/m3.
The GT operates as a CHP unit. The high-temperature exhaust gas it produces is captured by WHB to meet the thermal load. The mathematical model for this heat generation process is given by:
where is the thermal power output from the GT,
is the thermal power output from the WHB,
is the P2H ratio, and
is the heat recovery efficiency.
The HP functions as a P2H device, converting electricity into thermal energy to satisfy the microgrid’s heat demand. Its governing mathematical model and constraints are as follows:
where denotes the input electrical power of the HP,
is its coefficient of performance (COP),
is its thermal power output, and
is its maximum thermal power output.
This study employs a BESS. The dynamics of the battery are described by the following mathematical model, which tracks its State of Charge (SOC) over time [30]:
where is the SOC of the battery at the end of time step t (kWh);
is the SOC at the end of the previous time step t-1 (kWh);
is the power of the battery during time step t (kW), with a positive value indicating discharging and a negative value indicating charging;
and
are the charging and discharging efficiencies, respectively;
is the self-discharge rate;
is the duration of the time step.
The operational principle of the thermal storage tank is analogous to that of the BESS. It stores excess thermal energy (charges) when available and releases it (discharges) when direct heat generation is insufficient to meet the thermal load [30]. The operation is subject to the following capacity and power constraints:
where is the amount of thermal energy stored in the HS at time t;
and
are the thermal charging (heat absorption) and discharging (heat release) powers at time t, respectively;
and
are the thermal charging and discharging efficiencies, respectively;
and
are the minimum and maximum storage capacity of the HS, respectively;
and
are the minimum and maximum thermal discharging power, respectively;
and
are the minimum and maximum thermal charging power, respectively.
4 Multi-time-scale optimal dispatch model for a VPP with DDR
This study proposes a two-stage, multi-time-scale optimal dispatch framework based on MPC, which integrates day-ahead scheduling and intra-day rolling optimization. The MPC methodology operates by predicting the system’s future behavior at each control interval, solving a finite-horizon optimization problem over a prediction horizon p, and then implementing only the first element of the resulting control sequence. The architecture of this rolling optimization approach is illustrated in the Fig 5 below.
The day-ahead stage establishes an initial 24-hour dispatch plan with a 1-hour time resolution, based on day-ahead forecasts. Recognizing that the forecast accuracy for renewable generation (wind and solar) and load demand increases with finer time resolutions, a multi-time-scale approach is employed to refine this initial plan. The intra-day stage performs rolling optimization with a shorter 4-hour dispatch horizon and a 15-minute time resolution, continuously correcting the day-ahead schedule to adapt to real-time conditions.
4.1 Day-ahead dispatch stage
The objective function of the day-ahead optimal dispatch model is to minimize the total daily operating cost of the VPP, formulated as:
The total cost comprises four components: the VPP’s operating
and the environmental cost
, the cost of power exchange with the main grid
, and the DR compensation cost
.
- (1). Operating and Degradation Cost
This term includes the operational costs of the GT—encompassing fuel, start-up, and shutdown costs—and the degradation cost of the BESS due to cycling. It is calculated as:
where is the magnitude of the battery’s power flow (charging or discharging) at time t;
is the battery degradation cost coefficient;
is the MT’s fuel cost function;
and
are the start-up and shutdown costs;
and
are binary variables representing the MT’s start-up and shutdown status at time t.
The MT’s fuel cost is typically a quadratic function of its power output:
where a, b, c and d are the fuel cost coefficients.
- (2). Environmental Cost
This cost quantifies the environmental impact of emissions from the MT’s operation:
where is the MT’s power output, and
is the unit emission cost coefficient.
- (3). Grid Interaction Cost
This represents the cost of electricity transactions with the main utility grid:
where is the absolute power exchanged with the main grid, and
is the unit cost of grid power.
- (4). DR Compensation Cost
This is the cost of compensating consumers for participating in the DR program by curtailing their loads:
where and
are the amounts of curtailed electrical and thermal load, respectively, and
and
are their corresponding unit compensation prices.
The day-ahead optimal dispatch is subject to the following operational constraints:
- (1). Power Balance Constraints
At each time interval t, the VPP must maintain a balance between power generation and consumption for both electricity and heat.
where is the electrical loads.
,
and
are the power outputs from wind, PV, and the GT.
represent power purchased from and sold to the main grid.
is the battery’s discharging powers.
is the thermal loads.
- (2). Tie-Line Power Exchange Limits
The power exchanged with the main grid via the tie-line is constrained by its physical capacity:
where and
define the allowable range for power import/export.
- (3). BESS Constraints
Energy storage system operation is governed by power and energy SOC limits:
where and
are the maximum charging/discharging power rat ings.
and
are binary variables to prevent simultaneous charging and discharging.
- (4). Renewable Energy Output Constraints
The power output from wind and solar resources is constrained by the forecasted availability.
where and
represent the maximum available power from the wind farm and PV array at time t, respectively.
and
are the number of installed PV panels and WT. The terms
and
denote the forecasted power generation from a single unit of each technology at time t.
4.2 Intra-day dispatch stage
To address the deviations between the day-ahead dispatch schedule and actual real-time operation—which stem from forecast inaccuracies and unpredictable weather changes—this paper employs an intra-day rolling dispatch strategy. This strategy periodically adjusts the schedule using updated short-term forecasts for load, wind, and solar power.
To maintain the integrity of the day-ahead plan and prevent excessive adjustments, the objective of the intra-day optimization is to minimize the deviations from the pre-established day-ahead schedule. This is achieved by incorporating a penalty term for these adjustments into the objective function. Therefore, the intra-day objective is to minimize the sum of these penalty costs over the rolling dispatch horizon, as formulated in Equation (44):
where and
are the penalty costs associated with the adjustments made to the power outputs of the electrical and thermal units, respectively.
and
In this equation, ,
and
represent the total adjustments made to the power schedules of the battery, microturbine, and grid exchange during the intra-day stage, compared to the original day-ahead plan. These adjustments are weighted by their respective penalty coefficients,
,
, and
. Furthermore,
,
, and
denote the actual intra-day power outputs for the battery, microturbine, and grid interaction at time t.
where, ,
, and
represent the total adjustments to the scheduled thermal output of the heat storage tank, WHB, and HP, comparing the intra-day plan to the day-ahead schedule. These adjustments are penalized using their respective coefficients,
,
, and
. Furthermore,
,
and
denote the intra-day dispatched thermal power for these units at time t.
The constraints for the WT, PV units, battery storage, and grid interaction are the same as in the day-ahead model and are thus omitted. For the shorter intra-day scheduling horizon, however, the microturbine’s ramping constraints are introduced as follows:
where and
represent the upper and lower limits of the microturbine’s ramping rate.
4.3 Dynamic demand response model
DDR refers to the modification of electricity consumption patterns by end-users in response to signals from the supply side, which are typically issued when electricity market prices are high or system reliability is compromised. Upon receiving incentive-based signals—such as notifications of price increases or offers of direct compensation for load reduction—users alter their conventional electricity usage strategies. This leads to a reduction or shift in their electricity load during specific periods. In essence, DR constitutes a beneficial interaction between consumers and the grid.
Different loads respond differently to the same price signals. Price-based DR loads are generally categorized into curtailable loads (CL) and shiftable loads (SL). A curtailable load determines whether to interrupt its consumption by evaluating the change in electricity price before and after the DR event.
The characteristics of DR are commonly described using a price elasticity matrix. The element in the m-th row and n-th column of the elasticity matrix
represents the price elasticity coefficient of the load at time m with respect to the electricity price at time n. The mathematical model is as follows:
In the equation, represents the change in load at time m,
is the original load at time m,
denotes the change in electricity price at time n, and
is the original price at time n.
Consequently, the change in the curtailable load after DR, , is calculated as follows:
where is the original amount of curtailable load at time m.
is the price elasticity matrix for the curtailable load.
is the electricity price at time n.
The principle of shiftable load is that users flexibly adjust their energy consumption timing based on their own needs and real-time electricity/heat price information. This allows loads from high-price periods to be shifted to low- or flat-price periods. The mathematical model for the change in shiftable load after DR is given by:
where is the original amount of shiftable load at time m.
is the price elasticity matrix for the shiftable load.
5 Simulation results and analysis
To validate the effectiveness of the proposed multi-timescale optimization strategy for VPP, which integrates spatiotemporal feature-based load forecasting and DDR. Simulations of the constructed model were conducted using MATLAB R2021a with the CPLEX solver.
5.1 Objects case study data
To validate the effectiveness and practical applicability of the proposed multi-timescale optimization strategy, this study utilizes a comprehensive dataset of actual operational data from a regional multi-energy VPP. This dataset provides a realistic foundation for our simulation-based validation, ensuring that the model is tested against real-world dynamics and uncertainties. The data, spanning a three-year period with a 15-minute temporal resolution, includes historical time-series for electrical and thermal loads, wind and solar power generation, and market electricity prices. While the raw operational data is subject to a confidentiality agreement and cannot be publicly disclosed, the key technical parameters of the VPP components, derived from the real system’s specifications, are detailed in Table 1. The system is configured with a 520 kW wind turbine, a 300 kW photovoltaic array, a 400 kWh energy storage system with a 60 kW charge/discharge rating, a 300 kW micro GT, a 400 kW HP, and a 2000 kWh thermal storage tank. The peak electrical and thermal loads observed in the dataset are 440 kW and 570 kW, respectively. The time-of-use electricity/heat prices and grid tariffs used in the simulation, shown in Tables 2 and 3, are also based on the actual pricing schemes from the corresponding period. The day-ahead and intra-day forecast curves for renewable generation and loads, presented in Figs 6–9, are generated by our proposed model using this historical dataset.
To ensure the reproducibility of our forecasting results, this section provides detailed information regarding the experimental setup for the proposed model. The historical operational data, including load and renewable energy generation profiles, was obtained from a regional VPP operator under a confidentiality agreement, as mentioned in the Data Availability Statement. The dataset spans a three-year period from January 1, 2021, to December 31, 2023, with a 15-minute temporal resolution.
The dataset was chronologically partitioned to prevent data leakage and ensure a robust evaluation of the model’s predictive capabilities. Data from January 1, 2021, to December 31, 2022 (approximately 70% of the total data) was allocated for the training set. Data from January 1, 2023, to June 30, 2023 (15%) served as the validation set, used for hyperparameter tuning and triggering the early stopping mechanism. The remaining data, from July 1, 2023, to December 31, 2023 (15%), was reserved as the final, unseen test set to evaluate the model’s generalization performance. Prior to model training, all input features were normalized to a range of [0, 1] using min-max scaling to improve convergence speed and stability. The key hyperparameters, selected through a combination of grid search and empirical validation on the validation set, are detailed in Table 4.
Fig 6 compares the training convergence of four models: (a) the proposed attention-based BiLSTM, (b) Transformer, (c) LSTM, and (d) RNN. The proposed model (a) rapidly stabilizes within 20 epochs (MSE ≈ 0.015), with validation loss significantly lower than others. Although Transformer (b) converges quickly, its final validation loss (MSE ≈ 0.025) exceeds the proposed method. Traditional LSTM (c) and RNN (d) exhibit pronounced overfitting and higher stabilized loss (MSE > 0.03), demonstrating that integrating attention mechanisms with bidirectional structure enhances learning efficiency and generalization.
Fig 7 analyzes prediction error distributions. The proposed model (a) exhibits the most compact error concentration near zero (±0.01 range, peak density 0.04), indicating high accuracy and stability. Transformer (b) shows a sharp primary peak but a secondary peak at ±0.02, revealing sensitivity to outliers. LSTM (c) and RNN (d) display broad error spreads (±0.03) with lower density peaks (≤0.03), confirming that spatiotemporal feature fusion effectively controls prediction deviations.
Fig 8 displays electrical load forecasting curves, capturing spatiotemporal characteristics across seasons and regions. Fig 9 presents thermal load forecasting curves, reflecting periodic fluctuations in heating systems. Together, they provide the data foundation for multi-timescale scheduling.
5.2 Day-ahead optimal dispatch results analysis
Building upon the high-accuracy load forecasting data and incorporating the DDR model, this section presents an analysis of the day-ahead optimal dispatch results for the VPP.
- 1. Operational strategy analysis of energy storage systems
To further explore the role of energy storage systems in optimal scheduling, Figs 10, and 11 are jointly presented to illustrate the dynamic charging/discharging power and SOC variations of electrical and heat storage systems.
The operation behavior of the energy storage system shows a strong negative correlation with the energy prices in Fig 10, reflecting the classic economic arbitrage mode of “store low and discharge high”. As shown in Figs 10–12, the battery charges during nighttime electricity price valley periods (00:00–07:00) and discharges during afternoon and evening peak price periods. The SOC steadily increases during valley price periods and decreases during peak price periods, achieving the value of electricity time-shifting. The heat storage tank stores heat during early morning hours with low electricity prices and releases heat during high heat price periods in the afternoon and evening. Through heat storage configuration, the system effectively decouples the timing of “heat production” and “heat consumption”, improves the utilization of HP during low electricity price periods, and reduces overall heating costs.
Fig 13 shows the electrical and thermal load curves before and after integrated demand response, respectively. The figures indicate that before demand response, the load curve has large peak-valley differences, while after demand response, the load curve becomes flatter. This is because after introducing the demand response mechanism, time-of-use electricity prices guide users to change electricity consumption habits and actively adjust usage periods, while also providing subsidies for load transfer and load interruption users, further encouraging participation in load adjustment.
(a) Electric load curve before and after DR, (b) Heat load curve before and after DR.
Taking electrical load as an example, combined with time-of-use price information: during 10:00–12:00 and 18:00–21:00, the load reaches its peak while electricity prices are also at peak levels. Under the dual effects of price guidance and compensation incentives, some loads are interrupted during this period, while transferable loads are shifted to 01:00–09:00, 13:00–17:00, and 22:00–24:00. After implementing demand response, the electrical load achieves “peak shaving and valley filling”, reducing the peak-valley difference by 18.5% and effectively smoothing load fluctuations. The demand response process for thermal load is similar to that for electrical load.
Fig 14(a) illustrates the power balancing results for electrical load after demand response. During 01:00–07:00, wind power primarily supplies the electrical load. Due to low electricity prices, the VPP purchases power from the main grid to charge batteries and converts electrical power to thermal power via HP to meet thermal load demand. From 08:00–21:00, PV power output is substantial, and electricity selling prices are high. After meeting load demand, the VPP sells surplus power to the main grid for additional revenue. Energy storage systems also discharge for arbitrage during this period. Moreover, due to high electricity prices, HP reduce electric-to-heat conversion power. During 22:00–24:00, electricity prices are low. To maintain consistent SOC at the start and end of the scheduling cycle, the VPP purchases power from the main grid to charge batteries and supplies heat to thermal loads via HP. Throughout the scheduling period, renewable generation dominates, with other dispatchable sources providing auxiliary regulation. GT contribute minimal generation because their operating costs exceed main grid purchase prices during valley price periods. During flat or peak price periods, abundant wind and solar power prioritize supply to loads, resulting in low gas turbine output.
(a) Electric load balance, (b) Heat load balance.
Post demand response, the flattened load curve leads to smoother output profiles for DERs. Increased valley-period loads and reduced peak-period loads result in higher electricity purchases from the main grid during off-peak hours. Reduced peak loads during 08:00–21:00 enable greater electricity sales to the main grid, achieving arbitrage, increasing revenue, and enhancing the economic operation of the multi-energy VPP.
Fig 14(b) displays the power balancing results for thermal load after demand response. HP dominate the heat supply, while waste heat boilers and heat storage tanks contribute minimally, influenced by time-varying electricity/heat prices and equipment efficiency. During 01:00–10:00, HP operate at high capacity to meet most thermal load demand. Low electricity prices and adequate wind power supply allow bulk electricity purchases from the main grid for heating, simultaneously charging the heat storage tank. From 11:00–22:00, substantial electricity is sold to the grid for arbitrage, shifting thermal supply primarily to the heat storage tank and waste heat boilers. During 23:00–24:00, relatively low electricity prices prompt power purchases for HP operation and tank charging to maintain consistent stored heat levels.
5.3 Analysis of intraday rolling optimization results
Fig 15 compares the 15-minute intraday rolling scheduling results with the day-ahead hourly plan, showing the power adjustments of key equipment in the VPP during the intraday stage.
(a) Day-ahead vs intra-day VPP power exchange, (b) Day-ahead vs intra-day battery power, (c) Day-ahead vs intra-day HP power, (d) Day-ahead vs intra-day heat storage power.
The comparison clearly indicates that the To overcome this fragmentation, tactual intraday scheduling strictly follows the overall trend of the day-ahead plan, attributed to penalty terms for deviations included in the intraday optimization objective function. However, within each rolling time domain, intraday scheduling dynamically fine-tunes operations based on more accurate 15-minute forecast data. If actual photovoltaic output exceeds the day-ahead forecast at any moment, the intraday scheduling may increase battery charging power or reduce grid purchases to utilize unexpected renewable generation in real time. Conversely, if actual load exceeds forecasts, the system responds by increasing energy storage discharge or grid power purchases.
5.4 Comprehensive performance evaluation
To quantitatively substantiate the claims made in this paper and provide the comparative benchmarks suggested, a comprehensive performance evaluation was conducted. This analysis deconstructs the proposed framework into its constituent components to rigorously assess the distinct and synergistic contributions of the advanced forecasting model and the dynamic demand response mechanism. For this purpose, three distinct operational scenarios were designed and simulated. To account for the stochasticity in model training, each scenario’s forecasting model was trained and evaluated 10 times with different random initializations. The key performance indicators (KPIs) derived from the dispatch simulations, averaged over the 10 runs, are systematically presented in Table 5. The first, designated as the baseline scenario (Case 1), utilizes a conventional LSTM model for load forecasting without the implementation of DR, representing a standard VPP operational approach. The second scenario (Case 2) integrates the proposed attention-augmented BiLSTM model for enhanced spatiotemporal load forecasting but continues to operate without the DR mechanism, thereby isolating the impact of forecasting accuracy. Finally, Case 3 represents the full implementation of our proposed integrated strategy, combining the high-precision forecasting model with the DDR framework.
These metrics include forecasting accuracy, quantified by the Mean Absolute Percentage Error (MAPE), the effectiveness of load regulation, measured by the peak-valley load difference, and the overall economic efficiency, represented by the total daily operating cost and the associated cost reduction relative to the baseline. Furthermore, the technical benefit in terms of renewable energy integration is assessed through the renewable energy curtailment rate.
The results tabulated above provide clear, quantitative evidence of the proposed strategy’s superiority. A direct comparison between Case 1 and Case 2 elucidates the significant impact of the advanced forecasting model alone. By reducing the load forecasting MAPE from 5.8% to 2.5%, the VPP can achieve a more precise and proactive dispatch, resulting in a 7.3% reduction in total operating costs and a notable decrease in renewable energy curtailment. Subsequently, the introduction of the DDR mechanism, as demonstrated by the transition from Case 2 to Case 3, yields further profound benefits. The DR strategy effectively reshapes the load profile, achieving an 18.5% reduction in the peak-valley difference, which facilitates more economical energy arbitrage and better utilization of intermittent renewables. This culminates in the integrated strategy (Case 3) achieving a total operational cost reduction of 13.9% and halving the renewable curtailment rate compared to the baseline. This analysis confirms that a powerful synergistic effect emerges from the integration of high-fidelity spatiotemporal forecasting and adaptive demand-side management, validating the proposed framework as a more scientifically rigorous and economically efficient solution for modern VPP operation.
6 Conclusion
This study presented a comprehensive multi-timescale optimization strategy for VPP that leveraged AI-driven spatiotemporal load forecasting and DDR to enhance system operation. A novel attention-augmented BiLSTM model was developed for load prediction, which improved forecasting accuracy by fusing spatial and temporal features and provided a robust data foundation for VPP scheduling. The proposed MPC-based multi-timescale scheduling framework successfully adapted to RES uncertainties by optimizing both day-ahead and intra-day operations. Furthermore, the dispatch-integrated DDR strategy effectively miti-gated peak-valley load imbalances through real-time, closed-loop adjustments. The proposed framework’s effectiveness was validated through extensive simulations using a year-long set of actual operational data, which demonstrated its superiority over conventional approaches in reducing operational costs by 13.9%, improving load regulation, and enhancing overall grid stability. Our findings confirm that this integrated strategy offers a scientifically rigorous and practically viable solution for managing DERs in grids with high RES penetration.
While the proposed framework demonstrates significant advantages, future work can extend its capabilities in several key directions. One promising avenue is the integration of advanced AI control methods, such as deep Reinforcement Learning (RL), with the MPC framework. An RL agent could learn optimal control policies over time, dynamically adjusting MPC parameters to better handle non-stationary uncertainties that are difficult to model explicitly. In parallel, the demand response modeling could be significantly enhanced by moving beyond the current price-elasticity model to develop data-driven, agent-based models that capture the heterogeneous and uncertain behaviors of individual consumers, thereby allowing for more precise mobilization of demand-side flexibility. To further bolster the framework’s resilience, the deterministic optimization could be extended to a stochastic or robust formulation. By incorporating probabilistic forecasts directly from the AI model, the VPP scheduling could explicitly manage the risk associated with forecast errors. Finally, addressing the practical challenges of scalability and real-world validation is crucial for deployment. This includes investigating decentralized optimization algorithms to manage large-scale VPP and validating the entire strategy on a hardware-in-the-loop platform or in a pilot project to assess its performance against constraints like communication delays and imperfect data.
References
- 1. Musiqi D, Kastrati V, Bosisio A, Berizzi A. Deep Neural Network-Based Autonomous Voltage Control for Power Distribution Networks with DGs and EVs. Applied Sciences. 2023;13(23):12690.
- 2. Li K, Fan H, Yao P. Estimating carbon emissions from thermal power plants based on thermal characteristics. International Journal of Applied Earth Observation and Geoinformation. 2024;128:103768.
- 3. Yang J. Transaction decision optimization of new electricity market based on virtual power plant participation and Stackelberg game. PLoS One. 2023;18(4):e0284030. pmid:37079540
- 4. Wang W, Kong Z, He Y, Li C, Jia K. Research on the collaborative operation strategy of shared energy storage and virtual power plant based on double layer optimization. Journal of Energy Storage. 2024;101:113997.
- 5. Ren C, An N, Wang J, Li L, Hu B, Shang D. Optimal parameters selection for BP neural network based on particle swarm optimization: A case study of wind speed forecasting. Knowledge-Based Systems. 2014;56:226–39.
- 6. Singh SN, Mohapatra AAasim, . Repeated wavelet transform based ARIMA model for very short-term wind speed forecasting. Renewable Energy. 2019;136:758–68.
- 7. Kong W, Dong ZY, Jia Y, Hill DJ, Xu Y, Zhang Y. Short-Term Residential Load Forecasting Based on LSTM Recurrent Neural Network. IEEE Trans Smart Grid. 2019;10(1):841–51.
- 8. Wu H, Xu Z. Multi-Energy Load Forecasting in Integrated Energy Systems: A Spatial-Temporal Adaptive Personalized Federated Learning Approach. IEEE Trans Ind Inf. 2024;20(10):12262–74.
- 9. Zhao P, Hu W, Cao D, Huang R, Bai X, Huang Q, et al. A Novel Spatiotemporal Pyramidal Graph Modeling Approach for Short-Term Residential Load Forecasting. IEEE Trans Ind Inf. 2025;21(9):7153–64.
- 10. Guo Y, Li Y, Qiao X, Zhang Z, Zhou W, Mei Y, et al. BiLSTM Multitask Learning-Based Combined Load Forecasting Considering the Loads Coupling Relationship for Multienergy System. IEEE Trans Smart Grid. 2022;13(5):3481–92.
- 11. Shi H, Xu M, Li R. Deep Learning for Household Load Forecasting—A Novel Pooling Deep RNN. IEEE Trans Smart Grid. 2018;9(5):5271–80.
- 12. Zhao P, Hu W, Cao D, Zhang Z, Huang Y, Dai L, et al. Probabilistic Multienergy Load Forecasting Based on Hybrid Attention-Enabled Transformer Network and Gaussian Process-Aided Residual Learning. IEEE Trans Ind Inf. 2024;20(6):8379–93.
- 13. Wang K, Wang J, Zeng B, Lu H. An integrated power load point-interval forecasting system based on information entropy and multi-objective optimization. Applied Energy. 2022;314:118938.
- 14. Yang Y, Li S, Li W, Qu M. Power load probability density forecasting using Gaussian process quantile regression. Applied Energy. 2018;213:499–509.
- 15. Gao D, Yang W, Li P, Liu S, Liu T, Wang M, et al. A multiscale feature fusion network based on attention mechanism for motor imagery EEG decoding. Applied Soft Computing. 2024;151:111129.
- 16. Tang J, Liu J, Sun T, Kang H, Hao X. Multi-Time-Scale Optimal Scheduling of Integrated Energy System Considering Demand Response. IEEE Access. 2023;11:135891–904.
- 17. Shorinwa O, Schwager M. Distributed Model Predictive Control via Separable Optimization in Multiagent Networks. IEEE Trans Automat Contr. 2024;69(1):230–45.
- 18. He Y, Li Z, Zhang J, Shi G, Cao W. Day-ahead and intraday multi-time scale microgrid scheduling based on light robustness and MPC. International Journal of Electrical Power & Energy Systems. 2023;144:108546.
- 19. Yang S, Wan MP, Chen W, Ng BF, Dubey S. Experiment study of machine-learning-based approximate model predictive control for energy-efficient building control. Applied Energy. 2021;288:116648.
- 20. Luna AC, Diaz NL, Graells M, Vasquez JC, Guerrero JM. Mixed-Integer-Linear-Programming-Based Energy Management System for Hybrid PV-Wind-Battery Microgrids: Modeling, Design, and Experimental Verification. IEEE Trans Power Electron. 2017;32(4):2769–83.
- 21. Chen H, Luo H, Huang B, Jiang B, Kaynak O. Data-driven designs of observers and controllers via solving model matching problems. Automatica. 2023;156:111196.
- 22. Shen W, Zeng B, Zeng M. Multi-timescale rolling optimization dispatch method for integrated energy system with hybrid energy storage system. Energy. 2023;283:129006.
- 23. Duan C, Bharati G, Chakraborty P, Chen B, Nishikawa T, Motter AE. Practical Challenges in Real-Time Demand Response. IEEE Trans Smart Grid. 2021;12(5):4573–6.
- 24. Ye M, Tianqing C, Wenhui F. A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning. J of Syst Eng Electron. 2021;32(3):642–57.
- 25. Zhu Z, Chan KW, Bu S, Or SW, Gao X, Xia S. Analysis of Evolutionary Dynamics for Bidding Strategy Driven by Multi-Agent Reinforcement Learning. IEEE Trans Power Syst. 2021;36(6):5975–8.
- 26. Gholamnia M, Eslamirad N, Sajadi P, Masoumi S, Shahabi H, Pilla F. Dynamic electricity pricing model with hourly and monthly adjustments: A time series-based approach. Energy Reports. 2025;13:5238–51.
- 27. Zhang K, Xie Y, Liu N, Chen S. Customized Mean Field Game Method of Virtual Power Plant for Real-Time Peak Regulation. IEEE Trans Sustain Energy. 2025;16(2):1453–66.
- 28. Zhang H, Liao K, Yang J, Yin Z, He Z. Long-Term and Short-Term Coordinated Scheduling for Wind-PV-Hydro-Storage Hybrid Energy System Based on Deep Reinforcement Learning. IEEE Trans Sustain Energy. 2025;16(3):1697–710.
- 29. Zhang Y, Song Y, Wei G. A feature-enhanced long short-term memory network combined with residual-driven ν support vector regression for financial market prediction. Engineering Applications of Artificial Intelligence. 2023;118:105663.
- 30. Rosewater D, Ferreira S, Schoenwald D, Hawkins J, Santoso S. Battery Energy Storage State-of-Charge Forecasting: Models, Optimization, and Accuracy. IEEE Trans Smart Grid. 2019;10(3):2453–62.