Figures
Abstract
Accurate prediction of chiller energy consumption is crucial for reducing building energy consumption. In this study, an innovative dual-branch network architecture DNTB (A Dual-Branch Network Model Based on Transformer and Bi-LSTM for Energy Consumption Prediction in Building Chiller Systems) was proposed to address the problems of insufficient long-term dependency modeling and noise sensitivity in current prediction models. The research goal is to develop a prediction model that can simultaneously process temporal features and global dependencies. The basic principle is to utilize the complementary characteristics of Transformer and Bi-LSTM. Transformer is sensitive to data noise and Bi-LSTM is weak in capturing long-term sequence information. It can better capture the temporal information of chiller energy consumption data and well model the relationship between variables such as chilled water, building load, chiller temperature, humidity, dew point and chiller energy consumption. In order to prove the effectiveness and generalization ability of the model, experiments were carried out on long-term and short-term tasks of chiller energy consumption prediction. The long-term prediction results had MSE (mean absolute error) of 0.0051, RMSE (mean square error) of 0.0605, and R2 (coefficient of determination) of 0.8031. The short-term prediction results had MSE of 0.0080, RMSE of 0.0738, and R2 of 0.6717. The experimental results indicate that DNTB performs excellently in both long-term and short-term chiller energy consumption prediction, making it a robust framework for chiller energy consumption prediction. The introduction of DNTB enriches the diversity of empirical model algorithms.
Citation: Chen A, Wu M, Chen C, Chen C, Huang Y, Lv X (2025) DNTB: Dual-branch network model based on transformer and Bi-LSTM for energy consumption prediction in building chiller systems. PLoS One 20(10): e0330187. https://doi.org/10.1371/journal.pone.0330187
Editor: Zhengmao Li, Aalto University, FINLAND
Received: February 10, 2025; Accepted: July 18, 2025; Published: October 3, 2025
Copyright: © 2025 Chen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All dataset files are available from the https://www.kaggle.com/chillerenergy.
Funding: This research was partially funded by the Xinjiang Uyghur Autonomous Region Graduate Research Innovation Project (Grant No. XJ2025G085) to Wu Mingtao (W MT), and partially funded by the Xinjiang Uyghur Autonomous Region Youth Science and Technology Talent Project (Grant No. 2022TSYCCX0060) to Lv Xiaoyi (L XY). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the paper.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
In recent years, global warming and the greenhouse effect have become significant challenges worldwide [1]. To address the urgent need to combat climate change, governments and international organizations have proposed carbon neutrality goals [2,3].
In the building sector, reducing energy consumption and carbon emissions is critical to achieving carbon neutrality. With the growing demand for air conditioning and cooling systems, the energy consumption of Heating, Ventilation, and Air Conditioning (HVAC) systems, particularly chillers, has become a significant component of total building energy consumption. Chillers consume substantial amounts of energy; therefore, optimizing chiller energy usage is a vital measure to reduce building electricity consumption. Accurate chiller energy consumption predictions can assist building managers in making efficient and informed decisions to reduce energy waste [4]. However, chillers exhibit characteristics such as lag effects, nonlinearity, and strong coupling, which make energy optimization costly. These factors pose significant challenges to effectively predicting and optimizing chiller energy consumption in practical applications. Consequently, establishing an accurate prediction model for chiller energy consumption under various conditions is key to achieving energy savings in buildings.
In the early stages of energy prediction research, traditional statistical and physical models were widely used [5,6]. These studies primarily relied on empirical formulas and energy consumption models. After 2000, with advancements in computational power, machine learning was introduced into various energy prediction fields. For instance, Fang et al. (2016) employed a multivariate regression model to forecast energy consumption in district heating systems [7]. Support Vector Regression (SVR) effectively addressed the nonlinearity in chiller energy prediction by constructing hyperplanes in high-dimensional space to model the relationship between energy consumption and environmental factors. Similarly, Becker et al. (2017) applied K-Nearest Neighbors (KNN) and Random Forest algorithms to predict wind energy consumption in power systems [8]. KNN, a simple and intuitive non-parametric method, predicted future chiller energy consumption based on the similarity of historical data, while Random Forest used tree-structured models to capture complex energy consumption patterns and model nonlinear relationships between features.
Post-2010, with the advent of big data and further improvements in computational power, deep neural network methods became widely applied to energy consumption prediction. Artificial Neural Networks (ANNs), inspired by human biological neural networks, are nonlinear statistical methods capable of predicting various forms of building energy consumption, including total energy use, heating and cooling loads, and electricity consumption. Deep learning has made significant progress in the field of time series prediction. Many studies [9–15] have shown that the hybrid neural network architecture has demonstrated excellent performance in complex system modeling such as stock price prediction (Zhou & Wu, 2024), oil and gas engineering parameter prediction (Alakbari et al., 2021, 2023), and rock mechanics analysis (Alakbari et al., 2024), which provides important theoretical support for the construction of a dual-branch prediction model in this study. Two primary categories of ANN methods—backpropagation neural networks and generalized regression neural networks—have also been applied to energy consumption prediction [16,17]. However, despite these algorithms laying a foundation for chiller energy consumption prediction, traditional machine learning or deep learning models have certain limitations, particularly when dealing with complex dynamic systems and large datasets. These algorithms often suffer from strong dependencies on feature engineering, insufficient temporal modeling capabilities, and poor generalization performance.
In recent years, artificial intelligence combined with big data analytics has increasingly gained attention in building energy prediction [18–22]. Research in this field spans areas such as chiller energy consumption prediction [23–27], fault detection in building chillers [26], and predictive control optimization [4,28–31]. Traditional machine learning and deep learning approaches no longer suffice for building energy consumption predictions. Long Short-Term Memory (LSTM) networks, designed to address the long-distance dependency problem in traditional Recurrent Neural Networks (RNNs), utilize memory cells and three gating mechanisms (input, forget, and output gates) to selectively retain or discard information [32]. Bi-Directional LSTM (Bi-LSTM), an extension of LSTM, processes sequential data from both forward and backward directions, capturing contextual information. It is widely applied in sequence prediction and time series analysis tasks [33]. Bi-LSTM can effectively model the complex nonlinear relationships in energy consumption prediction, making it suitable for real-time and non-real-time forecasting. Numerous studies have demonstrated the advantages of LSTM in HVAC energy consumption prediction. For instance, F. Mtibaa et al. proposed LSTM-MISO and LSTM-MIMO models, which were evaluated using real-world case studies of buildings employing variable air volume (VAV) and constant air volume (CAV) systems. Results showed that the LSTM model outperformed multilayer perceptron models, reducing prediction errors by 50% [34]. Similarly, MJ Ellis et al. developed an encoder-decoder LSTM-based EMPC framework, which pre-cooled building thermal zones and reduced operational costs compared to maximum temperature maintenance approaches [35]. Transformer models, based on attention mechanisms, represent another milestone in deep learning. Initially introduced by Vaswani et al. in 2017 for natural language processing (NLP) tasks [36], Transformers leverage self-attention mechanisms and encoder-decoder architectures to process sequential data efficiently while addressing the long-distance dependency problems inherent in traditional RNNs. With strong temporal modeling capabilities, parallel computation efficiency, and the ability to capture complex dependencies, Transformers have become powerful tools in energy prediction, especially for tasks requiring multi-feature and long-sequence dependencies in chiller energy consumption prediction. For example, LIM B et al. achieved remarkable performance using Transformers for forecasting in power load, traffic, retail, and stock domains [37]. Similarly, Long Li et al. proposed a Transformer-based model for building cooling load prediction, improving load forecasting accuracy [38]. Despite their potential, Transformer and Bi-LSTM models have limitations when applied to chiller energy consumption prediction. Bi-LSTM struggles to capture long-term dependencies, such as seasonal factors affecting energy consumption. Additionally, Bi-LSTM may face training difficulties due to gradient vanishing or exploding issues. On the other hand, Transformers are sensitive to data quality and noise, which can negatively impact performance when dealing with noisy or anomalous data. The theoretical basis of this study lies in two points. First, the self-attention mechanism of Transformer can effectively model the global dependency between variables, but it requires high-quality data support. Second, the bidirectional time series processing capability of Bi-LSTM is good at capturing local time patterns, but its ability to model long-term trends is limited.
Given these research advancements and the limitations of existing models, this study develops a dual-branch network based on Transformer and Bi-LSTM (DNTB) for efficient prediction of chiller energy consumption in building energy systems. The main innovations of this study include:
- Proposing a dual-branch structure that leverages the complementary advantages of Transformer and Bi-LSTM in capturing long-term and short-term dependencies in sequential data, enabling accurate chiller energy consumption prediction.
- Introducing a decision-level fusion module that integrates the complementary information from the two branches through averaging their output representations to derive the final prediction.
- This study proposes a deep learning-based model named DNTB (Dual-branch Network based on Transformer and Bi-LSTM) that achieves superior performance in long-term and short-term chiller energy consumption prediction tasks.
2. Method
The Dual-branch Network based on Transformer and Bi-LSTM (DNTB) is an innovative and efficient framework for predicting chiller energy consumption. It boasts high accuracy and robust generalization capability. The framework is illustrated in Fig 1. First, sensor data collected from the chiller system is input into the framework. The raw chiller data is processed through two branches: a Transformer branch and a Bi-LSTM branch. The Transformer branch extracts important features and sequential information from the raw data and outputs corresponding representations. Simultaneously, the Bi-LSTM branch captures the dynamic temporal features of the chiller’s time series data from both forward and backward directions, producing sequence-aware representations. Finally, these representations, which contain extracted sequential information and high-level features, are input into a decision-level fusion module. After being passed through individual fully connected layers, the outputs from the two branches are averaged to yield the final prediction. The detailed structure of DNTB is explained below.
2.1. Transformer branch
In the standard Transformer architecture, there are two main components: the encoder and the decoder [36]. The encoder processes input sequences and transforms them into hidden state vectors, extracting features and contextual information to generate high-dimensional representations of the input data. The decoder typically generates output sequences (e.g., translating from one language to another). However, in chiller energy consumption prediction tasks, the goal is not to generate an output sequence but to predict a single value. Thus, the proposed model employs only the encoder component of the Transformer to produce sequence-aware representations.
Initially, sequential input data is mapped to vectors of a specified dimension and fed into the Transformer’s encoder. The encoder comprises four components: a multi-head attention mechanism, a normalization layer, a feedforward neural network layer, and another normalization layer. These components are detailed below:
Multi-head attention mechanism.
This component is composed of multiple attention layers. The attention mechanism (AM), a core concept in the Transformer, mimics how the human brain focuses on specific information. It scans the target region quickly, identifying relevant features. The attention mechanism is based on scaled dot-product attention, which maps a query (Q) and a set of key (K) and value (V) pairs to an output. The data received are either the input or output of the previous encoder, and weight matrices are applied to compute Q, K, and V. The computation is as follows:
where ,
, and
represent the weight matrices used to calculate Query, Key, and Value, respectively.
is the input variable. Q, K, and V represent the query, key, and value matrices, respectively.
is a scaling factor to adjust the attention scores. The multi-head attention mechanism stacks several attention layers, with each layer capturing features from different subspaces. This enables the model to extract more comprehensive features compared to single-head attention.
Add & normalize layer.
After extracting temporal features via multi-head attention, the output is normalized to enhance the model’s robustness. The normalization formula is given by:
Feedforward Neural Network (FFN).
The FFN consists of two linear transformations with a ReLU activation function, which enhances the model’s non-linear fitting capability. The FFN transforms the input vector before passing it to the next module:
Here, ,
,
, and
represent the weights and biases of the FFN layers.
2.2. Bi-LSTM branch
Recurrent Neural Networks (RNNs) are designed specifically for sequence and time-dependent predictions. LSTM (Long Short-Term Memory) networks address the issue of long-term dependency in traditional RNNs [39]. LSTM has shown superior performance in chiller energy consumption prediction compared to traditional models like linear regression and support vector machines. LSTM can capture temporal dependencies, handle nonlinearity, and reduce the interference of noise, making it particularly suitable for short-term energy prediction tasks.
The core components of LSTM include a memory cell and three gating mechanisms: the forget gate, input gate, and output gate. These mechanisms control the selective retention and forgetting of information. The following describes their operations:
Forget Gate: Determines which information to discard from the cell state:
Input Gate: Decides which information to update in the cell state:
Output Gate: Determines the output based on the updated cell state:
State Updates: The cell state and hidden state are updated as follows:
Among them, σ() is the sigmoid function, ,
,
are weight matrices used to update the states of the forget gate, input gate, and output gate,
,
,
are bias vectors used to update the states of the forget gate, input gate, and output gate, respectively.
is the activation value at time step
,
is the input of the time step,
is the memory candidate of the unit time step t, tanh() is the activation function,
is the weight matrix used to calculate the memory candidate, and
is the bias vector used to update the memory candidate. h_t is the activation value at time step
. For Bi-LSTM, the forward and reverse hidden states are concatenated at each time step to form the final output. In our model, the original chiller data is input as the input of the Bi-LSTM branch, the time series of the chiller data is analyzed, and finally a representation
with a good time series is output.
2.3. Decision level fusion
After the previous Transformer branch and Bi-LSTM branch processing, two feature representations and
with sequence information are obtained respectively. Although they both contain sequence feature information, they are obtained using different sequence feature extraction methods, so there is heterogeneity between them. Therefore, how to fuse the output representations of the two branches well is particularly critical. In this study, we use decision-level fusion. Common decision fusion methods include averaging [40], majority voting [41], weighted and learnable models [42], etc. This study uses average decision-level fusion, which can effectively retain the effective information in each branch and reduce the risk of overfitting. In this section, the two representations
and
will be passed through their respective fully connected layers to obtain two regression result decisions, and then the two classification decisions will be averaged at the decision level to finally obtain the prediction result. The following will introduce the internal structure of decision-level fusion in detail.
First, the two input representations and
are subjected to dropout regularization, which does not change the shape of the tensor. In order to achieve nonlinear transformation, a fully connected layer with Gelu as the activation function is used. The formula of the GeLU activation function refers to (13). Finally, two regression result decisions
and
are output. Among them,
is the regression result decision of the Transformer branch, and
is the regression result decision of the Bi-LSTM. Finally, the two regression result decisions are averaged and fused to obtain
. The mathematical formula of the relevant operation is as follows:
Among them, ,
are the weight matrices of the linear layer,
,
are the bias vectors of the linear layer.
is the final prediction result.
2.4. Training setup
Hyperparameter tuning of the model is crucial to achieve the best prediction performance. The specific Python version libraries used are shown in Table 1. Through a large number of experimental tests and comparisons, we obtained the parameters and hyperparameters based on the DNTB model, as shown in the table. In all experiments, the Adam optimizer was used uniformly, and the learning rate was uniformly set to 0.001. There are slight differences between long-term and short-term predictions in some parameters. For example, in the long-term prediction of chiller energy consumption, epochs are set to 100, Sequence length is set to 4, and hidden size is set to 64. In the short-term prediction of chillers, epochs are set to 150, Sequence length is set to 1, and hidden size is set to 32. Appropriate parameters can allow the model to fit the energy consumption curve trend better and faster. For specific parameters, see Table 2. The long-term task is set to 8 heads and the short-term task is set to 4 heads. The main purpose is that the long-term task requires more heads to model global dependencies, while the short-term task can reduce redundant calculations with fewer heads. The Bi-LSTM hidden size is set to 64 for the long-term task and 32 for the short-term task. The purpose is that the long-term task requires a larger capacity to store bidirectional time series information, while the short-term task avoids overfitting by compressing the dimension. During the training process, we use mean square error as the loss function to train the model:
The model was implemented using the PyTorch framework, a mature Python neural network framework, and the NVIDIA A40 GPU was used on Windows to implement a dual-branch network based on Transformer and Bi-LSTM for efficient prediction of building energy chillers. The loss values of the model training and testing for chiller energy consumption prediction converge as shown in Figs 2 and 3. We believe that the experimental parameters are set so that the DNTB model fits the chiller energy consumption data well.
2.5. Baseline
To validate the effectiveness and generalization capability of the proposed DNTB model, this study selected several state-of-the-art baseline models from the field of deep learning, particularly those widely used in time-series data analysis. The selected baseline models include Bi-LSTM, Transformer, AlexNet, ResNet, and VGG-19. A brief description of each baseline model is provided below:
Bi-LSTM [39]: Proposed by Hochreiter and Schmidhuber in 1997, Bi-LSTM (Bidirectional Long Short-Term Memory) has undergone significant advancements in subsequent research. It is a powerful sequential modeling tool, performing exceptionally well in processing time-series data and natural language processing tasks.
AlexNet [43]: AlexNet, introduced by Alex Krizhevsky et al. in 2012, marked a major breakthrough in deep learning for computer vision. It was the first convolutional neural network (CNN) to achieve significant performance improvements on the ImageNet dataset. AlexNet consists of multiple convolutional layers, pooling layers, and fully connected layers. It employs Dropout and ReLU activation functions, laying the foundation for modern CNN-based image classification models.
VGG-19 [44]: VGG-19, proposed by Karen Simonyan and Andrew Zisserman in 2014, is a deep convolutional neural network known for its simplicity and effectiveness. It demonstrated remarkable performance on the ImageNet dataset by utilizing small convolutional kernels and a consistent network architecture.
ResNet [45]: ResNet (Residual Network), introduced by Kaiming He et al. in 2015, achieved groundbreaking performance on the ImageNet dataset. The core innovation of ResNet is its residual connections, which allow the network to learn residual functions instead of direct mappings. This design addresses the gradient vanishing and exploding problems often encountered in deep networks, enabling the training of very deep models.
Transformer [36]: Transformer, proposed by Vaswani et al. in 2017, leverages self-attention mechanisms to capture long-range dependencies in sequential data. The multi-head attention mechanism in Transformer allows the model to learn multiple representations of the sequence simultaneously, making it highly effective for tasks involving complex sequential relationships. Transformers have shown significant success in NLP tasks such as machine translation, text generation, and question answering.
TCN [46]: TCN (Temporal Convolutional Network) is a time series modeling method based on convolutional neural networks. It captures long-term dependencies through dilated causal convolution and residual connections. Unlike traditional recurrent neural networks (RNNs), TCN uses a convolutional structure to achieve parallel computing, avoiding the time overhead of recursive computing. At the same time, it expands the receptive field in the time dimension through dilated convolution, which can effectively model long-distance temporal dependencies. Its causality ensures that the prediction depends only on historical data. It is suitable for single-step and multi-step time series prediction tasks and performs well in energy consumption prediction, speech recognition and other fields.
Informer [47]: Informer is an improved time series prediction model based on Transformer. To address the high computational complexity and memory consumption of traditional Transformer in long sequence prediction, Prob Sparse Self-Attention and Distilling Operation are proposed. Prob Sparse Attention reduces the amount of computation by screening key time points, while the distillation operation reduces the sequence length layer by layer, further improving the efficiency of long sequence modeling. While maintaining the global dependency modeling capability of Transformer, Informer significantly improves the practicality of long time series prediction and is widely used in scenarios that require processing of ultra-long input sequences, such as power load forecasting and meteorological data modeling.
The performance of these baseline models is compared to the DNTB model in both long-term and short-term chiller energy consumption prediction tasks, as discussed in subsequent sections.
3. Data analysis
The dataset used in this study was obtained from Kaggle and consists of chiller energy consumption data from commercial buildings in Singapore. The dataset used in this article is provided as an attachment. The time span of the dataset ranges from August 18, 2019, 00:00, to June 1, 2020, 13:00 [48]. Data collection was performed at 30-minute intervals, resulting in a total of 13,615 records. Then, we preprocessed the data, including removing outliers and using the mean value to interpolate missing values. The outlier screening used a sliding window (IQR) method to eliminate abnormal data, and finally obtained 13,610 data as experimental samples. This robust dataset provided a solid foundation for evaluating the proposed DNTB model, ensuring accuracy and reliability in chiller energy consumption prediction.
Subsequently, a Spearman correlation analysis was conducted to examine the relationships between the dataset’s variables and chiller energy consumption (Chiller Consumption, measured in kWh). The variables analyzed included Chilled Water Rate (L/sec), Cooling Water Temperature (°C), Building Load (RT), Outside Temperature (°F), Dew Point (°F), Humidity (%), Wind Speed (mph), and Pressure (in). The Spearman correlation coefficient is a statistical measure that evaluates the monotonic relationship (not necessarily linear) between two variables. Its values range from –1 to 1, where −1 indicates a perfect negative correlation, 1 indicates a perfect positive correlation, and 0 suggests no correlation [49]. In this study, we divide the correlation into three levels. The absolute value of the Spearman correlation coefficient is less than 0.45, which means the correlation is weak. The absolute value of the Spearman correlation coefficient is greater than or equal to 0.45 and less than 0.5, which means the correlation is strong. The absolute value of the Spearman correlation coefficient is greater than or equal to 0.5, which means the correlation is very strong. The experimental results are shown in Table 3. It can be seen from the table that there are five variables that have a very strong correlation with Chiller Consumption, namely Chilled Water Rate, Colling Water Temperature, Building Load, Outside Temperature and Humidity. Among them, Building Load has the strongest correlation with Chilled Consumption, with a Spearman value of 0.903364343, which indicates that when the Chilled Water Rate increases, the energy consumption of the chiller will increase. It is worth noting that the correlation coefficient between Humidity and Chiller Consumption is −0.589924807, which indicates that when Humidity increases, the energy consumption of the chiller will decrease. These correlation experiments provide a theoretical basis for using these different variables to predict the energy consumption of chillers.
In this section, we will introduce the partitioning of the chiller dataset. The prediction tasks are long-term prediction and short-term prediction. Depending on the task, the dataset is partitioned in different ways. First, for long-term prediction, in order to address the time series nature, the dataset is partitioned in chronological order to ensure that the training subset contains the earlier time period and the test set contains the later time period. This dataset partitioning strategy enables the model to predict future results based on historical patterns. The dataset is divided into a 70% training subset and a 30% test subset, which contain 9530 training samples and 4080 test samples respectively. In short-term prediction, the dataset is partitioned in chronological order using a five-fold cross-validation split method. The dataset is divided into an 80% training subset and a 20% test subset, which contain 10890 training instances and 2720 test instances respectively. The five-fold cross-validation method helps to reduce overfitting and bias and improve the generalization ability of the model [50]. Fig 4 shows the curve of the chiller energy consumption forecast for the test set from March 7 to March 18, 2020 in the long-term forecast. Repeated troughs and unique peaks appear in the image, indicating the fluctuation of the chiller energy consumption curve. There are many reasons for the troughs in the energy consumption data, including low building load (RT), changes in external temperature, or periods of reduced building occupancy, resulting in reduced chiller energy use. Similarly, high peaks mean the opposite, such as high building loads during the morning working peak, increased building occupancy, etc. These fluctuations are also affected by other factors such as humidity levels and chilled water flow, resulting in overall changes in the data.
4. Experimental and discussion of result
4.1. Model evaluation indexes
To quantitatively evaluate the proposed model for chiller energy consumption prediction, selecting appropriate evaluation metrics is crucial. This study selected several important performance evaluation metrics, including Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination R2. MAE is the average of the absolute errors between predicted and actual values, providing an intuitive measure of the model’s performance relative to the observed values. RMSE is the square root of the average of the squared errors between predicted and actual values. R2 represents the proportion of the variance in the dependent variable that is predictable from the independent variables, measuring the model’s goodness of fit. In building energy consumption prediction, these metrics together provide a comprehensive evaluation framework, helping us understand the model’s performance [51]. MAE and RMSE provide direct measures of prediction error, while R2 provides a measure of the model’s goodness of fit. By considering these metrics together, we can more comprehensively evaluate the model’s performance and select the most suitable model for building energy consumption prediction. The formulas for these metrics are as follows:
where is the actual value,
is the predicted value, and n is the number of samples.
These three evaluation metrics—MAE, RMSE, and R2—provide complementary insights into the model’s accuracy, error sensitivity, and fit. By considering all of these metrics together, we can more effectively evaluate the performance of the DNTB model in predicting chiller energy consumption.
Where i is the i-th predicted value, n is the total number of samples, is the predicted value, and y is the actual value. COV() is the covariance function, and VAR() is the variance function.
4.2. Model comparative analysis
To demonstrate the effectiveness and generalization capability of the proposed DNTB model, this study conducted extensive comparative experiments using seven different baseline models for both long-term and short-term chiller energy consumption prediction tasks. The experimental results are presented in Tables 4 and 5, as well as Figs 5 and 6.
In the long-term prediction task, among the seven baseline models, Bi-LSTM achieved the best performance, with a Mean Squared Error (MSE) of 0.0105, Mean Absolute Error (MAE) of 0.097, and an R2 of 0.5438. Bi-LSTM effectively handled sequential information and fit some of the chiller energy consumption data well. However, Bi-LSTM was prone to overfitting, causing the model’s performance to degrade after reaching a certain number of epochs. The worst-performing model was VGG-19, which, despite having a parameter size of 45.54MB, failed to deliver proportional improvements in performance. This suggests that overly complex models may not necessarily yield better results. Instead, they often require larger datasets to train effectively. The poor performance of VGG-19 can be attributed to its inability to capture temporal information and its inadequate fit to the chiller energy consumption data.
In the short-term prediction task, AlexNet achieved the best performance among the baseline models, with an MSE of 0.0124, MAE of 0.0897, and an R2 of 0.5066. The convolutional layers in AlexNet may have contributed to its ability to extract temporal features from the short-term data, positively influencing the prediction results. Similarly, VGG-19 performed the worst in this task as well, for the same reasons: its excessive complexity and inability to extract temporal features effectively. The proposed DNTB model outperformed all baseline models in both long-term and short-term prediction tasks. For long-term predictions, the DNTB model achieved an MSE of 0.0051, MAE of 0.0605, and an R2 of 0.8031. For short-term predictions, it achieved an MSE of 0.0084, MAE of 0.0747, and an R2 of 0.6538. The DNTB model demonstrated the ability to capture the temporal information of chiller energy consumption data while avoiding overfitting or underfitting, as evident from Fig 2. The Bi-LSTM branch in DNTB effectively processed sequential information and extracted relevant features, while the Transformer branch leveraged attention mechanisms to focus on features critical to the prediction task. The decision-level fusion mechanism successfully integrated the outputs of both branches, resulting in highly accurate predictions.
4.3. Ablation study
In order to verify the effectiveness of each module in the model, we ablated each component in the model on the short-term prediction task, including the Transformer branch, Bi-LSTM branch and Decision lever fusion modules. In the experiment of eliminating the Transformer branch, the input dimension of the Decision lever fusion module was reduced to half of the original, and the other components and parameters remained unchanged. Similarly, in the experiment of eliminating Bi-LSTM, the input module of the Decision lever fusion was also reduced to half of the original, and the other components and parameters remained unchanged. In the experiment of eliminating the Decision lever fusion module, we replaced the Decision lever fusion module with a fully connected layer (FC) and kept the other components and parameters unchanged. The results of the experiment are shown in Table 6, which shows some conclusions:
Effectiveness of the Transformer Branch: The Transformer branch uses an attention mechanism to extract spatial and structural information from the input energy consumption data, isolating key features while reducing noise. This positively impacts prediction results.
Importance of the Bi-LSTM Branch: The Bi-LSTM branch is indispensable in DNTB. It captures the temporal dependencies in chiller energy consumption data, which is critical for improving model performance.
Superiority of the Decision-Level Fusion Module: The Decision-Level Fusion module effectively integrates features extracted by the two branches. It combines their complementary strengths to output accurate energy consumption predictions.
5. Conclusion
As global warming intensifies and environmental issues become increasingly critical, energy prediction and consumption reduction are gaining attention from nations and organizations worldwide. The proposed chiller energy consumption prediction model, DNTB, has potential applications in building energy management systems and energy-saving engineering projects, bridging the gap between predictive models and real-world engineering applications. This study solves the binary problem of long-term dependency and noise sensitivity through the Transformer-BiLSTM collaborative architecture, and its dynamic fusion mechanism provides a new paradigm for time series prediction. BNTB excellently models the relationship between input variables and chiller energy consumption. BNTB has shown advantages in both long-term and short-term prediction tasks. The MAE, MSE, and R2 performances for long-term prediction tasks are 0.0051, 0.0605, and 0.8031, and the MAE, MSE, and R2 performances for short-term tasks are 0.0084, 0.0747, and 0.6538, which are better than other prediction models. The proposed decision fusion lever module can better integrate the feature data of the two branches and obtain an accurate energy consumption prediction result.
The proposed DNTB model enhances the empirical model database for chiller energy consumption prediction and demonstrates strong predictive capabilities. Subsequent work will develop a quantitative version for edge devices and integrate it with the model predictive control (MPC) system to achieve closed-loop optimization. However, this study has certain limitations. The current model is trained based on tropical data, and may have performance degradation and insufficient coverage of extreme working conditions in temperate/cold regions. In the future, more data sets from different regions will be tested and Generative Adversarial Networks (GAN) will be introduced to synthesize extreme working condition data to improve this study.
References
- 1. Mikhaylov A, Moiseev N, Aleshin K, Burkhardt T. Global climate change and greenhouse effect. Entrepreneur Sustain Issue. 2020;7(4):2897.
- 2. Huang R, Zhang S, Wang P. Key areas and pathways for carbon emissions reduction in Beijing for the “dual carbon” targets. Energy Policy. 2022;164:112873.
- 3. Hou R, Liu B, Sun Y, Liu L, Meng J, Levi MD, et al. Recent advances in dual-carbon based electrochemical energy storage devices. Nano Energy. 2020;72:104728.
- 4. Jia L, Wei S, Liu J. A review of optimization approaches for controlling water-cooled central cooling systems. Build Environ. 2021;203:108100.
- 5. Letherman K, Al-Azawi M. Predictions of the heating and cooling energy requirements in buildings using the degree hours method. Build Environ. 1986;21(3–4):171–6.
- 6. Mathews E, Van Heerden E, Arndt D. A tool for integrated HVAC, building, energy and control analysis part 1: overview of QUICKcontrol. Build Environ. 1999;34(4):429–49.
- 7. Fang T, Lahdelma R. Evaluation of a multiple linear regression model and SARIMA model in forecasting heat demand for district heating system. Appl Energy. 2016;179:544–52.
- 8. Becker R, Thrän D. Completion of wind turbine data sets for wind integration studies applying random forests and k-nearest neighbors. Appl Energy. 2017;208:252–62.
- 9. Zhou Z, Wu R. Stock price prediction model based on convolutional neural networks. J Ind Eng Appl Sci. 2024;2(4):1–7.
- 10. Alakbari FS, Mohyaldinn M, Ayoub M, Muhsan AS, Salih AA, Abbas AH, et al. Prediction of critical total drawdown in sand production from gas wells: Machine learning approach. Canadian J Chem Eng. 2023;101(5):2493–509.
- 11. Alakbari FS, Mohyaldinn ME, Ayoub MA, Hussein IA, Muhsan AS, Ridha S, et al. A gated recurrent unit model to predict Poisson’s ratio using deep learning. J Rock Mech Geotech Eng. 2024;16(1):123–35.
- 12. Wu R, Zhang T, Xu F. Cross-market arbitrage strategies based on deep learning. Acad J Sociol Manag. 2024;2(4):20–6.
- 13. Hassan AM, Ayoub MA, Mohyadinn ME, Al-Shalabi EW, Alakbari FS. A new insight into smart water assisted foam SWAF technology in carbonate rocks using artificial neural networks ANNs. In: Offshore Technology Conference Asia. OTC; 2022.
- 14. Li X, Hu C, Luo S, Lu H, Piao Z, Jing L. Distributed Hybrid-Triggered Observer-Based Secondary Control of Multi-Bus DC Microgrids Over Directed Networks. In: IEEE Transactions on Circuits and Systems I: Regular Papers, 2025.
- 15. Hu Z, Su R, Veerasamy V, Huang L, Ma R. Resilient frequency regulation for microgrids under phasor measurement unit faults and communication intermittency. IEEE Trans Ind Info. 2024.
- 16. Ben-Nakhi AE, Mahmoud MA. Cooling load prediction for buildings using general regression neural networks. Energy Convers Manag. 2004;45(13–14):2127–41.
- 17. Ekici BB, Aksoy UT. Prediction of building energy consumption by using artificial neural networks. Adv Eng Softw. 2009;40(5):356–62.
- 18. Wang Z, Srinivasan RS. A review of artificial intelligence based building energy prediction with a focus on ensemble prediction models. In: 2015 Winter simulation conference (WSC). IEEE; 2015.
- 19. Ngarambe J, Yun GY, Santamouris M. The use of artificial intelligence (AI) methods in the prediction of thermal comfort in buildings: Energy implications of AI-based thermal comfort controls. Energ Build. 2020;211:109807.
- 20. Lu C, Li S, Lu Z. Building energy prediction using artificial neural networks: A literature survey. Energ Build. 2022;262:111718.
- 21. Mohandes SR, Zhang X, Mahdiyar A. A comprehensive review on the application of artificial neural networks in building energy analysis. Neurocomputing. 2019;340:55–75.
- 22. Bi J, Wang H, Yan E, Wang C, Yan K, Jiang L, et al. AI in HVAC fault detection and diagnosis: A systematic review. Energy Rev. 2024:100071.
- 23. Zubair S, Hamayat F, Nazir A. AI-powered Thermal Comfort: Predicting Indoor Air Temperature for Efficient HVAC Systems. In: 2024 29th International Conference on Automation and Computing (ICAC). IEEE; 2024.
- 24. Sharma V, Mistry V. HVAC load prediction and energy saving strategies in building automation. European J Adv Eng Technol. 2022;9(3).
- 25. Liu H, Liu Y, Guo X, Wu H, Wamg H, Liu Y. An energy consumption prediction method for HVAC systems using energy storage based on time series shifting and deep learning. Energ Build. 2023;298:113508.
- 26. Zhuang D, Gan VJL, Tekler ZD, Chang A, Tian S, Shi X. Data-driven predictive control for smart HVAC system in IoT-integrated buildings with time-series forecasting and reinforcement learning. Appl Energy. 2023;338:120936.
- 27. Taheri S, Ahmadi A, Mohammadi-Ivatloo B, Asadi S. Fault detection diagnostic for HVAC systems via deep learning algorithms. Energ Build. 2021;250:111275.
- 28. Wang Z, Calautit J, Wei S, Tien PW, Xia L. Real-time building heat gains prediction and optimization of HVAC setpoint: An integrated framework. J Build Eng. 2022;49:104103.
- 29. He K, Fu Q, Lu Y, Wang Y, Luo J, Wu H, et al. Predictive control optimization of chiller plants based on deep reinforcement learning. J Build Eng. 2023;76:107158.
- 30. Rosdahl C, Bernhardsson B, Eisenhower B. Model-free MIMO control tuning of a chiller process using reinforcement learning. Sci Technol Built Environ. 2023;29(8):782–94.
- 31. Fang X, Gong G, Li G, Chun L, Peng P, Li W, et al. Cross temporal-spatial transferability investigation of deep reinforcement learning control strategy in the building HVAC system level. Energy. 2023;263:125679.
- 32.
Graves A, Graves A. Long short-term memory. In: Supervised sequence labelling with recurrent neural networks. 2012. p. 37–45.
- 33. Suebsombut P, Sekhari A, Sureephong P, Belhi A, Bouras A. Field data forecasting using LSTM and Bi-LSTM approaches. Appl Sci. 2021;11(24):11820.
- 34. Mtibaa F, Nguyen K-K, Azam M, Papachristou A, Venne J-S, Cheriet M. LSTM-based indoor air temperature prediction framework for HVAC systems in smart buildings. Neural Comput Appl. 2020;32:17569–85.
- 35. Ellis MJ, Chinde V. An encoder–decoder LSTM-based EMPC framework applied to a building HVAC system. Chem Eng Res Design. 2020;160:508–20.
- 36. Vaswani A. Attention is all you need. In: Advances in Neural Information Processing Systems, 2017.
- 37. Lim B, Arik SO, Loeff N, Pfister T. Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int J Forecast. 2021;37(4):1748–64.
- 38. Li L, Su X, Bi X, Lu Y, Sun X. A novel transformer-based network forecasting method for building cooling loads. Energy Build. 2023;296:113409.
- 39. Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005;18(5–6):602–10. pmid:16112549
- 40. Shutova E, Kiela D, Maillard J. Black holes and white rabbits: Metaphor identification with visual features. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies, 2016.
- 41. Morvant E, Habrard A, Ayache S. Majority vote of diverse classifiers for late fusion. In: Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, S+ SSPR 2014, Joensuu, Finland, August 20-22, 2014. Proceedings. Springer; 2014.
- 42. Evangelopoulos G, Zlatintsi A, Potamianos A, Maragos P, Rapantzikos K, Skoumas G. Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans Multimedia. 2013;15(7):1553–68.
- 43. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Information Process Syst. 2012;25.
- 44. Simonyan K. Very deep convolutional networks for large-scale image recognition. arXiv preprint. 2014.
- 45. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
- 46. Bai S, Kolter JZ, Koltun V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv. arXiv preprint arXiv:1803.01271, 2018;10.
- 47. Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, et al. Informer: beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of the AAAI conference on artificial intelligence, 2021.
- 48.
Chiller energy data: HVAC system energy data with weather data [Online]. 2023. Available from: https://www.kaggle.com/chillerenergy
- 49.
Ashraf A. Correlation in machine learning—All you need to know. 2023. Available from: https://medium.com/@abdallahashraf90x/all-you-need-to-know-about-correlation-for-machine-learning-e249fec292e9
- 50.
Cross-validation: evaluating estimator performance. 2023. Available from: https://scikit-learn.org/stable/modules/cross_validation.html
- 51. Dong F, Yu J, Quan W, Xiang Y, Li X, Sun F. Short-term building cooling load prediction model based on DwdAdam-ILSTM algorithm: A case study of a commercial building. Energ Build. 2022;272:112337.