DNTB: Dual-branch network model based on transformer and Bi-LSTM for energy consumption prediction in building chiller systems

Andong Chen; Mingtao Wu; Cheng Chen; Chen Chen; Yong Huang; Xiaoyi Lv

doi:10.1371/journal.pone.0330187

Abstract

Accurate prediction of chiller energy consumption is crucial for reducing building energy consumption. In this study, an innovative dual-branch network architecture DNTB (A Dual-Branch Network Model Based on Transformer and Bi-LSTM for Energy Consumption Prediction in Building Chiller Systems) was proposed to address the problems of insufficient long-term dependency modeling and noise sensitivity in current prediction models. The research goal is to develop a prediction model that can simultaneously process temporal features and global dependencies. The basic principle is to utilize the complementary characteristics of Transformer and Bi-LSTM. Transformer is sensitive to data noise and Bi-LSTM is weak in capturing long-term sequence information. It can better capture the temporal information of chiller energy consumption data and well model the relationship between variables such as chilled water, building load, chiller temperature, humidity, dew point and chiller energy consumption. In order to prove the effectiveness and generalization ability of the model, experiments were carried out on long-term and short-term tasks of chiller energy consumption prediction. The long-term prediction results had MSE (mean absolute error) of 0.0051, RMSE (mean square error) of 0.0605, and R2 (coefficient of determination) of 0.8031. The short-term prediction results had MSE of 0.0080, RMSE of 0.0738, and R2 of 0.6717. The experimental results indicate that DNTB performs excellently in both long-term and short-term chiller energy consumption prediction, making it a robust framework for chiller energy consumption prediction. The introduction of DNTB enriches the diversity of empirical model algorithms.

Citation: Chen A, Wu M, Chen C, Chen C, Huang Y, Lv X (2025) DNTB: Dual-branch network model based on transformer and Bi-LSTM for energy consumption prediction in building chiller systems. PLoS One 20(10): e0330187. https://doi.org/10.1371/journal.pone.0330187

Editor: Zhengmao Li, Aalto University, FINLAND

Received: February 10, 2025; Accepted: July 18, 2025; Published: October 3, 2025

Copyright: © 2025 Chen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All dataset files are available from the https://www.kaggle.com/chillerenergy.

Funding: This research was partially funded by the Xinjiang Uyghur Autonomous Region Graduate Research Innovation Project (Grant No. XJ2025G085) to Wu Mingtao (W MT), and partially funded by the Xinjiang Uyghur Autonomous Region Youth Science and Technology Talent Project (Grant No. 2022TSYCCX0060) to Lv Xiaoyi (L XY). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the paper.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

In recent years, global warming and the greenhouse effect have become significant challenges worldwide [1]. To address the urgent need to combat climate change, governments and international organizations have proposed carbon neutrality goals [2,3].

In the building sector, reducing energy consumption and carbon emissions is critical to achieving carbon neutrality. With the growing demand for air conditioning and cooling systems, the energy consumption of Heating, Ventilation, and Air Conditioning (HVAC) systems, particularly chillers, has become a significant component of total building energy consumption. Chillers consume substantial amounts of energy; therefore, optimizing chiller energy usage is a vital measure to reduce building electricity consumption. Accurate chiller energy consumption predictions can assist building managers in making efficient and informed decisions to reduce energy waste [4]. However, chillers exhibit characteristics such as lag effects, nonlinearity, and strong coupling, which make energy optimization costly. These factors pose significant challenges to effectively predicting and optimizing chiller energy consumption in practical applications. Consequently, establishing an accurate prediction model for chiller energy consumption under various conditions is key to achieving energy savings in buildings.

In the early stages of energy prediction research, traditional statistical and physical models were widely used [5,6]. These studies primarily relied on empirical formulas and energy consumption models. After 2000, with advancements in computational power, machine learning was introduced into various energy prediction fields. For instance, Fang et al. (2016) employed a multivariate regression model to forecast energy consumption in district heating systems [7]. Support Vector Regression (SVR) effectively addressed the nonlinearity in chiller energy prediction by constructing hyperplanes in high-dimensional space to model the relationship between energy consumption and environmental factors. Similarly, Becker et al. (2017) applied K-Nearest Neighbors (KNN) and Random Forest algorithms to predict wind energy consumption in power systems [8]. KNN, a simple and intuitive non-parametric method, predicted future chiller energy consumption based on the similarity of historical data, while Random Forest used tree-structured models to capture complex energy consumption patterns and model nonlinear relationships between features.

Post-2010, with the advent of big data and further improvements in computational power, deep neural network methods became widely applied to energy consumption prediction. Artificial Neural Networks (ANNs), inspired by human biological neural networks, are nonlinear statistical methods capable of predicting various forms of building energy consumption, including total energy use, heating and cooling loads, and electricity consumption. Deep learning has made significant progress in the field of time series prediction. Many studies [9–15] have shown that the hybrid neural network architecture has demonstrated excellent performance in complex system modeling such as stock price prediction (Zhou & Wu, 2024), oil and gas engineering parameter prediction (Alakbari et al., 2021, 2023), and rock mechanics analysis (Alakbari et al., 2024), which provides important theoretical support for the construction of a dual-branch prediction model in this study. Two primary categories of ANN methods—backpropagation neural networks and generalized regression neural networks—have also been applied to energy consumption prediction [16,17]. However, despite these algorithms laying a foundation for chiller energy consumption prediction, traditional machine learning or deep learning models have certain limitations, particularly when dealing with complex dynamic systems and large datasets. These algorithms often suffer from strong dependencies on feature engineering, insufficient temporal modeling capabilities, and poor generalization performance.

In recent years, artificial intelligence combined with big data analytics has increasingly gained attention in building energy prediction [18–22]. Research in this field spans areas such as chiller energy consumption prediction [23–27], fault detection in building chillers [26], and predictive control optimization [4,28–31]. Traditional machine learning and deep learning approaches no longer suffice for building energy consumption predictions. Long Short-Term Memory (LSTM) networks, designed to address the long-distance dependency problem in traditional Recurrent Neural Networks (RNNs), utilize memory cells and three gating mechanisms (input, forget, and output gates) to selectively retain or discard information [32]. Bi-Directional LSTM (Bi-LSTM), an extension of LSTM, processes sequential data from both forward and backward directions, capturing contextual information. It is widely applied in sequence prediction and time series analysis tasks [33]. Bi-LSTM can effectively model the complex nonlinear relationships in energy consumption prediction, making it suitable for real-time and non-real-time forecasting. Numerous studies have demonstrated the advantages of LSTM in HVAC energy consumption prediction. For instance, F. Mtibaa et al. proposed LSTM-MISO and LSTM-MIMO models, which were evaluated using real-world case studies of buildings employing variable air volume (VAV) and constant air volume (CAV) systems. Results showed that the LSTM model outperformed multilayer perceptron models, reducing prediction errors by 50% [34]. Similarly, MJ Ellis et al. developed an encoder-decoder LSTM-based EMPC framework, which pre-cooled building thermal zones and reduced operational costs compared to maximum temperature maintenance approaches [35]. Transformer models, based on attention mechanisms, represent another milestone in deep learning. Initially introduced by Vaswani et al. in 2017 for natural language processing (NLP) tasks [36], Transformers leverage self-attention mechanisms and encoder-decoder architectures to process sequential data efficiently while addressing the long-distance dependency problems inherent in traditional RNNs. With strong temporal modeling capabilities, parallel computation efficiency, and the ability to capture complex dependencies, Transformers have become powerful tools in energy prediction, especially for tasks requiring multi-feature and long-sequence dependencies in chiller energy consumption prediction. For example, LIM B et al. achieved remarkable performance using Transformers for forecasting in power load, traffic, retail, and stock domains [37]. Similarly, Long Li et al. proposed a Transformer-based model for building cooling load prediction, improving load forecasting accuracy [38]. Despite their potential, Transformer and Bi-LSTM models have limitations when applied to chiller energy consumption prediction. Bi-LSTM struggles to capture long-term dependencies, such as seasonal factors affecting energy consumption. Additionally, Bi-LSTM may face training difficulties due to gradient vanishing or exploding issues. On the other hand, Transformers are sensitive to data quality and noise, which can negatively impact performance when dealing with noisy or anomalous data. The theoretical basis of this study lies in two points. First, the self-attention mechanism of Transformer can effectively model the global dependency between variables, but it requires high-quality data support. Second, the bidirectional time series processing capability of Bi-LSTM is good at capturing local time patterns, but its ability to model long-term trends is limited.

Given these research advancements and the limitations of existing models, this study develops a dual-branch network based on Transformer and Bi-LSTM (DNTB) for efficient prediction of chiller energy consumption in building energy systems. The main innovations of this study include:

Proposing a dual-branch structure that leverages the complementary advantages of Transformer and Bi-LSTM in capturing long-term and short-term dependencies in sequential data, enabling accurate chiller energy consumption prediction.
Introducing a decision-level fusion module that integrates the complementary information from the two branches through averaging their output representations to derive the final prediction.
This study proposes a deep learning-based model named DNTB (Dual-branch Network based on Transformer and Bi-LSTM) that achieves superior performance in long-term and short-term chiller energy consumption prediction tasks.

2. Method

The Dual-branch Network based on Transformer and Bi-LSTM (DNTB) is an innovative and efficient framework for predicting chiller energy consumption. It boasts high accuracy and robust generalization capability. The framework is illustrated in Fig 1. First, sensor data collected from the chiller system is input into the framework. The raw chiller data is processed through two branches: a Transformer branch and a Bi-LSTM branch. The Transformer branch extracts important features and sequential information from the raw data and outputs corresponding representations. Simultaneously, the Bi-LSTM branch captures the dynamic temporal features of the chiller’s time series data from both forward and backward directions, producing sequence-aware representations. Finally, these representations, which contain extracted sequential information and high-level features, are input into a decision-level fusion module. After being passed through individual fully connected layers, the outputs from the two branches are averaged to yield the final prediction. The detailed structure of DNTB is explained below.

Download:

Fig 1. Framework diagram of the proposed model.

https://doi.org/10.1371/journal.pone.0330187.g001

2.1. Transformer branch

In the standard Transformer architecture, there are two main components: the encoder and the decoder [36]. The encoder processes input sequences and transforms them into hidden state vectors, extracting features and contextual information to generate high-dimensional representations of the input data. The decoder typically generates output sequences (e.g., translating from one language to another). However, in chiller energy consumption prediction tasks, the goal is not to generate an output sequence but to predict a single value. Thus, the proposed model employs only the encoder component of the Transformer to produce sequence-aware representations.

Initially, sequential input data is mapped to vectors of a specified dimension and fed into the Transformer’s encoder. The encoder comprises four components: a multi-head attention mechanism, a normalization layer, a feedforward neural network layer, and another normalization layer. These components are detailed below:

Multi-head attention mechanism.

This component is composed of multiple attention layers. The attention mechanism (AM), a core concept in the Transformer, mimics how the human brain focuses on specific information. It scans the target region quickly, identifying relevant features. The attention mechanism is based on scaled dot-product attention, which maps a query (Q) and a set of key (K) and value (V) pairs to an output. The data received are either the input or output of the previous encoder, and weight matrices are applied to compute Q, K, and V. The computation is as follows:

(1)

(2)

(3)

(4)

where , , and represent the weight matrices used to calculate Query, Key, and Value, respectively. is the input variable. Q, K, and V represent the query, key, and value matrices, respectively. is a scaling factor to adjust the attention scores. The multi-head attention mechanism stacks several attention layers, with each layer capturing features from different subspaces. This enables the model to extract more comprehensive features compared to single-head attention.

Add & normalize layer.

After extracting temporal features via multi-head attention, the output is normalized to enhance the model’s robustness. The normalization formula is given by:

(5)

Feedforward Neural Network (FFN).

The FFN consists of two linear transformations with a ReLU activation function, which enhances the model’s non-linear fitting capability. The FFN transforms the input vector before passing it to the next module:

(6)

Here, , , , and represent the weights and biases of the FFN layers.

Residual connection and normalization.

The final output of the FFN undergoes residual connection and normalization to further improve robustness.

After processing through these layers, the Transformer branch outputs a sequence-aware representation.

2.2. Bi-LSTM branch

Recurrent Neural Networks (RNNs) are designed specifically for sequence and time-dependent predictions. LSTM (Long Short-Term Memory) networks address the issue of long-term dependency in traditional RNNs [39]. LSTM has shown superior performance in chiller energy consumption prediction compared to traditional models like linear regression and support vector machines. LSTM can capture temporal dependencies, handle nonlinearity, and reduce the interference of noise, making it particularly suitable for short-term energy prediction tasks.

The core components of LSTM include a memory cell and three gating mechanisms: the forget gate, input gate, and output gate. These mechanisms control the selective retention and forgetting of information. The following describes their operations:

Forget Gate: Determines which information to discard from the cell state:

(7)

Input Gate: Decides which information to update in the cell state:

(8)

(9)

Output Gate: Determines the output based on the updated cell state:

(10)

State Updates: The cell state and hidden state are updated as follows:

(11)

(12)

Among them, σ() is the sigmoid function, , , are weight matrices used to update the states of the forget gate, input gate, and output gate, , , are bias vectors used to update the states of the forget gate, input gate, and output gate, respectively. is the activation value at time step , is the input of the time step, is the memory candidate of the unit time step t, tanh() is the activation function, is the weight matrix used to calculate the memory candidate, and is the bias vector used to update the memory candidate. h_t is the activation value at time step . For Bi-LSTM, the forward and reverse hidden states are concatenated at each time step to form the final output. In our model, the original chiller data is input as the input of the Bi-LSTM branch, the time series of the chiller data is analyzed, and finally a representation with a good time series is output.

2.3. Decision level fusion

After the previous Transformer branch and Bi-LSTM branch processing, two feature representations and with sequence information are obtained respectively. Although they both contain sequence feature information, they are obtained using different sequence feature extraction methods, so there is heterogeneity between them. Therefore, how to fuse the output representations of the two branches well is particularly critical. In this study, we use decision-level fusion. Common decision fusion methods include averaging [40], majority voting [41], weighted and learnable models [42], etc. This study uses average decision-level fusion, which can effectively retain the effective information in each branch and reduce the risk of overfitting. In this section, the two representations and will be passed through their respective fully connected layers to obtain two regression result decisions, and then the two classification decisions will be averaged at the decision level to finally obtain the prediction result. The following will introduce the internal structure of decision-level fusion in detail.

First, the two input representations and are subjected to dropout regularization, which does not change the shape of the tensor. In order to achieve nonlinear transformation, a fully connected layer with Gelu as the activation function is used. The formula of the GeLU activation function refers to (13). Finally, two regression result decisions and are output. Among them, is the regression result decision of the Transformer branch, and is the regression result decision of the Bi-LSTM. Finally, the two regression result decisions are averaged and fused to obtain . The mathematical formula of the relevant operation is as follows:

(13)

(14)

(15)

(16)

Among them, , are the weight matrices of the linear layer, , are the bias vectors of the linear layer. is the final prediction result.

2.4. Training setup

Hyperparameter tuning of the model is crucial to achieve the best prediction performance. The specific Python version libraries used are shown in Table 1. Through a large number of experimental tests and comparisons, we obtained the parameters and hyperparameters based on the DNTB model, as shown in the table. In all experiments, the Adam optimizer was used uniformly, and the learning rate was uniformly set to 0.001. There are slight differences between long-term and short-term predictions in some parameters. For example, in the long-term prediction of chiller energy consumption, epochs are set to 100, Sequence length is set to 4, and hidden size is set to 64. In the short-term prediction of chillers, epochs are set to 150, Sequence length is set to 1, and hidden size is set to 32. Appropriate parameters can allow the model to fit the energy consumption curve trend better and faster. For specific parameters, see Table 2. The long-term task is set to 8 heads and the short-term task is set to 4 heads. The main purpose is that the long-term task requires more heads to model global dependencies, while the short-term task can reduce redundant calculations with fewer heads. The Bi-LSTM hidden size is set to 64 for the long-term task and 32 for the short-term task. The purpose is that the long-term task requires a larger capacity to store bidirectional time series information, while the short-term task avoids overfitting by compressing the dimension. During the training process, we use mean square error as the loss function to train the model:

Download:

Table 1. The versions and functions of the key Python function libraries used.

https://doi.org/10.1371/journal.pone.0330187.t001

Download:

Table 2. The setting of Hyper parameter.

https://doi.org/10.1371/journal.pone.0330187.t002

(17)

The model was implemented using the PyTorch framework, a mature Python neural network framework, and the NVIDIA A40 GPU was used on Windows to implement a dual-branch network based on Transformer and Bi-LSTM for efficient prediction of building energy chillers. The loss values of the model training and testing for chiller energy consumption prediction converge as shown in Figs 2 and 3. We believe that the experimental parameters are set so that the DNTB model fits the chiller energy consumption data well.

Download:

Fig 2. The loss convergence curves of training and testing on the long-term prediction task of chiller energy consumption.

https://doi.org/10.1371/journal.pone.0330187.g002

Download:

Fig 3. The loss convergence curves of training and testing on the short-term prediction task of chiller energy consumption.

https://doi.org/10.1371/journal.pone.0330187.g003

2.5. Baseline

To validate the effectiveness and generalization capability of the proposed DNTB model, this study selected several state-of-the-art baseline models from the field of deep learning, particularly those widely used in time-series data analysis. The selected baseline models include Bi-LSTM, Transformer, AlexNet, ResNet, and VGG-19. A brief description of each baseline model is provided below:

Bi-LSTM [39]: Proposed by Hochreiter and Schmidhuber in 1997, Bi-LSTM (Bidirectional Long Short-Term Memory) has undergone significant advancements in subsequent research. It is a powerful sequential modeling tool, performing exceptionally well in processing time-series data and natural language processing tasks.

AlexNet [43]: AlexNet, introduced by Alex Krizhevsky et al. in 2012, marked a major breakthrough in deep learning for computer vision. It was the first convolutional neural network (CNN) to achieve significant performance improvements on the ImageNet dataset. AlexNet consists of multiple convolutional layers, pooling layers, and fully connected layers. It employs Dropout and ReLU activation functions, laying the foundation for modern CNN-based image classification models.

VGG-19 [44]: VGG-19, proposed by Karen Simonyan and Andrew Zisserman in 2014, is a deep convolutional neural network known for its simplicity and effectiveness. It demonstrated remarkable performance on the ImageNet dataset by utilizing small convolutional kernels and a consistent network architecture.

ResNet [45]: ResNet (Residual Network), introduced by Kaiming He et al. in 2015, achieved groundbreaking performance on the ImageNet dataset. The core innovation of ResNet is its residual connections, which allow the network to learn residual functions instead of direct mappings. This design addresses the gradient vanishing and exploding problems often encountered in deep networks, enabling the training of very deep models.

Transformer [36]: Transformer, proposed by Vaswani et al. in 2017, leverages self-attention mechanisms to capture long-range dependencies in sequential data. The multi-head attention mechanism in Transformer allows the model to learn multiple representations of the sequence simultaneously, making it highly effective for tasks involving complex sequential relationships. Transformers have shown significant success in NLP tasks such as machine translation, text generation, and question answering.

TCN [46]: TCN (Temporal Convolutional Network) is a time series modeling method based on convolutional neural networks. It captures long-term dependencies through dilated causal convolution and residual connections. Unlike traditional recurrent neural networks (RNNs), TCN uses a convolutional structure to achieve parallel computing, avoiding the time overhead of recursive computing. At the same time, it expands the receptive field in the time dimension through dilated convolution, which can effectively model long-distance temporal dependencies. Its causality ensures that the prediction depends only on historical data. It is suitable for single-step and multi-step time series prediction tasks and performs well in energy consumption prediction, speech recognition and other fields.

Informer [47]: Informer is an improved time series prediction model based on Transformer. To address the high computational complexity and memory consumption of traditional Transformer in long sequence prediction, Prob Sparse Self-Attention and Distilling Operation are proposed. Prob Sparse Attention reduces the amount of computation by screening key time points, while the distillation operation reduces the sequence length layer by layer, further improving the efficiency of long sequence modeling. While maintaining the global dependency modeling capability of Transformer, Informer significantly improves the practicality of long time series prediction and is widely used in scenarios that require processing of ultra-long input sequences, such as power load forecasting and meteorological data modeling.

The performance of these baseline models is compared to the DNTB model in both long-term and short-term chiller energy consumption prediction tasks, as discussed in subsequent sections.

3. Data analysis

The dataset used in this study was obtained from Kaggle and consists of chiller energy consumption data from commercial buildings in Singapore. The dataset used in this article is provided as an attachment. The time span of the dataset ranges from August 18, 2019, 00:00, to June 1, 2020, 13:00 [48]. Data collection was performed at 30-minute intervals, resulting in a total of 13,615 records. Then, we preprocessed the data, including removing outliers and using the mean value to interpolate missing values. The outlier screening used a sliding window (IQR) method to eliminate abnormal data, and finally obtained 13,610 data as experimental samples. This robust dataset provided a solid foundation for evaluating the proposed DNTB model, ensuring accuracy and reliability in chiller energy consumption prediction.

Subsequently, a Spearman correlation analysis was conducted to examine the relationships between the dataset’s variables and chiller energy consumption (Chiller Consumption, measured in kWh). The variables analyzed included Chilled Water Rate (L/sec), Cooling Water Temperature (°C), Building Load (RT), Outside Temperature (°F), Dew Point (°F), Humidity (%), Wind Speed (mph), and Pressure (in). The Spearman correlation coefficient is a statistical measure that evaluates the monotonic relationship (not necessarily linear) between two variables. Its values range from –1 to 1, where −1 indicates a perfect negative correlation, 1 indicates a perfect positive correlation, and 0 suggests no correlation [49]. In this study, we divide the correlation into three levels. The absolute value of the Spearman correlation coefficient is less than 0.45, which means the correlation is weak. The absolute value of the Spearman correlation coefficient is greater than or equal to 0.45 and less than 0.5, which means the correlation is strong. The absolute value of the Spearman correlation coefficient is greater than or equal to 0.5, which means the correlation is very strong. The experimental results are shown in Table 3. It can be seen from the table that there are five variables that have a very strong correlation with Chiller Consumption, namely Chilled Water Rate, Colling Water Temperature, Building Load, Outside Temperature and Humidity. Among them, Building Load has the strongest correlation with Chilled Consumption, with a Spearman value of 0.903364343, which indicates that when the Chilled Water Rate increases, the energy consumption of the chiller will increase. It is worth noting that the correlation coefficient between Humidity and Chiller Consumption is −0.589924807, which indicates that when Humidity increases, the energy consumption of the chiller will decrease. These correlation experiments provide a theoretical basis for using these different variables to predict the energy consumption of chillers.

Download:

Table 3. Correlation between chiller consumption and other variables.

https://doi.org/10.1371/journal.pone.0330187.t003

In this section, we will introduce the partitioning of the chiller dataset. The prediction tasks are long-term prediction and short-term prediction. Depending on the task, the dataset is partitioned in different ways. First, for long-term prediction, in order to address the time series nature, the dataset is partitioned in chronological order to ensure that the training subset contains the earlier time period and the test set contains the later time period. This dataset partitioning strategy enables the model to predict future results based on historical patterns. The dataset is divided into a 70% training subset and a 30% test subset, which contain 9530 training samples and 4080 test samples respectively. In short-term prediction, the dataset is partitioned in chronological order using a five-fold cross-validation split method. The dataset is divided into an 80% training subset and a 20% test subset, which contain 10890 training instances and 2720 test instances respectively. The five-fold cross-validation method helps to reduce overfitting and bias and improve the generalization ability of the model [50]. Fig 4 shows the curve of the chiller energy consumption forecast for the test set from March 7 to March 18, 2020 in the long-term forecast. Repeated troughs and unique peaks appear in the image, indicating the fluctuation of the chiller energy consumption curve. There are many reasons for the troughs in the energy consumption data, including low building load (RT), changes in external temperature, or periods of reduced building occupancy, resulting in reduced chiller energy use. Similarly, high peaks mean the opposite, such as high building loads during the morning working peak, increased building occupancy, etc. These fluctuations are also affected by other factors such as humidity levels and chilled water flow, resulting in overall changes in the data.

Download:

Fig 4. Test set chiller energy consumption curve for a period of time.

https://doi.org/10.1371/journal.pone.0330187.g004

4. Experimental and discussion of result

4.1. Model evaluation indexes

To quantitatively evaluate the proposed model for chiller energy consumption prediction, selecting appropriate evaluation metrics is crucial. This study selected several important performance evaluation metrics, including Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination R². MAE is the average of the absolute errors between predicted and actual values, providing an intuitive measure of the model’s performance relative to the observed values. RMSE is the square root of the average of the squared errors between predicted and actual values. R² represents the proportion of the variance in the dependent variable that is predictable from the independent variables, measuring the model’s goodness of fit. In building energy consumption prediction, these metrics together provide a comprehensive evaluation framework, helping us understand the model’s performance [51]. MAE and RMSE provide direct measures of prediction error, while R² provides a measure of the model’s goodness of fit. By considering these metrics together, we can more comprehensively evaluate the model’s performance and select the most suitable model for building energy consumption prediction. The formulas for these metrics are as follows:

(18)

where is the actual value, is the predicted value, and n is the number of samples.

(19)

(20)

These three evaluation metrics—MAE, RMSE, and R²—provide complementary insights into the model’s accuracy, error sensitivity, and fit. By considering all of these metrics together, we can more effectively evaluate the performance of the DNTB model in predicting chiller energy consumption.

Where i is the i-th predicted value, n is the total number of samples, is the predicted value, and y is the actual value. COV() is the covariance function, and VAR() is the variance function.

4.2. Model comparative analysis

To demonstrate the effectiveness and generalization capability of the proposed DNTB model, this study conducted extensive comparative experiments using seven different baseline models for both long-term and short-term chiller energy consumption prediction tasks. The experimental results are presented in Tables 4 and 5, as well as Figs 5 and 6.

Download:

Table 4. Model performance results for different models in long-term task.

https://doi.org/10.1371/journal.pone.0330187.t004

Download:

Table 5. Model performance results for different models in short-term task.

https://doi.org/10.1371/journal.pone.0330187.t005

Download:

Fig 5. Comparison of actual chiller energy consumption vs. predicted values for different models in the long-term task.

https://doi.org/10.1371/journal.pone.0330187.g005

Download:

Fig 6. Comparison of actual chiller energy consumption vs. predicted values for different models in the short-term task.

https://doi.org/10.1371/journal.pone.0330187.g006

In the long-term prediction task, among the seven baseline models, Bi-LSTM achieved the best performance, with a Mean Squared Error (MSE) of 0.0105, Mean Absolute Error (MAE) of 0.097, and an R² of 0.5438. Bi-LSTM effectively handled sequential information and fit some of the chiller energy consumption data well. However, Bi-LSTM was prone to overfitting, causing the model’s performance to degrade after reaching a certain number of epochs. The worst-performing model was VGG-19, which, despite having a parameter size of 45.54MB, failed to deliver proportional improvements in performance. This suggests that overly complex models may not necessarily yield better results. Instead, they often require larger datasets to train effectively. The poor performance of VGG-19 can be attributed to its inability to capture temporal information and its inadequate fit to the chiller energy consumption data.

In the short-term prediction task, AlexNet achieved the best performance among the baseline models, with an MSE of 0.0124, MAE of 0.0897, and an R² of 0.5066. The convolutional layers in AlexNet may have contributed to its ability to extract temporal features from the short-term data, positively influencing the prediction results. Similarly, VGG-19 performed the worst in this task as well, for the same reasons: its excessive complexity and inability to extract temporal features effectively. The proposed DNTB model outperformed all baseline models in both long-term and short-term prediction tasks. For long-term predictions, the DNTB model achieved an MSE of 0.0051, MAE of 0.0605, and an R² of 0.8031. For short-term predictions, it achieved an MSE of 0.0084, MAE of 0.0747, and an R² of 0.6538. The DNTB model demonstrated the ability to capture the temporal information of chiller energy consumption data while avoiding overfitting or underfitting, as evident from Fig 2. The Bi-LSTM branch in DNTB effectively processed sequential information and extracted relevant features, while the Transformer branch leveraged attention mechanisms to focus on features critical to the prediction task. The decision-level fusion mechanism successfully integrated the outputs of both branches, resulting in highly accurate predictions.

4.3. Ablation study

In order to verify the effectiveness of each module in the model, we ablated each component in the model on the short-term prediction task, including the Transformer branch, Bi-LSTM branch and Decision lever fusion modules. In the experiment of eliminating the Transformer branch, the input dimension of the Decision lever fusion module was reduced to half of the original, and the other components and parameters remained unchanged. Similarly, in the experiment of eliminating Bi-LSTM, the input module of the Decision lever fusion was also reduced to half of the original, and the other components and parameters remained unchanged. In the experiment of eliminating the Decision lever fusion module, we replaced the Decision lever fusion module with a fully connected layer (FC) and kept the other components and parameters unchanged. The results of the experiment are shown in Table 6, which shows some conclusions:

Download:

Table 6. Ablation study result.

https://doi.org/10.1371/journal.pone.0330187.t006

Effectiveness of the Transformer Branch: The Transformer branch uses an attention mechanism to extract spatial and structural information from the input energy consumption data, isolating key features while reducing noise. This positively impacts prediction results.

Importance of the Bi-LSTM Branch: The Bi-LSTM branch is indispensable in DNTB. It captures the temporal dependencies in chiller energy consumption data, which is critical for improving model performance.

Superiority of the Decision-Level Fusion Module: The Decision-Level Fusion module effectively integrates features extracted by the two branches. It combines their complementary strengths to output accurate energy consumption predictions.

5. Conclusion

As global warming intensifies and environmental issues become increasingly critical, energy prediction and consumption reduction are gaining attention from nations and organizations worldwide. The proposed chiller energy consumption prediction model, DNTB, has potential applications in building energy management systems and energy-saving engineering projects, bridging the gap between predictive models and real-world engineering applications. This study solves the binary problem of long-term dependency and noise sensitivity through the Transformer-BiLSTM collaborative architecture, and its dynamic fusion mechanism provides a new paradigm for time series prediction. BNTB excellently models the relationship between input variables and chiller energy consumption. BNTB has shown advantages in both long-term and short-term prediction tasks. The MAE, MSE, and R2 performances for long-term prediction tasks are 0.0051, 0.0605, and 0.8031, and the MAE, MSE, and R2 performances for short-term tasks are 0.0084, 0.0747, and 0.6538, which are better than other prediction models. The proposed decision fusion lever module can better integrate the feature data of the two branches and obtain an accurate energy consumption prediction result.

The proposed DNTB model enhances the empirical model database for chiller energy consumption prediction and demonstrates strong predictive capabilities. Subsequent work will develop a quantitative version for edge devices and integrate it with the model predictive control (MPC) system to achieve closed-loop optimization. However, this study has certain limitations. The current model is trained based on tropical data, and may have performance degradation and insufficient coverage of extreme working conditions in temperate/cold regions. In the future, more data sets from different regions will be tested and Generative Adversarial Networks (GAN) will be introduced to synthesize extreme working condition data to improve this study.

Supporting information

S1 File. HVAC energy data.

https://doi.org/10.1371/journal.pone.0330187.s001

(XLSX)

References

1. Mikhaylov A, Moiseev N, Aleshin K, Burkhardt T. Global climate change and greenhouse effect. Entrepreneur Sustain Issue. 2020;7(4):2897.
- View Article
- Google Scholar
2. Huang R, Zhang S, Wang P. Key areas and pathways for carbon emissions reduction in Beijing for the “dual carbon” targets. Energy Policy. 2022;164:112873.
- View Article
- Google Scholar
3. Hou R, Liu B, Sun Y, Liu L, Meng J, Levi MD, et al. Recent advances in dual-carbon based electrochemical energy storage devices. Nano Energy. 2020;72:104728.
- View Article
- Google Scholar
4. Jia L, Wei S, Liu J. A review of optimization approaches for controlling water-cooled central cooling systems. Build Environ. 2021;203:108100.
- View Article
- Google Scholar
5. Letherman K, Al-Azawi M. Predictions of the heating and cooling energy requirements in buildings using the degree hours method. Build Environ. 1986;21(3–4):171–6.
- View Article
- Google Scholar
6. Mathews E, Van Heerden E, Arndt D. A tool for integrated HVAC, building, energy and control analysis part 1: overview of QUICKcontrol. Build Environ. 1999;34(4):429–49.
- View Article
- Google Scholar
7. Fang T, Lahdelma R. Evaluation of a multiple linear regression model and SARIMA model in forecasting heat demand for district heating system. Appl Energy. 2016;179:544–52.
- View Article
- Google Scholar
8. Becker R, Thrän D. Completion of wind turbine data sets for wind integration studies applying random forests and k-nearest neighbors. Appl Energy. 2017;208:252–62.
- View Article
- Google Scholar
9. Zhou Z, Wu R. Stock price prediction model based on convolutional neural networks. J Ind Eng Appl Sci. 2024;2(4):1–7.
- View Article
- Google Scholar
10. Alakbari FS, Mohyaldinn M, Ayoub M, Muhsan AS, Salih AA, Abbas AH, et al. Prediction of critical total drawdown in sand production from gas wells: Machine learning approach. Canadian J Chem Eng. 2023;101(5):2493–509.
- View Article
- Google Scholar
11. Alakbari FS, Mohyaldinn ME, Ayoub MA, Hussein IA, Muhsan AS, Ridha S, et al. A gated recurrent unit model to predict Poisson’s ratio using deep learning. J Rock Mech Geotech Eng. 2024;16(1):123–35.
- View Article
- Google Scholar
12. Wu R, Zhang T, Xu F. Cross-market arbitrage strategies based on deep learning. Acad J Sociol Manag. 2024;2(4):20–6.
- View Article
- Google Scholar
13. Hassan AM, Ayoub MA, Mohyadinn ME, Al-Shalabi EW, Alakbari FS. A new insight into smart water assisted foam SWAF technology in carbonate rocks using artificial neural networks ANNs. In: Offshore Technology Conference Asia. OTC; 2022.
- View Article
- Google Scholar
14. Li X, Hu C, Luo S, Lu H, Piao Z, Jing L. Distributed Hybrid-Triggered Observer-Based Secondary Control of Multi-Bus DC Microgrids Over Directed Networks. In: IEEE Transactions on Circuits and Systems I: Regular Papers, 2025.
- View Article
- Google Scholar
15. Hu Z, Su R, Veerasamy V, Huang L, Ma R. Resilient frequency regulation for microgrids under phasor measurement unit faults and communication intermittency. IEEE Trans Ind Info. 2024.
- View Article
- Google Scholar
16. Ben-Nakhi AE, Mahmoud MA. Cooling load prediction for buildings using general regression neural networks. Energy Convers Manag. 2004;45(13–14):2127–41.
- View Article
- Google Scholar
17. Ekici BB, Aksoy UT. Prediction of building energy consumption by using artificial neural networks. Adv Eng Softw. 2009;40(5):356–62.
- View Article
- Google Scholar
18. Wang Z, Srinivasan RS. A review of artificial intelligence based building energy prediction with a focus on ensemble prediction models. In: 2015 Winter simulation conference (WSC). IEEE; 2015.
- View Article
- Google Scholar
19. Ngarambe J, Yun GY, Santamouris M. The use of artificial intelligence (AI) methods in the prediction of thermal comfort in buildings: Energy implications of AI-based thermal comfort controls. Energ Build. 2020;211:109807.
- View Article
- Google Scholar
20. Lu C, Li S, Lu Z. Building energy prediction using artificial neural networks: A literature survey. Energ Build. 2022;262:111718.
- View Article
- Google Scholar
21. Mohandes SR, Zhang X, Mahdiyar A. A comprehensive review on the application of artificial neural networks in building energy analysis. Neurocomputing. 2019;340:55–75.
- View Article
- Google Scholar
22. Bi J, Wang H, Yan E, Wang C, Yan K, Jiang L, et al. AI in HVAC fault detection and diagnosis: A systematic review. Energy Rev. 2024:100071.
- View Article
- Google Scholar
23. Zubair S, Hamayat F, Nazir A. AI-powered Thermal Comfort: Predicting Indoor Air Temperature for Efficient HVAC Systems. In: 2024 29th International Conference on Automation and Computing (ICAC). IEEE; 2024.
- View Article
- Google Scholar
24. Sharma V, Mistry V. HVAC load prediction and energy saving strategies in building automation. European J Adv Eng Technol. 2022;9(3).
- View Article
- Google Scholar
25. Liu H, Liu Y, Guo X, Wu H, Wamg H, Liu Y. An energy consumption prediction method for HVAC systems using energy storage based on time series shifting and deep learning. Energ Build. 2023;298:113508.
- View Article
- Google Scholar
26. Zhuang D, Gan VJL, Tekler ZD, Chang A, Tian S, Shi X. Data-driven predictive control for smart HVAC system in IoT-integrated buildings with time-series forecasting and reinforcement learning. Appl Energy. 2023;338:120936.
- View Article
- Google Scholar
27. Taheri S, Ahmadi A, Mohammadi-Ivatloo B, Asadi S. Fault detection diagnostic for HVAC systems via deep learning algorithms. Energ Build. 2021;250:111275.
- View Article
- Google Scholar
28. Wang Z, Calautit J, Wei S, Tien PW, Xia L. Real-time building heat gains prediction and optimization of HVAC setpoint: An integrated framework. J Build Eng. 2022;49:104103.
- View Article
- Google Scholar
29. He K, Fu Q, Lu Y, Wang Y, Luo J, Wu H, et al. Predictive control optimization of chiller plants based on deep reinforcement learning. J Build Eng. 2023;76:107158.
- View Article
- Google Scholar
30. Rosdahl C, Bernhardsson B, Eisenhower B. Model-free MIMO control tuning of a chiller process using reinforcement learning. Sci Technol Built Environ. 2023;29(8):782–94.
- View Article
- Google Scholar
31. Fang X, Gong G, Li G, Chun L, Peng P, Li W, et al. Cross temporal-spatial transferability investigation of deep reinforcement learning control strategy in the building HVAC system level. Energy. 2023;263:125679.
- View Article
- Google Scholar
32. Graves A, Graves A. Long short-term memory. In: Supervised sequence labelling with recurrent neural networks. 2012. p. 37–45.
33. Suebsombut P, Sekhari A, Sureephong P, Belhi A, Bouras A. Field data forecasting using LSTM and Bi-LSTM approaches. Appl Sci. 2021;11(24):11820.
- View Article
- Google Scholar
34. Mtibaa F, Nguyen K-K, Azam M, Papachristou A, Venne J-S, Cheriet M. LSTM-based indoor air temperature prediction framework for HVAC systems in smart buildings. Neural Comput Appl. 2020;32:17569–85.
- View Article
- Google Scholar
35. Ellis MJ, Chinde V. An encoder–decoder LSTM-based EMPC framework applied to a building HVAC system. Chem Eng Res Design. 2020;160:508–20.
- View Article
- Google Scholar
36. Vaswani A. Attention is all you need. In: Advances in Neural Information Processing Systems, 2017.
- View Article
- Google Scholar
37. Lim B, Arik SO, Loeff N, Pfister T. Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int J Forecast. 2021;37(4):1748–64.
- View Article
- Google Scholar
38. Li L, Su X, Bi X, Lu Y, Sun X. A novel transformer-based network forecasting method for building cooling loads. Energy Build. 2023;296:113409.
- View Article
- Google Scholar
39. Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005;18(5–6):602–10. pmid:16112549
- View Article
- PubMed/NCBI
- Google Scholar
40. Shutova E, Kiela D, Maillard J. Black holes and white rabbits: Metaphor identification with visual features. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies, 2016.
- View Article
- Google Scholar
41. Morvant E, Habrard A, Ayache S. Majority vote of diverse classifiers for late fusion. In: Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, S+ SSPR 2014, Joensuu, Finland, August 20-22, 2014. Proceedings. Springer; 2014.
- View Article
- Google Scholar
42. Evangelopoulos G, Zlatintsi A, Potamianos A, Maragos P, Rapantzikos K, Skoumas G. Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans Multimedia. 2013;15(7):1553–68.
- View Article
- Google Scholar
43. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Information Process Syst. 2012;25.
- View Article
- Google Scholar
44. Simonyan K. Very deep convolutional networks for large-scale image recognition. arXiv preprint. 2014.
- View Article
- Google Scholar
45. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
- View Article
- Google Scholar
46. Bai S, Kolter JZ, Koltun V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv. arXiv preprint arXiv:1803.01271, 2018;10.
- View Article
- Google Scholar
47. Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, et al. Informer: beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of the AAAI conference on artificial intelligence, 2021.
- View Article
- Google Scholar
48. Chiller energy data: HVAC system energy data with weather data [Online]. 2023. Available from: https://www.kaggle.com/chillerenergy
49. Ashraf A. Correlation in machine learning—All you need to know. 2023. Available from: https://medium.com/@abdallahashraf90x/all-you-need-to-know-about-correlation-for-machine-learning-e249fec292e9
50. Cross-validation: evaluating estimator performance. 2023. Available from: https://scikit-learn.org/stable/modules/cross_validation.html
51. Dong F, Yu J, Quan W, Xiang Y, Li X, Sun F. Short-term building cooling load prediction model based on DwdAdam-ILSTM algorithm: A case study of a commercial building. Energ Build. 2022;272:112337.
- View Article
- Google Scholar

[ref1] 1. Mikhaylov A, Moiseev N, Aleshin K, Burkhardt T. Global climate change and greenhouse effect. Entrepreneur Sustain Issue. 2020;7(4):2897.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Huang R, Zhang S, Wang P. Key areas and pathways for carbon emissions reduction in Beijing for the “dual carbon” targets. Energy Policy. 2022;164:112873.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Hou R, Liu B, Sun Y, Liu L, Meng J, Levi MD, et al. Recent advances in dual-carbon based electrochemical energy storage devices. Nano Energy. 2020;72:104728.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Jia L, Wei S, Liu J. A review of optimization approaches for controlling water-cooled central cooling systems. Build Environ. 2021;203:108100.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Letherman K, Al-Azawi M. Predictions of the heating and cooling energy requirements in buildings using the degree hours method. Build Environ. 1986;21(3–4):171–6.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Mathews E, Van Heerden E, Arndt D. A tool for integrated HVAC, building, energy and control analysis part 1: overview of QUICKcontrol. Build Environ. 1999;34(4):429–49.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Fang T, Lahdelma R. Evaluation of a multiple linear regression model and SARIMA model in forecasting heat demand for district heating system. Appl Energy. 2016;179:544–52.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Becker R, Thrän D. Completion of wind turbine data sets for wind integration studies applying random forests and k-nearest neighbors. Appl Energy. 2017;208:252–62.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Zhou Z, Wu R. Stock price prediction model based on convolutional neural networks. J Ind Eng Appl Sci. 2024;2(4):1–7.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Alakbari FS, Mohyaldinn M, Ayoub M, Muhsan AS, Salih AA, Abbas AH, et al. Prediction of critical total drawdown in sand production from gas wells: Machine learning approach. Canadian J Chem Eng. 2023;101(5):2493–509.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Alakbari FS, Mohyaldinn ME, Ayoub MA, Hussein IA, Muhsan AS, Ridha S, et al. A gated recurrent unit model to predict Poisson’s ratio using deep learning. J Rock Mech Geotech Eng. 2024;16(1):123–35.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Wu R, Zhang T, Xu F. Cross-market arbitrage strategies based on deep learning. Acad J Sociol Manag. 2024;2(4):20–6.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Hassan AM, Ayoub MA, Mohyadinn ME, Al-Shalabi EW, Alakbari FS. A new insight into smart water assisted foam SWAF technology in carbonate rocks using artificial neural networks ANNs. In: Offshore Technology Conference Asia. OTC; 2022.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Li X, Hu C, Luo S, Lu H, Piao Z, Jing L. Distributed Hybrid-Triggered Observer-Based Secondary Control of Multi-Bus DC Microgrids Over Directed Networks. In: IEEE Transactions on Circuits and Systems I: Regular Papers, 2025.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Hu Z, Su R, Veerasamy V, Huang L, Ma R. Resilient frequency regulation for microgrids under phasor measurement unit faults and communication intermittency. IEEE Trans Ind Info. 2024.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Ben-Nakhi AE, Mahmoud MA. Cooling load prediction for buildings using general regression neural networks. Energy Convers Manag. 2004;45(13–14):2127–41.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Ekici BB, Aksoy UT. Prediction of building energy consumption by using artificial neural networks. Adv Eng Softw. 2009;40(5):356–62.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. Wang Z, Srinivasan RS. A review of artificial intelligence based building energy prediction with a focus on ensemble prediction models. In: 2015 Winter simulation conference (WSC). IEEE; 2015.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref19] 19. Ngarambe J, Yun GY, Santamouris M. The use of artificial intelligence (AI) methods in the prediction of thermal comfort in buildings: Energy implications of AI-based thermal comfort controls. Energ Build. 2020;211:109807.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref20] 20. Lu C, Li S, Lu Z. Building energy prediction using artificial neural networks: A literature survey. Energ Build. 2022;262:111718.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref21] 21. Mohandes SR, Zhang X, Mahdiyar A. A comprehensive review on the application of artificial neural networks in building energy analysis. Neurocomputing. 2019;340:55–75.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref22] 22. Bi J, Wang H, Yan E, Wang C, Yan K, Jiang L, et al. AI in HVAC fault detection and diagnosis: A systematic review. Energy Rev. 2024:100071.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref23] 23. Zubair S, Hamayat F, Nazir A. AI-powered Thermal Comfort: Predicting Indoor Air Temperature for Efficient HVAC Systems. In: 2024 29th International Conference on Automation and Computing (ICAC). IEEE; 2024.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Sharma V, Mistry V. HVAC load prediction and energy saving strategies in building automation. European J Adv Eng Technol. 2022;9(3).
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref25] 25. Liu H, Liu Y, Guo X, Wu H, Wamg H, Liu Y. An energy consumption prediction method for HVAC systems using energy storage based on time series shifting and deep learning. Energ Build. 2023;298:113508.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref26] 26. Zhuang D, Gan VJL, Tekler ZD, Chang A, Tian S, Shi X. Data-driven predictive control for smart HVAC system in IoT-integrated buildings with time-series forecasting and reinforcement learning. Appl Energy. 2023;338:120936.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref27] 27. Taheri S, Ahmadi A, Mohammadi-Ivatloo B, Asadi S. Fault detection diagnostic for HVAC systems via deep learning algorithms. Energ Build. 2021;250:111275.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref28] 28. Wang Z, Calautit J, Wei S, Tien PW, Xia L. Real-time building heat gains prediction and optimization of HVAC setpoint: An integrated framework. J Build Eng. 2022;49:104103.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref29] 29. He K, Fu Q, Lu Y, Wang Y, Luo J, Wu H, et al. Predictive control optimization of chiller plants based on deep reinforcement learning. J Build Eng. 2023;76:107158.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref30] 30. Rosdahl C, Bernhardsson B, Eisenhower B. Model-free MIMO control tuning of a chiller process using reinforcement learning. Sci Technol Built Environ. 2023;29(8):782–94.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref31] 31. Fang X, Gong G, Li G, Chun L, Peng P, Li W, et al. Cross temporal-spatial transferability investigation of deep reinforcement learning control strategy in the building HVAC system level. Energy. 2023;263:125679.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref32] 32. Graves A, Graves A. Long short-term memory. In: Supervised sequence labelling with recurrent neural networks. 2012. p. 37–45.

[ref33] 33. Suebsombut P, Sekhari A, Sureephong P, Belhi A, Bouras A. Field data forecasting using LSTM and Bi-LSTM approaches. Appl Sci. 2021;11(24):11820.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref34] 34. Mtibaa F, Nguyen K-K, Azam M, Papachristou A, Venne J-S, Cheriet M. LSTM-based indoor air temperature prediction framework for HVAC systems in smart buildings. Neural Comput Appl. 2020;32:17569–85.
View Article
Google Scholar

[99] View Article

[100] Google Scholar

[ref35] 35. Ellis MJ, Chinde V. An encoder–decoder LSTM-based EMPC framework applied to a building HVAC system. Chem Eng Res Design. 2020;160:508–20.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref36] 36. Vaswani A. Attention is all you need. In: Advances in Neural Information Processing Systems, 2017.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

[ref37] 37. Lim B, Arik SO, Loeff N, Pfister T. Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int J Forecast. 2021;37(4):1748–64.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref38] 38. Li L, Su X, Bi X, Lu Y, Sun X. A novel transformer-based network forecasting method for building cooling loads. Energy Build. 2023;296:113409.
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref39] 39. Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005;18(5–6):602–10. pmid:16112549
View Article
PubMed/NCBI
Google Scholar

[114] View Article

[115] PubMed/NCBI

[116] Google Scholar

[ref40] 40. Shutova E, Kiela D, Maillard J. Black holes and white rabbits: Metaphor identification with visual features. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies, 2016.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref41] 41. Morvant E, Habrard A, Ayache S. Majority vote of diverse classifiers for late fusion. In: Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshop, S+ SSPR 2014, Joensuu, Finland, August 20-22, 2014. Proceedings. Springer; 2014.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref42] 42. Evangelopoulos G, Zlatintsi A, Potamianos A, Maragos P, Rapantzikos K, Skoumas G. Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans Multimedia. 2013;15(7):1553–68.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref43] 43. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Information Process Syst. 2012;25.
View Article
Google Scholar

[127] View Article

[128] Google Scholar

[ref44] 44. Simonyan K. Very deep convolutional networks for large-scale image recognition. arXiv preprint. 2014.
View Article
Google Scholar

[130] View Article

[131] Google Scholar

[ref45] 45. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
View Article
Google Scholar

[133] View Article

[134] Google Scholar

[ref46] 46. Bai S, Kolter JZ, Koltun V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv. arXiv preprint arXiv:1803.01271, 2018;10.
View Article
Google Scholar

[136] View Article

[137] Google Scholar

[ref47] 47. Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, et al. Informer: beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of the AAAI conference on artificial intelligence, 2021.
View Article
Google Scholar

[139] View Article

[140] Google Scholar

[ref48] 48. Chiller energy data: HVAC system energy data with weather data [Online]. 2023. Available from: https://www.kaggle.com/chillerenergy

[ref49] 49. Ashraf A. Correlation in machine learning—All you need to know. 2023. Available from: https://medium.com/@abdallahashraf90x/all-you-need-to-know-about-correlation-for-machine-learning-e249fec292e9

[ref50] 50. Cross-validation: evaluating estimator performance. 2023. Available from: https://scikit-learn.org/stable/modules/cross_validation.html

[ref51] 51. Dong F, Yu J, Quan W, Xiang Y, Li X, Sun F. Short-term building cooling load prediction model based on DwdAdam-ILSTM algorithm: A case study of a commercial building. Energ Build. 2022;272:112337.
View Article
Google Scholar

[145] View Article

[146] Google Scholar

Figures

Abstract

1. Introduction

2. Method

2.1. Transformer branch

Multi-head attention mechanism.

Add & normalize layer.

Feedforward Neural Network (FFN).

Residual connection and normalization.

2.2. Bi-LSTM branch

2.3. Decision level fusion

2.4. Training setup

2.5. Baseline

3. Data analysis

4. Experimental and discussion of result

4.1. Model evaluation indexes

4.2. Model comparative analysis

4.3. Ablation study

5. Conclusion

Supporting information

S1 File. HVAC energy data.

References