Fig 1.
Time series representation of the characteristic energy consumption of a building from Monday through Friday in the past full year and during the summer (June, July, and August) and winter (December, January, and February) months. The blue vertical boxes show the distribution (middle 50% variability) of energy consumption for the given hour across each season. The whiskers indicate the minimum to maximum consumption, excluding outliers.
Fig 2.
(a) breakdown of the buildings by building use type, (b) location of the buildings, (c) distribution of time length of the dataset for each building use type (d) distribution of buildings by KG climate zone (e) annual consumption distribution by type, and (f) time interval breakdown of dataset.
Fig 3.
Energy data processing pipeline.
The pipeline includes data acquisition, typically with differing data structures and file formats, preprocessing for providing a unique data structure and de-identification, cleaning which checks, and removal of anomalous data to improve data quality and prepares HBase triples, cradle ingestion of those triples into the data warehouse, then followed by analysis in HBase.
Table 1.
Energy time series data structure.
Table 2.
Column names and description of building energy dataset.
Table 3.
Data quality grading criteria.
Table 4.
Weather time series data structure.
Table 5.
Column names and description of weather energy dataset.
Fig 4.
HBase triples and their registration into a data table.
A rowkey and columnkey is assigned to each value. In the HBase data table triples share the same rowkey for a row and the columnkeys are the same for a column in data table.
Fig 5.
Slurm jobs workflow and life cycle.
(a) jobs workflow and interactions with scheduler and storage and (b) job life cycle from submission to completion.
Fig 6.
Slurm job controller flowchart.
It represents the designed distribution and management of jobs and actions based on the execution result of each job.
Fig 7.
Jobs are distributed through Slurm scheduler and in each job data are fetched from HBase and converted to a dataframe. After being analyzed, the results are converted to HBase triples and registered to the results data table.
Fig 8.
(a) comparison of single-core parallel and sequential job processing times, (b) distribution of individual building analysis times for the 814 buildings in our population.
Fig 9.
Processing time comparison of jobs with number of cores, (a) single-core and multi-core parallel jobs processing times, (b) performance of individual jobs with allocation of cores within each job.
Fig 10.
Anomalies in energy consumption data.
Points represent the anomalies and the line represents the energy consumption in kWh.
Fig 11.
Data quality population of data.
(a) Breakdown of data quality before and after cleaning (b) Status of data quality after cleaning. Blue represents the data without change in quality, red represents the data that failed after cleaning criteria, and green represents data that passed after cleaning.
Fig 12.
Clustering on a month of data.
(a) Hierarchical clustering with red cluster showing the abnormal days, (b) Energy consumption plot of one month with abnormal days detection.
Fig 13.
Breakdown of HVAC scheduling time density on 10 different building use types.
Red and green colors represent turn on and turn off times, respectively.
Table 6.
Turn on and off times of HVAC systems of different building use types for cooling degree days.
Fig 14.
Breakdown of operating time for (a) 10 building use types and (b) four KG climate zones. Three red bars represent first, second and third quartiles. Dots represent the actual operating hour of each building and black lines represent the density of the distribution in each category.