Figures
Abstract
Real estate markets are inherently dynamic, influenced by economic fluctuations, policy changes and socio-demographic shifts, often leading to emergence of anomalous—regions, where market behavior significantly deviates from expected trends. Traditional forecasting models struggle to handle such anomalies, resulting in higher errors and reduced prediction stability. In order to address this challenge, we propose EGCN, a novel cluster-specific forecasting framework that first detects and clusters anomalous regions separately from normal regions, and then applies forecasting models. This structured approach enables predictive models to treat normal and anomalous regions independently, leading to enhanced market insights and improved forecasting accuracy. Our evaluations on the UK, USA, and Australian real estate market datasets demonstrates that the EGCN achieves the lowest error both anomaly-free (baseline) methods and alternative anomaly detection methods, across all forecasting horizons (12, 24, and 48 months). In terms of anomalous region detection, our EGCN identifies 182 anomalous regions in Australia, 117 in the UK and 34 in the US, significantly more than the other competing methods, indicating superior sensitivity to market deviations. By clustering anomalies separately, forecasting errors are reduced across all tested forecasting models. For instance, when applying Neural Hierarchical Interpolation for Time Series Forecasting, the EGCN improves accuracy across forecasting horizons. In short-term forecasts (12 months), it reduces MSE from 1.3 to 1.0 in the US, 9.7 to 6.4 in the UK and 2.0 to 1.7 in Australia. For mid-term forecasts (24 months), EGCN achieves the lowest errors, lowering MSE from 3.1 to 2.3 (US), 14.2 to 9.0 (UK), and 4.5 to 4.0 (Australia). Even in long-term forecasts (48 months), where error accumulation is common, EGCN remains stable; decreasing MASE from 6.9 to 5.3 (US), 12.2 to 8.5 (UK), and 16.0 to 15.2 (Australia), highlighting its robustness over extended periods. These results highlight how separately clustering anomalies allows forecasting models to better capture distinct market behaviors, ensuring more precise and risk-adjusted predictions.
Citation: Le D, Rajasegarar S, Luo W, Nguyen TT, Vo N, Nguyen Q, et al. (2025) EGCN: Entropy-based graph convolutional network for anomalous pattern detection and forecasting in real estate markets. PLoS One 20(10): e0334141. https://doi.org/10.1371/journal.pone.0334141
Editor: Nikolaos Askitas, IZA - Institute of Labor Economics, GERMANY
Received: April 29, 2025; Accepted: September 23, 2025; Published: October 16, 2025
Copyright: © 2025 Le et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data underlying the results presented in the study were obtained from publicly available sources. Australian real estate data were obtained from Australian Property Monitors via AURIN (https://data.aurin.org.au/). Real estate data for the United States were sourced from Zillow (https://www.zillow.com/), and UK property transaction data were sourced from HM Land Registry (https://www.gov.uk/government/organisations/land-registry). Sentiment analysis data were derived from publicly available news headlines using a pre-trained RoBERTa model. All relevant processed data supporting the findings are within the manuscript. https://figshare.com/articles/dataset/EGCN_Entropy-based_Graph_Convolutional_Network_for_Anomalous_Pattern_Detection_and_Forecasting/29931260.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The real estate market remains a cornerstone of the global economy, driving financial stability, urban development and individual wealth [1–4]. As one of the most influential sectors, it plays a pivotal role in shaping macroeconomic conditions, urban planning strategies, and the financial well-being of households and businesses alike [4]. Real estate investments are central to wealth accumulation for individuals and institutional portfolios, while property development contributes to urbanization and the creation of infrastructure essential for economic growth. Accurate forecasting in this domain is critical for a wide range of stakeholders, including investors seeking to maximize returns, developers planning large-scale projects, and policymakers aiming to balance economic growth with risk mitigation. Reliable predictions of real estate market trends enable these groups to navigate uncertainties, identify emerging opportunities, and implement proactive strategies to address market risks.
Despite its importance, the inherent complexity of real estate markets poses significant challenges to reliable forecasting. One of the primary barriers is the presence of anomalies—unexpected and irregular changes in market behavior that disrupt typical patterns [5–9]. Anomalies can arise from a multitude of factors [10], including sudden economic shocks, policy interventions, or unforeseen socio-political developments such as international conflicts, pandemics, or natural disasters. These disruptions introduce volatility and unpredictability, rendering traditional forecasting models ineffective. For instance, a sudden shift in monetary policy, such as changes in interest rates or taxation, can lead to abrupt fluctuations in property demand in specific regions, causing erratic pricing trends and transaction volumes. Failure to account for these anomalies often results in inaccurate forecasts, undermining the ability of stakeholders to make informed decisions.
Given the multifaceted nature of real estate market disruptions, robust anomaly detection methods are useful to improve forecasting reliability. Anomalies, whether temporal or geospatial, are not merely statistical outliers but critical indicators of emerging opportunities and risks. Temporal anomalies, such as sudden spikes or drops in property prices or transaction volumes, may highlight underlying shifts in market dynamics, signaling early signs of investment hotspots or the formation of speculative bubbles. Similarly, geospatial anomalies, such as unexpected price increases in certain suburbs, can indicate localized attractiveness due to new infrastructure developments, policy changes, or other region-specific factors [11–13]. These anomalies provide actionable insights that can guide major investors and developers, such as Mitsubishi Estate Co., Ltd. [14,15] in strategically navigating market complexities, identifying lucrative opportunities, and mitigating potential risks in new real estate investments [16]. However, traditional forecasting approaches often misinterpret or discard anomalies as noise, prioritizing the smoothing of disruptions over the extraction of valuable insights. This approach undermines the predictive accuracy of these models and limits their ability to inform effective strategies [17].
Another significant challenge lies in the lack of integration between geospatial data and trading volume in most existing forecasting methodologies for real estate [18,19]. Real estate markets are inherently spatial, with location-specific factors such as infrastructure, demographics, and proximity to amenities influencing property values and transaction patterns. At the same time, trading volume provides critical insights into how public decisions shapes market trends, reflecting the collective psychology of buyers, sellers, and investors. The failure to combine these dimensions prevents a holistic understanding of market dynamics, leaving significant gaps in the ability to predict and respond to anomalies.
Addressing these challenges requires a paradigm shift in anomaly detection and forecasting methodologies. This paper proposes a novel graph-based neural network framework, called EGCN, that integrates entropy-based multivariate feature assimilation, a deep graph convolutional network (GCN) based autoencoder to identify and cluster normal and anomalous regions/suburbs and a cluster-specific deep neural time-series forecasting model to enhance predictive accuracy in the real estate market (Fig 1). The framework uses a GCN-based autoencoder in which the encoder component propagates and aggregates spatiotemporal features, capturing both local and global patterns in the data. The decoder component reconstructs the original features from the learned representations, enabling the identification of anomalies as deviations between reconstructed and actual features [20]. Temporal entropy, calculated using Kernel Density Estimation [21], highlights variability in property prices, and transaction volumes, forming the basis for anomaly detection. A graph is constructed where nodes represent suburbs, and edges encode geospatial proximity [22,23] and temporal correlations between the features. The GCN learns meaningful embeddings of the graph that support grouping of suburbs (nodes of the graph) that show similar patterns, i.e., grouping into two distinct clusters: normal and anomalous clusters. These clusters enable the use of distinct tailored deep neural time series forecasting models fitted for each cluster to improve predictive accuracy, compared to a method where all the time series measurements are combined and used as a whole (without clustering) for forecasting. Our evaluation reveals that the cluster-specific fornicating model fitting improves forecasting accuracy. Moreover, by integrating real estate market, and geospatial features, this framework addresses key limitations in existing methods, offering actionable insights for investors, developers, and policymakers.
It begins with multi-dimensional time-series data for m entities, each with features such as price (p), volume (v) and geospatial data (s), sampled at n time points (). Entropy values
are computed for each entity i and time window j using Kernel Density Estimation, capturing temporal variability across features. These entropy values form a matrix representing temporal dynamics. A graph-based autoencoder
is constructed to detect anomaly pattern nodes, where nodes
represent entities with features
, and edges Eij connect nodes based on spatial relationships with a threshold τ. Edges also encode temporal dependencies, such as shared feature correlations or entropy dynamics. Clustering is applied to group the nodes into two clusters,
grouping entities with similar spatiotemporal behaviors. Each cluster is then independently trained using neural network models, such as Multi-Layer Perceptrons, to predict future trends. This framework effectively combines entropy-based anomaly detection, graph modeling, and cluster-specific forecasting for robust spatiotemporal analysis.
The key contributions of this study are three fold:
- Entropy-Based Graph Convolutional Network for Anomaly Detection: Introducing a novel entropy-based graph convolutional network method that leverages an autoencoder architecture to detect anomalies in graph-structured real estate multi-dimensional time-series data. The method integrates geospatial proximity and temporal trends into its graph representation to identify spatial (suburb-level) and temporal (time-point) resolution anomalies. This approach also provides actionable insights for industrial stakeholders by analyzing features such as property prices, transaction volumes, and geographical locations, enabling the detection of emerging investment opportunities and potential market risks.
- Clustering suburbs into normal and anomalies that exhibit shared behaviour: Proposing an approach that groups suburbs or entities into normal and anomalies (two clusters) based on their entropy patterns and graph-based relationships. This process reveals interconnected market behaviors and facilitates a deeper understanding of how localized anomalies propagate through the broader spatial network, offering a valuable tool for regional market analysis.
- Entropy-based Graph Convolutional Network for Clustering and Forecasting: Developing advanced cluster-specific forecasting models tailored to predict future trends for each cluster. By focusing on the unique characteristics of each group, these models improve predictive accuracy and relevance compared to traditional global approaches. The EGCN framework demonstrates its ability to outperform baseline methods by accounting for the shared dynamics of clustered nodes, providing more precise and actionable forecasting outcomes for stakeholders.
By addressing the challenges faced by real estate industries—such as forecasting inaccuracies caused by anomalies and underutilized geospatial data, this research offers actionable insights for improving investment strategies. Additionally, the framework’s scalability makes it adaptable to other regions and global markets, further underscoring its potential impact. In the following sections, the methodology, experimental results, and practical implications of this approach is explored in detail, demonstrating its efficacy in advancing anomaly detection and forecasting for the real estate industry.
Related work
The real estate market has long been a focus of research due to its significant impact on global economies and urban development. Accurate forecasting in this domain requires integrating methods from time-series analysis, anomaly detection and machine learning. This section reviews existing approaches and highlights the gaps addressed by the proposed framework.
Anomaly detection
Anomaly detection has emerged as a crucial task in time-series analysis, especially for identifying disruptions that may signal risks or opportunities in the real estate market. Traditional anomaly detection techniques, such as Z-score analysis and Isolation Forest [24], focus on statistical deviations [25]. Recent advancements have introduced machine learning-based approaches, including Autoencoders [26] and Variational Autoencoders [27], to model normal patterns and identify deviations. However, these methods are limited in their ability to combine temporal, and spatial factors [28,29]. Existing methods often misinterpret such anomalies as noise, missing critical insights into localized market dynamics.
Graph-based models for spatiotemporal data
Graph-based models, such as Graph Neural Networks (GNNs) [30], GCNs [31], have shown significant promise in capturing spatiotemporal dependencies in various domains, including transportation, climate analysis, and social networks [11,12]. By representing entities as nodes and relationships as edges, graph-based models can effectively model complex interactions between spatially distributed features. However, the application of graph-based models in real estate markets remains limited, with most studies focusing on either spatial or temporal factors in isolation. Existing models rarely integrate entropy-based anomaly detection into graph representations, leaving significant gaps in addressing the multidimensional nature of real estate markets [32].
Clustering and cluster-specific forecasting
Clustering techniques, such as K-Means [33], Dynamic Time Warping (DTW) [34], and Weighted Dynamic Time Warping (WDTW) [35], have been widely used to group entities with similar behaviors, enabling localized insights and targeted analysis. In real estate, clustering can reveal regional trends, such as neighborhoods experiencing rapid growth or decline [7,36–38]. However, most clustering approaches fail to incorporate anomaly-driven patterns, which are critical for understanding market disruptions. Furthermore, traditional forecasting methods often treat clusters as static entities, ignoring the dynamic interactions within and across clusters. Advanced forecasting methods, such as recurrent neural networks (RNNs) [39] and transformers-based models [40], offer potential for improving cluster-specific predictions but are rarely tailored to account for shared anomaly-driven features within clusters.
Multivariate time series forecasting
Recent advancements in multivariate time-series forecasting have demonstrated the effectiveness of MLP-based models in enhancing prediction accuracy and reliability within financial markets. Long short-term memory (LSTM) models, known for their capability to capture long-term dependencies in sequential data, have been applied successfully in forecasting stock market trends [41] and gold prices [42]. The neural hierarchical interpolation for time series (N-HiTS) model has further shown its utility in efficiently identifying trends and patterns in multivariate financial datasets, particularly in forecasting financial indices [43–46]. Similarly, time series mixer (TSMixer), a model tailored to address the inherent volatility of financial markets, has achieved strong performance in predicting stock prices and exchange rates, highlighting its role in financial time-series analysis [47,48]. Meanwhile, Transformers [49], initially designed for natural language processing, have emerged as powerful tools for time-series forecasting, celling in capturing complex temporal dependencies and long-range patterns in financial data.
Methodology
This section presents the proposed framework, EGCN, for anomaly detection and forecasting in the real estate market. The framework integrates temporal entropy analysis, graph-based encoder modeling, anomalies detection, and forecasting to provide actionable insights. It consists of four main stages: temporal entropy computation, graph construction, anomaly clustering, and cluster-specific forecasting. The framework leverages multi-dimensional data, including property prices, and transaction volumes to identify anomalies and predict market trends.
Temporal entropy-based analysis
Let be the multivariate time-series data for n entities, where
is a d-dimensional feature vector at time t. The first step involves applying a sliding window of size W to X to extract overlapping segments of the time series:
where Xw represents a windowed segment of the data. The output Xw is then used to compute the entropy of each segment to quantify variability.
Kernel density estimation computation.
For each window Xw, we estimate its probability distribution using Kernel Density Estimation (KDE) [50]:
where K is the kernel function (e.g., Gaussian), h is the bandwidth, and N is the number of points in the window. The KDE-estimated distribution is then used to serve as the input for the next step, where temporal divergence between distributions is measured.
Kernel Density Estimation (KDE) is highly appropriate for the proposed framework due to its ability to estimate the underlying probability density functions of complex and noisy data without assuming any specific parametric form [51,52]. This non-parametric approach is crucial for real estate forecasting and anomaly detection, where the data distributions of features such as property prices, and transaction volumes are often irregular and multimodal. By using KDE, the framework captures the nuanced variations in these distributions across temporal windows, enabling the identification of subtle shifts or anomalies that traditional methods may overlook. Additionally, KDE’s flexibility allows it to handle multivariate data effectively, making it ideal for integrating spatial, and temporal dimensions inherent in real estate markets. This capability ensures that KDE not only enhances the accuracy of anomaly detection but also provides a robust foundation for predicting future trends based on probabilistic patterns, aligning with the framework’s goals for dynamic market analysis.
Jensen-Shannon Divergence (JSD).
To capture changes between successive sliding windows, the Jensen-Shannon Divergence (JSD) [53] is computed:
where DKL is the Kullback-Leibler divergence. The sequence of JSD values +
identifies significant temporal anomalies, forming the basis for graph construction.
The Jensen-Shannon Divergence (JSD), a symmetric and smoothed extension of Kullback-Leibler Divergence (KLD), is highly suitable for the proposed framework due to its robust ability to quantify the similarity between probability distributions by integrating KLD and Shannon entropy [53,54]. Unlike KLD, JSD is symmetric, ensuring that the order of distributions does not affect the divergence value, which is essential for clustering and anomaly detection tasks requiring consistent similarity measurements. Moreover, the incorporation of Shannon entropy introduces a smoothing effect, reducing sensitivity to noise and small variations, making it particularly effective for analyzing real-world multivariate time-series data. Within our framework, JSD facilitates the comparison of probability distributions over temporal windows, enabling the detection of significant deviations in features such as prices, and volumes while remaining robust to outliers. Its capability to capture both temporal and distributional differences makes JSD a powerful metric for anomaly detection and clustering, aligning seamlessly with the objectives of the proposed method.
To quantify temporal variability, we adopt KDE combined with JSD rather than alternative measures such as Shannon or Rényi entropy. KDE is non-parametric and well-suited for real estate data, which often exhibit multimodal and heavy-tailed distributions due to abrupt price shifts and irregular trading volumes, while JSD provides a symmetric and bounded divergence measure that incorporates Shannon entropy to smooth fluctuations and reduce sensitivity to noise. In contrast, Shannon entropy is highly dependent on discretization and may underperform when distributions are multimodal, and Rényi entropy requires parametric tuning that can lead to instability across heterogeneous datasets [55]. By applying JSD to KDE-estimated distributions across consecutive time windows, our framework captures both distributional and temporal changes, enabling robust detection of subtle market disruptions. The entropy and divergence values form the foundation for constructing a spatiotemporal graph representation.
Graph construction
The entropy and JSD values are used to construct a graph , where nodes represent entities and edges capture relationships based on spatial and temporal features [30].
Node features.
Each node is associated with a feature vector derived from JSD and spatial coordinates:
These node features encode both temporal variability and spatial characteristics.
Edge construction.
Edges Eij between nodes Ni and Nj are defined based on spatial proximity and temporal similarity:
- Spatial Proximity: Edges are created if the geospatial distance between nodes satisfies Haversine distances:
(5)
where ϕ and λ are latitudes and longitudes, R is Earth’s radius. - Temporal Similarity: Edge weights are proportional to the correlation of entropy values between nodes over time. The final edge weight is defined as:
(6)
whereis a balancing coefficient. This design ensures that both geographic closeness and similarity in temporal variability jointly determine the strength of connections in the graph. In our experiments, α was set to 0.5 to give equal importance to spatial and temporal features, but this parameter can be tuned to emphasize one dimension if desired. By combining spatial and temporal similarity into a single edge weight, the constructed graph better reflects the multidimensional relationships that drive anomalies in real estate markets.
The resulting graph structure G serves as input for the next step. Once the graph is defined, we employ a GCN to learn embeddings that capture these spatiotemporal relationships.
Graph convolutional network
Entropy values obtained from previous steps are utilized in a GCN-based autoencoder to construct the graph structure, facilitating the detection of spatiotemporal anomalies. The GCN in this framework employs an encoder-decoder architecture tailored to the real estate market case study. The encoder propagates and aggregates entropy values from neighboring suburbs, learning low-dimensional node embeddings that capture both local market behaviors and global patterns within the graph. The decoder reconstructs the original entropy values, ensuring that the learned embeddings preserve essential spatiotemporal dynamics. Anomalies, such as irregular trends in property prices, and transaction volumes are identified by comparing the reconstructed entropy values with the original data, with significant deviations highlighting potential spatiotemporal anomalies. The process follows these key steps:
- Feature Propagation and Aggregation: Node features are propagated and aggregated using edge connections, allowing nodes to integrate information from their neighbors. Each node Ni updates its representation by combining its own features with those of its neighbors, weighted by the edges Eij.
- Layer-wise Node Embedding Updates: At each layer l, the embedding of node i is updated as:
(7)
where:: Embedding of node i at layer l;
: Neighbors of node i;
: Degree of node i; W(l)): Trainable weight matrix for layer l;
: Non-linear activation function (e.g., ReLU).
- Final Embeddings: After propagating through multiple layers, the final embeddings capture both local patterns (relationships between neighboring nodes) and global patterns (graph-wide trends).
These embeddings allow us to reconstruct node features and evaluate deviations, which provides the basis for anomaly detection.
Identification of anomalous nodes
Anomaly detection begins by calculating the reconstruction error for each suburb, comparing its original entropy values, derived from w sliding windows of the dataset, with the reconstructed values produced by the trained GCN. The reconstruction errors across all features of a node are averaged to compute an aggregate error, excluding spatial attributes like latitude and longitude. A threshold is then determined based on the distribution of aggregate errors across all nodes. Nodes with aggregate errors exceeding this threshold are identified as anomalous, highlighting significant deviations from normal patterns. The detailed process is shown below:
- Thresholding: Nodes Ni and Nj with
are flagged as anomalous due to spatial isolation or irregularity.
(8)
(9)
whererepresents the mean of the aggregate reconstruction errors across all nodes,
is the standard deviation of the aggregate reconstruction errors, and k is a threshold multiplier that controls the sensitivity of anomaly detection. Under a Gaussian assumption, μ+1.5σ corresponds to a one-sided tail probability of approximately 6.7%, matching our operational aim to highlight only the most atypical regions [56–58]. We favor this fixed, a priori choice to prevent test-set–driven threshold optimization. Future work may consider data-driven calibration (e.g., targeting a desired false-positive rate or using validation folds) where appropriate.
- Anomalous Grouping: Nodes are assigned into two clusters, including flagged nodes (cluster of anomalies), and normal nodes (the rest).
(10)
The graph is then segmented into clusters based on spatial relationships and feature similarities. Nodes flagged as anomalous are grouped into a distinct cluster, separating them from regular patterns, while the remaining nodes are organized into regular clusters that represent typical behaviors. This methodology is highly effective, practical, and directly relevant for decision-making due to the following key reasons:
- Integration of Spatiotemporal Features: Combines entropy, latitude, and longitude to analyze spatial and temporal patterns effectively.
- Scalable Framework: GCN enables efficient processing of large datasets.
- Business Relevance: Domain-specific knowledge ensures the approach is aligned with real-world applications.
After identifying anomalous nodes, we separate the data into clusters, which then serve as distinct inputs for cluster-specific forecasting models. Besides, it is important to note that anomalies in our framework are identified at the location level (e.g., suburb or city) rather than at specific location–time pairs. The anomaly score for each node is computed from the aggregate reconstruction error of its entropy series across the full observation window. While this means the anomaly flag is assigned to the node as a whole, temporal dynamics are not ignored: windowed entropy combined with Jensen–Shannon Divergence ensures that short-term fluctuations and recurrent deviations contribute to the reconstruction error. Thus, transient shocks influence anomaly scores indirectly, but the final anomaly designation reflects persistent or structural irregularities at the regional level.
Cluster-specific forecasting models
Following training, each node’s anomaly score is defined as the norm of the reconstruction residual produced by the GCN autoencoder. We then perform a threshold-based binary partition using a robust z-score rule: nodes with scores exceeding are labeled abnormal, and all others normal. This yields the two groups reported in our results without invoking an additional clustering algorithm such as K-means, Spectral Clustering, or GMM. To check robustness, we evaluate the partition under a grid of hyperparameters (window size, hidden channels, learning rate, distance cutoff, and training epochs) and find that the identity of abnormal nodes is largely consistent across settings, indicating stability of the decision rule. Using the above GCN-based anomalies detection, the suburb have 2 clusters. Each cluster Ck is treated as an independent unit for forecasting:
where f is the forecasting model, X is the historical data, and W is the time window.
Forecasting methods used in this study include N-HiTS, Transformer, and TSMixer. Input features include historical prices, and transaction volumes.
N-HiTS [43], an extension of N-BEATS [59], improves forecasting accuracy while reducing computational complexity [43,60,61]. However, N-HiTS encounters difficulties in maintaining high accuracy with intricate dynamic patterns in multivariate data. It employs a hierarchical approach for time series forecasting, using blocks composed of multi-layer perceptrons (MLPs) to predict coefficients [43]. To enhance long-term forecasting, MaxPool layers with kernel size are applied to focus on specific scale components. The input to block
is given by:
where represents the input data from time steps t–L to t for block
, and
is the output of MaxPool, extracting optimized scale components for improved forecasting. These coefficients are used to generate the backcast,
, and the forecast,
, which are the respective outputs of the block.
Transformers [49] uses a self-attention mechanism to model dependencies in sequences:
where Q, K, V are the query, key, and value matrices, and dk is the dimensionality of the key vectors.
TSMixer [47] is a novel neural network architecture designed to capture intricate patterns in time series data by stacking multiple multi-layer perceptrons (MLPs). The architecture consists of several components: an input layer, a mixer layer comprising multiple MLPs, an aggregation layer, and a final output layer. The forecasted value is computed as follows:
where xt represents the input data at time step t, is the output of the i-th MLP,
denotes the concatenation of the outputs from all MLPs, W is a learnable weight matrix, and
is the activation function.
Original contributions compared to existing methods
Most existing graph-based anomaly detection models in spatiotemporal domains primarily focus on identifying anomalous nodes or regions using raw features (e.g., prices or volumes) aggregated through graph structures. While effective for detection, these models do not explicitly incorporate temporal entropy dynamics or use anomalies to enhance downstream forecasting. In contrast, our proposed framework introduces several key innovations:
- Entropy-driven representation: Instead of relying only on raw time-series values, EGCN computes temporal entropy using Kernel Density Estimation (KDE) and Jensen–Shannon Divergence (JSD). This entropy-based formulation provides a distributional view of variability in prices and transaction volumes, making anomalies more robustly detectable than with simple statistical or embedding methods.
- Integration of geospatial and temporal similarity: EGCN explicitly constructs graph edges using a combination of Haversine distance (spatial proximity) and temporal correlation (similarity of entropy dynamics). This dual-edge design ensures that both location-based and temporal behavioral relationships are preserved in the graph, improving anomaly detection compared to baselines that consider only one dimension.
- Anomaly-aware clustering and forecasting: Whereas baseline GNN-based approaches terminate at anomaly detection, EGCN extends further by clustering anomalous and normal regions separately and applying cluster-specific forecasting models. This enables forecasting models to specialize for distinct market behaviors, reducing error across short, mid, and long-term horizons.
- Cross-country validation: Unlike prior GNN-based methods, which are often evaluated within a single market, EGCN is validated on datasets from three countries (Australia, UK, and USA). Additional cross-country experiments demonstrate its ability to generalize cluster-specific forecasting across distinct housing systems.
Together, these contributions establish EGCN as the first framework to jointly integrate entropy-based temporal variability, geospatial proximity, and cluster-specific forecasting within a unified graph convolutional architecture. This positions EGCN as a significant advancement beyond existing methods that focus solely on anomaly detection.
Experimental setup
The experiments are designed to assess the anomaly detection, clustering and forecasting performance of the EGCN. Comparative analyses are performed against state-of-the-art graph-based methods, demonstrating the superiority of our approach in capturing complex spatio-temporal dependencies and delivering actionable insights for urban analytics.
Data preparation
We utilize three datasets that include time-series real estate data, namely price and volume, for various regions from Australia, the United States of America, and the United Kingdom from 01/2003 to 06/2024, along with their geospatial coordinates. It is important to note that anomaly detection is conducted as an unsupervised diagnostic analysis on the entire dataset from 2003 to 2024 to identify irregular spatiotemporal patterns. This step does not involve prediction and therefore can leverage the full observation window. By contrast, forecasting strictly follows a temporal train/test split: models are trained on the pre-COVID period from 2003 to 2019 and evaluated on the post-COVID period from 2020 to 2024, widely regarded as one of the most unstable periods in the past 30 years [62]. This separation ensures that predictive performance is assessed without any exposure to future data. This temporal division allows us to evaluate the robustness and adaptability of the proposed method in forecasting real estate dynamics under highly volatile and unpredictable conditions. All code and data used in this study are openly available at the following DOI link: 10.6084/m9.figshare.29931260.
- Australia: Historical real estate data covering 1,695 suburbs were obtained from Australian Property Monitors via the AURIN portal (data.aurin.org.au).
- United States: Real estate data from 301 major cities were sourced from Zillow (zillow.com), the nation’s largest online real estate platform. The dataset and maps focus on the contiguous 48 states; Alaska and Hawaii were excluded due to limited Zillow coverage during 2003–2024 and for visualization clarity. Zillow’s geographic coverage was more limited before 2007, introducing sparsity in some localities. To ensure data quality, suburbs with persistent missing values were excluded (removals less than 5% of the total dataset), and all series were winsorized at the 1st and 99th percentiles to mitigate the influence of extreme early observations. These steps ensured consistent training quality across the 2003–2019 sample period.
- United Kingdom: Property transaction data were obtained from HM Land Registry (https://www.gov.uk/government/organisations/land-registry), covering 1053 suburbs in England, Wales, and Northern Ireland. Scotland was excluded because comparable suburb-level transaction data were not consistently available across the study period, reflecting its distinct legal and housing data systems.
Baseline comparisons
To evaluate the effectiveness of our proposed approach, we compare its performance against several well-established methods for detecting anomalies in real estate data. Each method is based on Graph Neural Networks (GNNs) and is designed to identify regions where property prices or transaction volumes exhibit unusual trends. Below, we describe the baseline methods used in this study.
- Graph Autoencoder (GAE): The GAE [63] is an unsupervised learning model that identifies patterns in real estate data by compressing and reconstructing information about different regions. If a region deviates significantly from historical trends, the model struggles to reconstruct its features accurately, leading to a high reconstruction error. This makes GAE useful for detecting suburbs or cities where property market trends behave abnormally.
- Graph Attention Network (GAT): GAT [64,65] improves anomaly detection by learning the relative importance of neighboring regions. Unlike conventional methods, which treat all regional connections equally, GAT assigns different importance weights to areas based on their influence on each other. This is particularly useful in real estate markets, where some locations have stronger economic connections than others.
- Graph Sample and Aggregate (GraphSAGE): GraphSAGE [66] is designed for large datasets, sampling, and aggregating information from neighboring regions. This enables the model to detect anomalies in real estate markets efficiently, especially when a small number of suburbs experience sudden price shifts that may indicate economic disruptions or emerging trends.
- Graph Convolutional Network (GCN): GCN [67] learns property price and transaction patterns by aggregating data from connected regions. It helps identify suburbs or cities that do not follow expected market trends. This method is particularly useful for detecting spatial anomalies, such as a neighborhood experiencing rapid property price increases while surrounding areas remain stable.
- Graph Temporal Attention (GTA): GTA [68] extends traditional methods by incorporating time-series data. It reshapes real estate data into sequences and applies attention mechanisms to understand how past trends influence present behavior. By analyzing patterns over time, GTA can detect anomalies that emerge gradually, making it valuable for predicting long-term market sustainability.
- Multi-Temporal Graph Neural Network (MTGNN): MTGNN [69] combines both spatial and temporal information to analyze real estate trends. It models how different regions are connected while also tracking how property market patterns evolve. This enables the detection of anomalies such as a city experiencing rapid price fluctuations while nearby suburbs remain unchanged.
Evaluation and performance metrics
Each baseline method is trained using historical real estate data, and anomalies are identified based on reconstruction errors. To evaluate the effectiveness of our approach, we assess the models using the following metrics:
- Number of detected anomalies: The total count of regions (suburbs, cities, or other areas) identified as exhibiting unusual property market behavior.
- Impact on forecasting accuracy: After detecting anomalies, we divide regions into two groups: normal and anomalous regions. We then apply forecasting methods to each group separately, demonstrating that accounting for anomalies improves forecasting accuracy. This validates the effectiveness of our anomaly detection in enhancing real estate market predictions. Evaluation metrics used in this study are mean absolute scaled error (MASE) [70], mean squared error (MSE) [71], and mean absolute error (MAE) [71].
- Validation using domain knowledge: To ensure the reliability of detected anomalies, we cross-check the results with real-world real estate trends, historical market events, and expert insights. This additional validation step helps confirm whether the identified anomalies align with known economic shifts, policy changes, or urban development patterns.
By benchmarking against these models, we ensure a comprehensive evaluation of our approach in identifying unusual patterns in property prices and transaction volumes. This comparison helps policymakers, urban planners, and real estate professionals make data-driven decisions based on detected market anomalies.
Results
This section presents the findings of our study, evaluating the effectiveness of our anomaly detection method and its impact on forecasting accuracy. We compare our results with baseline models and validate them using real estate market trends and domain knowledge.
Real estate anomaly detection
In Fig 2, the anomaly detection results reveal significant differences across models in identifying unusual real estate market behavior in Australia. The Graph Autoencoder (GAE) and GCN detect a similar number of anomalies (107–108), while GraphSAGE and GTA show slightly higher counts (119-129), suggesting that spatial and temporal information enhances detection. The Multi-Temporal GNN (MTGNN) further improves results by capturing time-dependent trends, detecting 113 anomalies. The proposed model outperforms all others with 182 detected anomalies, indicating superior sensitivity to market fluctuations. Spatially, anomalies are concentrated in major metropolitan areas such as Sydney, Melbourne, and Brisbane, while regional hubs like Geelong and the Gold Coast also exhibit irregular trends, likely due to market growth or instability. Some detected anomalies in rural areas may indicate unexpected investment shifts. The results derived from the Australian dataset closely reflect the recommendations presented in the State of the Housing System 2024 Report [72], highlighting their practical relevance. Validation against real-world trends confirms that the proposed model aligns with known economic changes and housing market patterns. These findings suggest that anomaly detection significantly enhances forecasting accuracy, enabling policymakers, urban planners, and investors to make informed decisions regarding real estate market dynamics.
Reprinted from gadm.org under a CC BY license, with permission from Global Administrative Areas, original copyright 2018–2022.
In Fig 3, the anomaly detection results for the UK real estate market reveal distinct variations across models, with GAE, GCN, and MTGNN detecting the lowest number of anomalies (46-47), suggesting limited sensitivity to subtle market deviations. GraphSAGE and GTA identify slightly more anomalies (52-55), indicating that neighborhood aggregation and temporal awareness improve detection. The proposed model significantly outperforms all others, detecting 117 anomalies, highlighting its superior ability to capture market fluctuations. Spatially, anomalies are concentrated in London and Southeast England, where property prices are highly volatile, while emerging anomalies in the North and Midlands suggest shifting market trends. The findings align with economic shifts, Brexit-related housing uncertainties, and post-pandemic real estate dynamics, as shown in UK House Price Index summary: December 2024 [73], confirming the model’s validity. By distinguishing anomalous vs. normal regions, the results enhance forecasting accuracy, supporting policymakers in identifying housing risks, investors in making data-driven decisions, and urban planners in anticipating future infrastructure needs. The proposed model’s increased sensitivity makes it a valuable tool for real estate market analysis and decision-making.
Reprinted from gadm.org under a CC BY license, with permission from Global Administrative Areas, original copyright 2018–2022.
In Fig 4, the anomaly detection results for the US real estate market highlight significant variations across different graph-based models. GAE, GCN, and GAT detect the lowest number of anomalies (12-15), suggesting limited sensitivity to market fluctuations. GraphSAGE and GTA perform slightly better, detecting 16 anomalies, while MTGNN captures 21 anomalies, benefiting from its ability to integrate spatial and temporal dependencies. The proposed model significantly outperforms all others, detecting 34 anomalies, demonstrating a stronger ability to identify unusual market trends. Spatially, anomalies are concentrated in California, Washington, Texas, and the East Coast, which are known for volatile real estate conditions due to high demand, economic shifts, and migration patterns. The detection of anomalies in midwestern and southern states, typically considered stable markets, indicates potential emerging trends or market corrections. These results align with post-pandemic housing shifts and inflation-driven property value fluctuations, reinforcing the effectiveness of anomaly detection in improving real estate forecasting. The findings from the U.S. dataset align closely with the patterns and guidance presented in the America’s Rental Housing 2024 Report [74], emphasizing their practical importance and applicability to real-world real estate market trends. By distinguishing anomalous from normal regions, these insights can guide investors, policymakers, and urban planners in making data-driven decisions for risk assessment and market intervention.
Reprinted from gadm.org under a CC BY license, with permission from Global Administrative Areas, original copyright 2018–2022.
The anomalies detected by EGCN across Australia, the United Kingdom, and the United States correspond closely with major, well-documented disruptions in real estate markets, supporting the external validity of our approach. In Australia, dense anomaly clusters around Sydney and Melbourne during 2020–2021 align with the COVID-19 housing surge, which was fueled by record-low interest rates (0.10% cash rate), mortgage repayment holidays, and government incentives such as the HomeBuilder grant, all of which triggered sharp increases in dwelling prices and transaction volumes [75,76]. In the United Kingdom, anomaly signals are concentrated in London and the South East during 2008–2009, consistent with the global financial crisis, when house prices fell by more than 15% in a single year due to the collapse of Northern Rock and widespread credit tightening [77]. In the United States, anomalies detected in California, Nevada, Arizona, and Florida closely match the epicenters of the 2007–2009 subprime mortgage crisis, where excessive leverage, speculative building, and foreclosure waves produced localized collapses in housing markets [78,79]. Additionally, a second set of anomalies appears in 2020 around major metropolitan areas such as New York, San Francisco, and Seattle, which aligns with COVID-induced migration patterns (urban-to-suburban shifts) and supply-side constraints documented by Zillow and the U.S. Census Bureau [80,81]. The fact that these detected anomalies coincide with distinct macroeconomic shocks across different geographies and time periods suggests that EGCN is not simply flagging statistical outliers but is instead capturing genuine, structurally meaningful disruptions in housing markets.
To ensure that the anomalies identified by EGCN are not artifacts of reconstruction error sensitivity, we conducted placebo tests in which node labels were permuted and temporal sequences shuffled. For each dataset, the procedure was repeated over 50 independent randomizations, and the average number of detected anomalies was recorded. EGCN was then retrained and evaluated on these randomized datasets. Table 1 reports the mean anomaly counts from placebo runs compared with the anomalies detected in the original structured datasets.
In all three markets, anomaly counts under placebo randomization were substantially lower than those obtained from the original data. The effect was most pronounced in Australia, where randomization virtually eliminated anomalies (mean of 1 compared to 182). For the UK and USA, placebo anomaly counts were less than half of those observed in the structured data (49 vs. 117 and 15 vs. 34, respectively).
To further quantify these contrasts, we compared reconstruction error distributions between original and placebo runs using Welch’s t-test. In every case, differences were statistically significant (p < 0.00001). These results demonstrate that EGCN anomalies are not artifacts of reconstruction error sensitivity, but instead arise from meaningful spatiotemporal dependencies in the housing markets.
Real estate forecasting
The following tables present the forecasting performance of different models (N-HiTS, TSMixer, and Transformers) across short-term (12 months), mid-term (24 months), and long-term (48 months) horizons for the UK, US, and Australian real estate markets. Each table compares the impact of different anomaly detection methods, where anomalies are clustered separately from normal regions before forecasting. The evaluation metrics include MASE, MSE, and MAE. The best-performing results are presented in bold and underlined, while the second-best results are shown in bold only. These results demonstrate that EGCN achieves the lowest forecasting errors, proving its superiority in real estate market predictions.
In Table 2, the results for USA dataset show that EGCN consistently outperforms all other models, including Anomaly-free and alternative anomaly detection approaches (GAE, GAT, GCN, GTA, GraphSAGE, and MTGNN) across all forecasting horizons (12, 24, and 48 months). In short-term forecasting, EGCN significantly enhances predictive accuracy, reducing MASE from 3.0 to 2.5 (N-HiTS) and from 14.0 to 11.9 (Transformers), proving its ability to refine input data by filtering out anomalous regions, leading to more stable and reliable predictions. In mid-term forecasting, EGCN maintains superior performance, achieving the lowest MASE (3.9) in N-HiTS, marking an improvement over the baseline (Anomaly-free) and further refining forecasting accuracy beyond even the best-performing alternative (GAE). The impact is even more pronounced in long-term forecasting, where errors typically accumulate over time; however, EGCN remains the most stable model, reducing MASE from 6.8 (Anomaly-free) to 5.9 in N-HiTS, demonstrating its robustness in preserving forecasting reliability over extended periods. Even for Transformers, which struggle with long-term predictions, EGCN reduces MASE from 15.0 to 14.8, proving its adaptability across different model architectures.
In Table 3, the forecasting results for the UK dataset highlight EGCN’s superiority over all other models, including Anomaly-free (baseline) and alternative anomaly detection methods (GAE, GAT, GCN, GTA, GraphSAGE, and MTGNN) across short-term (12 months), mid-term (24 months), and long-term (48 months) forecasting horizons. In short-term forecasting, EGCN achieves the lowest MASE, reducing errors from 1.7 to 1.6 (N-HiTS), 2.7 to 2.4 (TSMixer), and 2.9 to 2.4 (Transformers), demonstrating its effectiveness in refining input data and improving prediction stability. Moving to mid-term forecasting, EGCN further minimizes errors, outperforming all competing models by reducing MASE from 2.0 to 1.7 (N-HiTS) and from 3.2 to 2.7 (TSMixer), marking a significant performance gain compared to Anomaly-free. In long-term forecasting, where prediction errors typically increase, EGCN continues to maintain superior accuracy, reducing MASE from 2.0 to 1.7 (N-HiTS) and from 3.3 to 2.7 (Transformers), ensuring more reliable long-term market predictions.
In Table 4 for Australia dataset, the forecasting results reveal that EGCN delivers the most accurate predictions compared to Anomaly-free (baseline) and alternative anomaly detection methods (GAE, GAT, GCN, GTA, GraphSAGE, and MTGNN) across all forecasting periods (12, 24, and 48 months). In short-term forecasting (12 months), EGCN reduces errors significantly, achieving a MASE of 6.0 in N-HiTS (compared to 6.7 for Anomaly-free), 21.7 in TSMixer (vs. 22.8), and 6.3 in Transformers (vs. 9.1), demonstrating its effectiveness in filtering out anomalies for improved short-term predictions. Mid-term forecasting (24 months) further highlights EGCN’s advantage, where it achieves the lowest MASE of 9.1 in N-HiTS and 26.0 in TSMixer, marking a notable improvement over the baseline (Anomaly-free: 9.6 and 27.0, respectively). Even in long-term forecasting (48 months), where errors tend to compound over time, EGCN remains the most stable, reducing MASE from 18.5 to 18.4 (N-HiTS), from 37.7 to 37.3 (TSMixer), and from 23.4 to 20.5 (Transformers), outperforming all other anomaly-aware approaches. These results confirm that EGCN enhances forecasting accuracy across different time horizons, demonstrating its robustness in mitigating the disruptive effects of anomalies and ensuring more stable real estate market predictions. The substantial improvements across all forecasting methods and horizons position EGCN as a superior cluster-specific forecasting tool, providing valuable insights for policymakers, investors, and analysts in long-term market planning.
The forecasting results across the UK, US, and Australian datasets show that EGCN demonstrates superior robustness Anomaly-free (baseline) and alternative anomaly detection methods (GAE, GAT, GCN, GTA, GraphSAGE, and MTGNN) across all time horizons (12, 24, and 48 months). Unlike traditional methods, EGCN clusters anomalies separately from normal regions and forecasts each group independently, allowing models to capture distinct market behaviors more effectively. This approach leads to significant error reductions across all datasets, particularly in Transformers, where forecasting traditionally struggles with anomalies. In short-term forecasting, EGCN achieves the lowest MASE, MSE, and MAE, improving accuracy by refining anomaly-informed predictions. In mid-term and long-term forecasting, it remains the most stable model, reducing error accumulation and outperforming all other anomaly-aware approaches. These findings confirm that clustering anomalies before forecasting significantly enhances predictive accuracy, making EGCN the most effective cluster-specific forecasting model for investors, policymakers, and analysts seeking reliable real estate market insights.
Study limitations
Despite its strong performance, EGCN has several limitations. First, the additional steps of anomaly detection, clustering, and separate forecasting increase computational complexity, making the framework more resource-intensive than conventional forecasting models. Second, housing markets evolve dynamically, yet our approach currently relies on static anomaly clustering, which may require adaptive mechanisms to remain effective under changing conditions. Third, the reported results emphasize point estimates of error metrics (MSE, MAE, MASE) for comparability across models and countries; while hypothesis tests (e.g., paired t-tests or Wilcoxon signed-rank tests) and confidence intervals would provide stronger evidence of statistical significance, these were not included due to computational demands. Finally, our evaluation focuses on the COVID-19 period as the principal stress test, though the 2007–2009 global financial crisis was also a major systemic shock. We prioritized COVID-19 to reflect contemporary structural disruptions, but acknowledge that rolling or expanding-window validation across multiple regimes would provide a broader assessment of robustness, which we identify as an important direction for future work.
Conclusion
This research introduces EGCN, a novel entropy-based graph convolutional network for anomaly detection and cluster-specific forecasting in real estate markets. By separating anomalous regions from normal ones, EGCN enables forecasting models to treat them independently, yielding more accurate and stable predictions. Evaluations across the U.K., U.S., and Australian housing markets show that EGCN achieves consistently lower MASE, MSE, and MAE than both anomaly-free baselines and alternative anomaly detection methods (GAE, GAT, GCN, GTA, GraphSAGE, MTGNN), across short-, mid-, and long-term forecasting horizons. These improvements confirm that anomaly-aware analysis provides deeper insights into speculative growth, downturns, and market shifts, supporting policymakers, investors, and urban planners in monitoring risks and planning effectively.
For evaluation, we employed a pre–post COVID-19 split to highlight performance under a major structural disruption. While temporally blocked k-fold validation would provide additional insights into model robustness during more gradual market phases, this remains an avenue for future work.
Building on the current framework, several research directions emerge. First, incorporating macroeconomic indicators such as interest rates, inflation, and employment could refine anomaly classification and improve forecasting stability. Second, extending the framework with explainable AI methods, coupled with statistical significance testing, would enhance transparency and facilitate trust among decision-makers. Third, developing real-time adaptive anomaly detection mechanisms would allow the system to continuously adjust to evolving market dynamics, increasing its operational relevance in fast-changing environments. Fourth, exploring transfer learning or domain adaptation techniques may enable EGCN to generalize across different regions without retraining from scratch, addressing the challenge of heterogeneous housing markets. Finally, broadening EGCN’s application beyond real estate—into domains such as financial asset forecasting, supply chain risk management, and climate-related impact analysis—offers an opportunity to validate its versatility and strengthen its role as a general-purpose tool for analyzing complex, dynamic systems.
References
- 1. Li S-G, Xu X-Y, Liu Q-H, Dong Z, Dong J-C. Financial development, real estate investment and economic growth. Applied Economics. 2023;55(54):6360–77.
- 2. Garriga C, Gete P, Tsouderou A. The economic effects of real estate investors. Real Estate Economics. 2023;51(3):655–85.
- 3. Baek I, Liu J, Noh S. Real estate uncertainty and financial conditions over the business cycle. International Review of Economics & Finance. 2024;89:656–75.
- 4. Hitzig SL, Yuzwa KE, Weichel L, Cohen E, Anderson L, Athanasopoulos P, et al. Identifying priorities and developing collaborative action plans to improve accessible housing practice, policy, and research in Canada. PLoS One. 2025;20(2):e0318458. pmid:39928677
- 5. Nayar N, Price SM, Shen K. Macroeconomic uncertainty and predictability of real estate returns: the impact of asset liquidity. Journal of Real Estate Research. 2024;46(1):82–113.
- 6. Gomez-Gonzalez JE, Hirs-Garzón J, Sanin-Restrepo S, Uribe JM. Financial and macroeconomic uncertainties and real estate markets. Eastern Econ J. 2023;50(1):29–53.
- 7. Stundziene A, Pilinkiene V, Grybauskas A. Maintaining the stability of the housing market in the event of an economic shock. IJHMA. 2022;16(2):255–72.
- 8. Anastasiou D, Kapopoulos P, Zekente K-M. Sentimental shocks and house prices. J Real Estate Finan Econ. 2021;67(4):627–55.
- 9. Zhao C, Liu F. Impact of housing policies on the real estate market - Systematic literature review. Heliyon. 2023;9(10):e20704. pmid:37842595
- 10. Iqbal A, Amin R, Alsubaei FS, Alzahrani A. Anomaly detection in multivariate time series data using deep ensemble models. PLoS One. 2024;19(6):e0303890. pmid:38843255
- 11.
Aveline-Dubach N. China’s housing booms: a challenge to bubble theory. Lecture Notes in Morphogenesis. Springer; 2020. p. 183–208. https://doi.org/10.1007/978-3-030-36656-8_11
- 12. Cevik S, Naik S. Bubble detective: city-level analysis of house price cycles. International Finance. 2023;27(1):2–16.
- 13. Glindro E, Subhanij T, Zhu H, Szeto J. Determinants of House Prices in Nine Asia Pacific Economies. International Journal of Central Banking. 2011;7:163–204.
- 14. Bai W. The impact of a weaker yen against the US Dollar on Japanese Real Estate. HBEM. 2024;39:84–92.
- 15.
Mitsubishi Estate Co L. Mitsubishi Estate Co., Ltd. Integrated Report 2024 . 2024. https://www.mec.co.jp/assets/img/annual/integratedreport2024e_v.pdf
- 16.
ESR G. Australia and Mitsubishi Estate Asia Form Strategic Partnership to Accelerate Growth in Melbourne’s High-Demand South East. 2024. https://assets.ctfassets.net/nq9kuc33ycuo/4LUqmLPoyiKdmiLbKntJTf/25ec95597d62fbee1675a26febdbd0f2/ESR_Pakenham_MEA_1_August.pdf
- 17. Mitrea CA, Lee CKM, Wu Z. A comparison between neural networks and traditional forecasting methods: a case study. International Journal of Engineering Business Management. 2009;1.
- 18. Karakani HM. Supporting the measurement of sustainable development goals in Africa: geospatial sentiment data analysis. IEEE Technol Soc Mag. 2024;43(1):70–85.
- 19.
de Oliveira THM, Painho M. Open geospatial data contribution towards sentiment analysis within the human dimension of smart cities. Open Source Geospatial Science for Urban Studies: The Value of Open Geospatial Data. Springer; 2020. p. 75–95.
- 20. Truong T-N, Nguyen CT, Zanibbi R, Mouchère H, Nakagawa M. A survey on handwritten mathematical expression recognition: The rise of encoder-decoder and GNN models. Pattern Recognition. 2024;153:110531.
- 21. Chen Y-C. A tutorial on kernel density estimation and recent advances. Biostatistics & Epidemiology. 2017;1(1):161–87.
- 22. Jin Y, Wakayama T, Jiang R, Sugasawa S. Clustered Factor Analysis for Multivariate Spatial Data. arXiv preprint 2024.
- 23. Prasetya DA, Nguyen PT, Faizullin R, Iswanto I, Armay EF. Resolving the shortest path problem using the haversine algorithm. Journal of Critical Reviews. 2020;7(1):62–4.
- 24.
Liu FT, Ting KM, Zhou ZH. Isolation forest. In: 2008 eighth IEEE International Conference on Data Mining. 2008. p. 413–22.
- 25. Zamanzadeh Darban Z, Webb GI, Pan S, Aggarwal C, Salehi M. Deep learning for time series anomaly detection: a survey. ACM Comput Surv. 2024;57(1):1–42.
- 26.
Zhou C, Paffenroth RC. Anomaly detection with robust deep autoencoders. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017. p. 665–74. https://doi.org/10.1145/3097983.3098052
- 27.
Pol AA, Berger V, Germain C, Cerminara G, Pierini M. Anomaly detection with conditional variational autoencoders. In: 2019 18th IEEE International Conference on Machine Learning And Applications (ICMLA). 2019. p. 1651–7. https://doi.org/10.1109/icmla.2019.00270
- 28. He J, Xu Q, Jiang Y, Wang Z, Huang Q. ADA-GAD: anomaly-denoised autoencoders for graph anomaly detection. AAAI. 2024;38(8):8481–9.
- 29. Singh R, Srivastava N, Kumar A. Network anomaly detection using autoencoder on various datasets: a comprehensive review. ENG. 2024;18(9).
- 30. Corso G, Stark H, Jegelka S, Jaakkola T, Barzilay R. Graph neural networks. Nat Rev Methods Primers. 2024;4(1).
- 31. Zhang S, Tong H, Xu J, Maciejewski R. Graph convolutional networks: a comprehensive review. Comput Soc Netw. 2019;6(1):11. pmid:37915858
- 32. Khemani B, Patil S, Kotecha K, Tanwar S. A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions. J Big Data. 2024;11(1).
- 33. Kodinariya TM, Makwana PR. Review on determining number of cluster in K-means clustering. International Journal. 2013;1(6):90–5.
- 34. Giorgino T. Computing and visualizing dynamic time warping alignments in R: ThedtwPackage. J Stat Soft. 2009;31(7):1–24.
- 35.
Cuturi M, Blondel M. Soft-dtw: a differentiable loss function for time-series. In: Proceedings of ICML. PMLR; 2017. p. 894–903.
- 36. Goetzmann WN, Wachter SM. Clustering methods for real estate portfolios. Real Estate Economics. 1995;23(3):271–310.
- 37. Yucebas SC, Yalpir S, Genc L, Dogan M. Price prediction and determination of the affecting variables of the real estate by using X-means clustering and CART decision trees. JUCS. 2024;30(4):531–60.
- 38. Maggon M. A bibliometric analysis of the first 20 years of the Journal of Corporate Real Estate. JCRE. 2022;25(1):7–28.
- 39. Mienye ID, Swart TG, Obaido G. Recurrent neural networks: a comprehensive review of architectures, variants, and applications. Information. 2024;15(9):517.
- 40. Islam S, Elmekki H, Elsebai A, Bentahar J, Drawel N, Rjoub G, et al. A comprehensive survey on applications of transformers for deep learning tasks. Expert Systems with Applications. 2024;241:122666.
- 41. Yan L, Jia L, Lu S, Peng L, He Y. LSTM-based deep learning framework for adaptive identifying eco-driving on intelligent vehicle multivariate time-series data. IET Intelligent Trans Sys. 2023;18(1):186–202.
- 42. Salim M, Djunaidy A. Development of a CNN-LSTM approach with images as time-series data representation for predicting gold prices. Procedia Computer Science. 2024;234:333–40.
- 43. Challu C, Olivares KG, Oreshkin BN, Garza Ramirez F, Mergenthaler Canseco M, Dubrawski A. NHITS: Neural Hierarchical Interpolation for Time Series Forecasting. AAAI. 2023;37(6):6989–97.
- 44. Li Y, Du Q. Oil price volatility and gold prices volatility asymmetric links with natural resources via financial market fluctuations: Implications for green recovery. Resources Policy. 2024;88:104279.
- 45. Kangalli Uyar SG, Uyar U, Balkan E. Fundamental predictors of price bubbles in precious metals: a machine learning analysis. Miner Econ. 2023;37(1):65–87.
- 46. Yadav H, Thakkar A. NOA-LSTM: An efficient LSTM cell architecture for time series forecasting. Expert Systems with Applications. 2024;238:122333.
- 47.
Gobato Souto H, Heuvel SK. TSMixer and Realized Volatility Prediction. Available at SSRN. 2024.
- 48. Iqbal J, Ahmed A, Ramzan M. Forecasting the nexus and impact of news sentiment on NYSE, gold prices, and WTI oil using the neural network approach. Bahria University Journal of Management & Technology. 2024;7(1).
- 49. Wen Q, Zhou T, Zhang C, Chen W, Ma Z, Yan J. Transformers in time series: a survey. arXiv preprint 2022. https://arxiv.org/abs/2202.07125
- 50.
Kullback S. Information theory and statistics. Dover Books on Mathematics: Dover Publications; 2012.
- 51. Wang C. Statistical method for clustering high-dimensional data based on fuzzy mathematical modeling. Applied Mathematics and Nonlinear Sciences. 2023;9(1).
- 52. Huang L, Zhou X, Shi L, Gong L. Time series feature selection method based on mutual information. Applied Sciences. 2024;14(5):1960.
- 53. Nielsen F. On the Jensen-Shannon symmetrization of distances relying on abstract means. Entropy (Basel). 2019;21(5):485. pmid:33267199
- 54. Jabeur SB, Mefteh-Wali S, Viviani J-L. Forecasting gold price with the XGBoost algorithm and SHAP interaction values. Ann Oper Res. 2021;334(1–3):679–99.
- 55. Bein B. Entropy. Best Practice & Research Clinical Anaesthesiology. 2006;20(1):101–9.
- 56.
Barnett V, Lewis T. Outliers in statistical data. New York: Wiley; 1994.
- 57. Chandola V, Banerjee A, Kumar V. Anomaly detection. ACM Comput Surv. 2009;41(3):1–58.
- 58.
Iglewicz B, Hoaglin DC. Volume 16: how to detect and handle outliers. Quality Press; 1993.
- 59. Oreshkin BN, Carpov D, Chapados N, Bengio Y. N-BEATS: neural basis expansion analysis for interpretable time series forecasting. arXiv preprint 2019. https://arxiv.org/abs/1905.10437
- 60.
Zhu T, Chen T, Kuang L, Zeng J, Li K, Georgiou P. Edge-based temporal fusion transformer for multi-horizon blood glucose prediction. In: IEEE International Symposium on Circuits and Systems. 2023. p. 1–5.
- 61. Zhou H, Arik SÖ, Wang J. Business metric-aware forecasting for inventory management. arXiv preprint 2023. https://arxiv.org/abs/2308.13118
- 62. Ansenberg U, Avni N, Rosen G. Exploring real estate valuation practices in an informal market. Habitat International. 2024;147:103069.
- 63. Du X, Yu J, Chu Z, Jin L, Chen J. Graph autoencoder-based unsupervised outlier detection. Information Sciences. 2022;608:532–50.
- 64.
Zhao H, Wang Y, Duan J, Huang C, Cao D, Tong Y. Multivariate time-series anomaly detection via graph attention network. In: 2020 IEEE International Conference on Data Mining (ICDM). 2020. p. 841–50.
- 65. Zhang Z, Chen Y, Wang H, Fu Q, Chen J, Lu Y. Anomaly detection method for building energy consumption in multivariate time series based on graph attention mechanism. PLoS One. 2023;18(6):e0286770. pmid:37289704
- 66. Chen C, Li Q, Chen L, Liang Y, Huang H. An improved GraphSAGE to detect power system anomaly based on time-neighbor feature. Energy Reports. 2023;9:930–7.
- 67.
Hu Z, Wu T, Zhang Y, Li J, Jiang L. Time series anomaly detection based on graph convolutional networks. In: 2020 2nd International Conference on Applied Machine Learning (ICAML). 2020. p. 138–45. https://doi.org/10.1109/icaml51583.2020.00036
- 68. Zhang S, Guo Y, Zhao P, Zheng C, Chen X. A graph-based temporal attention framework for multi-sensor traffic flow forecasting. IEEE Trans Intell Transport Syst. 2022;23(7):7743–58.
- 69. Gao J, Zhang X, Tian L, Liu Y, Wang J, Li Z, et al. MTGNN: multi-task graph neural network based few-shot learning for disease similarity measurement. Methods. 2022;198:88–95.
- 70. Gneiting T, Raftery A. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association. 2007;102:359–78.
- 71. Willmott C, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res. 2005;30:79–82.
- 72.
National Housing Supply and Affordability Council. State of the Housing System 2024 . 2024. https://nhsac.gov.au/sites/nhsac.gov.au/files/2024-05/state-of-the-housing-system-2024.pdf
- 73.
UK Government. UK House Price Index for December 2024 . 2024. https://www.gov.uk/government/statistics/uk-house-price-index-for-december-2024/uk-house-price-index-summary-december-2024
- 74.
Joint Center for Housing Studies of Harvard University. America’s Rental Housing 2024 . 2024. https://www.jchs.harvard.edu/americas-rental-housing-2024
- 75.
Reserve Bank of Australia. Statement on Monetary Policy – February 2022 . Reserve Bank of Australia; 2022. https://www.rba.gov.au/publications/smp/2022/feb/
- 76.
CoreLogic R. Housing Market and Economic Update – December 2021 . 2021. https://www.corelogic.com.au/reports
- 77.
Responsibility B. Economic and fiscal outlook December 2012 . The Stationery Office; 2012.
- 78.
Gorton GB. Slapped by the invisible hand: The panic of 2007. Oxford University Press; 2010.
- 79.
Mian A, Sufi A. House of debt: how they (and you) caused the Great Recession, and how we can prevent it from happening again. University of Chicago Press; 2015.
- 80.
Zillow Research. Zillow Consumer Housing Trends Report 2021 . Zillow. 2021. https://www.zillow.com/research/homeowners-consumer-housing-trends-report-2021-29736/
- 81.
U S Census Bureau. Housing Vacancies and Homeownership (CPS/HVS) – Third Quarter 2021; 2021. https://www.census.gov/housing/hvs/data/q321ind.html