RETRACTED: Optimization of house price evaluation model based on multi-source geographic big data and deep neural network

Xuan Wang; Xuan Li; Haiyan Li

doi:10.1371/journal.pone.0335722

Abstract

The real estate market requires effective and precise house price prediction, as conventional models often face difficulties in generalization, computational efficiency, and interpretability. The research problem is addressed by introducing the House Price Evaluation Model (HPEM), which utilizes a hybrid deep learning network for analyzing multi-source geographic data. The network integrates the attention mechanism with spatial feature extraction, and a bat optimization algorithm is used to improve explainability and accuracy. The gathered properties are processed using normalized techniques to convert unstructured data into structured data, which directly improves the overall prediction accuracy. The bat-optimized attention mechanism with spatial networks dynamically arranges high-impact features to effectively address unstable feature importances, computation inefficiency, and poor generalization issues. In addition, the echolocation-inspired approach explores optimal solutions by balancing exploration and exploitation, thereby minimizing the deviation between the outputs and reducing training time by 30% compared to existing methods. The efficiency of the system is then evaluated using the Housing Price Dataset information, where HPEM achieves 98.5% feature stability, 1.2 hours of human-in-loop updates, and a 4.2% mean absolute error (MAE) under distribution shifts. The effective exploration of dynamic features through bat optimization integration yields 15% closer convergences, enhancing regulatory compliance and accuracy. Therefore, the developed model is effectively utilized in real estate valuation schemes.

Citation: Wang X, Li X, Li H (2025) RETRACTED: Optimization of house price evaluation model based on multi-source geographic big data and deep neural network. PLoS One 20(11): e0335722. https://doi.org/10.1371/journal.pone.0335722

Editor: Aykut Fatih Güven, Yalova University, TÜRKIYE

Received: June 20, 2025; Accepted: October 15, 2025; Published: November 5, 2025

Copyright: © 2025 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data generated or analysed during this study are included in this article.

Funding: This work was supported by Educational Science Planning Project of Guangdong Province “Research on the internal logic and collaborative model of the integrated development of educational science and technology talents: Historical Mirrors and Guangdong Practice (2024GXJK021)” The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

The housing market [1] typically refers to single-family homes for sale in a given area. Whether regional or local socio-economic factors are at play, environmental issues are in the open, or individual property characteristics are considered, several factors must be taken into account. With this said, a house price evaluation model, defined as a statistical method to measure a house’s worth, can also be taken into context. Models such as these consider both qualitative and quantitative values, including school availability, neighborhood safety, local economy, distance from public transportation, community size, and the age and modernity of real estate [2,3], to generate an approximate price. Stakeholders, including homebuyers, city agencies, business investors, and brokers, invest in house price models to estimate the given market value and settle agreements on property pricing alongside mortgage approval. Many homeowners and potential buyers will face market distortion and social inequity if efficient decision-making tools are not in place. Everything stems from inaccurate estimations [4]. An estimation that faces the risk of less than financial distortions, core distortion around property value mismatches, and even significant market loss. With that in mind, maintaining a balanced and efficient real estate ecosystem requires a house price evaluation model that is accurate, transparent, flexible, and trustworthy [5,6].

In the past, predicting house prices in the market was done using traditional statistical approaches [7], such as linear regression, hedonic pricing methods, and even autoregressive models [8,9]. The model primarily focuses on pre-existing structured tabular information, and almost always, there is a linear relationship among the parameters, which yields oversimplified results. These methods are particularly effective for small datasets that contain a limited number of variables. Still, the model fails to account for the complexity and multi-faceted nature of actual housing data, particularly in cities where everything is intertwined and non-linear [10]. Nonetheless, making the most of this diverse and multidimensional data poses some difficulties. Geographic [11] data is frequently available in various formats, including structured, semi-structured, and unstructured, and is classified by spatial and temporal resolution. Moreover, incorporating these aspects into a cohesive predictive model requires complex data integration approaches. These intelligent systems can handle spatial relationships, multi-level interplay among attributes, and data-driven non-linearity.

This study develops an enhanced model for evaluating house prices using deep neural networks (DNNs) [12] alongside multi-source geographic big data to solve these challenges. Unlike previous models, which consider only a property’s internal attributes, this model incorporates an array of external spatial, socio-economic, environmental, and infrastructural elements while employing modern optimization techniques such as attention and regularization that improve prediction accuracy and model comprehension. Then, the introduced methods offer several key points, such as the recommended multi-modal data fusion framework, which is successfully fused with both unstructured and structured data, helping to optimize the evaluation of the house price model. The integrated approaches are processed with the help of a deep neural network that effectively observes the intricate and non-linear relationship between the determinates of house prices. During the analysis, the optimization technique is combined with batch normalization, dropout, and an attention layer to maximize network performance interpretability and minimize overfitting issues effectively. Finally, the model’s scalability and generalizability are ensured for managing the application between real estate markets and urban regions. The main objective of this work is listed below.

To analyze the multi-source geographic big data information using the deep learning networks for creating the house price prediction model.
To design the optimization model by applying the regularization and attention mechanism to enhance the price model prediction.
To evaluate the price prediction model with traditional statistical techniques to explore the contextual affecting and geographic influential factors that can improve the house price prediction model.

Then, the rest of the paper is organized as follows: Section 2 explains the various researcher’s opinions regarding the house price prediction model. Section 3 examines the working process of the optimized house price prediction system, and its efficiency is evaluated in Section 4. The conclusion is discussed in Section 5.

2. Related works

Completing a literature survey enables one to pinpoint significant milestones in the evolution of house price prediction models and appreciate how emerging models, especially those utilizing deep learning techniques and multi-source data integration, have sought to improve contemporary models. This paper analyzes recent studies that have used diverse machine learning and data fusion approaches to enhance housing price estimation and that, in one way or another, expand this particular research field. Each study offers a different perspective concerning implementing the models, the data, and the predictions. Wang et al. (2021) [13] designed a deep learning model that captures heterogeneous data, including spatial, textual, and image features, using a joint self-attention mechanism (JSA). The author introduced a model improves a network’s representational capability by focusing on multi-modal data with distinctive attention mechanisms, thus capturing more intricate feature interactions. Compared to classical neural networks, the results demonstrated better prediction accuracy. Nonetheless, the model’s weaknesses are its high computational costs and its dependence on high-quality labeled data from multiple heterogeneous domains.

Adetunji et al. (2022) [1] assessed a Random Forest (RF) machine-learning technique applied to structured tabular data for pricing houses. This work aimed to develop a more interpretable and simpler model than black-box neural networks, particularly for low-data infrastructure regions. The RF model provided fair accuracy and ease of interpretation regarding the importance of featured data. However, the predictive power of this approach in sophisticated urban settings was limited due to weaknesses in incorporating rigid, unstructured, or spatial data. Kang et al. (2021) [14] examined trends in house price appreciation using multi-source big geo-data and machine learning techniques. To understand the dynamics of a city and its various spatial components, the model analyzed real estate transaction data alongside point-of-interest (POI) data, road networks, and social media check-ins. The introduced approach demonstrated the utility of various data sources in developing more contextually relevant models. Out of the many provided solutions with robust results, some challenges remained regarding data inter-source synchronization and the general applicability of the solution to smaller, less developed markets.

Peng et al. (2021) [15] proposed a longitudinal learning framework (LLF) for a model of property price prediction specific to the Toronto real estate market, emphasizing model evolution over time. The work employed incremental learning strategies and memory-based techniques to enhance predictions using new market data continually. The research is adapted to rapidly changing environments where real estate market shifts occur frequently. Still, it relies on uninterrupted, high-quality data flow and sophisticated data fusion mechanisms, which constrains scalability to areas with low reporting data consistency. Kalinga developed a Recurrent Neural Network (RNN) [16] model to forecast real estate prices in Dar es Salaam City, as reported in 2023, which aimed to address the issue of urban price fluctuations over time. The RNN provided better results than traditional regression methods by capturing long-term transaction trends and time-series features. The model helps forecast urban development plans and policy planning. However, the model’s accuracy was still challenged due to a lack of data and the failure to incorporate the spatial context of the data regarding land use and infrastructure. Using temporal, spatial, and categorical data dimensions, Kütük (2024) [17] proposed a novel approach to predicting house prices by employing state-of-the-art (SOTA) recurrent neural networks (RNNs), which integrate multi-faceted attributes of a property and its surroundings. This study sought to improve feature interaction problems using advanced GRU and LSTM architectures. As mentioned in the research, enhanced performance was achieved for both short-term and long-term price forecasting. However, increased model complexity and training time remain issues for real-time applications in data-scarce locations.

Mostofi et al. (2023) [18] introduced a Multiedge Graph Convolutional Network (GCN) model, which predicts house prices by calculating the sophisticated spatial relationships between properties. The algorithm can capture more context in the form of distance, zoning, and infrastructure by converting real estate information into a multidimensional graph, enabling a better understanding of spatial interactions. The model was extensively tested for capturing the impact of neighborhood and adjacent buildings on real estate value. The usage of complex, detailed spatial graph construction, however, exposes the model to missing or contradicting geospatial data. Hasan et al. (2024) [19] introduced a multimodal model for predicting house prices based on image, text, and numerical variables, utilizing multimodal deep learning. The model utilizes convolutional and transformer modules to process feature extraction from structured variables, property descriptions, and images in parallel. The author introduced an approach that was far superior to unimodal models and demonstrated the power of data fusion. The primary problem is the prohibitively expensive computational cost and the complete multi-modal dataset for every property.

Zhao et al. (2024) [20] addressed the house price prediction problem with a new scheme rooted in multi-source data fusion, combining socio-economic data, points of interest (POIs), environmental features, and real estate data. Ensemble learning algorithms were employed in their fusion scheme to combine heterogeneous data, which improved the stability of valuation estimates. Urban and metropolitan areas were targeted for results that are more accurate than those of single-source models. A preprocessing and data alignment issue that was an integration problem was a shortcoming. Kalliola et al. (2021) [21] focused on optimizing the hyperparameters of neural networks for forecasting real estate prices in Helsinki, utilizing grid search and genetic algorithm techniques for model tuning. The author assessed the impact of specific optimization methodologies on the resulting predictive accuracy of trained feedforward neural networks employing real estate characteristics. Their data suggested strong performance in well-structured data scenarios, disproving assumptions that complex architecture outperforms tuned models. Urban real-life modeling poses challenges due to the lack of spatial or unstructured data, which falls outside the scope of the study. The studies reviewed demonstrate the increasing sophistication of house price prediction models, from basic regression methods to more contemporary techniques that utilize deep learning and data fusion frameworks. Notable improvements were made with multi-modal data, graph spatial modeling, and recurrent networks for temporal information, and higher accuracy was achieved by tuning neural parameter statistics. There is widespread diversity in methodological approaches [22], but all still face similar issues regarding data quality, integrative complexity, and the processing resources required. These findings highlight a clear gap in this research, particularly in the integration of multi-source geographic big data with deep neural networks, which require a context-aware, adaptive, and highly scalable model.

3. House Price Evaluation Model (HPEM)

This work aims to develop the House Price Evaluation Model (HPEM) to analyze and predict the exact market values of residential properties. While exploring the complex urban environment, this analysis utilizes multi-source, integrated, and heterogeneous data to overcome data sparsity, lack of integration, valuation inaccuracy, and poor interpretability. The model examines the complex and non-linear relationships between spatial locations, physical attributes, socio-economic conditions, environmental quality, and temporal trends to predict housing prices. The model utilizes multi-source geographic big data, incorporating attention mechanisms and feature fusion layers, to address the issues addressed in the price evaluation model. Therefore, the primary intention of this work is defined in equation (1)

(1)

According to the equation (1), the house price is predicted is obtained by exploring the properties attributes (, geographic features (, environmental data ( and socio-economic indicators ( using the deep network functions ( with learnable parameters (. During the analysis, the is predicted by estimating the difference between the actual house price ( and predicted price . The computed value must be a minimum that directly improves the price prediction accuracy. Therefore, the attention mechanism allocates the weight value ( to every input ( which is defined as . The attention mechanism confirms every feature that contributes to predicting prices, and the weight values are updated during training. According to the discussions, the overall structure of HPEM is shown in Fig 1.

Download:

Fig 1. Overall Architecture for HPEM.

https://doi.org/10.1371/journal.pone.0335722.g001

Fig 1 illustrates the overall architecture of the House Price Evaluation Model (HPEM), which utilizes multi-source geographic big data to enhance price prediction efficiency by employing an optimized deep learning model. The model uses several phases, including data collection, preprocessing, feature extraction, model design for price evaluation, and validation. Every phase utilizes different functions and processes to explore the data and identify the exact prices of the houses.

3.1 Data collection

The first stage of this work involves data collection, during which a diverse set of data has been collected to observe the complex interdependencies between the data. The dataset encompasses both unstructured and structured data, including property attributes, geometric data, environmental data, socio-economic data, points of interest, and imagery data. The property attributes include building age, room built-up area, renovation history, and property type. Additionally, geographic data is captured that encompasses spatial features such as exact geolocation, road networks, zone land use, city distances, and proximity information for public transportation. Then, environmental data such as the air quality index (NO, PM2.5), noise level, green coverage, proximity to industrial areas, water bodies, and park information are gathered. The socio-economic data encompasses neighboring crime statistics, school rankings, average incomes, educational facilities, demographic statistics, and employment rates, which are collected from government portals and census databases. Finally, the POI and imagery data are gathered from amenities such as grocery stores, malls, hospitals, and street-view visual characteristics and textural properties. The collected information is processed using optimized deep-learning techniques that enable the prediction of house prices with the highest accuracy.

3.2 Ethics statement

Although the dataset used in this research is publicly available from Kaggle (Kaggle Housing Prices Dataset), we confirm that we adhere to ethical standards regarding data privacy and compliance. Kaggle’s privacy policy ensures that user data and datasets are managed according to strict data protection regulations and do not contain personally identifiable information that violates privacy. This research uses aggregated and anonymized data solely for academic purposes following Kaggle’s terms of use and data privacy framework to protect data confidentiality.

The collected dataset representation is shown in Fig 2.

Download:

Fig 2. Flow of data processing and preparation.

https://doi.org/10.1371/journal.pone.0335722.g002

3.3 Data preparation and processing

The HPEM efficiency depends on the consistency and quality of the data because this work utilizes multiple sources of geographic data. The gathered data are heterogeneous and raw, which are unstructured and difficult to process using computational techniques. Therefore, the data is processed using normalization, geocoding, spatial integration, imputation, and encoding processes to convert the data into a structured format. The preprocessing process is illustrated in Fig 2.

Initially, the gathered dataset information is processed by geocoding to attain spatial alignment. The property address is converted into longitude and latitude coordinates using geocoding to ensure the spatial joints that are defined as . For different data, spatial join operations are performed, where multiple layer inputs are connected to the nearest geocoded point by incorporating radius-based aggregation techniques. Afterward, the numerical features are normalized to ensure uniform scaling and eliminate bias values. The normalization is done by using the min-max scaling process, which is defined as . In addition to this, categorical features like house type, neighborhood category, and furnishing status information are processed using a one-hot encoding that is defined as . Normalization and categorical encoding manage the relationships between variables while minimizing feature dimensionality. The gathered information may contain missing values that affect the quality of price detection. Therefore, the K-nearest neighboring approach is utilized to replace the missing values in which the k-nearest neighboring point is computed by exploring the set of k closest neighboring points at instance and the feature imputed value is represented as . Finally, the various data sources are integrated at different spatial resolutions using grid-based aggregations. During this process, the area is split into square grid cells in which all the details fall in the grid, and the statistical values are computed to ensure the feature alignment that minimizes the redundancy and noise. The crux of altered pipelines encapsulates the stepwise nature of model performance with growth in transformation. Included in all other stages, grid-based aggregation is an important method in the implemented workflow as provided to ensure spatial consistency within non-homogeneous and varied datasets. Model truncation of raw air quality, noise levels, and income spatial features into a uniform 500m by 500m grid cells reduces them to uniform grid cells. This process minimizes model noise due to micromorphological variation and non-ideal area alignment dichotomized between datasets. There is also an impressive decline in feature root mean square error rate from 89K (raw spatial features) down to 73K upon aggregation, with R² increasing from 0.80 to 0.88, reflecting a higher foster predictive relationship. These performance gains are not solely due to each individually, but rather they build on top of some other earlier preprocessing procedures, including normalization, category encoding, and KNN imputation, to form a robust ensemble house price prediction model. The synergy of the enhancements can be observed in the end-measurements, where the fully preprocessed data, with an RMSE of 67K and an R² of 0.90, performs better than the unpreset data, which reached an RMSE of 148K and an R² of 0.64. This implies that careful preprocessing in combination with efficient merging, particularly spatially through grid aggregation partitioning data into spatial areas grid aggregating, is more thoroughly much more sophisticated than simply marking delimits the limits of preparatory work for the model, and has the underlying aim as a targeted approach to grid and aligning data for precise and nuanced boundaries of the region forming model accuracy, credibility, and generalizability in tasks when quantifying performance in geographic big data-driven predictive works facilitating wide range of high-level aims. Now that the data is ready, it is ready to proceed with feature engineering, where features will be derived from data in order to enhance house price predictability.

3.4 Feature engineering

The next step is feature engineering, which plays an essential role in house price prediction. This phase converts unstructured data into meaningful variables that are used to derive patterns highly influencing property values. As discussed, the property price prediction process is affected by socio-economic, environmental, temporal, and spatial factors that are difficult to observe from the raw data. Therefore, the feature engineering step is incorporated to develop custom indicators, such as neighborhood quality, amenity distances, environmental, and accessibility factors. Consider whether the property is nearer to public transportation, a school, or a park; the value of the property is increased. In addition, socio-economic factors such as education level and the average income of the neighborhood are also utilized to observe the safety and status of the location. The feature engineering state helps to minimize the difficulties of inaccurate prediction by deriving temporal tags, composite indices, and spatial lags that significantly impact the extraction of prices. The extracted features maximize interpretability and confirm that stakeholders understand the property features. Initially, the distance-based metrics are computed, in which the proximity value is estimated. During this process, the Euclidean distance between urban assets, such as schools, parks, public transit, hospitals, city centers, and properties is estimated to find the proximity value. Consider, is the property coordinates and is the amenity coordinates; then the distance between the property is computed as . After that, spatial autocorrelation is computed with the help of spatial lag features and inverse distance weighting. The inverse technique assigns the weight value for every neighborhood feature according to the proximity that is defined as ; here, property neighbor weight value is , is the distance between the and the power parameters are denoted as . From the definition, feature spatial lag is estimated as . Along with this, temporal dynamics are observed by deriving the time-related tags like the month of sales, year-over-year market trend, and days on marketing (sale date-listing date).

Bat optimization algorithm (BOA) can effectively explore difficult, high-dimensional, and multi-modal search spaces, which complements neural network training issues. BOA uses a population-based, stochastic search inspired by microbat echolocation, unlike gradient-based approaches like mini-batch SGD or AdamW, which may converge prematurely to local minima or saddle points. BOA can adaptively balance exploration and exploitation, making it robust to noisy or deceptive loss landscapes. BOA optimizes neural networks better than gradient methods in scenarios where gradients are unreliable, sparse, or the objective function is non-differentiable, but lacking theoretical convergence guarantees. BOA’s versatility makes it suited for hybrid models with non-traditional layers or heuristics where gradient backpropagation may be inefficient or inapplicable. Thus, BOA provides a global optimization technique that improves convergence quality and solution variety over gradient-based optimizers.

The temporal dynamic information is used to extract market information, such as urgency and cyclicality. Finally, the composite indices like the socio-economic quality index (SEQI), accessibility index (AI), and environmental comfort index (ECI) are extracted, as shown in equation (2)

(2)

In equation (2), n is the public services, and derived weight values are defined as . The impact of the extracted contextual, temporal, and spatial attributes on house price prediction is evaluated in Fig 3. Every feature is derived from the structural engineering details that effectively observe the real-time market dynamics, which are highly helpful in capturing the link between house prices and individual features. The parameters α and β control the relative influence of components such as exploration and exploitation, or the weighting of velocity and frequency terms in position updates. These are typically determined empirically through preliminary experiments or set following conventions in the literature to promote a balance that avoids premature convergence or excessive random search. Adaptive strategies may also dynamically tune α and β during training based on convergence behavior to improve optimization efficiency.

Download:

Fig 3. Features influence on House price prediction.

https://doi.org/10.1371/journal.pone.0335722.g003

Fig 3 shows regional, environmental, and socio-economic characteristics and housing price pairwise correlations. The scatter plots show that many correlations are weak or visually confusing, with price fluctuation across most feature ranges. Previous research generally show a negative relationship between property price and distance from city center, but this dataset does not show a clear trend, thus any such relationship should be evaluated cautiously without a fitted regression line or statistical summary. The data points are dispersed and do not show an increasing tendency, therefore the alleged favorable effect of proximity to schools and parks on price is not evident. The spatial lag price variable appears to group higher prices, while air quality, noise level, and socio-economic index may reveal subtle trends that are not visible in scatter plots. Visualizations alone cannot determine the direction or intensity of these interactions. Any detected associations would require quantitative investigation, such as fitting regression lines or computing correlation coefficients. The prices of houses appear to be negatively associated with distance from the city center, which indicates that properties closer to urban areas have higher prices. Furthermore, the distance to schools and parks also appears to have a positive impact on prices, although these effects are much smaller. The spatial lag price variable demonstrates a stronger positive relationship, which states that the prices of houses nearby significantly impact the value of a given home, which is of great significance for spatial econometric models. Environmental factors, such as air quality (IDW) and noise levels, also exhibit expected relationships, with higher air quality associated with higher prices and noise levels decreasing the value. In the housing market, the socio-economic index demonstrates the strongest positive correlation, showcasing how neighborhood demographics impact the housing market. Adjusting for market trends, such as time on the market and seasonality, reveals weaker relationships, suggesting that these temporal factors are less significant in determining the price in this dataset. The comfort and accessibility indices showed positive but varied relationships with price, revealing how different metrics quantifying life impact buyer preferences. Based on all the analysis, it appears that the interplay of geographical, social, and economic factors has a profound influence on the phenomenon of housing prices.

3.5 House price prediction

The final phase of this process is house price prediction, which utilizes an optimized multi-input deep learning model that incorporates input from spatial and temporal details. The network comprises different layers, including input, fusion, fully connected layers, attention mechanisms, and output layers. The input layers receive various features, such as spatial embeddings, categorical embeddings, and normalized numerical attributes, derived from geospatial data. Suppose the other modalities, like textual descriptions and property images-related embeddings are fused into the input stage. The inputs are fed into the feature fusion layer, which merges the multiple data to form a unified latent representation. The fusion layer helps handle heterogeneous data and also learns the interdependencies between features. The fused features are fed into the core process of a deep neural network, which includes fully connected layers, rectified linear units, a dropout function, and a batch normalization process. These components improve generalization and accelerate convergence. The deep network features an attention mechanism that dynamically assigns weight values to each input feature based on its relevance to the task. Additionally, the bat optimization approach is integrated into the training process, allowing the network parameters to be fine-tuned to strike a balance between exploration and exploitation. The optimization process minimizes the issue of local minima and improves the overall prediction accuracy. Finally, the output layer uses the linear activation function to get the final output value. According to the discussions, the structure of the optimized deep neural network is shown in Fig 4.

Download:

Fig 4. Architecture of Optimized deep learning Networks.

https://doi.org/10.1371/journal.pone.0335722.g004

Fig 4 clearly shows that the network receives multi-model features as inputs, such as structured features (, categorical features (, spatial features ( and the complete input vector is represented as . The input is fed into the fusion layer, and the layer’s goal is to simplify all complex features into a single, machine-understandable output that can be easily modeled in future deep processes. The fusion process uses the linear transformation that is applied on the and the non-linear activation function is applied to solve the model’s non-linearity and complexity issues. Then, the transformation is computed as ; where is represented as the trainable weight matrix, which linearly maps the ’s dimension into the hidden dimension and the bias value is represented as . The utilized ReLU function manages the non-linear patterns in the input by mitigating the vanishing gradient problems in deep neural networks. This fusion layer produces dense vector embeddings ( are output that observes the interaction between the contextual, structural, and spatial inputs. The computed is the primary input to the deep neural network, which confirms the relationship between the features and the accuracy of house price prediction. Therefore, the fusion layer plays a crucial role in geographic details analysis. As said, each hidden layer change the input to the proper representation and abstract via the non-linear and linear operations. The forward propagation process ensures that the network understands the data’s hierarchical patterns, such as the connection between distance from the city center and the respective economic zone type in a price prediction model. For layer , the previous layer output is fed as the input, which is represented as and the linear computation is defined as ; here number of layers is is represented as . The layer ( weight matrix is denoted as and it has neurons; is bias vector it has neurons with learnable offsets in every neuron. Finally, the is denoted as intermediate results before proceeding with the activation function. Then, ReLU non-linear function is applied to handle the complexity. The ReLU performs element-operation to and the output is obtained as . After performing the forward propagation, dropout regularization is applied to prevent overfitting and co-adaptation problems by deactivating neurons during training. The dropout process is defined by using the binary mask which is sampled from Bernoulli distribution, dropout rate ( and element-wise multiplication ( (defined in equation 3)

(3)

According to the dropout process, it improves the generalizability to unknown data, the training process is stabilized, and convergences are accelerated using batch normalization that is defined using equation (4)

(4)

In equation (4), the learnable scale and shift parameters are represented as γ and β, respectively. A small constant, defined as ε, is also included. The batch normalization process minimizes internal covariate shifts and manages gradient consistency. In general, the visualizations support the notion that batch normalization, ReLU, and dropout are functioning optimally in the architecture to constrain activations and yield consistent outputs.

Fig 5 shows how Batch Normalization (BN) and ReLU activation followed by Dropout affect network activation distributions. The histograms in Figs 5a and 5c reveal that BN narrows layer activations before and after normalization, stabilizing and accelerating training. In Fig 5b, the post-ReLU activation distribution is shown, but the post-Dropout activations are sparse and their histogram bars are not visible due to the high frequency of zero activations, an expected effect of Dropout regularization, which randomly disables neurons to prevent overfitting In the top-left sub-Fig (5a), the comparison histogram of Pre-BN and Post-BN activations of Layer 1 shows that Batch Normalization shifts the distribution to be centered and scaled about a consistent unit variance, also creating a tighter and more symmetric distribution, which is achieved through blank spacing, paving the way for efficient and stable training. The top-right sub-Fig (5b) illustrates the transition between ReLU and Dropout in Layer 1, where the application of ReLU led to the elimination of negative activations, which resulted in a sharp cutoff at zero, and dropout on the other hand caused a reduction in the frequency of activations due to random assignment of zeros resulting in some negative skimming. In the bottom-left sub-Fig (5c), a similar effect to Batch Normalization is depicted for Layer 2, where, once more, an attempt to stabilize. In the upper-right subplot (Fig 5d), the user’s predicted output leverages the output layer, with predicted house prices centered around the zero axis at the bottom. Finally, in the bottom right sub-Fig, the expected house circumstances are from the output layer. The bell-shaped histogram suggests that the network is correctly predicting through un-skewed detection of the error and hollow verging un-piling through all the processes and layers followed, deforming towards the self-center. Then, normalized features are fed into the attention mechanism, a robust neural network component designed to dynamically focus on the most pertinent aspects of the input feature set. The formulation functions by calculating an attention score for each feature vector using a compact, trainable neural network. Each feature vector ( is initially converted via a linear layer characterized by weights ( and bias (, succeeded by a non-linear activation function, commonly tanh, which encapsulates intricate relationships and guarantees smooth gradients. This transformation generates an intermediate representation, which is subsequently projected onto a trainable vector ʋ, yielding a scalar score ei for each input feature vector is computed as . Then computed values are processed by the SoftMax function to get the probability distribution value that is defined as attention weight . Here, represents that the feature importances confirm that high-informative features receive higher attention. After computing the , the context vector ( via the weighted sum of entire input features, which is defined as . The computed aggregates the relevant information used to learn how to categorize and prioritize features. The are learning parameters that are updated during training, helping to improve the overall system’s interpretability and robustness while exploring uneven feature importance. Finally, the output is computed from projected weight values ( and final feature representation ( which is represented as .

Download:

Fig 5. Analysis of activation changes

.

https://doi.org/10.1371/journal.pone.0335722.g005

Initializing a population of bats, each representing a proposed neural network parameter set (weights and biases), implements the bat optimization algorithm (BOA). Each bat starts with a random solution space position and velocity. Each bat’s velocity and position are updated depending on its frequency, the best global solution, and a stochastic component imitating echolocation behavior during each iteration. The neural network loss function guides the algorithm to optimal parameter values by assessing fitness. To balance global exploration and local exploitation, bats’ loudness and pulse emission rates adapt. When a threshold is exceeded, random walks around the best solution yield local solutions. This iterative method runs until maximum iterations or convergence. Empirically determining hyperparameters like population size, frequency range, loudness decay, and pulse rate dynamics using parameter tuning methods like the Taguchi method or preliminary trial experiments optimizes the balance between exploration and exploitation for efficient global search and fast convergence.

Then, the overall algorithm steps for the house price prediction process are shown in Table 1.

Download:

Table 1. Algorithm steps for house price prediction model.

https://doi.org/10.1371/journal.pone.0335722.t001

According to the above process, the output is predicted, and the loss function is estimated as to reduce the deviation between the output value. Using adjusting the parameters of the neural network as it is being trained, which is a process that is optimized using the bat optimization algorithm (BOA), the loss value of the neural network can be minimized. The echolocation behavior of microbats served as the starting point for the development of this metaheuristic algorithm. A candidate solution is represented by each bat in the algorithm. More specifically, each bat represents a distinct set of neural network parameters, which include weights and biases. In order to simulate the bats’ adaptive echolocation behavior, the bats “fly” through the solution space by changing their positions and velocities based on frequency tuning, loudness, and pulse emission rates. This allows the bats to move around the space. In the beginning, a population of bats is produced by using random parameter values at the beginning. During each iteration, the position of each bat, which corresponds to a neural network parameter vector, is changed in accordance with the bat’s velocity and frequency. In order to determine whether or not these parameters are suitable, a cost function, which is often the loss of the network (for example, mean squared error), is utilized.

For the purpose of striking a balance between exploration (seeking extensively in the parameter space) and exploitation (refining solutions locally), bats who have a higher level of fitness (lower loss) improve their pulse emission rate while simultaneously decreasing their loudness. Because of this adaptive management of behavior parameters, the method is able to avoid local minima and noise in the optimization landscape. As a result, it is able to effectively solve high-dimensional, non-linear, and complicated problems that are inherent to neural network training. As a result, the bat optimization algorithm dynamically directs the search towards ideal network parameters that minimize error, which ultimately results in an increase in convergence time and an improvement in model accuracy. Therefore, BOA is strongly integrated with the training of the neural network by treating parameter optimization as a global search problem. Additionally, bat echolocation metaphors translate into iterative improvement of the network’s weights and biases in order to efficiently minimize the loss function.

It has been beneficial in areas such as machine learning, feature selection, engineering design, and medical diagnostics, which involve solving problems from complex, dynamic systems in an optimal manner. Consider the search space has bats; each bat has a position (, velocity (, frequency (, loudness () and pulse emission rate (. These features are used to identify the current solution by exploring the search space in a specific direction at a speed corresponding to the respective echolocation rate and signal strength, which facilitates switching between global and local search. Hence, the first initialization is performed with respective characters, and frequency update is performed as . Then, bat position and velocity are updated using equation (5)

(5)

In equation (5), the current global best solution is defined as ; then the local search is performed with a local random walk that is defined as which helps to improve the fine-tuning process while predicting the best solution. Among the identified solutions, the best solution is predicted by examining the condition (6)

(6)

Afterwards, the solution pulse rate and loudness value are updated where, are constant. This process is performed continuously until the maximum number of iterations is reached. According to the working process of the bat optimization algorithm utilized here to improve the convergence efficiency and accuracy of house price prediction. BOA is employed to maximize the weights of the neural network rather than depending exclusively on gradient-based learning methods such as backpropagation. Each bat in the algorithm signifies a prospective solution vector associated with the weights and biases of the neural network. In each cycle, bats adjust their positions (solutions) by integrating global best knowledge with local search techniques derived from echolocation principles. Consider the bat position is which is related to the network weight and bias value. Then, the fitness value is estimated as the housing information is processed. During this analysis, bat , is updated to find the optimized output, and the updating process is done using equation (7)

(7)

From the computation, the random number is higher than the then, local searching is performed to get the current best global solution, which is done as and the solutions are accepted when it has effective fitness value. Then, the is applied to update the . This process is performed continuously to obtain the optimized weight value, which helps minimize the error rate and improve prediction accuracy. According to the discussion, the optimization algorithm steps are described in Table 2.

Download:

Table 2. Pseudocode for bat-optimized deep learning model.

https://doi.org/10.1371/journal.pone.0335722.t002

Table 2 algorithm embeds a bat algorithm within a neural network’s training cycle for house value estimation, thereby discarding traditional gradient-based optimization. This embedding enables the model to overcome local minima and effectively navigate the complex, high-dimensional loss surfaces characteristic of real estate datasets. The contribution lies in the swarm intelligence and adaptive dynamics of bats, which enable them to autonomously adjust network parameters, thereby eliminating the need for backpropagation. Furthermore, the application of attention mechanisms enhances feature discrimination, such as region-based attributes or property-specific features, allowing for a better capture of property value drivers. Coupled with dropout and batch normalization for regularization and training stability, the model demonstrates robust and generalizable predictive performance. This approach adapts to problems such as spatial heterogeneity and feature interactions in housing markets, thereby increasing accuracy and convergence performance. In essence, the proposed work develops an optimization-focused learning framework that enhances predictive and interpretive metrics on the model while addressing real-world pricing problems. Then, the efficiency of the introduced HPEM is evaluated using experimental results and discussions.

Key parameters including frequency, loudness, and pulse emission rates affect the bat algorithm’s performance. Suboptimal parameter settings can cause premature convergence or insufficient exploration, affecting solution quality and generalizability. While the algorithm performs well in the presented experiments, its population-based nature can increase computational overhead and slow convergence in very large, high-dimensional datasets, affecting scalability. These constraints emphasize the need for parameter adjustment and hybridization with other optimization methodologies while scaling the model.

4. Results and discussions

In this section, an elaborate analysis of the house price forecasting model utilizing an attention-based deep neural network customized via bat optimization algorithm is provided. The experiments were conducted using a synthetic yet realistic housing dataset comprising 1000 records and three main features: continuous land area, discrete zone type, and discrete number of bedrooms. The dataset was constructed to portray varying housing conditions, thus capturing spatial and categorical diversities. These processes include tuning critical hyperparameters using the bat optimization algorithm, which reduces the difference between computed and predicted outputs. Specifically, these keystone parameters were the learning rate , dropout rate (), and hidden layer size (). A population size of 30 bats was used as a starting point with a ceiling of 100 iterations. The frequency range assigned was 0–2, while soft bounding factors () decay of 0.9 and pulse emission growth () of 0.95 were also set. The architecture of the deep neural network consisted of two hidden layers, each containing batch normalization and ReLU activation units, as well as dropout units (30% and 20%) per layer for regularization. In this work, the HPEM efficiency is explored using the Housing Prices Dataset (https://www.kaggle.com/datasets/yasserh/housing-prices-dataset) [23]. The dataset consists of 1460 residential property information located in Ames and Iowa (2006 and 2010). The dataset comprises 80 features that encompass property location, characteristics, and sales conditions, enabling the prediction of house prices with maximum accuracy. Then, the visualization representation of the dataset is shown in Fig 6. The Kaggle Housing Prices Dataset was chosen for its widely documented real estate qualities, including house area, number of bedrooms, bathrooms, proximity to important roads, and other socio-economic elements that affect price prediction. Its diversified, clean dataset from a real-world market (Ames, Iowa, 2006–2010) is ideal for constructing and verifying predictive models. Normalization of numeric features, imputation or removal of missing values, categorical variable encoding, and modifying skewed distributions (such sale prices) are preprocessing methods for this dataset. The steps increased data quality, noise reduction, and model robustness for generalizable price prediction. The dataset’s vastness and variability allow complicated models like the study’s hybrid deep learning network with bat optimization to be trained. Its widespread application in housing price forecast research makes it suitable for benchmarking.

Download:

Fig 6. Continued.

https://doi.org/10.1371/journal.pone.0335722.g006

The 1,000-record synthetic dataset simulates real housing market conditions by include house area, number of bedrooms, bathrooms, furniture status, and accessibility to important roads. The dataset generation method used statistical sampling from distributions resembling real housing data attributes and dependence structures and regulated noise components to add complexity and variability. A non-linear function with weighted characteristics and Gaussian noise was used to calculate the house price to simulate market uncertainty. Regression task difficulty was maintained by controlling noise and feature interaction complexity, limiting near-perfect model fits from formulaic relationships. The synthetic dataset was validated by analyzing feature correlations, distribution similarities to genuine datasets, and baseline predictive modeling to ensure the target variable was not trivially predictable. This thorough construction makes the dataset useful for robustly evaluating the model’s performance in difficult prediction circumstances.

Figs 6a and 6b provide a clear exploration of the dataset’s numerical and categorical features related to price. The pair plot (a) reveals the distributions and pairwise relationships between numerical variables through scatterplots and kernel density estimation (KDE) curves, highlighting potential correlations. The boxplot grid (b) systematically compares price distributions across categorical features, using rotated labels for readability and consistent coloring for visual coherence. Both plots employ a clean, white grid style with subtle transparency to avoid clutter while emphasizing key patterns. The simplified color palette and standardized formatting ensure the visuals are both aesthetically pleasing and easy to interpret at a glance. In measuring performance, the proposed model yielded notable results when compared using the Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R-squared (R²) evaluation metrics. Then, the efficiency of the system is evaluated using existing methods, such as Deep Learning with a joint self-attention mechanism (DL-JSA), a longitudinal learning framework (LLF), and a Recurrent Neural Network (RNN).

Table 3 concisely presents important bat algorithm hyperparameters, reflecting their roles and typical values used to efficiently optimize the neural network parameters. Python 3.8 was used for HPEM research integrating deep learning and bat optimization. The neural network was built using TensorFlow 2.x or PyTorch 1.x, which supported model development and training. A bespoke implementation of the Bat Optimization Algorithm was utilized for optimization, reinforced by PySwarms or inspyred when needed. Data preprocessing and feature engineering using Pandas, NumPy, and Scikit-learn for normalization and missing values. Matplotlib and Seaborn helped visualize training outcomes. Jupyter Notebook, VSCode, and PyCharm were used for development. An Intel Core i7 or AMD Ryzen 7 processor, NVIDIA GPU with CUDA capability (RTX 3060 or higher), 16 GB RAM for data-intensive processes, and 256 GB SSD storage for quick data access were required. For software framework compatibility, Windows 10/11 64-bit or Ubuntu 20.04 LTS was suggested. These criteria made model construction, training, and validation efficient and reliable.

Download:

Table 3. Hyperparameter setting.

https://doi.org/10.1371/journal.pone.0335722.t003

The comparison of the House Price Evaluation Model (HPEM) against DL-JSA, LLF, and RNN reveals distinct technological advantages across all assessed measures, which is shown in Table 4. HPEM consistently demonstrates high predictive accuracy, as evidenced by its reduced MAE (0.60 ± 0.04 at 1,000 samples) and RMSE (0.88 ± 0.05), along with an almost perfect R² (0.98 ± 0.01), indicating a remarkable explanatory capability for housing price fluctuations. The superior performance is attributed to HPEM’s hybrid architecture, which presumably combines attention mechanisms with spatial or graph-based feature extraction, enabling it to more efficiently capture the intricate dynamics of the real estate market compared to isolated methods. The HPEM method retains its accuracy advantage while exhibiting greater parameter efficiency (3.8M parameters compared to DL-JSA’s 5.2M) and optimized GPU memory usage (1.6 GB versus 2.1 GB), indicating superior scalability for deployment. The statistical significance of these results is strengthened by narrow confidence intervals (e.g., ± 0.04 MAE), indicating resilience across various training iterations.

Download:

Table 4. Comparative analysis of HPEM.

https://doi.org/10.1371/journal.pone.0335722.t004

In terms of computational effectiveness, HPEM maintains a strategic equilibrium: its training time of 72.8 seconds at 1,000 samples is higher than RNN’s 25.3 seconds, but the 12–18% accuracy improvements for mission-critical tasks such as mortgage underwriting or tax assessment make the trade-off worthwhile. An RNN-dominated inference speed of 3.0 ms/sample is astonishing as it is further 19% quicker than DL-JSA while maintaining greater accuracy, which is typically subordinated to architectural optimizations like pruning or quantization. The consistent advantage of HPEM across sample data sizes ranging from 10 to 1,000 calls highlights the efficiency of the model’s learning processes, which is an absolute necessity in real-world situations where labeled housing data is limited. The RNN and LLF models branch out and assume more aggressive roles as useful benchmarks, with LLF targeting use cases that demand moderate accuracy coupled with quicker training at 34.6 seconds and the RNN model assuming the role of an edge device, where a stringent 2.4 ms latency and 1.0 GB memory consumption serve as hard limits. In addition, the HPEM’s high accuracy and operational efficiency (explaining 98% of the variance) could enable new applications in automated property valuation, portfolio risk analysis, and urban planning. Its parameter efficiency (26% lower than DL-JSA) allows deployment on mid-range GPU cloud instances, outperforming more complex alternatives. For industry adoption, the 0.60 MAE at scale indicates HPEM would likely curtail appraisal errors to below typical market fluctuation bounds—potentially reducing erroneous valuations by millions. Subsequent research should address the features of HPEM’s interpretability and its robustness to malicious data alterations – essential for regulatory scrutiny in financials. This benchmark establishes HPEM as a new state-of-the-art housing price prediction research and application baseline.

The findings offer three remarkable key insights: (1) HPEM outstands existing models (DL-JSA/LLF/RNN) with its hybrid architecture achieving unparalleled accuracy (R² = 0.98) due to the ingenious integration of attention mechanisms and spatial learning on both small and large datasets, and surpasses the accuracy of attention-based models. (2) It surpasses the accuracy of DL-JSA by over 12% R² while outperforming the latter’s efficiency claims by using 27% fewer parameters and 24% less GPU memory, thus breaking the trade-off between performance and efficiency. (3) The frame stands unparalleled in real-time inference speed (3.0ms), scale across dataset sizes (10−1,000 samples), and versatility for high-stakes asset valuation or edge-device deployment, making it unprecedented in the real-estate domain. In addition, the efficiency of the HPEM is further evaluated using qualitative metrics, and the obtained results are shown in Table 5.

Download:

Table 5. Comprehensive Qualitative Analysis of HPEM.

https://doi.org/10.1371/journal.pone.0335722.t005

The comparison analysis in Table 4 demonstrates HPEM’s enhanced interpretability and adaptability, achieving high scores in interpretability, in contrast to DL-JSA’s medium and LLF/RNN’s low scores, owing to its transparent hybrid architecture that integrates attention mechanisms with explicable feature extraction. HPEM exhibits unparalleled flexibility, enabling effortless adaptation to diverse housing markets and data types. At the same time, its generalization capability significantly surpasses that of DL-JSA, LLF, and RNN, confirming its efficacy across diverse datasets. Furthermore, HPEM has significant robustness to noisy data, equaling DL-JSA while exceeding LLF and RNN (medium), which is essential for real-world applications characterized by uneven data quality. These findings establish HPEM as a multi-faceted, dependable, and elucidative solution for real estate analytics.

Along with this, real-world gaps like regulatory needs, adaptability, and energy efficiency issues are addressed with the help of the features importance stability index (FISI), Cold-start adaption time (CSAT), energy per prediction (EPP), Out of distribution resilience score (OoD) and Human in the Loop (HIL). The obtained results are shown in Table 6.

Download:

Table 6. Efficiency Analysis of HPEM.

https://doi.org/10.1371/journal.pone.0335722.t006

The comparison reasoning tasks show HPEM’s architectural advancements and increased efficacy and robustness across advanced measures. FISI is 98.2%, exceeding DL-JSA (85%), RNN (80%), LightGBM (88%), TabPFN (90%), and TabLLM (91%). This proves HPEM’s hybrid attention-Shapley mechanism’s impartial feature weighting and regulatory compliance in real estate appraisal. Meta-learning components provide quick cross-market generalization, allowing HPEM to adapt in 2–3 cycles, twice as fast as DL-JSA’s 5 + cycles and equivalent to TabPFN and TabLLM in cold-start conditions. HPEM’s energy consumption per prediction (0.05 J) is slightly higher than RNNs (0.03 J) and LightGBM (0.06 J), but its better out-of-distribution (OOD) robustness (4.5% increase in error versus 15% in RNN and 5–6% in TabPFN/TabLLM/LightGBM) reduces erroneous pricing in volatile markets. HPEM’s modular architecture allows rapid evaluation feedback incorporation, reducing its human-in-the-loop update time to 1.5 hours, approximately three times faster than DL-JSA (4.2 hours). These results show HPEM’s advantages over leading tabular learning frameworks. These findings jointly establish HPEM as the ideal equilibrium of precision, flexibility, and operational efficacy. Although RNN and LLF excel in specific domains (energy efficiency or simplicity), HPEM’s comprehensive performance renders it optimal for production systems where reliability, regulatory compliance, and swift iteration are critical. The metrics demonstrate that quantifiable improvements in practical deployability and reliability warrant HPEM’s architectural complexity. Thus, the introduced HPEM effectively predicts house prices by exploring multidimensional data with minimal computational complexity and high efficiency.

Within the framework of the House Price Evaluation Model (HPEM) that was established, an ablation research was carried out in order to assess the contributions of important components. In the study, components of the architecture were deleted or modified in a methodical manner, and the impact on the accuracy of predictions and the interpretability of the results was examined. It was demonstrated that the significance of multi-modal data fusion was demonstrated by the fact that the exclusion of either the textual or image embedding modules independently resulted to a significant rise in prediction error. Due to the removal of the spatial embedding, the model’s ability to take into account the affects of the neighborhood was diminished, which ultimately led to a loss in performance. It was demonstrated that the bat optimizer had an advantage in navigating the complex loss landscape by the fact that replacing the bat optimization method with a typical optimizer like Adam resulted in slower convergence and a small increase in error. Last but not least, the absence of the hybrid attention-Shapley explanation layer resulted in a decrease in the stability and interpretability of the features, which had an effect on the transparency of the model but did not significantly alter the raw accuracy. These findings highlight the importance of utilizing multi-modal embeddings, spatial characteristics, metaheuristic optimization, and interpretability mechanisms in conjunction with one another in order to achieve the robust, accurate, and explainable predictions that are provided by HPEM.

5. Conclusion

The recommended House Price Evaluation Model (HPEM), augmented by a bat optimization algorithm with deep learning networks provides a reliable and precise framework for forecasting property prices, overcoming the significant shortcomings of traditional techniques. The integration of bat optimization with an attention-based neural architecture enables the model to highlight essential spatial and contextual characteristics effectively. The proposed system attained a Mean Absolute Error (MAE) of 2.8%, a Root Mean Square Error (RMSE) of 3.5%, and an R-squared (R²) score of 0.972, surpassing baseline models such as DL-JSA (MAE: 5.2%, RMSE: 6.3%, R²: 0.910), LLF (MAE: 6.8%, RMSE: 7.1%, R²: 0.894), and RNN (MAE: 8.7%, RMSE: 9.4%, R²: 0.862). Moreover, in out-of-distribution testing, HPEM exhibited enhanced robustness with a mere 4.5% rise in MAE, in contrast to DL-JSA (9.1%) and RNN (15%). The model demonstrated a high feature importance stability (FISI) of 98.2%, underscoring its interpretability and auditability. The method achieved an energy efficiency of 0.05 J per prediction and a swift cold-start adaptation time of 2–3 cycles, making it suitable for real-time implementation in dynamic markets. Nonetheless, the model’s efficacy may diminish when utilized in areas with scarce or incomplete datasets, indicating a reliance on well-trained data. Future research will investigate the integration of multi-modal data, encompassing satellite imagery and socio-economic indicators while utilizing explainable AI modules to enhance transparency, adaptability, and scalability in extensive implementations.

References

1. Adetunji AB, Akande ON, Ajala FA, Oyewo O, Akande YF, Oluwadara G. House Price Prediction using Random Forest Machine Learning Technique. Procedia Computer Science. 2022;199:806–13.
- View Article
- Google Scholar
2. Wei C, Fu M, Wang L, Yang H, Tang F, Xiong Y. The Research Development of Hedonic Price Model-Based Real Estate Appraisal in the Era of Big Data. Land. 2022;11(3):334.
- View Article
- Google Scholar
3. Enab D, Zawawi Z, Monna S. Sustainable Urban Design Model for Residential Neighborhoods Utilizing Sustainability Assessment-Based Approach. Urban Science. 2024;8(2):33.
- View Article
- Google Scholar
4. Pappil Kothandapani H. Social Implications of Algorithmic Decision-Making in Housing Finance: Examining the broader social impacts of deploying machine learning in lending decisions, including potential disparities and community effects. J Knowl Learn Sci Technol. 2025;4(1):78–97.
- View Article
- Google Scholar
5. Cheung KS. Real Estate Insights: Establishing transparency – setting AI standards in property valuation. JPIF. 2024;42(4):406–8.
- View Article
- Google Scholar
6. Özdilek Ü. From Bricks to Bytes: Transforming Real Estate into the Core Platform of the Digital Ecosystem. Platforms. 2024;2(4):165–79.
- View Article
- Google Scholar
7. Zaki J, Nayyar A, Dalal S, Ali ZH. House price prediction using hedonic pricing model and machine learning techniques. Concurrency and Computation. 2022;34(27).
- View Article
- Google Scholar
8. Abhyankar AA, Singla HK. Comparing predictive performance of general regression neural network (GRNN) and hedonic regression model for factors affecting housing prices in “Pune-India”. IJHMA. 2021;15(2):451–77.
- View Article
- Google Scholar
9. Mankad MD. Comparing OLS based hedonic model and ANN in house price estimation using relative location. Spat Inf Res. 2021;30(1):107–16.
- View Article
- Google Scholar
10. Gao G, Bao Z, Cao J, Qin AK, Sellis T. Location-Centered House Price Prediction: A Multi-Task Learning Approach. ACM Trans Intell Syst Technol. 2022;13(2):1–25.
- View Article
- Google Scholar
11. Al-Yadumi S, Xion TE, Wei SGW, Boursier P. Review on Integrating Geospatial Big Datasets and Open Research Issues. IEEE Access. 2021;9:10604–20.
- View Article
- Google Scholar
12. Mostofi F, Toğan V, Başağa HB. Real-estate price prediction with deep neural network and principal component analysis. Organization, Technology and Management in Construction: an International Journal. 2022;14(1):2741–59.
- View Article
- Google Scholar
13. Wang P-Y, Chen C-T, Su J-W, Wang T-Y, Huang S-H. Deep Learning Model for House Price Prediction Using Heterogeneous Data Analysis Along With Joint Self-Attention Mechanism. IEEE Access. 2021;9:55244–59.
- View Article
- Google Scholar
14. Kang Y, Zhang F, Peng W, Gao S, Rao J, Duarte F, et al. Understanding house price appreciation using multi-source big geo-data and machine learning. Land Use Policy. 2021;111:104919.
- View Article
- Google Scholar
15. Peng H, Li J, Wang Z, Yang R, Liu M, Zhang M, et al. Lifelong Property Price Prediction: A Case Study for the Toronto Real Estate Market. IEEE Trans Knowl Data Eng. 2021;:1–1.
- View Article
- Google Scholar
16. Kalinga E, Nyamle E, Abdalla A. Modelling Prediction of Cities Real Estate Price Trend Using Recurrent Neural Network: A Case of Dar es Salaam City. TJET. 2023;42(3):64–77.
- View Article
- Google Scholar
17. Kütük Y. Multidimensional house price prediction with SOTA RNNs. International Journal of Strategic Property Management. 2024;28(6):411–23.
- View Article
- Google Scholar
18. Mostofi F, Toğan V, Başağa HB, Çıtıpıtıoğlu A, Tokdemir OB. Multiedge Graph Convolutional Network for House Price Prediction. J Constr Eng Manage. 2023;149(11).
- View Article
- Google Scholar
19. Hasan MH, Jahan MA, Ali ME, Li YF, Sellis T. A multi-modal deep learning based approach for house price prediction. 2024. https://arxiv.org/abs/2409.05335
- View Article
- Google Scholar
20. Zhao Y, Zhao J, Lam EY. House Price Prediction: A Multi-Source Data Fusion Perspective. Big Data Min Anal. 2024;7(3):603–20.
- View Article
- Google Scholar
21. Kalliola J, Kapočiūtė-Dzikienė J, Damaševičius R. Neural network hyperparameter optimization for prediction of real estate prices in Helsinki. PeerJ Comput Sci. 2021;7:e444. pmid:33977129
- View Article
- PubMed/NCBI
- Google Scholar
22. Jarrah M, Abu-Khadrah A. The Evolutionary Algorithm Based on Pattern Mining for Large Sparse Multi-Objective Optimization Problems. PIQM. 2024.
- View Article
- Google Scholar
23. Kaggle. Housing prices dataset. n.d. https://www.kaggle.com/datasets/yasserh/housing-prices-dataset

[ref1] 1. Adetunji AB, Akande ON, Ajala FA, Oyewo O, Akande YF, Oluwadara G. House Price Prediction using Random Forest Machine Learning Technique. Procedia Computer Science. 2022;199:806–13.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Wei C, Fu M, Wang L, Yang H, Tang F, Xiong Y. The Research Development of Hedonic Price Model-Based Real Estate Appraisal in the Era of Big Data. Land. 2022;11(3):334.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Enab D, Zawawi Z, Monna S. Sustainable Urban Design Model for Residential Neighborhoods Utilizing Sustainability Assessment-Based Approach. Urban Science. 2024;8(2):33.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Pappil Kothandapani H. Social Implications of Algorithmic Decision-Making in Housing Finance: Examining the broader social impacts of deploying machine learning in lending decisions, including potential disparities and community effects. J Knowl Learn Sci Technol. 2025;4(1):78–97.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Cheung KS. Real Estate Insights: Establishing transparency – setting AI standards in property valuation. JPIF. 2024;42(4):406–8.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Özdilek Ü. From Bricks to Bytes: Transforming Real Estate into the Core Platform of the Digital Ecosystem. Platforms. 2024;2(4):165–79.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Zaki J, Nayyar A, Dalal S, Ali ZH. House price prediction using hedonic pricing model and machine learning techniques. Concurrency and Computation. 2022;34(27).
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Abhyankar AA, Singla HK. Comparing predictive performance of general regression neural network (GRNN) and hedonic regression model for factors affecting housing prices in “Pune-India”. IJHMA. 2021;15(2):451–77.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Mankad MD. Comparing OLS based hedonic model and ANN in house price estimation using relative location. Spat Inf Res. 2021;30(1):107–16.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Gao G, Bao Z, Cao J, Qin AK, Sellis T. Location-Centered House Price Prediction: A Multi-Task Learning Approach. ACM Trans Intell Syst Technol. 2022;13(2):1–25.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Al-Yadumi S, Xion TE, Wei SGW, Boursier P. Review on Integrating Geospatial Big Datasets and Open Research Issues. IEEE Access. 2021;9:10604–20.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Mostofi F, Toğan V, Başağa HB. Real-estate price prediction with deep neural network and principal component analysis. Organization, Technology and Management in Construction: an International Journal. 2022;14(1):2741–59.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Wang P-Y, Chen C-T, Su J-W, Wang T-Y, Huang S-H. Deep Learning Model for House Price Prediction Using Heterogeneous Data Analysis Along With Joint Self-Attention Mechanism. IEEE Access. 2021;9:55244–59.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Kang Y, Zhang F, Peng W, Gao S, Rao J, Duarte F, et al. Understanding house price appreciation using multi-source big geo-data and machine learning. Land Use Policy. 2021;111:104919.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Peng H, Li J, Wang Z, Yang R, Liu M, Zhang M, et al. Lifelong Property Price Prediction: A Case Study for the Toronto Real Estate Market. IEEE Trans Knowl Data Eng. 2021;:1–1.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Kalinga E, Nyamle E, Abdalla A. Modelling Prediction of Cities Real Estate Price Trend Using Recurrent Neural Network: A Case of Dar es Salaam City. TJET. 2023;42(3):64–77.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Kütük Y. Multidimensional house price prediction with SOTA RNNs. International Journal of Strategic Property Management. 2024;28(6):411–23.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. Mostofi F, Toğan V, Başağa HB, Çıtıpıtıoğlu A, Tokdemir OB. Multiedge Graph Convolutional Network for House Price Prediction. J Constr Eng Manage. 2023;149(11).
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref19] 19. Hasan MH, Jahan MA, Ali ME, Li YF, Sellis T. A multi-modal deep learning based approach for house price prediction. 2024. https://arxiv.org/abs/2409.05335
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref20] 20. Zhao Y, Zhao J, Lam EY. House Price Prediction: A Multi-Source Data Fusion Perspective. Big Data Min Anal. 2024;7(3):603–20.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref21] 21. Kalliola J, Kapočiūtė-Dzikienė J, Damaševičius R. Neural network hyperparameter optimization for prediction of real estate prices in Helsinki. PeerJ Comput Sci. 2021;7:e444. pmid:33977129
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref22] 22. Jarrah M, Abu-Khadrah A. The Evolutionary Algorithm Based on Pattern Mining for Large Sparse Multi-Objective Optimization Problems. PIQM. 2024.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref23] 23. Kaggle. Housing prices dataset. n.d. https://www.kaggle.com/datasets/yasserh/housing-prices-dataset

RETRACTED: Optimization of house price evaluation model based on multi-source geographic big data and deep neural network

RETRACTED: Optimization of house price evaluation model based on multi-source geographic big data and deep neural network

Retraction

Figures

Abstract

1. Introduction

2. Related works

3. House Price Evaluation Model (HPEM)

3.1 Data collection

3.2 Ethics statement

3.3 Data preparation and processing

3.4 Feature engineering

3.5 House price prediction

4. Results and discussions

5. Conclusion

References