Kernel-based geographically and temporally weighted autoregressive model for house price estimation

Spatiotemporal nonstationarity and autocorrelation are two crucial points in modeling geographical data. Previous studies have demonstrated that geographically and temporally weighted autoregressive (GTWAR) model accounts for both spatiotemporal nonstationarity and autocorrelation simultaneously to estimate house prices. Therefore, this paper proposes a kernel-based GTWAR (KBGTWAR) model by incorporating the basic principle of support vector machine regression into spatially and temporally varying coefficients model. The efficacy of KBGTWAR model is demonstrated through a case study on housing prices in the city of Shenzhen, China, from year 2004 to 2008. Comparing the existing models, KBGTWAR model obtains the lowest value for the residual sum of squares (RSS) and the highest value for the coefficient of determination R2. Moreover, KBGTWAR model improves the goodness of fit of the existing GTWAR model from 12.0 to 4.5 in terms of RSS, from 0.914 to 0.968 in terms of R2 and from 3.84 to 4.45 in terms of F-statistic. The results show that KBGTWAR model provides a comparatively high goodness of fit and sufficient explanatory power for both spatiotemporal nonstationarity and autocorrelation. The results of this study demonstrate that the proposed KBGTWAR model can be used to effectively formulate polices for real estate management.


Introduction
Analysis of relationships between an output variable and input variables in spatiotemporal fields has recently attracted attention in a data analysis community. Spatiotemporal models occur when data are obtained across time as well as space. Spatial or temporal autocorrelation exists between the observations. In addition, spatial or temporal nonstationarity arises in the relationships. Ordinary regression models have neglected these issues that violate the standard Gauss-Markov assumptions. Spatial econometrics models have been proposed to deal with these issues. Spatiotemporal regression models can contribute to better understanding complex phenomena studied in geographical information science and other fields.
Spatial models not only can account for spatial autocorrelation but also can deal with spatial lag dependence or/and spatial error dependence. There are two approaches for handling spatial autocorrelation. The first is the weight matrix approach, which uses a spatial weight matrix to deal with the spatial relationships between observations. This approach is based on the work of [1]. See for details [2] and [3]. The second is the geostatistical approach, which directly models the covariance matrix of the error terms. This approach is based on the work of [4]. See for details [5] and [6]. Moreover, [3] proposed a spatiotemporal autoregressive (STAR) model to incorporate temporal correlations between observations and found it powerful in the context of residential real estate. [7] developed a two-order STAR model with a Bayesian procedure to effectively detect and correct heteroscedasticity among the residuals.
The spatial models aforementioned have made remarkable contributions to effectively applying spatial and temporal process to regression modeling. However, these models assumed that a global level relationship exists across the study area. By the way, the assumption of stationary or structural stability over space and time is often impractical, because parameters tend to change depending on the study area. To deal with spatial heterogeneity in housing markets, a variety of localized modeling techniques have been proposed. As a result, [8] and [9] demonstrated the usefulness of locally weighted regression in modeling nonmonocentric cities. [10], [11] and [12] proposed geographically weighted regression (GWR), which has grown in popularity amongst real estate modeling methods. GWR model is a local spatial model for exploring spatial nonstationarity. To capture both spatial nonstationarity and spatial autocorrelation in a complex process, [13] proposed a GWR with spatially lagged variables to utilize the spatial variations. But this model did not consider temporal information. To model the effects of temporal heterogeneity, [14] and [15] proposed a geographically and temporally weighted regression (GTWR) to simultaneously capture both spatial and temporal nonstationarity by integrating temporal effects into the traditional GWR model. [16] developed a GTWR model based on travel time distance metrics. But these GTWR models do not consider spatial autocorrelation. [17] asserted that it is better to deal with both spatial and temporal heterogeneity along with spatial autocorrelation effects in a mixed model. It is because spatial heterogeneity and autocorrelation are generally related in the context of modeling although the two problems are theoretically distinguished. However, it is not easy to build an integrated model with spatiotemporal autocorrelated and heterogeneous effects. [18] proposed a geographically and temporally weighted autoregressive (GTWAR) model to deal with spatiotemporal variations. In this paper we propose an efficient kernel-based GTWAR (KBGTWAR) model by incorporating the basic principle of support vector machine (SVM) regression into spatially and temporally varying coefficients model. SVM, first developed by [19] and his group at AT&T Bell Laboratories, has been successfully applied to a number of real world problems related to classification and regression problems.
The rest of this paper is organized as follows. Section 2 briefly describes the basic principle of GTWR and GTWAR models. Section 3 proposes an efficient KBGTWAR model. Section 4 and section 5 present case study and conclusion, respectively.

GTWAR model
In this section we briefly review the basic framework for GTWR and GTWAR models. We also review how to determine the adjustment parameters related with these two models.

GTWR model
The GWR model is a spatially varying coefficient regression approach for exploring spatial nonstationarity of a regression relationship for spatial data [10,11,12]. The GWR model extends the traditional linear regression model by allowing local rather than global parameters to be estimated. [14] developed a GTWR model to deal with spatial and temporal nonstationarity simultaneously by integrating temporal effects into the GWR model. [15] developed a GTWR model focusing on spatiotemporal kernel function definition and spatiotemporal bandwidth optimization. [16] developed a GTWR model based on non-Euclidean travel distance. The GTWR model can be expressed as follows: indicates the slope for each variable k and each space-time point i, and i represents the random error with no correlation between different points. Here, We notice that the GTWR model (1) is a spatially and temporally varying coefficient model. For a given data set, these local parameters are estimated using the weighted least square procedure. The relevant weights W ij , for j = 1, Á Á Á, n, indicate the proximity of each data point to the point (u i , v i , t i ). Let be the vector of the local parameters for the space-time point i. Here, the superscript t represents the transpose of vector or matrix. Then, where X is the n × (d + 1) matrix of input variables, y is the n-dimensional vector of output variable, and W(u i , v i , t i ) is an n × n weighting matrix of the form In addition, the fitted valueŷ is obtained as follows: . .
x a n ðX t Wðu n ; v n ; t n ÞXÞ À 1 X t Wðu n ; v n ; t n Þ where x a i is the ith row of the matrix X such that x a i ¼ ð1; x 1i ; Á Á Á ; x di Þ. The weights W ij are usually obtained through an adaptive kernel function. The adaptive kernel function attempts to adjust for the density of data points. This adaptive kernel function could use the same number of observed points in each local neighborhood set. The most commonly used adaptive kernel function is the Gaussian function where d ij is a spatiotemporal distance between points i and j and h is a nonnegative parameter known as bandwidth, which produces a decay of influence with distance. When the spatial and temporal distances between points i and j are given by we can construct the spatiotemporal distance as as a linear combination between ðd S ij Þ 2 and ðd T ij Þ 2 as follows: where μ S represents the scale factor of spatial distance and μ T represents the scale factor of temporal distance. Thus, the weights W ij can be expressed as are the parameters of spatiotemporal, spatial, and temporal bandwidths, respectively. Thus, if the spatial and temporal bandwidths are determined, the weight matrix [18] developed the improved GTWR (IGTWR) with the following spatiotemporal distance where ν 2 [0, π]. The adjustment parameters μ S , μ T and ν should be determined.

GTWAR model
To account for both spatiotemporal heterogeneity and spatial autocorrelation effects simultaneously, [18] developed a GTWAR, which combines the IGTWR model with the autocorrelation regression model. Spatial autocorrelation is considered by introducing the spatial lag P n j¼1 " W ij y j in a linear regression relationship [20].
where ρ i represents a spatial autoregressive parameter varying across geographical locations, i denotes the random error in the relationship, and " W ij is the element in the ith row and jth column of the n × n spatial weight matrix " W with " W ii ¼ 0. The elements " W ij are typically rownormalized, such that for each i, P n j¼1 " W ij ¼ 1. Consequently, the spatial lag may be interpreted as a weighted average of the neighbors. We notice that the model (11) is a locally based autoregressive model. Incorporating (11) into the IGTWR model, [18] proposed the following where ρ(u i , v i , t i ) is a scalar spatiotemporal autoregressive parameter at point i.

Model selection
Parameter estimation in GTWR and GTWAR is highly dependent on the adjustment parameters μ S , μ T and/or ν associated with the weighting function used. The selection of the adjustment parameters for GTWR and GTWAR can be determined using a cross validation (CV) approach or the corrected Akaike information criterion (AIC) from [12]. Since the adaptive kernel function generally uses the same number q of the nearest observed points, there is one more adjustment parameter. To obtain an optimal value of q, the CV or corrected AIC approach could be used. [14] argue that only the parameter ratio τ = μ T /μ S plays an important role in constructing weights. Hence, they set μ S = 1 to reduce the number of adjustment parameters, and so only three parameters, q, μ T and ν.
The CV function is defined as where λ is the vector of adjustment parameters associated with the GTWR or GTWAR model andŷ ðÀ iÞ ðλÞ is the fitted value of y i with the observation i omitted from the calibration process. The corrected AIC function is defined according to [21] as follows: where n is the number of data points in data set, RSS is the residual sum of squares defined as RSSðλÞ ¼ P n i¼1 ðy i Àŷ i Þ 2 , and tr(H(λ)) is the trace of the hat matrix H(λ) associated with the GTWR or GTWAR model with the adjustment parameter vector λ, which satisfies the relationshipŷ ¼ HðλÞy. See (5) for the hat matrix related with the GTWR model and [18] for the hat matrix related with the GTWAR model. Here log denotes the natural logarithm. The adjustment parameters are achieved automatically with an optimization technique by minimizing the CV function (13) or the corrected AIC function (14).

KBGTWAR model
In this section we review SVM regression and illustrate how to develop KBGTWAR model. The underlying idea of KBGTWAR model is that the true mean specification is approximated by combining linear SVM regression with nonlinear feature mapping function of the coordinate vector (u i , v i , t i ) of the observed point i.

SVM regression
The foundations of SVM have been originally proposed by [19] and are gaining popularity due to many attractive features, and promising empirical performance. We now briefly review SVM regression. See for details [22]. Suppose we are given the data set D ¼ fðx i ; y i Þg n i¼1 with each covariate vector x i 2 R d and the output y i 2 R. We basically illustrate the case of the linear SVM regression, taking the form where b is the bias term. The goal of the linear SVM regression is to find a linear regression function f ðxÞ that approximates all pairs ðx i ; y i Þ with precision and is simultaneously as flat as possible. Flatness means that we seek a small w. One way to guarantee this is to minimize the norm k w k 2 . To make it feasible, we introduce slack variables x i ; x Ã i representing upper and lower constraints on the outputs. This leads to the convex optimization problem The regularization parameter C > 0 determines the trade-off between the flatness of f and the fitting errors.
The key idea is to construct the primal Lagrange function where a i ; a Ã i ; Z i ; Z Ã i ! 0 are Lagrange multipliers. The conditions for optimality are given by Here, we refer to α i and a Ã i by a ðÃÞ i . Substituting (18), (19) and (20) into (17) yields the dual optimization problem.
Solving the above optimization problem determines the Lagrange multipliersâ i ;â Ã i . Thus the optimal regression function can be rewritten as follows: Here, the optimal value of b is obtained by employing Karush-Kuhn-Tucker [23] conditions as where n s is the size of I sv ¼ fi ¼ 1; 2; Á Á Á ; n : 0 < ja i À a Ã i j < Cg. We now consider how to make the linear SVM regression algorithm nonlinear. This could be achieved by simply preprocessing the the input vectors x i by a nonlinear feature mapping function 0 : R d ! F into some feature space F , and then applying the linear SVM regression algorithm. Thus, we only need to use the kernel trick K(x i , x j ) = ϕ(x i ) t ϕ(x j ) in the Eqs (21), (22) and (23) associated with the linear SVM regression algorithm [24]. We never need to know explicitly what ϕ is. The most popular kernel is Gaussian kernel defined by where κ > 0 is the kernel parameter.

KBGTWAR model
, covariate vector x i 2 R d and the output y i 2 R, we consider the following GTWAR model reexpressed from (12): where x 0i = 1 and x ki is the kth component of x i = (x 1i , Á Á Á, x di ) t for k = 1, Á Á Á, d. For simplicity, we use the spatiotemporal weights matrix " W constructed from q-nearest neighbors based on the following modified spatiotemporal distance When the observation j is one of q nearest neighbors of the observation i, " To develop KBGTWAR model, we apply the basic principle of the linear SVM regression to the GTWAR model (12) after preprocessing the coordinate vectors u i by a nonlinear feature mapping function ϕ into some feature space F . Thus, we first assume that ρ(u i ) and β k (u i ) for k = 0, 1, Á Á Á, d are nonlinearly related to the coordinate vector u i such that ρ( where w and w k are the weight vectors of dimension d h corresponding to ϕ(u i ). Here, ϕ is defined in an implicit way. An inner product in feature space has an equivalent kernel such that K(u i , u j ) = ϕ (u i ) t ϕ (u j ), provided certain conditions hold [24]. Several choices of the kernel function are possible. As mentioned before, Gaussian kernel is most widely used. Thus, we focus on the choice of an Gaussian kernel (24) in the sequel.
Using the basic idea of the linear SVM regression, we define the convex optimization problem: where the constant C > 0 is the regularization parameter and we use the notation " W i to refer to the ith row of matrix " W . For simplicity we set up the size of the insensitive zone to zero. Now we are going to construct the primal Lagrange function for the case where ρ 2 [0, 1] or ρ 2 [−1, 1]. For the case of ρ 2 [0, 1], the Lagrange function is constructed as follows: where a i ; a Ã i ; Z i ; Z Ã i ; n i ; n Ã i are the Lagrange multipliers. For the case of ρ 2 [−1, 1], the Lagrange function is constructed as follows: where a i ; a Ã i ; Z i ; Z Ã i ; n i ; n Ã i are the Lagrange multipliers. It is understood that the dual variables in (28) and (29) have to satisfy positivity constraints, i.e., a i ; a Ã i ; Z i ; Z Ã i ; n i ; n Ã i ! 0. It follows from the saddle point condition that the partial derivatives of L with respect to the primal variables ðw; w k ; b; b k ; x i ; x Ã i Þ have to vanish for optimality.
Classical Lagrangian duality enables the primal problem to be transformed to their dual problem. Substituting For the case of ρ 2 [−1, 1], the dual optimization problem is obtained as follows: Solving the above quadratic programming (QP) problem (36) or (37) with the constraints determines the optimal Lagrange multipliersâ i ;â Ã i andn i ;n Ã i . Thus, the estimated weight vectorsŵ andŵ k are obtained, respectively as follows: Thus, for a point u i associated with the training data set D, ρ(u i ) and β k (u i ) are obtained, respectively as follows: where K i is the ith row of the kernel matrix K whose elements are ϕ(u i ) t ϕ(u j ) = K(u i , u j ), X k is the kth column of the n × (d + 1) design matrix X, and is the Hadamard product.
Here,b andb k can be determined by the linear regression with the input vector ð " W i y; X i Þ t and the output variable which is, where I sv ¼ fi ¼ 1; 2; Á Á Á ; n : 0 < ja i À a Ã i j < C or jn i À n Ã i j > 0g is obtained by exploiting Karush-Kuhn-Tucker conditions [23]. That is, the estimated values ofŷ is obtained as follows: whereρ ¼ ðrðu 1 Þ; Á Á Á ;rðu n ÞÞ t andβ k ¼ ðb k ðu 1 Þ; Á Á Á ;b k ðu n ÞÞ t .

Model selection
The functional structure of the KBGTWAR model is characterized by the regularization parameter C, the kernel parameter κ and the number q of the nearest neighbors. These hyperparameters will affect the final model complexity. We now illustrate the model selection method which determines the optimal values of these hyperparameters of the KBGTWAR model. To choose these hyperparameters we utilize the AIC function which is defined according to [25] as follows: where λ is the vector of hyperparameters, RSSðλÞ ¼ P n i¼1 ðy i Àŷ i Þ 2 ,ŝ 2 ðλÞ denotes the estimate of noise variance defined asŝ 2 ðλÞ ¼ 1 nÀ d RSSðλÞ, and d is the number of free parameters, i.e., the size of I sv ¼ fi ¼ 1; 2; Á Á Á ; n : 0 < ja i À a Ã i j < C or jn i À n Ã i j > 0g.

Case study
This section illustrates the performance of the proposed KBGTWAR model using house price data collected from 2001 to 2008 in Shenzhen, China. This data set was firstly used in [18]. Shenzhen is a special economic zone, which is situated in Guangdong Province immediately north of Hong Kong Special Administrative Region. This city forms part of the Pearl River Delta megalopolis. There are six administrative districts in Shenzhen, which are Luohu, Nanshan, Futian, Yantian, Bao'an and Longgang. By the way, the last two districts are not situated in the Special Zone. See for further details [18]. The house prices of Shenzhen continue to increase at an alarming rate due to rapid industrialization and urbanization. The study data on house prices were provided by the Shenzhen municipal bureau of land resources and housing management. From the study area 406 observations are available. [18] reported that for empirical study thirteen input variables were used, but only six of them were statistically significant at the 90% confidence level according to their t-probabilities. The six input variables used in this study are land area (LANDA), distance from the nearest major road (DROAD), quality (QUAL), land price (LPRICE), property management spend (MAGT), and proximity to bus (TRAFF). The input variables and (u, v, t) coordinate variables are standardized such as (z − min(z))/(max(z) − min(z)). As in [26], a recent selling price is used as output variable, which stands as a proxy for the market value of the house. In fact, the logarithm of recent selling price is considered as output variable.
According to [18], there exist spatiotemporal autocorrelations in the output variable. Therefore, it is appropriate to use GTWAR-based models for this house price data set. For comparison, the global OLS model, the spatial autoregressive (SAR) model and three different GWRbased models including GWR, GTWAR and KBGTWAR are implemented using the same data set. The OLS and SAR models are employed to analyze the house price data by considering space coordinates and time as exogenous variables. GTWAR and KBGTWAR models are used to analyze the house price data with spatial and temporal considerations.
Using the cross validation (CV) technique, [18] determined that the optimal number of the nearest neighbors for GWR and GTWAR models is q = 51 and q = 44, respectively. In this paper we obtain q = 53 using the AIC technique for KBGTWAR model. As mentioned before, the spatiotemporal weights matrix " W for KBGTWAR model is constructed using the spatiotemporal distance (26) instead of the spatiotemporal distance in [18]. Using this " W , we obtain the value of Moran's I for KBGTWAR model, which is 0.1105. This value indicates that spatiotemporal autocorrelation is positive and thus we need to use KBGTWAR model under the condition ρ 2 [0, 1]. We examine parameter estimates for models under consideration and goodness of fit in terms of RSS and R 2 . By comparing RSS and R 2 values, KBGTWAR gives significantly better fit of data than OLS, SAR, GWR and GTWAR models. The proposed KBGTWAR model provides the bigger F-statistic value than GWR and GTWAR models. Therefore, this indicates that it is more appropriate to model this particular data set with nonlinear local KBGTWAR model.
We now investigate the spatial and temporal variations of three selected parameters, i.e., autoregressive parameter ρ, DROAD and QUAL coefficients.

Conclusions
In this paper, we proposed the KBGTWAR model to simultaneously account for spatiotemporal nonstationarity and autocorrelation that exist in the house prices. To devise KBGTWAR model we applied kernel technique to spatially and temporally varying coefficients and then utilized the basic principle of linear SVM to estimate the relevant parameters. Unlike GTWAR model, KBGTWAR model is basically nonlinear. Therefore, KBGTWAR model can deal with complex nonlinear trends, which are very common in spatiotemporal phenomena. KBGTWAR model can also appropriately explore the dynamic relationship between the output variable and the input variables, since this model is based on varying coefficients model which efficiently describes dynamic patterns of a regression relationship. KBGTWAR model takes over all advantages of SVM and varying coefficients model that capture nonlinearities in the data, that have good prediction ability, and that are useful tools when the functional form of the relationship between the output variable and the input variables is left unspecified. In particular, KBGTWAR model works well under settings without strong assumptions on the distribution of the data. KBGTWAR model is also interpretable, since varying coefficients model is meaningfully interpretable.
However, as with all SVM-related models, KBGTWAR model requires lots of computing time in determining the optimal hyperparameters, and has serious computational problem for large data because it has to solve a large-scale quadratic programming problem to get the values of relevant parameters. These are disadvantages associated with KBGTWAR model. This paper analyzed data reflecting the spatiotemporal nonstationarity and autocorrelation of house prices. OLS, SAR, GWR, GTWAR and KBGTWAR models used in the study were built based on house price data collected between 2001 and 2008 in the city of Shenzhen, China. The performances of these models were then compared based on RSS, R 2 and F-statistic. This paper demonstrates that the proposed KBGTWAR model provides good results in goodness of fit for the given example. To conclude, we proposed a more efficient KBGTWAR model to account for spatiotemporal nonstationarity and autocorrelation simultaneously.