A Method of DTM Construction Based on Quadrangular Irregular Networks and Related Error Analysis

A new method of DTM construction based on quadrangular irregular networks (QINs) that considers all the original data points and has a topological matrix is presented. A numerical test and a real-world example are used to comparatively analyse the accuracy of QINs against classical interpolation methods and other DTM representation methods, including SPLINE, KRIGING and triangulated irregular networks (TINs). The numerical test finds that the QIN method is the second-most accurate of the four methods. In the real-world example, DTMs are constructed using QINs and the three classical interpolation methods. The results indicate that the QIN method is the most accurate method tested. The difference in accuracy rank seems to be caused by the locations of the data points sampled. Although the QIN method has drawbacks, it is an alternative method for DTM construction.


Introduction
A digital terrain model (DTM) is an ordered set of sampled data points that represent the spatial distribution of various types of information on the terrain [1][2][3] and one of the most terrain information is elevation. When terrain information is one dimensional and only contains elevation, the model is equivalent to DEM (Digital Elevation Model). Obviously, DEM is a subset of DTM and the most fundamental component of DTM. In this paper, elevation is the only terrain information we interested in. DTMs can be generated from many sources of data: field surveys, existing topographic maps, stereo aerial photographs (at various scales) or satellite images [4]. The sampling pattern determines the method used for the construction of the DTM. There are two competing methods used to represent terrain: regular grids, or gridded DEMs, and triangulated irregular networks, or TINs. Scattered elevation points, which are one of the main data sources for DTMs, can be converted to TINs directly using linear interpolation [5] or converted into regularly gridded data points to construct gridded DEMs [6]. However, defects in the computational efficiency and accuracy of both methods have long been concerns.
The TIN method is considered optimal for constructing a DTM from irregularly located point elevations. TIN consists of a set of triangular facets defined by three edges, with each edge bounded by two vertices [7]. For most existing TIN-building methods, the task of the user is to select a set of points that best approximate the terrain surface according to a series of building criteria. For scattered point elevations, each point is used as a vertex for at least three triangles. The TIN method is known for its efficiency of data storage and the simplicity of its data structure for accommodating irregularly spaced elevation points [8]. However, the TIN method sometimes requires a large amount of data storage space to maintain the topology among each element, which lowers the speed of the algorithm. This is especially true for the wide range of landscapes [9] that are inevitable in some applications of DTM construction, such as hydrological analysis [10][11][12].
A gridded DEM represents the terrain surface by a regular square grid of sample elevation, gives a uniform intensity of sampling and is also known as the "altitude matrix" [13]. Gridded DEMs are used more frequently because of the wide availability and relatively low cost of data storage they provide. Prominent examples of gridded DEMs are the USGS 7.5-minute grid DEM and the SRTM data. Grid DEMs require much less computer storage allocation because only the elevation data at each point need to be stored; the x and y coordinates can be calculated from the starting point and grid spacing interval. In other DTM representation methods, these coordinates must be explicitly indicated [14]. Another advantage is that the topology is simple [15] because the neighbours of a selected point can be readily obtained from the "altitude matrix" to calculate the geomorphological parameters. Given this advantage, a wide variety of computer programs have been created based on gridded data. However, this method does have several disadvantages: (1) With a uniform grid spacing interval, grid DEMs tend to be redundant in smooth areas and are unable to portray small landscape features in areas where elevation varies rapidly over short distances; (2) With scattered elevation points (irregular spacing), the data must be converted into regularly spaced points by some method, such as TIN. According to error-propagation theory, DTM errors will propagate through this manipulation and manifest themselves in the final products [6,16].
At present, neither method has been conclusively shown to be superior over a wide variety of terrain types [17]. Especially for scattered elevation point source data, the TIN method is not adopted for large areas of DTM construction, which increases the computation cost and lacks the efficiency necessary for some disciplines, such as hydrology [4]. Gridded DEMs are constructed from the spatial derivatives of point elevations without greatly considering the original point elevations, which makes the errors in the DTM larger. Even with recent improvements in computer hardware, there are still limitations in storage space and the computational cost of running the method. Considering these problems with DTM accuracy and computation efficiency, a new method that avoids the flaws of these two methods should be developed.
In this paper, a new method of DTM construction based on the quadrangular irregular network (QIN) is developed. The related research was first mentioned by Ferguson [18]. Hessing investigated automatic contouring with this method for small amounts of data [19]. Wu applied this method to relief generalization [20]. These studies attempted to establish a relationship between a QIN method and a DTM; however, the specific advantages of this method are not elaborated clearly, and the studies lack a comparative analysis of the accuracy of the DTM product.
A QIN method consists of a set of irregular quadrilateral meshes, where each mesh contains four vertices. All original scattered elevation points are regarded as quadrilateral vertices, and a series of new points is calculated by fitting a polynomial to the surrounding elevation points. All these vertices are organized in a topological matrix similar to the data structure of the gridded DEM. Therefore, the computer programs designed for gridded DEMs apply to QINs, ignoring the differences in the spatial interval.
To obtain the elevation (Z value) at a location on each mesh, two approaches are commonly employed [21]: 1. Each vertex is treated as the centre of a mesh. All locations on the mesh are assumed to have the same elevation as the central vertex.

2.
A continuous surface is fitted to each mesh using adjacent vertices. The elevation of each point on the mesh can be calculated from the surface equation.
It is clear that the first method cannot approximate the real-world continuous surface well. To improve the accuracy of DTMs, a large number of studies have focused on DTM construction for interpolations, such as IDW, SPLINE [22], KRIGING [23], optimization methods based on theorems of surfaces [24,25], fractal methods [26], and the Coons path method [16]. These interpolators can be constructed based on a quadrangular irregular network similar to the regular grid model because of their common topological structure. In this paper, we fit a polynomial surface to the irregular quadrilateral using bicubic interpolation, which preserves fine details better than common algorithms do. And an inter-comparison is drawn among this integral QIN method, other two interpolation methods (SPLINE and KRIGING) and TIN method, to analyse the simulation accuracy.
The rest of this paper is organized as follows. First, the construction rules of QIN is introduced, and then, the procedure of defining a smooth surface through a point matrix which is topologically equivalent to a planar rectangular grid is investigated. Second, a numerical test and a real-world example are employed to comparatively analyse the accuracy of QINs and that of the two classical interpolation methods SPLINE, KRIGING and TIN method. At last, discussion and conclusions are presented.

Principles of quadrangular irregular networks
The approximation of a surface by a set of points is important for DTMs. Several classical methods of solution have been proposed. A restriction of these methods is that the set of points must be rectangular sp aced (in the XY-plane). This method loosens the limitation of point distributions so that the point set is topologically equivalent to a planar rectangular grid [18].
Let D be the set of scattered elevation points d k = (x k , y k , z k ), k = 1,2,. . .,K. To obtain a point array {P n,m }, n = 1,2, . . ., N and m = 1,2, . . ., M, which is topologically equivalent to an N×M planar rectangular grid, adjacent points must be connected by straight-line segments in the point set D. The resulting point array {P n,m } divides the set D into "vertical subsets" V 1 , V 2 , . . ., V m , which satisfy the following principles [19]: 2. If d a and d b are members of the same vertical subset, the line connecting (x a , y a ) and (x b , y b ) must form an angle with the y-axis that is less than π/4.

3.
Each V m must contain at least one member of D.
4. If d a is a member of V 1 or V M , it must also be on the boundary of the convex hull of D.
The set D is also divided into "horizontal" subsets H 1 , H 2 , . . .,H n , which follow the same principles: 1. If d a is a member of H n and d b is a member of H n+1 , then y a <y b .
2. If d a and d b are members of the same horizontal subset H n , the line connecting (x a , y a ) and (x b , y b ) must form an angle with the x-axis that is less thanπ/4.

3.
Each H n must contain at least one member of D.
4. If d n is a member of H 1 or H N , it must also be on the boundary of the convex hull of D.
For maximum computational efficiency, π/4 is the best threshold: it keeps the vertical and horizontal point sets to a minimum and guarantees the characteristics of the topological matrix. If the angle is less thanπ/4, multiple new points will be produced. If the angle is more thanπ/4, the straight line segment connecting the known point to the points in the coincided area of the angle will be assigned to the horizontal vector and the vertical vector simultaneously.
Let I n,m = H n \V m . I n,m indicates the intersection of one horizontal vector and one vertical vector; there is only, at most, one element belonging to two different vectors. There are two situations for I n,m : (1) If I n,m is nonempty, call the element it contains P n,m ; (2) If I n,m is empty, a derivative point must be constructed that will satisfy conditions (1) and (2)

Calculation of new points
After the above steps, the original data points are assigned to horizontal and vertical vectors and the values of M and N, which represent the number of the horizontal and vertical vector, respectively, are already clear. To determine the x and y coordinates of a new point I n,m , two linear equations involving line segments through the data points in V m and H n are used to calculate the intersection. For large numbers of scattered elevation points, the cost of computation is large and optimization should be considered; for example, an optimization that allows the calculation of a series of new points from two original data points to be performed only once.
To compute the entire quadrangular irregular network, the z values of new points must be obtained. A number of methods can be utilized, such as the inverse distance weighted (IDW) method, the rectangle-based blending method [27,28], triangle-based blending methods [29], finite-element-based methods, Foley's methods, and global-basis-function-type methods. For large numbers of elevation data points, to improve computational efficiency, storage requirements and ease of implementation, an inverse-distance-weighted method is employed in this paper. The basic function is given by [30,31], where w k (x, y) is the weight function, which has a variety of forms of expression, depending on data availability, and f k is the function of known data.
To improve accuracy, two factors of the weighting function must be considered: distance and direction. As Tobler's First Law [32] states, nearby points make greater contributions than distant points to any interpolated value; distant data points tend to influence the surface at higher order. For uneven data points, direction can be used to adjust the number of data points involved in the interpolation calculations. Considering these two factors, the interpolation method described below was proposed by Hessing [19]: Let (x n,m , y n,m ) be a new point which must be interpolated, and let (a i ,b i ,c i ), for i = 1, 2, . . ., I, be the data points. The circle centred at point (x n,m , y n,m ) is subdivided into 4L sections and the weights of the interpolation function and y i ¼ tan À1 ½ðb i À y n;m Þ=ða i À x n;m Þ; ð3Þ where 1 i I. For each l, 0 l L-1, four sectors are defined as follows: pl=2L y < p=2 þ pl=2L; p=2 þ pl=2L y < p þ pl=2L; p þ pl=2L y < 3p=2 þ pl=2L; 3p=2 þ pl=2L y < 0 or 0 y < pl=2L: Let J i,l be the number of data points (a j , b j , c j ) such that θ i and θ j are in the same sector and d j 2 < d i 2 . The weight function of point (a i , b i , c i ) is described as.
For accuracy and efficiency of the interpolation algorithm, W 0 = 1, W 1 = 1/2, . . ., W K = W K-1 /2 and L = 10, K = 4 are the parameter values suggested by Hessing after a series of experiments. This method follows Tobler's First Law and other derivative rules, such as giving relatively isolated data points more weight and giving clustered data points less weight to an even greater degree than isolated points. One important factor for interpolation is the sample strategy. In this method, K = 4 determines the number of data points, which means that up to 5 data points can be involved in a computation. However, if the value of d i is large, the contribution of each point is small, and errors may be introduced. Therefore, a threshold R for d i should be used. Suppose N is the total number of data points and A is the area of the convex hull of all data points; then where R is defined as 10 times the average spacing of the data points. After these configurations, the interpolation can be easily implemented. New points interpolated from data points affect the accuracy of the entire QIN. A detailed error analysis will be presented in "Analysis" section.

Construction of a bicubic surface
The bicubic surface method was originally developed by Ferguson in the 1960s for use in the car industry [18]. The basic idea of a Ferguson surface is to define a smooth surface through a point array whose arrangement is topologically equivalent to a planar rectangular grid, to extend the permission of point array distributions. The resulting solution is a smooth composite of parametric surface segments which is second order continuous. A Ferguson surface can be expressed by the following function: S n;m ðu; vÞ ¼ ½X n;m ðu; vÞ; Y n;m ðu; vÞ; Z n;m ðu; vÞ ð6Þ where 0 u 1, 0 v 1 q¼0 a n;m p;q u p v q ; ð7Þ Y n;m ðu; vÞ ¼ X 3 p¼0 X 3 q¼0 b n;m p;q u p v q ; ð8Þ Z n;m ðu; vÞ ¼ X 3 p¼0 X 3 q¼0 c n;m p;q u p v q : There are thus 16 coefficients to determine. The Ferguson surface contains the four corner points of each quadrilateral bounded by the curves [X n,m (0, v), Y n,m (0, v)], [X n,m (u, 0), Y n,m (u, 0)], [X n,m (1, v), Y n,m (1, v)], and [X n,m (u, 1), Y n,m (u, 1)]. Therefore, the conditions can be obtained as follows (Fig 2): Let R n,m be the partial derivatives of u and T n , m be the partial derivatives of v. To meet the second order continuous condition, additional conditions are also needed:

> > > > > < > > > > > :
and @S n;m @v ¼  ; U ¼ ½u 3 u 2 u 1; V ¼ ½v 3 v 2 v 1 and P u;v n;m ¼ @ 2 S @u@v j u¼n;v¼m . Eq 13 is called a Coons bicubic surface, which is difficult to compute because the mixed partial derivatives of the four corner points are difficult to determine. Therefore, a Ferguson surface is The quadrilateral with four corner points P n,m , P n,m+1 , P n+1,m , P n+1,m+1 and the partial derivatives R n,m , T n,m on data point P n,m .
Now, to obtain all the coefficients, we focus on the calculation of R n,m and T n,m near the data point P n,m . A method that considers the distance-based weight was proposed by Hessing. Let 2 ðP n;m À P n;mÀ1 Þ þ 1 D 2 2 ðP n;mþ1 À P n;m Þ ; D 1 2 ¼ ðx n;m À x n;mÀ1 Þ 2 þ ðy n;m À y n;mÀ1 Þ 2 and D 2 2 ¼ ðx n;mþ1 À x n;m Þ 2 þ ðy n;mþ1 À y n;m Þ 2 : Thus E = (E x , E y , E z ) is the weighed sum of P n,m-P n,m-1 and P n,m+1-P n,m . Finally, the equations are complete: R n;m ¼ ðMinðx n;m Àx n;mÀ1 ;x n;mþ1 Àx n;m ÞÞ E x and T n;m ¼ ðMinðy n;m Ày n;mÀ1 ;y n;mþ1 Ày n;m ÞÞ E y . After the above configurations, a second order continuous surface has been constructed that covers all the points, and each mesh of the surface is constructed by one quadrilateral. The surface represents the true terrain elevations and can be treated as data based on digital terrain analysis.

Numerical test
The Peaks surface.
was used as a numerical test surface. The true value at each point can be determined by the equation to avoid uncertainty introduced by uncontrollable data errors present in real world data (Fig 3). The computational domains of x, y and z are [-3,3] by [-3,3] by [-6,6], respectively. 3600 points were randomly chosen as the original data points. The sample strategy is that the XY-plane was divided into 60×60 grids. In each grid, a point was sampled by calculating its x and y coordinates randomly; its z value was then calculated from the Peaks equation (Fig 4). The x and y values of 7308 new points were calculated from the intersection of two line segments, and the z values of these points were calculated by the Hessing IDW algorithm (Fig 5). After these processing steps, the 101×108 quadrangular irregular networks were constructed successfully. Fig 6 indicates the distribution of all data points, and Fig 7 indicates the quadrangular networks constructed by the Hessing rules.
In addition to the QIN method, there are some classical interpolation methods including KRIGING, SPLINE which are widely used in GIS applications. These two interpolation methods and TIN method were employed to comparatively analyse the errors of the QIN method. The spatial analyst and 3D analyst toolboxes in ArcGIS 10.0 were used to perform these three classical methods; all default parameters of the software were kept. For KRIGING interpolation, the ordinary option was selected, the semivariogram model was spherical, the search radius was variable and the maximum number of search points was 12. For SPLINE interpolation, the REGULARIZED option was used, the weight was 0.1 and the number of searched points was 12. For TIN method, the original data points were employed as the input feature class.  The domain of the Peaks surface was orthogonally divided into 10,000 (100×100) lattices, so there were 10,000 check points that were interpolated by the three classical methods to comparatively analyse the accuracy of the QIN method. The root mean-square error (RMSE) is expressed as [25] RMSE ¼ where sf i,j is the numerical result at the point (x i ,y i ) calculated by the Peaks equation and f i,j is the interpolated value of the simulated surface at point (x i ,y i ). The QIN construction process consists of two steps. First, the 7308 new data points were calculated by Hessing IDW methods. Second, the 10,000 check points were calculated by the bicubic method. Accuracy of all the new data points was calculated as Table 1 shows, KRIGING has the lowest RMSE value, and Hessing IDW has the highest RMSE value. The absolute  maximum error and the standard deviation of Hessing IDW are also the highest of any method. In this step, KRIGING has the best accuracy, followed by SPLINE and TIN, and Hessing IDW performs worst. An error matrix [E i,j ] for each method is constructed to express the spatial distribution of errors; the E i,j are formulated as where f i,j is the true value of the surface and Sf i,j is the simulated value of the surface at the point (i,j) [25]. Fig 7 indicates that these four methods have a relatively concentrated and negative error region, which is coloured blue in the figures; this area is concave in the Peaks surface. The error distribution of Hessing IDW and SPLINE is more complex, while that of the KRI-GING is more uniform.
Hessing IDW performs the worst in the first step, but this does not mean that the QIN method has a higher error, because this accuracy analysis is just about the new data points in the first step. With these new data points, the QIN meshes were constructed completely and the bicubic surface can be constructed based on the meshes. As Table 2 shows, the QIN method actually delivers the second best accuracy. SPLINE has the highest accuracy, and TIN performs the worst. Therefore, the overall accuracy ranking is different from that of the first step. With the increased number of points involved in the interpolation calculation, the accuracy of both the SPLINE and QIN methods improve. Fig 8 shows that the accuracy of QINs is greatly affected in areas with steep slopes; the Peaks surface has four such areas of high error. SPLINE and KRIGING show a concentrated area of high error, and overall, the simulated z value is lower than the true value. TIN also has a region of high error, but in other areas the result is closer to the true value. Fig 9 shows the relationship between errors at new points and check points. This analysis indicates that there is no strong trend between these two quantities. Table 3 shows the Pearson product-moment correlation coefficients (PPCC) among the four parameters, which is formulated as where cov(X, Y) is the covariance of X and Y, and σ X and σ Y are the standard deviation of X and A real-world example 1209 elevation data points (Fig 10A) taken as original points and 6209 check points (Fig 10B) were randomly taken from a 1:50000 contour map to construct a DTM using QIN method, TIN method, SPLINE and KRIGING. The errors about these methods were comparatively analysed. The study area is located in Midu County, in the centre of Yunnan Province, China ( Fig 11). The elevation ranges from 0 to 2620 meters and is suitable for a comparative study of DTM accuracy. Fig 12 shows the relationship between the heights of the interpolated points and the corresponding observed heights of the points. This analysis indicates that the QIN method has the highest interpolation accuracy. The results from the SPLINE simulation appear more discrete and are lower than the line y = x overall. The KRIGING method has fewer points than SPLINE and tends to overestimate the elevations of lower check points. The TIN method has very few discrete simulation points and overestimates the elevations of lower check points even more than the KRIGING method does. The QIN method has the best result, but there are also some discrete simulation points.

Discussion
The numerical test demonstrates that the accuracy of the QIN method is lower than that of the SPLINE method but higher than that of the KRIGING and TIN methods. In the real-world example, the accuracy of a QIN is the highest. The strategy of sampling discrete points, including original data points and check points, is the main reason why the QIN method does not achieve the highest accuracy in the numerical tests.
According to sampling theory, if there is a profile that can be expressed as a series of sine and cosine waves and is long enough to represent local terrain, a maximum frequency F can be  resolved in the dataset. If the sampling interval is more than 1/(2F) along the terrain profile, the DTM cannot be reconstructed completely. Such rigorous sampling rules were not used in either test conducted. However, under these sampling conditions, the QIN method is also a better choice for DTM construction. Once the QIN model is built, the classical methods of digital terrain analysis can be applied based on this model, which has the same topology as a gridded DTM. However, the QIN model has huge computational and storage costs because it must generate a series of new data points and restore them for matrix structure construction. In the real-world test, there were 12,413 new data points generated, which was 10 times the number of original data points. To resolve this problem, two alternatives can be considered: 1) storing no new data points in data files, or 2) dividing the original data points into several clusters and constructing a QIN model based on each cluster, then integrating these QIN models. This is a direction of future research.