Empirical approach to threshold determination for the delineation of built-up areas with road network data

Various approaches have been proposed to address the delineation of built-up areas for a wide range of applications. Recently developed approaches are based on the increasing availability of road network data. However, most approaches have employed one or more parameters to divide built-up from non-built-up areas. Very few studies have discussed how to determine appropriate thresholds for such parameters. This study employed an empirical approach for threshold determination, and validated that the approach is applicable for the delineation of built-up areas using road network data. A series of experiments were designed to investigate the most-appropriate thresholds (determined using a similarity measure) for multiple parameters of three existing approaches (street blocks, grid-based, and kernel density) with regard to different administrative regions and cities/towns. The results show that in most cases, the most-appropriate thresholds or ranges for different subdivisions are either identical or overlap—thus validating the use of the most-appropriate thresholds to delineate built-up areas for one or multiple small subdivisions and, by inference, for a much larger region.


Introduction
The delineation of built-up areas, to create polygons that represent built-up areas in a region [1], can be used for applications such as predicting population growth [2], representing urban spatial development [3][4], identifying urban sprawls [5][6][7][8] and mapping land-use patterns [9]. Extensive studies have been focused on delineating built-up areas with different source data, including census data [10][11][12], postcode data [13], remote sensing data [14][15][16][17][18][19], settlement and buildings data [20][21], "check-in" data (e.g. Flickr or Sina Weibo [22][23]), and even multiple source data [8]. Nowadays, road network data have become increasingly available. Many authoritative mapping agencies have begun to open their data to the public (e.g. TIGER/ Line1 and Ordnance Survey OpenData), and along with the development of Web 2.0 technology, a number of platforms (e.g. OpenStreetMap and Wikimapia) support volunteers in the creation and distribution of free geographic data for the world. Therefore, the delineation of built-up areas with road network data has received much attention to [1,6,7,24]. For instance, PLOS  have also been used for evaluating the automatic delineation of built-up areas in previous studies [1,20].

Approaches and parameters to be tested
Several approaches have been proposed for the delineation of built-up areas using road network data. This section briefly introduces the three typical approaches-the approach based on street blocks, the grid-based approach, and the kernel density approach-that will be tested. More details are available in previous research work [1,24,25]. The approach based on street blocks. A street block can be viewed as a closed region formed by one or several connected roads [1]. The approach based on street blocks has two general steps [25]: 1. Calculate the land areas of all the street blocks (e.g., "4" and "16" in " Fig 2B") in a road network data.
2. Delineate the street blocks whose land areas are smaller than a threshold (called the street block size) as built-up areas (e.g., "4"in " Fig 2B").
The grid-based approach. The grid-based approach may have four steps [24]: 1. Create a regular grid (e.g. 0.5×0.5 km 2 ); 2. Intersect this grid with road network data; 3. Calculate the density (D i ) of each grid cell according to the formula below [1]: where L i denotes the total length of roads located in the i th grid cell; and A i denotes the area of the i th grid cell. 4. Delineate the grid cells whose densities are larger than a threshold (called the grid density) as built-up areas (e.g. "3"in Fig 2C).
The grid-based approach involves two parameters. In addition to the cell density, the cell size is also an essential parameter because the cell density may vary with different cell sizes.
The kernel density approach. The kernel density approach may have two general steps [24]: 1. Calculate the density of each cell (e.g. "1", "2", "3", "4", and "5" in " Fig 2D") using the kernel density estimation:l wherelðsÞ is the estimated density of the cell measured at location s (i.e. the centroid of this cell), τ is the bandwidth, n is the number of neighboring road intersections of location s; and s i is the i th road intersection within distance τ of location s; k (. . .) is the kernel function.
2. Delineate the grid cells whose kernel densities are larger than a threshold (called the kernel density) as built-up areas (e.g. "3" and larger values in " Fig 2D").
The kernel density approach also involves two parameters-the bandwidth (the radius of the dot circles in " Fig 2D"), and the kernel density. With the bandwidth, the cell size may be set as small as possible. However, if the cell size is set too small, the number of cells increases dramatically and the approach may fail due to the ''out of memory'' problem [1]. Thus, by using this approach, we fixed the cell size at 0.1×0.1 km 2 .

Experimental steps
An experiment was designed to investigate whether the most-appropriate thresholds for the delineation of built-up areas are the same or similar for different subdivisions of a road network. Specifically: 1. Divide a large road network into subdivisions.
2. Determine the most-appropriate thresholds for the delineation of built-up areas for each subdivision.
3. Compare the most-appropriate thresholds for different subdivisions.
Divide a large road network into subdivisions. Two division modes were considered: 1. The road network of the North Island was subdivided into nine non-overlapping road networks using the administrative boundary data in order to investigate the most-appropriate thresholds for the different administrative districts (" Fig 3A").
2. The road network of the North Island was also subdivided into smaller road networks in order to investigate the most-appropriate thresholds for cities or towns of different sizes. As an example, a total of 33 subdivisions in three (Auckland, Wellington and Hawke's Bay) out of the nine administrative districts were manually chosen (" Fig 3B", " Fig 3C," "Fig 3D") by referring to the actual built-up areas in the corresponding benchmark. The rules for selection include: first, each subdivision covers the built-up areas in the benchmark of only one city or town, but this subdivision should be larger than the corresponding built-up areas in the benchmark; and second, the selected built-up areas in different subdivisions should vary dramatically in size. Specifically, the size of the largest built-up area (Auckland City) among the 33 subdivisions is 302.627 km 2 , while that of the smallest one (Tikokino) is only 0.169 km 2 .
Determine the most-appropriate threshold for each subdivision. The basic idea is to first automatically delineate built-up areas with different thresholds. The delineated built-up areas are then compared with those in the corresponding benchmark by calculating a similarity measure. Finally, the threshold corresponding to the highest similarity value is determined as the most-appropriate threshold. The similarity value, ranging from 0 to 1, can be calculated as [26,27]: where A is the land area of the automatically delineated built-up areas; B is the land area of the built-up areas in the corresponding benchmark; and A \ B is the land area of the built-up areas delineated in common. " Fig 4" plots the similarity distributions for the North Island of New Zealand, using three approaches for the delineation of built-up areas. The x-axis denotes various thresholds for one parameter (e.g., street block size, cell density or kernel density), and the y-axis denotes the corresponding similarity value. In this study, these three approaches were implemented using commercial GIS software (ArcGIS, version 9.3) and freeware (OpenJUMP, version 1.6.0). The details steps to implement these approaches have been reported in a previous study [1]. It can be seen in "Fig 4" that all the similarity distributions have the same trend. That is, the similarity value first increases along with an increase of the threshold, and then it begins to decrease after  going through a peak. The threshold corresponding to this peak can be viewed as the mostappropriate threshold (highlighted with a vertical dotted line). More precisely, the most-appropriate threshold for the street block size is 0.6×0.6 (km 2 ), while those for the cell density and the kernel density are 6 (km/km 2 ) and 24 (num/km 2 ), respectively. Based on the similarity distributions in "Fig 4", the parameters and thresholds to be tested in this study are listed in " Table 1".
Compare the most-appropriate thresholds for different subdivisions. The most-appropriate thresholds for the nine administrative districts and 33 test regions covering cities or towns were determined. Initially, the most-appropriate thresholds were compared to find out if they were identical. If the most-appropriate thresholds turned out not to be identical, the appropriate threshold ranges were visually determined to investigate whether the threshold ranges overlapped.
The head/tail break, as an existing classification method, was also employed for comparison purposes. The head/tail break generally applies a break rule for threshold determination. The break rule is defined as "Given a variable X, if its values x follow a heavy-tailed distribution, then the mean of the values can divide all the values into two parts: a high percentage in the tail, and a low percentage in the head." [25]. It has been found that all the street block sizes of a road network follow the heavy-tailed distribution. The mean size was then used as a threshold to divide all the street blocks into built-up areas (in the tail) and non-built-up areas (in the head) [25]. When the first mean is not perfect, the break may continue if all the street block sizes in the tail still follow the heavy-tailed distribution. Thereby, the second mean, the third mean, the fourth mean and so on may be obtained. The break stops if the percentage in the head is larger than 40% [28]. This condition can be relaxed by 50%, or even more if the percentage in the head remained less than 40% in a subsequent break. This study also employs the head/tail break method to determine thresholds for the street block size, and also for the cell density and kernel density; these thresholds are also evaluated using the similarity measure.

Results of the similarity distributions for the nine administrative districts of the North Island
Results for the approach based on street blocks. " Fig 5" plots the similarity distributions for the nine administrative districts of the North Island, using the approach based on street blocks. The threshold for the only parameter (street block size) varies from 0.1×0.1 to 2.0×2.0 km 2 . For each administrative district, the most-appropriate threshold for the street block size is highlighted with a vertical, solid line.
The following can be observed from " Fig 5": 1. All the distributions are similar to those plotted in " Fig 4A". That is, the similarity value first increases with an increase of the street block size, and then it begins to decrease after going through a peak.  2. The most-appropriate thresholds for the street block size are all located within 0.4×0.4 to 0.7×0.7km 2 . To be specific, the most-appropriate thresholds for five out of the nine administrative districts are 0.7×0.7km 2 , while those for three out of the nine administrative districts are 0.5×0.5km 2 .
3. An appropriate threshold range (highlighted with two vertical dotted lines), in which all the corresponding similarity values are no more than 0.05 different from that of the mostappropriate threshold, was determined for each administrative district. For instance, the appropriate threshold range for Northland is between 0.4×0.4 and 0.9×0.9 km 2 , while that of Auckland is 0.5×0.5 to 1.0×1.0 km 2 . More importantly, all the appropriate threshold ranges overlap in the nine administrative districts.
Results for the grid-based approach. " Fig 6" plots the similarity distributions for the nine administrative districts of the North Island using the grid-based approach. The gridbased approach involves two parameters: cell size (ranging from 0.3×0.3 to 0.9×0.9 km 2 ), and cell density (ranging from 1 to 15 km/km 2 ).
The following can be observed from " 2. For each cell size, the most-appropriate thresholds for the cell density are either the same or only one apart (e.g., 6-7 km/km 2 for the cell size 0.5×0.5 km 2 ).
The maximum similarity values for the different cell sizes were compared (" Table 2"). It can be seen in " Table 2" that the most-appropriate cell size appears to be 0.5×0.5 km 2 , for which the maximum similarity value becomes the greatest for seven out of the nine administrative districts.
Results for the kernel density approach. " Fig 7" plots the similarity distributions for the nine administrative districts of North Island using the kernel density approach. The two parameters, bandwidth (ranging from 0.3 to 0.9 km), and kernel density (ranging from 1 to 60 num/km 2 ) in the kernel density approach were analyzed.
The following can be observed from " Fig 7": 1. The most-appropriate thresholds are located within a certain range (e.g., 22-28 num/km 2 for the bandwidth 0.5 km), in which the similarity values are either the same or close to each other.
" Table 3" further lists the maximum similarity values for different bandwidths, tested on the nine administrative districts of the North Island. It can be seen in " Table 3" that the mostappropriate bandwidth is 0.5 km for six out of the nine administrative districts and 0.7 km for another three districts.

Results of the similarity distributions for 33 cities/towns on the North Island
" Table 4" lists the most-appropriate thresholds or ranges for the parameters (i.e., street block size, cell density and kernel density) tested on the 33 subdivisions of the North Island, also see S2 File. The threshold for the cell size was fixed at 0.5×0.5 km 2 while the bandwidth was fixed at 0.5 km, the value at which the corresponding similarity value becomes a maximum in most cases ("Tables 2 and 3").
The following can be observed from " Table 4": 1. There may be multiple, most-appropriate thresholds, especially for the subdivisions (e.g., Orewa and Waipawa) with relatively small built-up areas. This is because, typically, the smaller the size of a city or town, the smaller the number of street blocks or grids to be delineated as built-up areas. It is therefore possible to delineate the same built-up areas with different thresholds.
2. For most of the subdivisions, the most-appropriate thresholds for a parameter (e.g., street block size, cell density or kernel density) are either the same or close to each other. For instance, in 29 out of the 33 subdivisions, the most-appropriate thresholds or ranges for the street block size are either located within or overlap with 0.4×0.4 to 0.9×0.9 km 2 , which also overlaps with those found in " Fig 5". In 28 out of the 33 subdivisions, the most-appropriate thresholds for the cell density are located within 5 to 7 km/km 2 , which is almost the same as the value (6 to 7 km/km 2 ) found in " Fig 6B". In 26 out of the 33 subdivisions, the mostappropriate thresholds for the kernel density are located within 18 to 31 num/km 2 . These thresholds are either within, or close to, the values (22 to 28 num/km 2 ) found in " Fig 7B".
3. However, the ranges of the above most-appropriate thresholds become wider. For instance, the range for the street block size varies from 0.2×0.2 to 1.7×1.7 km 2 . The corresponding result for the cell density varies from 4 to 8 km/km 2 and for the kernel density, it varies from 10 to 32 num/km 2 . This illustrates that the most-appropriate thresholds for multiple parameters may vary dramatically for cities or towns.

Results of the head/tail break
"Tables 5-7" list the thresholds obtained using the head/tail break and corresponding similarity values for the approach based on street blocks, the grid-based approach and the kernel density approach, respectively. The thresholds for some parameters (e.g., cell size and bandwidth) and the 33 cities/towns of the North Island are not listed here because their values cannot follow the heavy-tailed distribution.
The following can be observed from "Tables 5-7": 1. The values of both the street block size and kernel density follow the heavy-tailed distribution for the ten study cases. Values of cell density also follow the heavy-tailed distribution, Empirical approach to threshold determination for the delineation of built-up areas with road network data in most study cases, if the relaxation condition is considered (i.e., the percentage in the head is allowed to be larger than 40%).
2. Most of the maximum similarity values obtained using the head/tail break (highlighted with gray) are similar (i.e., with a difference no more than 0.05) to those in " Table 8". However, the number of breaks needed for a maximum similarity value may vary between study cases. As an example, in " Table 7", the second mean is the most-appropriate threshold for the study case of Auckland, but the fifth mean is the most-appropriate threshold for the study case of Northland.
3. The mean for the maximum similarity value is either located within or close to the appropriate range. (e.g., 0.4×0.4 to 0.9×0.9 km 2 for the street block size, 5-7 km/km 2 for the cell density, and 18-31num/km 2 for the kernel density) found in " Table 4". The similarity value may, however, be much lower if the threshold obtained using the head/tail break is outside of the appropriate range (as is seen in the study case of Northland in " Table 5"). Empirical threshold ranges may therefore also be used to determine an appropriate mean for the head/tail break.

Using a different benchmark and evaluation measure
In the previous tests, the benchmark for evaluation consisted of building and residential data only. However, the actual built-up areas may include not only these features, but also roads, school playing fields, and factories. Therefore, the actual built-up areas may be larger than those marked in the benchmarks. An investigation was made into whether the most-appropriate thresholds or ranges may vary with a different benchmark and evaluation measure. First, the artificial surfaces acquired from GlobeLand30-2010 (http://globallandcover.com/GLC30Download/index.aspx), a mapping product of global land cover at 30-meter spatial resolution derived from remote sensing images in 2010, was used as the built-up areas in the benchmark, see S3 File. Next, two evaluation measures, the similarity measure (M 1 ), and an integrated measure (M 2 ) averaging both correctness and completeness with equal weights, proposed in an existing study [1], were employed to compare the automatically delineated built-up areas and the corresponding benchmark. As an example, " Table 9" lists the most-appropriate thresholds or ranges for 21 out of the 33 subdivisions of the North Island, using the GlobeLand30-2010 and the above evaluation measures, also see S4 File. The results of the other 12 subdivisions of the North Island are not listed here because the corresponding artificial surfaces for such a subdivision were not available in the GlobeLand30-2010.
The following can be observed from " Table 9": 1. The appropriate threshold ranges are almost the same as those found in " Table 4". For example, the most-appropriate thresholds or ranges for the street block size, cell density, and kernel density are mostly located within 0.4×0.4 to 1.0×1.0 km 2 , 5-7 km/km 2, and 17-28 num/km 2 , respectively; these ranges are almost the same as those found in "Fig 4" (0.4×0.4 to 0.9×0.9 km 2 , 5-7 km/km 2, and 18-31 num/km 2 ) respectively. The majority of Empirical approach to threshold determination for the delineation of built-up areas with road network data the most-appropriate thresholds or ranges are either are the same or overlap, even when using a different benchmark.
2. The majority of the most-appropriate thresholds or ranges are the same, irrespective of the two different evaluation measures (M 1 and M 2 ) used, which indicates the effectiveness of the similarity measure. Moreover, the similarity values are more sensitive to different parameter thresholds, whereas the integrated values are much closer to each other (" Fig 8"). If all the grid cells were divided into built-up areas, the accuracy would be very low, but the completeness would then be as high as 100%, and the integrated value would be larger than 50%. Thus, the similarity measure was selected to determine an appropriate threshold or range.

Using a different study area
The road network data of the West Midlands region of England (at 1: 25,000 scale) were also used for testing. This study area was chosen for three reasons.  which makes a comparison of the most-appropriate thresholds for different counties possible. Second, the size of built-up areas varies with different counties. For instance, in both Stokeon-Trent and West Midlands counties, most areas are built-up areas; but in other counties, most areas are non-built-up areas. Third and most importantly, the road network data of the West Midlands region were freely acquired from the open data provided by Ordnance Survey (https://www.ordnancesurvey.co.uk/). The corresponding artificial surfaces were also freely acquired from the GlobeLand30-2010 and were also used as the benchmarks, see S3 File. " Fig 10" shows that the most-appropriate thresholds are located within 0.4×0.4 to 1.0×1.0 km 2 , 6-9 km/km 2 respectively using the approach based on street blocks and the grid-based approach; these ranges are almost the same as those found before (" Fig 4" and " Table 9"). For the kernel density approach, seven out of the nine, most-appropriate thresholds (located within 13-15 num/km 2 ) are very close to each other. Although the most-appropriate thresholds for West Midlands county and Stoke-on-Trent county were much smaller (e.g., 8 and 9 num/km 2 , respectively), the most-appropriate threshold ranges for these two counties (e.g., 3-14 and 3-17 num/km 2 , respectively) still overlap with those of other counties. However, the similarity distributions for both West Midlands county and Stoke-on-Trent county are Empirical approach to threshold determination for the delineation of built-up areas with road network data Empirical approach to threshold determination for the delineation of built-up areas with road network data different from those for other counties because most areas in these two counties are built-up. Even when all the street blocks or grid cells were delineated as built-up areas, the corresponding similarity values were still higher than 0.6 (" Fig 10"). Furthermore, the artificial surfaces in the GlobeLand30-2010 suffer from classification errors. The overall accuracy of GlobeLand30-2010 (http://www.globeland30.org/home/ Enbackground.aspx) was reported as 80.33%. Therefore, some small cities or towns were not used for comparison. Although the use of other source data (e.g., census data and "check-in" data) may offer alternatives, they also have limitations. For instances, check-in data often contains biases because not everyone checks in or uses social media. As it is difficult to precisely identify built-up areas, we suggest using different source data as benchmarks in order to minimize subjectivity.

Conclusion and discussions
This study employed an empirical approach to determine appropriate thresholds for multiple parameters in the delineation of built-up areas using road network data. Specifically, the five parameters (street block size, cell size, cell density, bandwidth and kernel density) of the three typical approaches (the approach based on street blocks, the grid-based approach and the kernel density approach) were tested. That is, extensive experiments were carried out to investigate the most-appropriate thresholds for various parameters in these approaches. The North Island of New Zealand was chosen as the study area, with road network data used as source data, and the corresponding building and residential data used as benchmarks. The road network was divided into nine administrative subdivisions and 33 different city/town subdivisions. The built-up areas of each subdivision were automatically delineated with different Empirical approach to threshold determination for the delineation of built-up areas with road network data approaches and thresholds. For each subdivision, the most-appropriate threshold was determined by calculating the similarity (or consistency) between an automatically delineated builtup area and a corresponding benchmark. A different benchmark (GlobeLand30-2010), an integrated measure, and a different study area (the West Midlands region of England) were used for validation of earlier results. Results show that in most cases, the most-appropriate thresholds for the different subdivisions were either the same or close to each other. However, the most-appropriate thresholds for some cities/towns varied dramatically.
The reasons for these results might be as follows: 1. The road network of a city/town is commonly designed based on principles and criteria proposed by the department of urban planning. Consequently, the street block size or cell density of a road network in a city/town region are organized so as to be not to be too large or too small. For instance, 95% of the street blocks in the built-up areas of either the West Empirical approach to threshold determination for the delineation of built-up areas with road network data Midlands region or North Island are smaller than 0.9×0.9 km 2 (" Fig 11A"). Such principles and criteria may be consistent within a country (" Fig 11A") and even be similar for different countries (" Fig 11B"). Most of the appropriate thresholds are therefore the same or similar.
2. However, different cities/towns often have different street network patterns (e.g. street blocks of different sizes and shapes), so the most-appropriate thresholds are not always the same and may even be quite different (" Fig 12").
Nevertheless, this study validates that the empirical approach [26] is also applicable to the delineation of built-up areas. That is, to first subdivide a large road network according to administrative boundaries, or smaller units (e.g., cities or towns), and then to apply the mostappropriate thresholds or ranges obtained from multiple subdivisions to infer the results for the larger one. The inference method may include calculating the average or median of multiple, most-appropriate thresholds, or overlapping multiple appropriate threshold ranges [26].
In future research, more road network data of different countries and/or regions will be used for testing the most-appropriate thresholds for various parameters. In addition, other source data (e.g. census data) will be used as benchmarks for automatic analysis of delineated Empirical approach to threshold determination for the delineation of built-up areas with road network data built-up areas. Moreover, it will still be necessary to develop an approach to adaptively determine the most-appropriate threshold for different cities/towns. Finally, it may be worth investigating whether the empirical approach is also applicable for determining appropriate thresholds for the delineation of built-up areas with different source data.