Empirical approach to threshold determination for the delineation of built-up areas with road network data

Qi Zhou; Lei Guo

doi:10.1371/journal.pone.0194806

Abstract

Various approaches have been proposed to address the delineation of built-up areas for a wide range of applications. Recently developed approaches are based on the increasing availability of road network data. However, most approaches have employed one or more parameters to divide built-up from non-built-up areas. Very few studies have discussed how to determine appropriate thresholds for such parameters. This study employed an empirical approach for threshold determination, and validated that the approach is applicable for the delineation of built-up areas using road network data. A series of experiments were designed to investigate the most-appropriate thresholds (determined using a similarity measure) for multiple parameters of three existing approaches (street blocks, grid-based, and kernel density) with regard to different administrative regions and cities/towns. The results show that in most cases, the most-appropriate thresholds or ranges for different subdivisions are either identical or overlap—thus validating the use of the most-appropriate thresholds to delineate built-up areas for one or multiple small subdivisions and, by inference, for a much larger region.

Citation: Zhou Q, Guo L (2018) Empirical approach to threshold determination for the delineation of built-up areas with road network data. PLoS ONE 13(3): e0194806. https://doi.org/10.1371/journal.pone.0194806

Editor: Juan A. Añel, Universidade de Vigo, SPAIN

Received: June 6, 2017; Accepted: March 9, 2018; Published: March 27, 2018

Copyright: © 2018 Zhou, Guo. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data underlying this study are third party and may be freely obtained and used after ordering from government agencies. To be specific, the road network, building and residential data of North Island was provided by the Land Information of New Zealand (https://data.linz.govt.nz/); the GlobeLand30-2010 data were provided by National Geomatics Center of China (http://globallandcover.com/User/Login.aspx); and the road network data of the West Midlands of England was provided by Ordnance Survey (https://www.ordnancesurvey.co.uk/opendatadownload/products.html#VMDVEC).

Funding: The project was supported by National Natural Science Foundation of China (No. 41771428 to QZ), Fundamental Research Funds for the Central Universities, China University of Geosciences (Wuhan) (G1323541711 to QZ), and Beijing Key Laboratory of Urban Spatial Information Engineering (2017213 to QZ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The delineation of built-up areas, to create polygons that represent built-up areas in a region[1], can be used for applications such as predicting population growth[2], representing urban spatial development[3–4], identifying urban sprawls[5–8] and mapping land-use patterns[9]. Extensive studies have been focused on delineating built-up areas with different source data, including census data[10–12], postcode data[13], remote sensing data[14–19], settlement and buildings data[20–21], “check-in” data (e.g. Flickr or Sina Weibo[22–23]), and even multiple source data[8]. Nowadays, road network data have become increasingly available. Many authoritative mapping agencies have begun to open their data to the public (e.g. TIGER/Line® and Ordnance Survey OpenData), and along with the development of Web 2.0 technology, a number of platforms (e.g. OpenStreetMap and Wikimapia) support volunteers in the creation and distribution of free geographic data for the world. Therefore, the delineation of built-up areas with road network data has received much attention to [1,6,7,24]. For instance, two approaches have been proposed—the grid-based and kernel density approaches—which were derived by analyzing the density of intersections in a road network[24]. The grid-based approach created a regular grid and calculated the number of road intersections in each grid cell; if the density of a grid cell was larger than a certain threshold, such a grid cell was most likely to be a built-up area. This approach has also been improved by first clustering road intersections [6]. The kernel density approach used the kernel density estimation (KDE) to calculate the density of road intersections in each grid cell and delineated built-up areas. A different approach based on street blocks was also proposed [25], and its steps were to first calculate the land areas (or sizes) of all the street blocks in a road network, and then to divide these street blocks into built-up areas and non-built-up areas.

Nevertheless, existing approaches involve one or more parameter(s) for the delineation of built-up areas with road network data. For instances, the approach based on street blocks involves at least one parameter called the street block size; both the grid-based and kernel density approaches involve two parameters—cell size and cell density and bandwidth and kernel density, respectively[1]. However, the thresholds for the above parameters were determined arbitrarily and subjectively in previous studies. Some suggested setting the bandwidth between 250 and 500 m and the cell size at 100 m [24]; whereas some others used a threshold of 500 m for the cell size [6]. In another study, the appropriate thresholds for these parameters were determined through visual analysis rather than quantitative assessment[1]. Threshold determination is a necessary step, especially for automatic delineation of a large number of built-up areas in a country/region. To our knowledge, very few studies have quantitatively analyzed the most-appropriate thresholds for various parameters; and more importantly, how to determine appropriate threshold(s) for the automatic delineation of a large number of built-up areas with road network data. While a threshold determination approach called the head/tail break (dividing many small things and a few large things according to the arithmetic mean) has been proposed [25], there is a need to determine the most-appropriate threshold from multiple breaks obtained using the head/tail break approach. An empirical approach was also proposed [26], and the tenet of it was to divide a large road network into several subdivisions, and to use the most-appropriate threshold(s), obtained from one or several subdivisions, to make inferences for large road networks. But, the empirical approach has only been used for selective omission in a road network data (i.e. retaining more important roads for the purpose of map generalization) and not for the delineation of built-up areas with road network data.

This study was inspired by the empirical approach [26]. The objective of this study is to validate whether or not the empirical approach is also applicable to the delineation of built-up areas with road network data. More specifically, we assumed that the most-appropriate thresholds/ranges are either the same or overlap, for the delineation of built-up areas in the different subdivisions of a road network. If this assumption is true, it is possible to use the most-appropriate thresholds obtained from one or multiple small subdivisions to make inferences for a much larger region. This study used five parameters of three typical approaches (the approach based on street blocks, the grid-based approach, and the kernel density approach) for analyses; and a similarity measure was used to determine the most-appropriate thresholds for various parameters. However, a comparison of these approaches, having already been reported in a previous research work [1], is beyond the scope of this study.

The paper is structured as follows: Section 2 describes the experimental data, benchmarks, approaches, and parameters to be tested for the delineation of built-up areas; and the measure used to determine and compare multiple appropriate thresholds. Section 3 analyzes the most-appropriate thresholds for various parameters. Section 4 further validates the results of using a different benchmark, evaluation measure and/or study area. Section 5 presents conclusion and discussions.

Experimental design

Data and benchmark

This study used the open data produced by the Land Information of New Zealand (http://www.linz.govt.nz/topography/topo-maps/map-chooser) for testing, see S1 File. More precisely, the road network data at 1: 50,000 scale were used as source data, and the corresponding building and residential data at 1: 250,000 scale were used as benchmarks (“Fig 1”). The building and residential data are described by the Land Information of New Zealand as "central business district areas of large towns and cities", and "a group of houses and buildings that covers an area greater than 90,000 m²", respectively. These benchmarks were chosen because they have also been used for evaluating the automatic delineation of built-up areas in previous studies[1,20].

Download:

Fig 1. Study area: North Island of New Zealand.

https://doi.org/10.1371/journal.pone.0194806.g001

Approaches and parameters to be tested

Several approaches have been proposed for the delineation of built-up areas using road network data. This section briefly introduces the three typical approaches—the approach based on street blocks, the grid-based approach, and the kernel density approach—that will be tested. More details are available in previous research work [1,24,25].

The approach based on street blocks.

A street block can be viewed as a closed region formed by one or several connected roads [1]. The approach based on street blocks has two general steps [25]:

Calculate the land areas of all the street blocks (e.g., “4” and “16” in “Fig 2B”) in a road network data.
Delineate the street blocks whose land areas are smaller than a threshold (called the street block size) as built-up areas (e.g., “4”in “Fig 2B”).

Download:

Fig 2.

Three approaches for the delineation of built-up areas: (a) a schematic road network, (b) the approach based on street blocks, (c) the grid-based approach, and (d) the kernel density approach.

https://doi.org/10.1371/journal.pone.0194806.g002

The grid-based approach.

The grid-based approach may have four steps [24]:

Create a regular grid (e.g. 0.5×0.5 km²);
Intersect this grid with road network data;
Calculate the density (D_i) of each grid cell according to the formula below[1]: (1) where L_i denotes the total length of roads located in the i^th grid cell; and A_i denotes the area of the i^th grid cell.
Delineate the grid cells whose densities are larger than a threshold (called the grid density) as built-up areas (e.g. “3”in Fig 2C).

The grid-based approach involves two parameters. In addition to the cell density, the cell size is also an essential parameter because the cell density may vary with different cell sizes.

The kernel density approach.

The kernel density approach may have two general steps [24]:

Calculate the density of each cell (e.g. “1”, “2”, “3”, “4”, and “5” in “Fig 2D”) using the kernel density estimation: (2) where is the estimated density of the cell measured at location s (i.e. the centroid of this cell), τ is the bandwidth, n is the number of neighboring road intersections of location s; and s_i is the i^th road intersection within distance τ of location s; k (…) is the kernel function.
Delineate the grid cells whose kernel densities are larger than a threshold (called the kernel density) as built-up areas (e.g. “3” and larger values in “Fig 2D”).

The kernel density approach also involves two parameters—the bandwidth (the radius of the dot circles in “Fig 2D”), and the kernel density. With the bandwidth, the cell size may be set as small as possible. However, if the cell size is set too small, the number of cells increases dramatically and the approach may fail due to the ''out of memory'' problem [1]. Thus, by using this approach, we fixed the cell size at 0.1×0.1 km².

Experimental steps

An experiment was designed to investigate whether the most-appropriate thresholds for the delineation of built-up areas are the same or similar for different subdivisions of a road network. Specifically:

Divide a large road network into subdivisions.
Determine the most-appropriate thresholds for the delineation of built-up areas for each subdivision.
Compare the most-appropriate thresholds for different subdivisions.

Divide a large road network into subdivisions.

Two division modes were considered:

The road network of the North Island was subdivided into nine non-overlapping road networks using the administrative boundary data in order to investigate the most-appropriate thresholds for the different administrative districts (“Fig 3A”).
The road network of the North Island was also subdivided into smaller road networks in order to investigate the most-appropriate thresholds for cities or towns of different sizes. As an example, a total of 33 subdivisions in three (Auckland, Wellington and Hawke’s Bay) out of the nine administrative districts were manually chosen (“Fig 3B”, “Fig 3C,” “Fig 3D”) by referring to the actual built-up areas in the corresponding benchmark. The rules for selection include: first, each subdivision covers the built-up areas in the benchmark of only one city or town, but this subdivision should be larger than the corresponding built-up areas in the benchmark; and second, the selected built-up areas in different subdivisions should vary dramatically in size. Specifically, the size of the largest built-up area (Auckland City) among the 33 subdivisions is 302.627 km², while that of the smallest one (Tikokino) is only 0.169 km².

Download:

Fig 3.

Subdividing the road network of the North Island according to (a) nine administrative districts and (b–d) different cities or towns.

https://doi.org/10.1371/journal.pone.0194806.g003

Determine the most-appropriate threshold for each subdivision.

The basic idea is to first automatically delineate built-up areas with different thresholds. The delineated built-up areas are then compared with those in the corresponding benchmark by calculating a similarity measure. Finally, the threshold corresponding to the highest similarity value is determined as the most-appropriate threshold. The similarity value, ranging from 0 to 1, can be calculated as [26,27]: (3) where A is the land area of the automatically delineated built-up areas; B is the land area of the built-up areas in the corresponding benchmark; and A ∩ B is the land area of the built-up areas delineated in common.

“Fig 4” plots the similarity distributions for the North Island of New Zealand, using three approaches for the delineation of built-up areas. The x-axis denotes various thresholds for one parameter (e.g., street block size, cell density or kernel density), and the y-axis denotes the corresponding similarity value. In this study, these three approaches were implemented using commercial GIS software (ArcGIS, version 9.3) and freeware (OpenJUMP, version 1.6.0). The details steps to implement these approaches have been reported in a previous study [1]. It can be seen in “Fig 4” that all the similarity distributions have the same trend. That is, the similarity value first increases along with an increase of the threshold, and then it begins to decrease after going through a peak. The threshold corresponding to this peak can be viewed as the most-appropriate threshold (highlighted with a vertical dotted line). More precisely, the most-appropriate threshold for the street block size is 0.6×0.6 (km²), while those for the cell density and the kernel density are 6 (km/km²) and 24 (num/km²), respectively. Based on the similarity distributions in “Fig 4”, the parameters and thresholds to be tested in this study are listed in “Table 1”.

Download:

Fig 4.

Most-appropriate thresholds for the North Island of New Zealand using (a) the approach based on street blocks; (b) the grid-based approach; (c) the kernel density approach.

https://doi.org/10.1371/journal.pone.0194806.g004

Download:

Table 1. Parameters and thresholds to be tested using three approaches to the delineation of built-up areas.

https://doi.org/10.1371/journal.pone.0194806.t001

Compare the most-appropriate thresholds for different subdivisions.

The most-appropriate thresholds for the nine administrative districts and 33 test regions covering cities or towns were determined. Initially, the most-appropriate thresholds were compared to find out if they were identical. If the most-appropriate thresholds turned out not to be identical, the appropriate threshold ranges were visually determined to investigate whether the threshold ranges overlapped.

The head/tail break, as an existing classification method, was also employed for comparison purposes. The head/tail break generally applies a break rule for threshold determination. The break rule is defined as “Given a variable X, if its values x follow a heavy-tailed distribution, then the mean of the values can divide all the values into two parts: a high percentage in the tail, and a low percentage in the head.”[25]. It has been found that all the street block sizes of a road network follow the heavy-tailed distribution. The mean size was then used as a threshold to divide all the street blocks into built-up areas (in the tail) and non-built-up areas (in the head) [25]. When the first mean is not perfect, the break may continue if all the street block sizes in the tail still follow the heavy-tailed distribution. Thereby, the second mean, the third mean, the fourth mean and so on may be obtained. The break stops if the percentage in the head is larger than 40% [28]. This condition can be relaxed by 50%, or even more if the percentage in the head remained less than 40% in a subsequent break. This study also employs the head/tail break method to determine thresholds for the street block size, and also for the cell density and kernel density; these thresholds are also evaluated using the similarity measure.

Experimental results and analyses

Results of the similarity distributions for the nine administrative districts of the North Island

Results for the approach based on street blocks.

“Fig 5” plots the similarity distributions for the nine administrative districts of the North Island, using the approach based on street blocks. The threshold for the only parameter (street block size) varies from 0.1×0.1 to 2.0×2.0 km². For each administrative district, the most-appropriate threshold for the street block size is highlighted with a vertical, solid line.

Download:

Fig 5. Most-appropriate thresholds for the nine administrative districts of the North Island, using the approach based on street blocks.

The most-appropriate threshold for each administrative district is highlighted with a vertical solid line.

https://doi.org/10.1371/journal.pone.0194806.g005

The following can be observed from “Fig 5”:

All the distributions are similar to those plotted in “Fig 4A”. That is, the similarity value first increases with an increase of the street block size, and then it begins to decrease after going through a peak.
The most-appropriate thresholds for the street block size are all located within 0.4×0.4 to 0.7×0.7km². To be specific, the most-appropriate thresholds for five out of the nine administrative districts are 0.7×0.7km², while those for three out of the nine administrative districts are 0.5×0.5km².
An appropriate threshold range (highlighted with two vertical dotted lines), in which all the corresponding similarity values are no more than 0.05 different from that of the most-appropriate threshold, was determined for each administrative district. For instance, the appropriate threshold range for Northland is between 0.4×0.4 and 0.9×0.9 km², while that of Auckland is 0.5×0.5 to 1.0×1.0 km². More importantly, all the appropriate threshold ranges overlap in the nine administrative districts.

Results for the grid-based approach.

“Fig 6” plots the similarity distributions for the nine administrative districts of the North Island using the grid-based approach. The grid-based approach involves two parameters: cell size (ranging from 0.3×0.3 to 0.9×0.9 km²), and cell density (ranging from 1 to 15 km/km²).

Download:

Fig 6.

Most-appropriate thresholds for the nine administrative districts of the North Island, using the grid-based approach with different cell sizes: (a) 0.3×0.3 km², (b) 0.5×0.5 km², (c) 0.7×0.7 km², and (d) 0.9×0.9 km². For each cell size, the most-appropriate threshold range is highlighted with two vertical dotted lines.

https://doi.org/10.1371/journal.pone.0194806.g006

The following can be observed from “Fig 6”:

All the distributions are similar to those plotted in “Fig 4B”, despite the use of different cell sizes (i.e. 0.3×0.3 to 0.9×0.9 km²) and different cell densities (i.e., 1–15 km/km²).
For each cell size, the most-appropriate thresholds for the cell density are either the same or only one apart (e.g., 6–7 km/km² for the cell size 0.5×0.5 km²).

The maximum similarity values for the different cell sizes were compared (“Table 2”). It can be seen in “Table 2” that the most-appropriate cell size appears to be 0.5×0.5 km², for which the maximum similarity value becomes the greatest for seven out of the nine administrative districts.

Download:

Table 2. Maximum similarity values for different cell sizes obtained from the nine administrative districts of the North Island.

https://doi.org/10.1371/journal.pone.0194806.t002

Results for the kernel density approach.

“Fig 7” plots the similarity distributions for the nine administrative districts of North Island using the kernel density approach. The two parameters, bandwidth (ranging from 0.3 to 0.9 km), and kernel density (ranging from 1 to 60 num/km²) in the kernel density approach were analyzed.

Download:

Fig 7.

Most-appropriate thresholds for the nine administrative districts of the North Island, using the kernel density approach with different bandwidths: (a) 0.3 km, (b) 0.5 km, (c) 0.7 km, and (d) 0.9 km. For each bandwidth, the most-appropriate threshold range is highlighted with two vertical dotted lines.

https://doi.org/10.1371/journal.pone.0194806.g007

The following can be observed from “Fig 7”:

The most-appropriate thresholds are located within a certain range (e.g., 22–28 num/km² for the bandwidth 0.5 km), in which the similarity values are either the same or close to each other.
The ranges for the different bandwidths (i.e., 0.3, 0.5, 0.7 and 0.9 km) overlap.

“Table 3” further lists the maximum similarity values for different bandwidths, tested on the nine administrative districts of the North Island. It can be seen in “Table 3” that the most-appropriate bandwidth is 0.5 km for six out of the nine administrative districts and 0.7 km for another three districts.

Download:

Table 3. Maximum similarity values for different bandwidths obtained from the nine administrative districts of the North Island.

https://doi.org/10.1371/journal.pone.0194806.t003

Results of the similarity distributions for 33 cities/towns on the North Island

“Table 4” lists the most-appropriate thresholds or ranges for the parameters (i.e., street block size, cell density and kernel density) tested on the 33 subdivisions of the North Island, also see S2 File. The threshold for the cell size was fixed at 0.5×0.5 km² while the bandwidth was fixed at 0.5 km, the value at which the corresponding similarity value becomes a maximum in most cases (“Tables 2 and 3”).

Download:

Table 4. Most-appropriate thresholds or ranges for the 33 subdivisions of the North Island.

https://doi.org/10.1371/journal.pone.0194806.t004

The following can be observed from “Table 4”:

There may be multiple, most-appropriate thresholds, especially for the subdivisions (e.g., Orewa and Waipawa) with relatively small built-up areas. This is because, typically, the smaller the size of a city or town, the smaller the number of street blocks or grids to be delineated as built-up areas. It is therefore possible to delineate the same built-up areas with different thresholds.
For most of the subdivisions, the most-appropriate thresholds for a parameter (e.g., street block size, cell density or kernel density) are either the same or close to each other. For instance, in 29 out of the 33 subdivisions, the most-appropriate thresholds or ranges for the street block size are either located within or overlap with 0.4×0.4 to 0.9×0.9 km², which also overlaps with those found in “Fig 5”. In 28 out of the 33 subdivisions, the most-appropriate thresholds for the cell density are located within 5 to 7 km/km², which is almost the same as the value (6 to 7 km/km²) found in “Fig 6B”. In 26 out of the 33 subdivisions, the most-appropriate thresholds for the kernel density are located within 18 to 31 num/km². These thresholds are either within, or close to, the values (22 to 28 num/km²) found in “Fig 7B”.
However, the ranges of the above most-appropriate thresholds become wider. For instance, the range for the street block size varies from 0.2×0.2 to 1.7×1.7 km². The corresponding result for the cell density varies from 4 to 8 km/km² and for the kernel density, it varies from 10 to 32 num/km². This illustrates that the most-appropriate thresholds for multiple parameters may vary dramatically for cities or towns.

Results of the head/tail break

“Tables 5–7” list the thresholds obtained using the head/tail break and corresponding similarity values for the approach based on street blocks, the grid-based approach and the kernel density approach, respectively. The thresholds for some parameters (e.g., cell size and bandwidth) and the 33 cities/towns of the North Island are not listed here because their values cannot follow the heavy-tailed distribution.

Download:

Table 5. Thresholds for the street block size obtained using the head/tail break and the corresponding similarity values for the approach based on street blocks.

https://doi.org/10.1371/journal.pone.0194806.t005

Download:

Table 6. Thresholds for the cell density obtained using the head/tail break and the corresponding similarity values for the grid-based approach.

https://doi.org/10.1371/journal.pone.0194806.t006

Download:

Table 7. Thresholds for the kernel density obtained using the head/tail break and the corresponding similarity values for the kernel density approach.

https://doi.org/10.1371/journal.pone.0194806.t007

The following can be observed from “Tables 5–7”:

The values of both the street block size and kernel density follow the heavy-tailed distribution for the ten study cases. Values of cell density also follow the heavy-tailed distribution, in most study cases, if the relaxation condition is considered (i.e., the percentage in the head is allowed to be larger than 40%).
Most of the maximum similarity values obtained using the head/tail break (highlighted with gray) are similar (i.e., with a difference no more than 0.05) to those in “Table 8”. However, the number of breaks needed for a maximum similarity value may vary between study cases. As an example, in “Table 7”, the second mean is the most-appropriate threshold for the study case of Auckland, but the fifth mean is the most-appropriate threshold for the study case of Northland.
The mean for the maximum similarity value is either located within or close to the appropriate range. (e.g., 0.4×0.4 to 0.9×0.9 km² for the street block size, 5–7 km/km² for the cell density, and 18–31num/km² for the kernel density) found in “Table 4”. The similarity value may, however, be much lower if the threshold obtained using the head/tail break is outside of the appropriate range (as is seen in the study case of Northland in “Table 5”). Empirical threshold ranges may therefore also be used to determine an appropriate mean for the head/tail break.

Download:

Table 8. Maximum similarity values of using different approaches for the delineation of built-up areas with road network data.

https://doi.org/10.1371/journal.pone.0194806.t008

Validation on using a different benchmark, evaluation measure and study area

Using a different benchmark and evaluation measure

In the previous tests, the benchmark for evaluation consisted of building and residential data only. However, the actual built-up areas may include not only these features, but also roads, school playing fields, and factories. Therefore, the actual built-up areas may be larger than those marked in the benchmarks.

An investigation was made into whether the most-appropriate thresholds or ranges may vary with a different benchmark and evaluation measure. First, the artificial surfaces acquired from GlobeLand30-2010 (http://globallandcover.com/GLC30Download/index.aspx), a mapping product of global land cover at 30-meter spatial resolution derived from remote sensing images in 2010, was used as the built-up areas in the benchmark, see S3 File. Next, two evaluation measures, the similarity measure (M₁), and an integrated measure (M₂) averaging both correctness and completeness with equal weights, proposed in an existing study [1], were employed to compare the automatically delineated built-up areas and the corresponding benchmark.

As an example, “Table 9” lists the most-appropriate thresholds or ranges for 21 out of the 33 subdivisions of the North Island, using the GlobeLand30-2010 and the above evaluation measures, also see S4 File. The results of the other 12 subdivisions of the North Island are not listed here because the corresponding artificial surfaces for such a subdivision were not available in the GlobeLand30-2010.

Download:

Table 9. Most-appropriate thresholds or ranges for 21 out of the 33 subdivisions of the North Island, using the GlobeLand30-2010 and two different evaluation measures.

https://doi.org/10.1371/journal.pone.0194806.t009

The following can be observed from “Table 9”:

The appropriate threshold ranges are almost the same as those found in “Table 4”. For example, the most-appropriate thresholds or ranges for the street block size, cell density, and kernel density are mostly located within 0.4×0.4 to 1.0×1.0 km², 5–7 km/km^2, and 17–28 num/km², respectively; these ranges are almost the same as those found in “Fig 4” (0.4×0.4 to 0.9×0.9 km², 5–7 km/km^2, and 18–31 num/km²) respectively. The majority of the most-appropriate thresholds or ranges are either are the same or overlap, even when using a different benchmark.
The majority of the most-appropriate thresholds or ranges are the same, irrespective of the two different evaluation measures (M₁ and M₂) used, which indicates the effectiveness of the similarity measure. Moreover, the similarity values are more sensitive to different parameter thresholds, whereas the integrated values are much closer to each other (“Fig 8”). If all the grid cells were divided into built-up areas, the accuracy would be very low, but the completeness would then be as high as 100%, and the integrated value would be larger than 50%. Thus, the similarity measure was selected to determine an appropriate threshold or range.

Download:

Fig 8. Evaluation of the thresholds for two parameters (cell density and kernel density) by using both the similarity measure (M₁ and the integrated measure (M₂, for the case study of Lower & Upper Hutt City.

https://doi.org/10.1371/journal.pone.0194806.g008

Using a different study area

The road network data of the West Midlands region of England (at 1: 25,000 scale) were also used for testing. This study area was chosen for three reasons. First, there are eight counties—Shropshire, Telford and Wrekin, Staffordshire, Stoke-on-Trent, Herefordshire, Worcestershire, West Midlands (county), Warwickshire—in the West Midlands region (see “Fig 9”) which makes a comparison of the most-appropriate thresholds for different counties possible. Second, the size of built-up areas varies with different counties. For instance, in both Stoke-on-Trent and West Midlands counties, most areas are built-up areas; but in other counties, most areas are non-built-up areas. Third and most importantly, the road network data of the West Midlands region were freely acquired from the open data provided by Ordnance Survey (https://www.ordnancesurvey.co.uk/). The corresponding artificial surfaces were also freely acquired from the GlobeLand30-2010 and were also used as the benchmarks, see S3 File.

Download:

Fig 9. Study area of the West Midlands region of England.

https://doi.org/10.1371/journal.pone.0194806.g009

“Fig 10” shows that the most-appropriate thresholds are located within 0.4×0.4 to 1.0×1.0 km², 6–9 km/km² respectively using the approach based on street blocks and the grid-based approach; these ranges are almost the same as those found before (“Fig 4” and “Table 9”). For the kernel density approach, seven out of the nine, most-appropriate thresholds (located within 13–15 num/km²) are very close to each other. Although the most-appropriate thresholds for West Midlands county and Stoke-on-Trent county were much smaller (e.g., 8 and 9 num/km², respectively), the most-appropriate threshold ranges for these two counties (e.g., 3–14 and 3–17 num/km², respectively) still overlap with those of other counties. However, the similarity distributions for both West Midlands county and Stoke-on-Trent county are different from those for other counties because most areas in these two counties are built-up. Even when all the street blocks or grid cells were delineated as built-up areas, the corresponding similarity values were still higher than 0.6 (“Fig 10”).

Download:

Fig 10.

Most-appropriate thresholds for the West Midlands region of England and its eight counties using (a) the approach based on street blocks, (b) the grid-based approach, and (c) the kernel density approach.

https://doi.org/10.1371/journal.pone.0194806.g010

Furthermore, the artificial surfaces in the GlobeLand30-2010 suffer from classification errors. The overall accuracy of GlobeLand30-2010 (http://www.globeland30.org/home/Enbackground.aspx) was reported as 80.33%. Therefore, some small cities or towns were not used for comparison. Although the use of other source data (e.g., census data and “check-in” data) may offer alternatives, they also have limitations. For instances, check-in data often contains biases because not everyone checks in or uses social media. As it is difficult to precisely identify built-up areas, we suggest using different source data as benchmarks in order to minimize subjectivity.

Conclusion and discussions

This study employed an empirical approach to determine appropriate thresholds for multiple parameters in the delineation of built-up areas using road network data. Specifically, the five parameters (street block size, cell size, cell density, bandwidth and kernel density) of the three typical approaches (the approach based on street blocks, the grid-based approach and the kernel density approach) were tested. That is, extensive experiments were carried out to investigate the most-appropriate thresholds for various parameters in these approaches. The North Island of New Zealand was chosen as the study area, with road network data used as source data, and the corresponding building and residential data used as benchmarks. The road network was divided into nine administrative subdivisions and 33 different city/town subdivisions. The built-up areas of each subdivision were automatically delineated with different approaches and thresholds. For each subdivision, the most-appropriate threshold was determined by calculating the similarity (or consistency) between an automatically delineated built-up area and a corresponding benchmark. A different benchmark (GlobeLand30-2010), an integrated measure, and a different study area (the West Midlands region of England) were used for validation of earlier results. Results show that in most cases, the most-appropriate thresholds for the different subdivisions were either the same or close to each other. However, the most-appropriate thresholds for some cities/towns varied dramatically.

The reasons for these results might be as follows:

The road network of a city/town is commonly designed based on principles and criteria proposed by the department of urban planning. Consequently, the street block size or cell density of a road network in a city/town region are organized so as to be not to be too large or too small. For instance, 95% of the street blocks in the built-up areas of either the West Midlands region or North Island are smaller than 0.9×0.9 km² (“Fig 11A”). Such principles and criteria may be consistent within a country (“Fig 11A”) and even be similar for different countries (“Fig 11B”). Most of the appropriate thresholds are therefore the same or similar.
However, different cities/towns often have different street network patterns (e.g. street blocks of different sizes and shapes), so the most-appropriate thresholds are not always the same and may even be quite different (“Fig 12”).

Download:

Fig 11.

Plot of the area percentages of street blocks in the built-up areas of both the North Island and West Midlands region (a) and eight counties in the West Midlands region (b). The x-axis denotes the different area ranges of street blocks in the built-up areas and the y-axis denotes the total area of street blocks within an area range proportional to that of street blocks within all the (area) ranges.

https://doi.org/10.1371/journal.pone.0194806.g011

Download:

Fig 12. The most-appropriate thresholds for two small towns (Warkworth and Featherston) in the North Island.

https://doi.org/10.1371/journal.pone.0194806.g012

Nevertheless, this study validates that the empirical approach [26] is also applicable to the delineation of built-up areas. That is, to first subdivide a large road network according to administrative boundaries, or smaller units (e.g., cities or towns), and then to apply the most-appropriate thresholds or ranges obtained from multiple subdivisions to infer the results for the larger one. The inference method may include calculating the average or median of multiple, most-appropriate thresholds, or overlapping multiple appropriate threshold ranges [26].

In future research, more road network data of different countries and/or regions will be used for testing the most-appropriate thresholds for various parameters. In addition, other source data (e.g. census data) will be used as benchmarks for automatic analysis of delineated built-up areas. Moreover, it will still be necessary to develop an approach to adaptively determine the most-appropriate threshold for different cities/towns. Finally, it may be worth investigating whether the empirical approach is also applicable for determining appropriate thresholds for the delineation of built-up areas with different source data.

Supporting information

S1 File. Road, buildings and residential data provided by the Land Information of New Zealand (https://data.linz.govt.nz/).

https://doi.org/10.1371/journal.pone.0194806.s001

(ZIP)

S2 File. The results for the 33 subdivisions of the North Island using the similarity measure.

https://doi.org/10.1371/journal.pone.0194806.s002

(ODS)

S3 File. Land cover data provided by the National Geomatics Center of China (http://globallandcover.com/User/Login.aspx).

https://doi.org/10.1371/journal.pone.0194806.s003

(ZIP)

S4 File. The results for the 21 out of the 33 subdivisions of the North Island using both the similarity and the integrated measures.

https://doi.org/10.1371/journal.pone.0194806.s004

(ODS)

Acknowledgments

The project was supported by National Natural Science Foundation of China (No. 41771428), Fundamental Research Funds for the Central Universities, China University of Geosciences (Wuhan) (No. G1323541711), and it was also funded by Beijing Key Laboratory of Urban Spatial Information Engineering (No. 2017213). The author would like to express special thanks to all the anonymous reviewers and the editor for their valuable comments. We are also thankful to the Land Information of New Zealand, National Geomatics Center of China and Ordnance Survey for providing the experimental data.

References

1. Zhou Q. Comparative study of approaches to delineating urban areas using road network data. Transactions in GIS. 2015;19(6):848–876.
- View Article
- Google Scholar
2. Qiu F, Woller KL, Briggs R. Modeling urban population growth from remotely sensed imagery and TIGER GIS road data. Photogrammetric Engineering & Remote Sensing. 2003; 68:1031–42.
- View Article
- Google Scholar
3. Cheng J, Masser I. Urban growth pattern modeling: a case study of Wuhan city, PR China. Landscape and Urban Planning. 2003; 62: 199–217.
- View Article
- Google Scholar
4. Li X, Liu X. An extended cellular automaton using case-based reasoning for simulating urban development in a large complex region. International Journal of Geographical Information Science. 2006; 20: 1109–36.
- View Article
- Google Scholar
5. Ji W, Ma J, Twibell RW, Underhill K. Characterizing urban sprawl using multi-stage remote sensing images and landscape metrics. Computers, Environment and Urban Systems. 2006; 30: 861–79.
- View Article
- Google Scholar
6. Jia T, Jiang B. Measuring urban sprawl based on massive street nodes and the novel concept of natural cities. 2011; http://arxiv.org/ftp/arxiv/papers/1010/1010.0541.pdf (accessed 6 July 2014)
7. Jiang B, Jia T. Zipf's law for all the natural cities in the United States: a geospatial perspective. International Journal of Geographic Information Science. 2012; 25:1269–81.
- View Article
- Google Scholar
8. Terando AJ, Costanza J, Belyea C, Dunn RR, McKerrow A, Collazo JA. The Southern Megalopolis: Using the Past to Predict the Future of Urban Sprawl in the Southeast U.S. PLoS ONE. 2014; 9(7): e102261. pmid:25054329
- View Article
- PubMed/NCBI
- Google Scholar
9. Arsanjani JJ, Helbich M, Bakillah M, Hagenauer J, Zipf A. Toward mapping land-use patterns from volunteered geographic information. International Journal of Geographical Information Science. 2013; 27: 2264–78.
- View Article
- Google Scholar
10. Rozenfeld HD, Rybski D, Andrade JS Jr, Batty M, Stanley HE, Makse HA. Laws of population growth. Proceedings of the National Academy of Sciences. 2008; 105:18702–7.
- View Article
- Google Scholar
11. Rozenfeld HD, Rybski D, Gabaix X, Makse HA. The area and population of cities: new insights from a different perspective on cities. American Economic Review, American Economic Association. 2011;101: 2205–25.
- View Article
- Google Scholar
12. Holmes TJ, Lee S. Cities as six-by-six-mile squares: Zipf's law? In: Glaeser EL, editor. The Economics of Agglomerations. Chicago: University of Chicago Press; 2009. p. 105–132.
13. Thurstain-Goodwin MT, Unwin D. Defining and delineating the central areas of towns for statistical monitoring using continuous surface representations. Transactions in GIS. 2000; 4: 305–17.
- View Article
- Google Scholar
14. Gong P, Howarth PJ. The use of structural information for improving land-cover classification accuracies at the rural-urban fringe. Photogrammetric Engineering and Remote Sensing. 1990; 56: 67–73.
- View Article
- Google Scholar
15. Sutton PC. A scale-adjusted measure of “urban sprawl” using nighttime satellite imagery. Remote Sensing of Environment. 2003; 86: 353–69.
- View Article
- Google Scholar
16. Sudhira HS, Ramachandra TV, Jagadish KS. Urban sprawl: metrics, dynamics and modeling using GIS. International Journal of Applied Earth Observation. 2004;5:29–39.
- View Article
- Google Scholar
17. Ji W, Ma J, Twibell RW, Underhill K. Characterizing urban sprawl using multi-stage remote sensing images and landscape metrics. Computers, Environment and Urban Systems. 2006;30:861–79.
- View Article
- Google Scholar
18. Saravanan P, Ilangovan P. Identification of urban sprawl pattern for Madurai region using GIS. International Journal of Geomatics and Geosciences. 2010;1:141–9.
- View Article
- Google Scholar
19. Griffiths P, Hostert P, Gruebner O, Linden SVD. Mapping megacity growth with multi-sensor data. Remote Sensing of Environment. 2010;114:426–39.
- View Article
- Google Scholar
20. Chaudhry O, Mackaness W. Automatic identification of urban settlement boundaries for multiple representation databases. Computers, Environment and Urban Systems. 2008;32:95–109.
- View Article
- Google Scholar
21. Tannier C, Thomas I, Vuidel G, Frankhauser P. A fractal approach to identifying urban boundaries. Geographical Analysis. 2011;43:211–27.
- View Article
- Google Scholar
22. Lüscher P, Weibel R. Exploiting empirical knowledge for automatic delineation of city centres from large-scale topographic database. Computers, Environment and Urban Systems. 2013;37:18–34.
- View Article
- Google Scholar
23. Zhen F, Cao Y, Qin X, Wang B. Delineation of an urban agglomeration boundary based on Sina Weibo microblog 'check-in' data: A case study of the Yangtze River Delta. Cities.2017;60:180–191.
- View Article
- Google Scholar
24. Borruso G. Network density and the delimitation of urban areas. Transactions in GIS. 2003;7:177–91.
- View Article
- Google Scholar
25. Jiang B, Liu X. Scaling of geographic space from the perspective of city and field blocks and using volunteered geographic information. International Journal of Geographic Information Science. 2012;26:215–29.
- View Article
- Google Scholar
26. Zhou Q, and Li ZL. Empirical determination of geometric parameters for selective omission in a road network. International Journal of Geographical Information Science. 2016; 30(2): 263–299.
- View Article
- Google Scholar
27. Li ZL, Zhou Q. Integration of linear- and areal-hierarchies for continuous multi-scale representation of road networks. International Journal of Geographical Information Science. 2012;26:855–80.
- View Article
- Google Scholar
28. Jiang B, Yin J. Hi-Index for quantifying the fractal or scaling structure of geographic features. Annals of the Association of American Geographers. 2014;104(3):530–540.
- View Article
- Google Scholar

[ref1] 1. Zhou Q. Comparative study of approaches to delineating urban areas using road network data. Transactions in GIS. 2015;19(6):848–876.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Qiu F, Woller KL, Briggs R. Modeling urban population growth from remotely sensed imagery and TIGER GIS road data. Photogrammetric Engineering & Remote Sensing. 2003; 68:1031–42.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Cheng J, Masser I. Urban growth pattern modeling: a case study of Wuhan city, PR China. Landscape and Urban Planning. 2003; 62: 199–217.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Li X, Liu X. An extended cellular automaton using case-based reasoning for simulating urban development in a large complex region. International Journal of Geographical Information Science. 2006; 20: 1109–36.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Ji W, Ma J, Twibell RW, Underhill K. Characterizing urban sprawl using multi-stage remote sensing images and landscape metrics. Computers, Environment and Urban Systems. 2006; 30: 861–79.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Jia T, Jiang B. Measuring urban sprawl based on massive street nodes and the novel concept of natural cities. 2011; http://arxiv.org/ftp/arxiv/papers/1010/1010.0541.pdf (accessed 6 July 2014)

[ref7] 7. Jiang B, Jia T. Zipf's law for all the natural cities in the United States: a geospatial perspective. International Journal of Geographic Information Science. 2012; 25:1269–81.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref8] 8. Terando AJ, Costanza J, Belyea C, Dunn RR, McKerrow A, Collazo JA. The Southern Megalopolis: Using the Past to Predict the Future of Urban Sprawl in the Southeast U.S. PLoS ONE. 2014; 9(7): e102261. pmid:25054329
View Article
PubMed/NCBI
Google Scholar

[21] View Article

[22] PubMed/NCBI

[23] Google Scholar

[ref9] 9. Arsanjani JJ, Helbich M, Bakillah M, Hagenauer J, Zipf A. Toward mapping land-use patterns from volunteered geographic information. International Journal of Geographical Information Science. 2013; 27: 2264–78.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref10] 10. Rozenfeld HD, Rybski D, Andrade JS Jr, Batty M, Stanley HE, Makse HA. Laws of population growth. Proceedings of the National Academy of Sciences. 2008; 105:18702–7.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref11] 11. Rozenfeld HD, Rybski D, Gabaix X, Makse HA. The area and population of cities: new insights from a different perspective on cities. American Economic Review, American Economic Association. 2011;101: 2205–25.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref12] 12. Holmes TJ, Lee S. Cities as six-by-six-mile squares: Zipf's law? In: Glaeser EL, editor. The Economics of Agglomerations. Chicago: University of Chicago Press; 2009. p. 105–132.

[ref13] 13. Thurstain-Goodwin MT, Unwin D. Defining and delineating the central areas of towns for statistical monitoring using continuous surface representations. Transactions in GIS. 2000; 4: 305–17.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref14] 14. Gong P, Howarth PJ. The use of structural information for improving land-cover classification accuracies at the rural-urban fringe. Photogrammetric Engineering and Remote Sensing. 1990; 56: 67–73.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref15] 15. Sutton PC. A scale-adjusted measure of “urban sprawl” using nighttime satellite imagery. Remote Sensing of Environment. 2003; 86: 353–69.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref16] 16. Sudhira HS, Ramachandra TV, Jagadish KS. Urban sprawl: metrics, dynamics and modeling using GIS. International Journal of Applied Earth Observation. 2004;5:29–39.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref17] 17. Ji W, Ma J, Twibell RW, Underhill K. Characterizing urban sprawl using multi-stage remote sensing images and landscape metrics. Computers, Environment and Urban Systems. 2006;30:861–79.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref18] 18. Saravanan P, Ilangovan P. Identification of urban sprawl pattern for Madurai region using GIS. International Journal of Geomatics and Geosciences. 2010;1:141–9.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref19] 19. Griffiths P, Hostert P, Gruebner O, Linden SVD. Mapping megacity growth with multi-sensor data. Remote Sensing of Environment. 2010;114:426–39.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref20] 20. Chaudhry O, Mackaness W. Automatic identification of urban settlement boundaries for multiple representation databases. Computers, Environment and Urban Systems. 2008;32:95–109.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref21] 21. Tannier C, Thomas I, Vuidel G, Frankhauser P. A fractal approach to identifying urban boundaries. Geographical Analysis. 2011;43:211–27.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref22] 22. Lüscher P, Weibel R. Exploiting empirical knowledge for automatic delineation of city centres from large-scale topographic database. Computers, Environment and Urban Systems. 2013;37:18–34.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref23] 23. Zhen F, Cao Y, Qin X, Wang B. Delineation of an urban agglomeration boundary based on Sina Weibo microblog 'check-in' data: A case study of the Yangtze River Delta. Cities.2017;60:180–191.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref24] 24. Borruso G. Network density and the delimitation of urban areas. Transactions in GIS. 2003;7:177–91.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref25] 25. Jiang B, Liu X. Scaling of geographic space from the perspective of city and field blocks and using volunteered geographic information. International Journal of Geographic Information Science. 2012;26:215–29.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref26] 26. Zhou Q, and Li ZL. Empirical determination of geometric parameters for selective omission in a road network. International Journal of Geographical Information Science. 2016; 30(2): 263–299.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref27] 27. Li ZL, Zhou Q. Integration of linear- and areal-hierarchies for continuous multi-scale representation of road networks. International Journal of Geographical Information Science. 2012;26:855–80.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref28] 28. Jiang B, Yin J. Hi-Index for quantifying the fractal or scaling structure of geographic features. Annals of the Association of American Geographers. 2014;104(3):530–540.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

Figures

Abstract

Introduction

Experimental design

Data and benchmark

Approaches and parameters to be tested

The approach based on street blocks.

The grid-based approach.

The kernel density approach.

Experimental steps

Divide a large road network into subdivisions.

Determine the most-appropriate threshold for each subdivision.

Compare the most-appropriate thresholds for different subdivisions.

Experimental results and analyses

Results of the similarity distributions for the nine administrative districts of the North Island

Results for the approach based on street blocks.

Results for the grid-based approach.

Results for the kernel density approach.

Results of the similarity distributions for 33 cities/towns on the North Island

Results of the head/tail break

Validation on using a different benchmark, evaluation measure and study area

Using a different benchmark and evaluation measure

Using a different study area

Conclusion and discussions

Supporting information

S1 File. Road, buildings and residential data provided by the Land Information of New Zealand (https://data.linz.govt.nz/).

S2 File. The results for the 33 subdivisions of the North Island using the similarity measure.

S3 File. Land cover data provided by the National Geomatics Center of China (http://globallandcover.com/User/Login.aspx).

S4 File. The results for the 21 out of the 33 subdivisions of the North Island using both the similarity and the integrated measures.

Acknowledgments

References