The world’s user-generated road map is more than 80% complete

OpenStreetMap, a crowdsourced geographic database, provides the only global-level, openly licensed source of geospatial road data, and the only national-level source in many countries. However, researchers, policy makers, and citizens who want to make use of OpenStreetMap (OSM) have little information about whether it can be relied upon in a particular geographic setting. In this paper, we use two complementary, independent methods to assess the completeness of OSM road data in each country in the world. First, we undertake a visual assessment of OSM data against satellite imagery, which provides the input for estimates based on a multilevel regression and poststratification model. Second, we fit sigmoid curves to the cumulative length of contributions, and use them to estimate the saturation level for each country. Both techniques may have more general use for assessing the development and saturation of crowd-sourced data. Our results show that in many places, researchers and policymakers can rely on the completeness of OSM, or will soon be able to do so. We find (i) that globally, OSM is ∼83% complete, and more than 40% of countries—including several in the developing world—have a fully mapped street network; (ii) that well-governed countries with good Internet access tend to be more complete, and that completeness has a U-shaped relationship with population density—both sparsely populated areas and dense cities are the best mapped; and (iii) that existing global datasets used by the World Bank undercount roads by more than 30%.

Logistic with up to four jumps: We introduce up to four jumps superposed on the logistic curve. y(t;t 0 , k, y min , y max , Here i ranges from 1 to 4 and the t i are dates on which the road network length underwent a discontinuous increase. We also allow for 1, 2, or 3 jumps, rather than 4, as separate specifications.
Gompertz: The Gompertz function is another sigmoid curve which allows for asymmetry between the concave and convex regions.
where t 0 and t 1 are fixed to be the first and last times in the time series.
The best fit is almost always obtained through the sigmoid functions with jumps. The linear fit produces the lowest mean-square error for just 1 country (Estonia), and the Gompertz function for 19 countries. The remaining countries are fit with a sigmoid function with no jump (2 countries), one jump (41), two jumps (44), three jumps (58), four jumps (84), or one ramp (4).
Where the model suggests that completeness is greater than one (i.e., the asymptote lies slightly below the maximum observed value), we code completeness as 1.0. In 22 countries, the model estimated completeness to be greater than 1.05. These are: Syrian Arab Republic, Ghana, Afghanistan, Burkina Faso, Malawi, Yemen, Haiti, Montenegro, Swaziland, New Caledonia, Gambia, Timor-Leste, Martinique, Cabo Verde, Brunei Darussalam, Barbados, Virgin Islands (U.S.), Dominica, Saint Vincent and the Grenadines, Kiribati, Liechtenstein, and Holy See. These countries tend to have experienced relatively recent rapid growth in the road network, and thus the parametric fit is likely to be less reliable-highlighting the value of using two independent methods to estimate completeness.

B Sensitivity check: the lengths of missing edges
We analyzed the length of existing vs missing edges formally for five randomly selected countries: Great Britain, Malta, Angola, French Guiana, and Djibouti. We compared the edge length of ways that existed in the OSM database on January 1, 2016, and compared that to a version of the OSM database from November 7, 2016. In both cases, we used the osm2po segmenter (osm2po.de) and restricted the database to the OSM tags described in the section "Saturation of contributions" of the main text. In each country, the average edge length became shorter over the ∼10month period, implying that missing ways are also shorter. This makes sense given that longer edges are easier to include when tracing aerial imagery, and are less likely to be overlooked. Specifically, the reduction in length was 1.6% (Great Britain), 0.7% (Malta), 18.1% (Angola), 7.1% (French Guiana), and 11.2% (Djibouti). Note that the reduction in edge length is not solely due to newly added roads being shorter, but also arises when an existing edge is split by a newly added intersecting street.

L Citation
For any use of data or code, cite the original PLOS One paper Barrington-Leigh, Christopher and Millard-Ball, Adam (2017), "The world's user-generated road map is more than 80% complete," [citation to be updated...]

M Contact
For further questions, please contact: Chris Barrington-Leigh, McGill University Adam Millard-Ball, University of California, Santa Cruz