The authors have declared that no competing interests exist.
Road accidents are one of the main causes of death around the world and yet, from a time-space perspective, they are a rare event. To help us prevent accidents, a metric to determine the level of concentration of road accidents in a city could aid us to determine whether most of the accidents are constrained in a small number of places (hence, the environment plays a leading role) or whether accidents are dispersed over a city as a whole (hence, the driver has the biggest influence).
Here, we apply a new metric, the Rare Event Concentration Coefficient (
In terms of their concentration, about 5% of the road junctions are the site of 50% of the accidents while around 80% of the road junctions expect close to zero accidents. Accidents which occur in regions with a high accident rate can be considered to have a strong component related to the environment and therefore changes, such as a road intervention or a change in the speed limit, might be introduced and their impact measured by changes to the
According to the World Health Organization (with data available at
Broadly speaking, road accidents have three potential causes: firstly, it could have something to do with the
How do we distinguish whether a certain region has an increased probability of accounting for an accident? Clearly, the road geometry, road obstacles and the level of traffic have an impact on the distribution of road accidents, but these tend to remain unchanged for long periods of time and are specific to a certain area so it makes any comparison between different cities, or even areas of a city, quite complicated.
If, for example, we analyse data and find a specific junction with several accidents, would that be enough to suggest that it is necessary to reduce the speed limit or put in a road intervention scheme? Is there a threshold as to the
Numerous studies have been conducted to identify the spatial patterns of road traffic accidents and develop techniques to identify crash-prone locations using, for instance, Bayesian inference [
There are, however, two technical aspects with respect to heat maps which are often ignored: when we say that a location is considered to be “hot”, what are we comparing this with? and to what degree is the observed heat map the result of randomness? The relevance of randomness, in terms of its spatial distribution, is that every point process, no matter how it is generated and whatever the underlying distribution, will result in a set of observations being relatively close to each other, thus, even random points (where the term ‘random’ is used here for a uniform distribution) might be interpreted as having a “hot region” (
The underlying uniform distribution has the property that every region is expected to contain a number proportional to its area, thus, any apparent concentration observed in the map and any region with a higher, or fewer, number of points is only the result of randomness and not the result of a higher probability of observing a point in that region.
Although a heat map offers a visual tool for representing road accidents, it might actually result in misleading conclusions when the random element of the location of road accidents is not considered. The crucial difference between a point process that is generated by a uniform distribution and a point process with a different distribution, is frequently undetectable based on a simple visual inspection. A similar situation occurs when a single road is considered, an apparent concentration of accidents will appear, no matter how random or concentrated road accidents are. A formal statistical test against
Road accidents might happen due to a mixture of environmental elements, for example, an obstructed visibility, excessive speed of road users, the curvature or quality of the roads, the street lighting and more. These conditions perhaps repeat, almost under the exact same conditions, day after day and so we expect to observe particular road junctions or segments with a much higher number of accidents than others if the environment is the main cause. However, accidents might also happen because of factors related to the driver or simply because of luck, and the chances are that interventions oriented to the road rather than the driver would not reduce this type of accident. A natural way to detect whether road accidents might be attributed to elements on the road rather than the driver is through its concentration. If there is an element which increases risk related to the environment, then more accidents would occur in this specific location than elsewhere and therefore a high concentration should be observed.
The degree of concentration of events has been shown to play a crucial role in other aspects, such as wealth [
Although crime and road accidents are fundamentally different events, they both share a low frequency, a high degree of concentration and the fact that both are, to a certain extent, unpredictable. Thus, both areas of research can utilise the tools developed to deal with their low-frequency but highly-concentrated type of events.
Statistically speaking, one of the things that make road accidents (as well as crime) hard to analyse is their low frequency. In London, for example, the road junction with the highest number of accidents has (just over) one accident every month, which makes them highly unpredictable and statistically hard to deal with. No relevant pattern, in terms of the day of the week or the time of the day of road accidents, can realistically be observed when the frequency of such events is so low. Moreover, since road accidents are low-frequency events, we observe that the majority of road segments (or intersections) suffered no accidents within the time period of the analysis. Hence, the Gini coefficient
The objective of this work is to present the
Two sources of information and two types of analysis are used here to compare the concentration of road accidents. Firstly, data available from the Transport for London (TFL) website (available at
Road accident data has, in general, two issues. A considerable number of non-fatal injury accidents are not reported to the police and are therefore not included in the available data, however, issues of under-reported accidents [
The data from the Transport for London contains information on road traffic collisions that involve personal injury occurring on public highways which have been reported to the police. Data is collected by the police at the scene of an accident or, in some cases, reported by a member of the public at a police station, then processed and passed on to Transport for London. The data, taken between 2005 and 2014, includes 242,782 unique collisions, with
Category | Fatal | Serious | Slight | Total |
---|---|---|---|---|
Frequency | 1,670 | 27,788 | 213,324 | 242,782 |
% | 0.7 | 11.4 | 87.9 | 100 |
For the purpose of taking into account only the most urban parts of the city, only the central area of London is considered here, which accounts for 70% of the road accidents registered by TFL occur.
The motorway data considered here contains road traffic collisions registered on motorways in Mexico. The data is divided for each motorway and considers, for each accident registered by the police, the distance from the starting point of the highway. Unfortunately, the data does not include in which direction of the road the accident occurred.
The motorways analysed have Mexico City as their starting point, connecting the capital of Mexico with five large cities: Cuernavaca, Toluca, Pachuca, Puebla and Querétaro (
Schematic representation of the nine roads which connect Mexico City and the five main cities in its peripheral region.
The length of the motorway and the vehicle flow rate is different for each of the 9 motorways considered. Both of these factors become relevant when it comes to studying road accidents. Longer roads or those with a higher number of vehicles are expected to have more accidents even if the risk for a driver is the same as compared to a shorter or less used road. Therefore, the flow, measured in
Taking into account the length of the road and the number of cars using it, allows a comparison of different roads to be made. For instance, in
Federal Road | Toll Road | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Destination | length | flow | accidents | victims | fatal | length | flow | accidents | victims | fatal |
Cuernavaca | 60.5 | 467.9 | 105 | 159 | 22 | 70.7 | 1204.7 | 106 | 128 | 23 |
Toluca | 55 | 764.5 | 117 | 90 | 20 | 55 | 1589.4 | 46 | 45 | 15 |
Pachuca | 62.5 | 1761.3 | 162 | 161 | 29 | 62.5 | 1083.3 | 62 | 99 | 20 |
Puebla | 121 | 2736.3 | 49 | 63 | 13 | 121 | 2725.7 | 162 | 309 | 46 |
Querétaro | — | — | — | — | — | 164 | 1548.8 | 293 | 362 | 64 |
The accident risk (number of accidents per vehicle kilometres of travel) and how lethal the accidents are, varies considerably between different roads. The road with the highest accident risk (the Federal Road between Mexico City and Cuernavaca) is actually 12.5 times more prone to accidents and 9.9 times more likely to have lethal accidents than the safest road (the Federal Road between Mexico City and Puebla).
It is important to determine when two accidents have occurred at the same location. Different levels of data aggregation have been used in previous studies, from countries, provinces, counties, road segments, a point pattern process, road junctions and segments of a road with various lengths [
The hypothesis that road accidents are homogeneously distributed (known as Complete Spatial Randomness or CSR) is easily rejected [
In the case of the urban space, we tessellate the region of analysis, that is, the city is divided into nearly 30,000 non-overlapping, regular hexagons, and the number of accidents within each hexagon is counted. A hexagonal tessellation is frequently used in cartography since it offers advantages in terms of the visualisation [
In the case of the motorway data in Mexico, we divide the highway into non-overlapping segments of 500 metres and count the number of accidents within each segment. Due to the precision of the data, smaller segments do not group accidents correctly and larger segments are not refined enough to identify a specific location of a highway. Also, 500 metres has been frequently used in other studies when a highway is partitioned [
Although using either a tessellation (in the case of the urban data) or a segmentation of the road (in the motorway data) has its disadvantages (such as a potential autocorrelation of the number of accidents) it does allow a region to be clearly identified, to cluster the accidents that are nearby and to consider different levels of refinement. Using this partition of the space transforms the data into a non-negative discrete variable, rather than a continuous measurement of the location of road accident, which is easier to analyse.
Partitioning of Central London into 29,600 hexagonal tiles, with sides of 40 metres, and the count of accidents between 2005 and 2014.
The number of accidents within each motorway segment or within each hexagonal pixel, during a certain period of time (two years in Mexico and 10 years for the London data), might be equal to zero for obvious reasons (for example, for tiles which overlay a river or a park) or might be much higher in regions with a higher volume of traffic [
Using a Poisson distribution for the number of road accidents observed on each segment has conceptual advantages. Firstly, the expected number of road accidents on a segment is given simply by its rate λ
In the case of the urban setting, two neighbouring tiles might have similar rates, especially if the same road goes through both of them. In the case of the analysis of motorways, two neighbouring segments might also have similar rates if they experience accidents due to similar causes. Although in our context there is a clear spatial structure that is highly relevant to the problem, we focus on the rates in each of the tiles, and we simply assume that each tile has a fixed accident rate.
With this approach, we move away from the observed count data for road accidents into the analysis of the rates, λ
To model the inhomogeneous distribution of accident rates, we assume that the
The distribution of the rates
The procedure of considering a discrete set of observations, assuming they suffer different rates and then measuring the concentration using the
A procedure to obtain a confidence interval for the observed
The Lorenz curve [
A: The accident rate (
The level in which road accidents are spatially concentrated is surprisingly high. In Central London, 32% of the accidents happen in only 2.4% of the road junctions, and they get even more concentrated if we focus only on the Serious and Fatal categories.
Category | Fatal | Serious | Slight | Total |
---|---|---|---|---|
0.8712 | 0.8198 | 0.8057 | 0.8055 |
A value of the
Considering only the Serious and Fatal road accidents, results of the mixture model are that around 64% of the tiles have a rate equal to zero. There are, on the other hand, a few tiles (roughly 0.4% of the surface or 109 road junctions) which have an accident rate of almost 11. This means that in the small region represented by the 109 tiles, we expect someone to suffer either a serious or a fatal accident every year.
Group | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
size |
64.2 | 22.7 | 2.1 | 1.9 | 1.7 | 1.5 | 1.3 | 1.2 | 1.0 | 1.0 | 1.0 | 0.4 |
rate |
0.000 | 0.488 | 0.823 | 1.159 | 1.517 | 1.906 | 2.337 | 2.839 | 3.466 | 4.359 | 5.860 | 10.950 |
Serious and Fatal road accidents have a surprisingly high degree of concentration. Results of the mixture model are that nearly half of that type of road accident happen in less than 5% of the tiles considered. However, another relevant component of road accidents is that nearly 25% of the Serious and Fatal road accidents occur in tiles in which we expect only one accident every twenty years. Perhaps accidents which occur at road junctions which have such a small rate cannot be attributed to the road itself and the chances are that they occurred due to causes related to the driver (such as alcohol consumption, driving when fatigued or more).
The
Year | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 |
---|---|---|---|---|---|---|---|---|---|---|
0.813 | 0.825 | 0.824 | 0.825 | 0.826 | 0.831 | 0.831 | 0.828 | 0.828 | 0.821 |
Tiles with the highest rates in London have specific environmental factors which contribute to creating more dangerous roads. For instance, certain Underground stations which are transportation hubs, with a large number of pedestrians, are among the tiles with the highest rate in the city: such as Elephant and Castle, Hyde Park Corner and Camden Town. Also, some roads with a high flow have a consistent high accident rate, such as Euston Road and Kingsland Road (the A10 which is a main arterial road) and finally, relevant commercial streets are also among the locations with the highest accident rate, such as Oxford Street.
For the Mexican motorway data, comparing the distribution of the accident rates in the nine highways separately reveals that each road has a different pattern. In the case of the Federal Road between Mexico City and Puebla, the
Road accidents are rare events and there is a need to use adequate tools to deal with them. The Federal Road between Mexico City and Puebla has the lowest possible degree of concentration, but it is only when we look at the
Misleading interpretations also might be obtained from the Gini coefficient, directly from the number of road accidents. For instance, in the Federal Road between Mexico City and Puebla gives a value of
Accidents have a low frequency and so, in the case of the Federal Road between Mexico City and Puebla we are considering only 49 accidents distributed along 242 units of 500 metres (121 kilometres of road) meaning that, due to the low frequency of road accidents, at least 79.7% of the observations are equal to zero. In general, the low frequency of events (high count of observations with zero events) increases the Gini coefficient: the share of events for a great part of the population is zero, thus meaning more inequality in their distribution. However, by taking into account the distribution of the rates of road accidents in the Federal Road between Mexico City and Puebla and not just the number of road accidents, the results show that almost every segment of that road has the same accident rate and there is practically no concentration of accidents along that road.
Another consequence of the low frequency of accidents is that the Gini coefficient computed directly from the number of accidents tends to give similar results between different roads, with small or negligible differences between them and, in the worst case scenario, with the wrong results and interpretation [
Other roads also have a certain degree of uniformity with regards to their accidents. The Federal Road between Mexico City and Pachuca, for instance, has
The nine roads in Mexico have a different rate distribution of their accidents (
Computing the
A: The accident rates (
Environmental factors that contribute to the chance of having an accident can be identified in the road segments with high accident rates. For instance, among the highest rate segments (
The accident rate along these two higher rate sections have been identified using the CAMAN procedure and the
In total, in these two segments (one on the Federal Road to Pachuca and the other one, on the Toll Road to Cuernavaca) which are less than 0.7% of the 772.3 kilometres of roads considered, there are 4% of the road accidents.
In addition, the different values of the
A typical approach to determine the concentration/dispersion of a variable (for example, using the Gini coefficient) fails to work as a measure of the concentration of road accidents due to their low frequency and their high level of spatial concentration. The methodology presented here, considering the distribution of the rates and the
Results for the urban environment show that road accidents are highly concentrated, especially those that fall into the Serious and Fatal category. This result could be useful to policy makers: by focusing their resources on less than 5% of the road junctions, they are considering the regions where nearly half of that type of accident occurs.
Results for the motorway environment show a much smaller concentration degree. In the case of the Federal Road between Mexico City and Puebla, road accidents are considered to be distributed almost uniformly along the road, meaning that statistically speaking, they have the smallest possible concentration. Also, the procedure introduced here, including the use of the
For a city planner, a quantitative tool such as the
The ability to identify regions of a road or of a city which have environmental factors that increase the risk of an accident enables infrastructures to be re-designed accordingly. For instance, in the case of London, these results might be used to justify the plans to transform Oxford Street, one of the London’s roads with the highest accident rate and with the highest number of road fatalities, into a pedestrian street. More information about the transformation of Oxford Street into a pedestrian road is available at
The methodology presented here could be easily applied to other types of accidents by adjusting the parameters. For example, the tiling procedure could help a risk manager to identify whether there are regions in some industrial complex with an increased rate of an accident, and the
The original data is available at the
The original data is available at the
The results of the estimated rates are available on a public repository
The authors acknowledge the anonymous reviewers for their valuable suggestions.