The scaling of crime concentration in cities

Crime is a major threat to society’s well-being but lacks a statistical characterization that could lead to uncovering some of its underlying mechanisms. Evidence of nonlinear scaling of urban indicators in cities, such as wages and serious crime, has motivated the understanding of cities as complex systems—a perspective that offers insights into resources limits and sustainability, but that usually neglects details of the indicators themselves. Notably, since the nineteenth century, criminal activities have been known to occur unevenly within a city; crime concentrates in such way that most of the offenses take place in few regions of the city. Though confirmed by different studies, this concentration lacks broad analyses on its characteristics, which hinders not only the comprehension of crime dynamics but also the proposal of sounding counter-measures. Here, we developed a framework to characterize crime concentration which divides cities into regions with the same population size. We used disaggregated criminal data from 25 locations in the U.S. and the U.K., spanning from 2 to 15 years of longitudinal data. Our results confirmed that crime concentrates regardless of city and revealed that the level of concentration does not scale with city size. We found that the distribution of crime in a city can be approximated by a power-law distribution with exponent α that depends on the type of crime. In particular, our results showed that thefts tend to concentrate more than robberies, and robberies more than burglaries. Though criminal activities present regularities of concentration, we found that criminal ranks have the tendency to change continuously over time—features that support the perspective of crime as a complex system and demand analyses and evolving urban policies covering the city as a whole.


Criminal events
We used data sets of criminal occurrences in disaggregated level that contains the longitude and latitude of each offense. We obtained this data from 19 cities from United States and 6 police forces (constabularies) from United Kingdom. In the case of UK, the data was acquired from the UK Government website 1 that makes available data of crime every month since 2010 which is published by the UK Home Office. The data of crime in U.S. cities were retrieved from the respective police offices of each considered city via their websites that are described in Table 1.
Although the aforementioned data sets have their own particularities, each criminal event in any of them is characterized by the following: • type -the category of the criminal event; • address -the address where the crime occurred; • location -the latitude and longitude where the crime occurred.

City
Terms used for burglary Atlanta Table 4: The terms grouped for theft in each considered data set from U.S.

City
Terms used for burglary Atlanta  Seattle   'THEFT-AUTO PARTS', 'THEFT-PRSNATCH', 'THEFT-OTH', 'THEFT-CARPROWL', 'THEFT-BUILDING', 'THEFT-PKPOCKET',   'THEFT-BICYCLE', 'THEFT-AUTOACC Police forces in U.K. employ the same terms for crime regardless of region. Therefore, we grouped the categories of offenses in UK the same way for all considered regions, as described in Table 5.

Geospatial information
In the case of the U.S. cities, we obtained the boundaries of the U.S. states from the U.S. Census Bureau 3 . We used the TIGER (Topologically Integrated Geographic Encoding and Referencing) shapefiles with granularity of blocks (delimited in 2010). To have only the regions of the considered cities in the study, we clipped each shapefile with the bounding box of each city. The bounding boxes were retrieved from the OpenStreetMap initiative 4 . For the U.K. data, we gathered the boundaries of the jurisdiction of each police forces (from December 2011) 5 , then clipped them with the boundaries (super generalized clipped boundaries in England and Wales) of the Lower Layer Super Output Areas (LSOAs) 6 .
In order to carry out spatial analyses, we have to project each crime data set on the same projection of their respective boundaries. Since most of the crime data sets and boundaries have the same spatial reference (EPSG:4326), this procedure was only needed to be performed on few data sets, as described in Table 6.

Population
We gathered data with respect to the total resident population in smallest spatial units available of the considered locations from official census. In the case of the U.S. cities, we used the 2010 census data from the U.S. Census Bureau 7 which provides the total population (P1) in block level. For the locations in the U.K., our analysis were done with the 2011 census data in LSOA level provided by the Office for National Statistics 8 .

Splitting cities
To split a city in k regions with same population size, we first create a graph based on the spatial and census data in which each node of the graph represents the same amount of population. Then, we partition the graph in parts with the same number of nodes and thus the total population in each region is also the same. Such graph is constructed based on the cells of the Voronoi diagrams derived from random coordinates uniformly generated within each shape in the city. The number of points created in each shape s i is proportional to its resident population p i . In Pseudocode 1, a description is given for the main steps we employ to split locations. The core of our procedure comprises mainly of three functions: • GenerateRandomCoordinatesOnShape(n=number of coordinates, s=shape) -This function returns a list containing n coordinates that are randomly generated uniformly within a given shape s.
• ClippedVoronoi(c=list of coordinates, s=shape) -This function returns a list with the shapes of the cells from the Voronoi diagram derived from the coordinates in s after postprocessing the cells by clipping them with the shape s. For our experiments, in order to create the diagrams, we employed the library qhull 9 which is wrapped in the pyhull library 10 .
• Partition(g=graph, k=number of partitions) -This function splits a graph in k parts of the same size while minimizing the number of edges between nodes of different parts, then returns a list d with elements d i equals to the index j ∈ [1, k] of the partition of the node n i . In this work, we used the KaFFPa (Karlsruhe Fast Flow Partitioner) algorithm to partition the graphs [1].
Finally, we group the shapes s from the output of SplitLocation based on their partition index j ∈ [1, k]. We define each of these k groups as a region r j in the city. For the purpose of our work here, we say that a criminal event occurred in region r j if the offense took place on a shape that belongs to the group of shapes of the region r j .

The number of splits and the analysis of crime
To analyze crime in a given city c with the method described in Section 2, we have to choose the number of regions R that the city will be divided. This value has to be chosen in such way that the aggregation level leads to units that represent the place. The analysis of crime at the wrong geographic unit may lead to incorrect understanding of crime dynamics [2]. Such problem can arise by examining at larger spatial levels which might hide lower-order variability (i.e., the averaging problem), an 150years-old observation documented by Glyde [3]. The focus on micro levels, on the other hand, might obscure the importance and impact of larger community and neighborhood effects [2].
In order to find a suitable split in the cities, we divided each city by an increasing number of splits and analyze the number of regions without any crime. We found that the number of regions with at Pseudocode 1: Given a location L that is composed by different shapes b i and a real-valued quantity about each b i , split L in k regions such that the sums of the quantities in the regions are roughly equal to the same amount.
input : list of shapes b; list of real numbers p; number of regions k output : list of shapes s; list of integers d parameter: granularity level r (default = 1.0) Create a graph G with s disconnected nodes 9 foreach node n j in G do 10 foreach node n k in G do 11 if s j is spatially adjacent to s k then 12 Create an edge between n j and n k least one crime R n≥1 increases with the total number of regions R, until R n≥1 saturates at a certain point u in such way that R n≥1 (r ) = u for r ≥ r u (shown in Figure 1). In other words, even if we divide a city in more regions, after this point the number of regions with crime is the same. A plausible reason for that is the accuracy level used in police departments when an offense is registered in the criminal system. An ideal data set of crime would be one that the number of regions with crime increases steadily with the number of regions, until the number of regions is equal to the number of offenses (i.e., each criminal event has its own region). However, such ideal data is difficult to exist due to aspects in the very nature of the data. Since we are working with spatial data, the limitations in the apparatus involved to store events, such as coordinates look-up in GIS systems or GPS receivers, may introduce inaccuracies in the data. For instance, an office could use a GIS system that provides the same coordinates of a location regardless where this location is on a street. Moreover, criminal data has sensible information with respect to criminals, victims, and ongoing investigations, thus police offices might intentionally decrease the granularity of the data for privacy purposes.
Nevertheless, the procedures that each office carries out to record a criminal event lead to different levels of accuracy in the data sets. Analyses using geographic units smaller than the limits found in each city have the potential to be biased by the arbitrary system in offices. In order not to bias our results with such procedures and for the purpose of our analysis, we set R c = ρr u with ρ = 0.9. The vertical lines in Figure 1 are the values for R c regarding different types of crime. Some statistics of crime for each location after we split them are described in Table 7.

Arrangements
Each city c can be divided into R c regions in different ways or arrangements. In fact, the numbers in Table 7 are related to a certain arrangement of divisions in each city. Since we want to analyze crime in a city and not in arbitrary arrangement of divisions, we create distinct arrangements of R c regions for each city c. For that, we employ a stochastic partitioning algorithm with the method described in  Section 2 for splitting the cities. We use the KaFFPa (Karlsruhe Fast Flow Partitioner) algorithm to partition a city 30 times using different seeds for the random number generator [1]. Hence, for each city c we first generated 30 arrangements in which each comprises of R c same-population divisions of the city, then aggregated the occurrences of crime by type of crime such as theft, burglary, and robbery; the aggregation is done for each arrangement.

Model
We followed the procedures described by Clauset et al. in order to select the models for the distributions of crime in the considered locations [4]. For each empirical distribution of crime in an arrangement of a city, we follow these steps: (1) we estimate x min and α of the power law; (2) calculate the goodness-offit of the power-law model; (3) fit the data with the following distributions: truncated power law (TP), lognormal (LN), exponential (EX), and stretched exponential (SE); and (4) compare the power-law model with the other models using the likelihood ratio test. To estimate the parameters of the power law, we employed the methods described by Clauset et al. which are implemented in the Python library powerlaw [4,5]. In the case of the (4) step, we do not trust the result of the test when the associated p-value is greater than 0.1. For the (2) step, we reject the power-law model in the case that the estimated p-value is lesser than 0.1. Since we have different arrangements for each city for a given type of crime, we define the score of the power law as the relative number of times that the power-law model was not rejected in a city-crime pair (see Table 8). Furthermore, for each city, we rule out the power law to describe the distribution of crime if the score is less than 0.9. In our experiment, this case will happen if less than 27 arrangements out of 30 do not satisfy the aforementioned p-value conditions. Our results showed that the score requirement was not attained only in 1 location for thefts, 3 locations for robberies, and 1 location for burglaries, out of the 25 considered locations (see the bold scores in Table 8).
For each set of arrangements, we count the number of times each alternative distribution is (or not) statistically favored over the power law. Table 8 summarizes this counting in which l denotes the count that the alternative happens to be favored, while w represent the contrary. Note that, for each alternative model, type of crime, and city, the sum w + l is not necessary equal to the number of arrangements n a . In fact, n a − (w + l) is the number of times the results from the likelihood ratio test can not be trusted due to large p-value [4]. Here we consider that an alternative is favored in a city over the power law if l/n a > 2/3 (i.e., if the alternative is favored more than 2/3 of the time). In our case, such requirement means l ≥ 21. Still, if more than one alternative is favored over the power law in the city, we say that the one with higher l is the one favored over the others. As described in Table 8, the truncated power law was favored over the power law in 4 locations for thefts, 5 locations for robberies, and 2 locations for burglaries. In Table 9, the estimated parameters found for each city-crime pair are summarized, and Figure 2-4 depict the complementary cumulative distribution function (CCDF) for the distributions with the parameters found along with the empirical CCDF for each city.
As highlighted in the main text, the exponents of the distributions of burglary and robbery tend to present high values which means that criminal events tend to concentrate less in the case of such types of crime. The high-valued exponents must be taken with caution due to the behavior of the   Figure 5. The concentration rapidly vanishes when α increases, and starts to present levels of concentration similar to an exponential, that is, there is almost no concentration.

Independence test for alpha vs population
To evaluate the relationship between the concentration of crime in a city and the population size of this city, we use a test of independence proposed by Hoeffding [6,7]. The null hypothesis here is that two random variables X and Y are independent, that is: where F X,Y (x, y) is the joint distribution of X and Y and their respective marginal distributions are F X (x) and F Y (y). For the alternative that X and Y are dependent, we reject H 0 , at the α level of confidence, if D ≥ d α where D is the Hoeffding test statistic and d α satisfies P (D ≥ d α ) = α. Since we deal with a small sample size, we use HoeffD function from the NSM3 library in R developed by [6].
In this analysis, we focused only on the U.S. cities in order to keep all the data points in the same urban system [8]. For that, we used the mean α over the 30 arrangements for each city-crime pair and the population size from the 2010 U.S. Census. We found D = 0.000067 for theft, D = −0.001260 for robbery, and D = −0.000267 for burglary. Therefore, we do not reject the null hypothesis with the 95% confidence (d α = 0.003). Table 9: Estimated parameters for the power-law distributions and the truncated power-law distribution. Locations where the truncated power-law is favored over the power law are indicated by the λ of the distribution.

Entropy of rank
To analyze the dynamics of crime, we first split the data based on quantity (amount-based) and on time (time-based), then we created ranks of criminal regions (i.e., we ordered the regions based on the number of crime) for each split of data, and then calculated the entropy of the distribution of each position in the rank. The procedure to calculate the entropy H c r of the rank r of the city c is described in Figure 7, specifically for the time-based rank r t ; the r a case (amount-based) presents analogous procedure.
For the purposes of our analysis, we split the data by t w = 7 days in the r t case and by a w = a c w records for amount-based case r a , where a c w is the split size (e.g., every 50 criminal records, or every 25 records) in which the entropy is the lowest for each city c. To find a c w , firstly we measured the entropy of the first position, that is, H c a (1), while increasing a w for each arrangement of R c split for each location c, secondly we used Tukey's test to find the set s c aw containing the values of a w in which the mean of H c a (1) is statistically equal to the lowest (with 95% confidence), then finally we define a c w = min(s c aw ) ( Fig 4A of the main text). Figure 8 depicts the influence of the different size a w on the measured entropy of the rank r a with respect to thefts for all considered cities.

Clustering cities
To create a hierarchy of cities based on their dynamics with respect to criminal events across the regions in the cities, we clustered the entropies of the ranks using agglomerative hierarchical clustering technique which uses average as the method to calculate the distance between clusters, known as UPGMA (Unweighted Pair Group Method with Arithmetic Mean) algorithm. We chose the Euclidean distance as the metric to measure the distance between clusters. Here we used the entropy of the positions in the rank as the feature vector. For the clustering, we focused on the r t rank and only used positions i > 20, since the values stabilizes over the positions. In order not to bias towards the high entropy values, we normalized the feature vector as the following: that is, we normalized each position k by the sample average of the entropy of the position k among all considered cities. To perform such analysis, we used the implementation of the algorithm available in the scipy library 13 for Python.      The amount of violent crimes in cities has been shown to scale superlinearly (β ≈ 1.15) with city size [9,10]. Scaling laws in cities, however, depend on the definition of the city as well as on the model for the fluctuations around population size N , that is, the conditional Pr(Y |N ) [11,12]. Intriguingly, criminal allometries relate also to the type of crime. For instance, (A) burglaries scale linearly with population size regardless of fluctuation model, whereas (B) robberies exhibit sublinearity or superlinearity depending on the model used. In the case of thefts, (C-D) we found superlinear increase with population size, independent of fluctuation model (see Table 10).