Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Spatial interactions in urban scaling laws


Analyses of urban scaling laws assume that observations in different cities are independent of the existence of nearby cities. Here we introduce generative models and data-analysis methods that overcome this limitation by modelling explicitly the effect of interactions between individuals at different locations. Parameters that describe the scaling law and the spatial interactions are inferred from data simultaneously, allowing for rigorous (Bayesian) model comparison and overcoming the problem of defining the boundaries of urban regions. Results in five different datasets show that including spatial interactions typically leads to better models and a change in the exponent of the scaling law.

1 Introduction

One of the pillars of the study of cities as complex systems is the existence of statistical laws that apply “universally” to urban regions in different locations [14]. Examples include the Zipf’s law of city sizes, the gravitational law of population movement, and—the focus of this paper—scaling laws (1) between observables y and the population x of cities. All these laws have their origin in the first half of the XX century and continue to be investigated in increasingly rich datasets [57]. In particular, the scaling law (1) was discussed for the area of cities since the 1940s [8], can be viewed as a form of increasing return to scale [1, 9], and has been the subject of many recent studies [4, 1015].

Originally, urban laws were seen as akin to the empirical laws of classical mechanics, the basis of a sociophysics theory [7, 8]. A modern trace of this simplistic view is the fact that models and explanations of the origin of these laws are typically presented independently from the statistical analysis in support of their validity, e.g., the data analysis supporting (1) is based on straight-line fits of log y vs. log x regardless of the explanation for its appearance. This undermines the statistical nature of the laws (evident from the large fluctuations) and is unable to select between the many alternative models that “explain” their origin (which often predict different fluctuations and can thus be tested).

The need for careful data-analysis methods to investigate statistical laws in complex systems has been extensively discussed for power-law distributions such as Zipf’s law [1618]. Similar scrutiny is being applied to the methods used in scaling laws in urban systems [13, 14, 1921] and reveal the limitations of the traditional linear-fitting approach: it relies on several simplifying assumptions, it is unable to deal with y = 0 in the data, it makes it difficult to compare to alternative models and to assess whether the scaling is indeed non-linear (β ≠ 1), and it treats each city equally so that results are sensitive to cut-offs and fluctuations in the data of the many small cities. These limitations motivated us to introduce in Ref. [19] a model of urban scaling that focuses on individuals instead of cities, effectively giving more weight to the largest cities. Fig 1 compares this and alternative fitting models for the dependence of the Gross Domestic Product (GDP, y) on the population of cities (x) in two countries.

Fig 1. Urban scaling laws (1) based on different models.

The GDP y of different municipalities in Brazil (left) and metropolitan areas in USA (right) are shown as a function of their population x. The straight lines correspond to different models: linear fit of the data, the Per capita model (P) and the City model (C), see Eq (11). The estimated scaling exponent β of the different models are shown in the caption. Cities close to large urban areas are highlighted. The City model gives more weight to larger cities and therefore leads to a value of β that is different from the linear fit [19].

A limitation of data-analyses methods of scaling laws (1) that persists is that they ignore the crucial element of any urban data: their spatial component [1]. The importance of location for scaling laws has been recognized [11] and modelled [15], but not incorporated into the data analysis. Linear fitting and all methods proposed in Ref. [19] assume that observations in different cities are independent from each other and thus independent of their location. Not surprisingly, the scalings show spatially-correlated fluctuations [11] and are sensitive to the definition of city boundaries [13, 14]. For instance, in the results in Fig 1 (left panel) we highlight one of Brazil’s municipalities (“São Caetano do Sul”-SP) that lies within Brazil’s largest metropolitan area (around “São Paulo”-SP). We see that the GDP of this municipality is much larger than expected by any of the models and it is natural to suspect that this is at least partially due to its proximity to other urban areas. This effect is enhanced by the fact that Brazil’s data is aggregated according to administrative areas (municipalities), which often do not reflect connected urban regions. Still, the problem of defining appropriate urban areas is not trivial [5, 14] and spatial proximity should play a role regardless of the chosen urban unit. In fact, Fig 1 (right panel) shows that in USA, where data is given for metropolitan areas, a similar effect appears (e.g., “San-José-Santa Clara”-CA close to “San Francisco”-CA, or “Trenton”-NJ between “New York City”-NY and “Philadelphia”-PA).

Here we propose the first framework to investigate scaling laws (1) that accounts simultaneously for the following three crucial points: (i) it is based on generative models (Sec. 2); (ii) it accounts for spatial interactions between different urban areas; and (iii) it allows for rigorous statistical analyses (Sec. 3), including model comparison and the inference of parameters. Results in 5 datasets from Brazil and USA show (Sec. 4) that, in most cases, models that account for spatial interactions provide a better description of the data and that the scaling exponent β depends on the spatial scaling, in agreement with previous observations [13, 14] of the dependence of β on the urban unit.

2 Model

2.1 Generative process

We are interested in modelling the process that generates data compatible with the scaling law (1). The starting point of our model is the widespread interpretation that Eq (1) reflects a change in people’s efficiency (or consumption) depending on the amount of interactions available to them [12]. Accordingly, we consider a generative process in which tokens (e.g. a patent, a dollar of GDP, a piece of infrastructure) are assigned to (produced or consumed by) an individual person j with probability pp(j).

Consider j = 1, …, M persons living in i = 1, …, N cities, on which the population of the city i is given by xi and . A total of Y ≡ ∑i yi tokens are (randomly) assigned to the X persons. In the absence of any other information, this defines our first (null) model:

  1. (P) Per-capita model: All tokens Y are distributed with equal probability to all persons j as in a constant per-capita attribution, p(j) = 1/X. In this case, the probability pc that a token is attributed to city i is given by (2) where c(j) is the city in which j lives and δ(x) = 1 for x = 0 (otherwise δ(x) = 0).

This model corresponds to a linear (trivial) scaling law, β = 1 in Eq (1). A super-linear β > 1 (sub-linear β < 1) scaling is obtained if a token is more likely to be assigned to someone living in a more (less) populous city. In this spirit, in Ref. [19] we assumed that the probability that a token is assigned to person j depends on the population around j as (3)

Here we generalize this idea to account for spatial interactions between j and other individuals j′ that live in other cities (i.e., c(j)≠c(j′)). We introduce a quantity Aj, defined as the total attractiveness of individual j due to all its interactions, and use it as a weight on the probability of assignment of a token as (4) where Z(β) is the normalization constant (i.e., ). If β = 1, the probability pp(j) is the same for all j as in the per-capita model and we recover Eq (2). For β > 1, pp(j) grows with the interactions Aj in line with a super-linear scaling. For β < 1, pp(j) decays with Aj in line with a sub-linear scaling.

The attractiveness of an individual Aj certainly depends on a multitude of factors that could be included in the model, depending on data availability and research interest. Here, we focus on pairwise interactions aj,j between individuals j and j′ separated by a distance d = dj,j. We obtain Aj as the total interaction of j and all other individuals j′ by summing over all j(5)

The distance dj,j ≥ 0 does not need to be a distance in a mathematical sense and, in practice, depends on the availability of data. Below we use the geographic (geodesic) distance between cities (another natural choice would be the commuting time). The pairwise (spatial) interactions aj,j is discussed below and will lead to three different specific models.

2.2 Spatial interactions

In order to explore the formalism above we now consider simple dependencies of the pairwise interaction a(d) on the distance ddj,j between two persons j, j′. In general, we are interested in functions a(d) that monotonically decay with d from a(0) = 1 to limd → ∞ a(d) = 0. Choosing another value at a(0) leads to the same results because of the normalization of pp(j) in Eq (4). Our framework can be applied to any function a(d) suitable to model spatial relationships, data will reveal us which one is more suitable.

The simplest choice of a(d) is

  1. (C) City model: (6) in which interactions occur only within the same city (d = 0). From Eq (5) we get Aj = xc(j), i.e., we recover the scaling law (1) and Eq (3) (the model of Ref. [19], Sec. 4.2).

Spatial interactions beyond city limits can be incorporated using more general functions a(d). Here we start this investigation with functions a(d;α) that depend on a single parameter α that is measured in the same units of d (e.g., km) and sets a scale for spatial interactions such that a(α;α) = 1/2 (i.e., at a distance d = α the interactions decay to a factor 0.5 of the interaction at the same city d = 0). Furthermore, we wish to recover the choice (6) in the limit of small α, i.e., a(d)→aC(d) in Eq (6) for α → 0+. Two choices of a(d;α) that satisfy these properties (and also a(0;α) = 1 and limd → ∞ a(d, α) = 0 for any α) are:

  1. (G) Gravitational model: (7) inspired by models of gravitational interactions (for large d the interactions decay as a ∼ 1/d2, one can also replace the power 2 by an additional parameter) [3, 8, 15].
  2. (E) Exponential model: (8)

For α → ∞, the distances do not matter, everyone is equally linked to everyone else, and the P-model is retrieved. Altogether, the four models discussed above are summarized in Table 1 and satisfy

2.3 Likelihood

We now discuss how the likelihood of our models can be computed from the data. We assume that (xi, yi) data is available at locations i = 1, …, N. We denote the locations i as cities but we stress that this does not need to correspond to any urban definition of cities as the spatial interaction between different regions can be accounted explicitly in our models by choosing an appropriate function a(d). We assume also that a measure of distance di,i between all pairs of cities is available (e.g., the geodesic distance between the centroid of the cities).

Besides their location (city), individuals are indistinguishable. Therefore, the probability pc(i) that a token is assigned to city i is given by a sum of pp(j) over persons j on city i (i.e. c(j) = i), which contains exactly xc(j)xi terms (9) where we used Eq (4) and consider that xi ≫ 1 for all i. The last equation defines the attractiveness of an individual in city i as (10)

This can be thought also as the number of effective interactions available for an individual in city i so that Ai = xi in the city model (6) and Aixi otherwise (e.g., for the gravitational and exponential models). It depends only on the population xi of all cities and on the distances di,i between cities, e.g. through Eqs (7) or (8), and therefore Ai can be computed independently of the data yi.

The expected number of tokens in city i is given by (11)

The probability of observing yi tokens in each city of size xi is a multinomial distribution (12)

This corresponds to the likelihood P(D|M, θ) of the data D ≡ {y1, ⋯, yN}—since the populations (x1, ⋯, xN) are fixed—for a given model class M and given parameters θ. It is convenient to write the log-likelihood as (13)

3 Data analysis

3.1 General framework

The models described above contain strong simplifying assumptions Our focus on the scaling relationship led to the assumptions that individuals are identical and that the token assignments are independent. and therefore our approach here is not to test whether the data is compatible with them (we know it is not While in linear fitting the number of observations equals to the number of cities, our model focus on individuals j and tokens of output y (X = ∑xi, Y = ∑yi) so that the number of observations is much larger and the expected fluctuations (for large cities) are much smaller than the fluctuations in the data. This accounts only to fluctuations of the (random) assignment of tokens and neglects fluctuations (present in the data) due to measurement imprecision and due to factors that are not part of our model.) but instead to compare the different models. This means that instead of the likelihood P(D|M, θ) that models generate the data D = {y1, ⋯, yN}, computed in the previous section, we should focus on what the data D tells us about the model class M ∈ {P, C, G, E} and their parameters θ = {α, β}. This is done based on the (posterior) probability (14) computed from the three terms in the right hand side:

  • P(D) depends only on the data, act as a normalization, and does not affect the choice between models.
  • P(M, θ) is the prior probability and is taken flat so that no a priori preference is given to any model. Specifically, we write P(M, θ) = P(θ|M)P(M) with P(M) = 1/4 and constant P(θ|M) in 0 ≤ β ≤ 2 and 0 ≤ ααmax, where αmax is an arbitrary maximum distance (we use αmax = 6, 371 km, Earth’s radius). This implies that P(θ|M) for our the models P, C, G, E are 1, 1/2, 1/2 αmax, 1/2 αmax, respectively.
  • P(D|M, θ) is the likelihood and is evaluated numerically from Eq (13). This is facilitated by two observations: (i) the two first terms in the log-likelihood (13) are independent of the models so that the variation across M and θ depends only on the last term; (ii) in this last relevant term, the parameter α enters only in Ai through the dependence on a(d) so that for a fixed α the dependence of the matrix di,i is reduced to the vector Ai. It is thus computationally more efficient to fix α, compute Ai once, and then consider variations in β.

3.2 Estimation of parameters θ = {α, β}

The best parameters θ = {α, β} of a given model M are the ones that maximize the posterior P(θ|D, M). Since the priors are constant, this is equivalent to the maximization of the log likelihood (13) in the space of admissible parameters set by the priors.

3.3 Model selection

In the comparison of the different model classes M we account for the fact that models have different (number of) parameters θ by computing P(M|D), or equivalently, the description length (15) by integrating over all parameters θ of model M

The description length corresponds to the size (in number of bits, for based 2 logarithm) of the optimal encoding of data and model [22]. Since the priors P(θ|M) and P(M) are constant, the crucial computational step is the integration of the likelihood over the parameters θ. When the number of observations Y = ∑i yi is large (often the case for relevant urban scaling analysis), the likelihood is expected to be sharply peaked around the maximum-likelihood parameters θ. In this case, the description length is dominated by the maximum log-likelihood and further approximations can be used to compute (e.g., the Bayesian Information Criterion). However, one should be careful using these approximations to compare non-nested models (e.g., G and E) and around parameters θ in which the priors are discontinuous (as in the relevant case of α = 0).

4 Results

4.1 Data

We apply the models and data-analysis methods described above to five datasets from two different countries. For Brazil, the data on three observables y—GDP, deaths due to external (non-natural) causes, and deaths due to AIDS—in the year 2010 is given for thousands of municipalities (administrative boundaries). For USA, the data on two observables y—GDP and miles of roads—in the year 2013 are given for hundreds of metropolitan areas. The USA cases can be considered as the paradigmatic examples of super- and sub-linear urban scaling laws [12]. In both countries, the average distance between two urban units is of thousands of km. The results of our analysis are reported in Table 2. The data and codes used in this paper are available in Ref. [23]. The data was collected from censuses and governamental agencies, was used in Refs. [12, 17], and is available with further documentation and all codes used in this paper in Ref. [23].

4.2 The effect of α

We start investigating the central question of this paper: does spatial proximity between cities help to explain the observations y studied in urban scaling? And, if so, does it affect the scaling exponent β? The results in Fig 2 demonstrate that the answer to both questions is positive in most (but not all) cases. The top row of the figure shows that the value of the (maximum likelihood) exponent β for a fixed α changes significantly with α. The bottom row shows that often (Brazil GDP, USA Roads, but not in USA GDP) the best model is observed for α > 0. In these cases, there is an interval in α for which the model with geographic distance has a larger likelihood than the α = 0 case, compatible with the idea that the spatial scales we are accounting in this interval are meaningful (i.e., distances of 10–100 km that are relevant to interacting people).

Fig 2. Spatial dependence affects the choice of the scaling parameter β.

The value of α (measured in kilometres) is varied and the most likely value of β is estimated for each α. The three left panels shows the value of β and the three left panels indicate the likelihood of the different models M indicated in the legend. The panels on the top correspond to GDP data from Brazil, the best model is M = G with α = 14.6 and β = 1.24. The panels on the centre correspond to GDP data from USA, the best model is the city model M = C (obtained for α = 0) with β = 1.12. The panels on the bottom correspond to data from road miles in the USA, the best model (largest likelihood) is M = G with α = 20.4 and β = 0.75.

The addition of spatial interactions does not trivialize the urban scaling law, differently from the effect of city boundaries reported in Ref. [14]. In fact, the non-linear scaling exponent β is often enhanced by the spatial relation α, i.e., super- (sub-) linear scalings β > 1 (β < 1) in the usual approach (at α = 0) show an even larger (smaller) value of β for the maximum-likelihood value of α. For instance, for Brazil GDP the estimation of β in the non-spatial models are 1.05 (linear fitting) and 1.17 (city model) while in the spatial models it is 1.24 (gravitational model) and 1.21 (exponential model). The same effect is observed in the case of sublinear scaling in the data for USA Roads Lengths, see Table 2.

4.3 Comparing different models

In all our five datasets the models with non-linear scaling (C,G, and E, for which β ≠ 1) are preferred over the per-capita (P) model (negative in Table 2). In four of the five datasets, the models with spatial interactions (α ≠ 0 in the G and E models) are preferred over the one (C-model) that ignores it. The exception is the case of USA GDP, for which the estimated value of α is zero for the G model and very small (1.65 km) for the E model. The description length of the C model is smaller than the one in the G, E models by 2 and 3 bytes, respectively, indicating that the largest likelihood of the data obtained with α = 1.65 in the E model is not sufficient to justify its increased model complexity.

The comparison of the Gravitational and Exponential models reveal that both show a very similar behaviour as a function of α (Fig 2), similar inferred model parameters α and β, and similar description lengths (Table 2). This indicates that the conclusions are not very sensitive to the functional form of a(d), used to account for spatial interactions (as long as they satisfy the natural constraints we used to propose a(d)). The most important distinction we found is between models that ignore spatial interactions (linear fitting, C model, and α = 0) and those that account for it (α > 0 in the G and E models).

4.4 Increased interactivity

We now investigate how the spatial models introduced here change the number of effective interactions of individuals. In the introduction we discussed how the GDP of cities close to large urban areas were underestimated. Our analysis reveals that spatial interactions were not a strong factor in the USA GDP data overall. This was different for Brazil GDP, where the best model is the Gravitational model with α = 14.6. For these parameters, in Fig 3 we show the increased attractiveness—or number of interactions, Ai in Eq (10)—that individuals in different cities in Brazil experience. It fluctuates significantly from city to city because it is an intricate function of the location of all cities, but it is clear that smaller cities are more affected than larger cities.

Fig 3. Accounting for spatial interactions increase the attractiveness of individuals in small cities.

The attractiveness Ai in Eq (10) divided by the population xi is shown as a function of xi for the different Brazilian municipalities i. The horizontal black line correspond to the case in which spatial interactions are ignored (α = 0 ⇒ Ai = xi). The dots correspond to the result of the Gravitational model with the maximum likelihood parameters obtained for the case of GDP (see Table 2).

For the case of “São Caetano do Sul”, the attractiveness of the inhabitants of this municipality is 43.6 times larger than assuming that interactions occur only within the municipality (i.e., A = 43.6 x for α = 14.6 in M = G). The GDP of this city is 11.0 BR$ (Billion reais), much larger than the per-capita expectation of 2.1 BR$. The city model improves this expectation to 2.7 BR$, still too low but better than the linear-fit estimation 1.6 BR$. The best spatial model (G model with α = 14.6 and β = 1.24) improves the prediction to 4.5 BR$. Therefore, we conclude that spatial interactions can explain a considerable amount of the GDP of this municipality, even more than the inclusion of the non-linear scaling (β > 0), but that other factors remain significant.

5 Discussions

We introduced models of urban scaling laws that account for spatial interactions between individuals in different locations and that allow for rigorous statistical inference and model comparison. Results in five databases reveal that spatial interactions between cities leads to improved models and change the estimation of the urban scaling parameter β. Our approach shows how the problem [13, 14] of the effect of the definition of the urban unit (city boundaries) on scaling laws can be solved by including spatial interactions between different locations explicitly in the model and inference.

The framework introduced in this paper can be extended to account for more sophisticated models (of interactions), beyond the four simple models introduced here. This could include more detailed information about the proximity and connectivity between different urban areas (e.g., commuting time) and incorporate ideas proposed in models of scaling laws [15], in models of the growth of cities [2, 3], and in methods to define boundaries of urban regions [5, 14]. It would be interesting to use these models to explore datasets at different spatial resolutions (e.g., neighbourhoods) and when additional information on the population in each location is available. The crucial point is that additional parameters and models for interactions should be inferred from the data together with the parameter β of the urban scaling law, avoiding arbitrary choices and leaving to the data and model-comparison techniques the choice between different approaches.


Somwrita Sarkar and Elsa Arcaute contributed with stimulating discussions.


  1. 1. Fujitsa Masahisa, Krugman Paul R., and Venables Anthony J. The Spatial Economy: Cities, Regions, and International Trade. MIT Press, 2001.
  2. 2. Batty Michael. The new science of cities. Mit Press, 2013.
  3. 3. Barthelemy Marc. The Structure and Dynamcis of Cities. Cambridge University Press, 2016.
  4. 4. Rybski Diego, Arcaute Elsa, and Batty Michael. Urban Scaling Laws Environment and Planning B: Urban Analytics and City Science 46 (9) 1605–10 (2019).
  5. 5. Rozenfeld Hernán D., Rybski Diego, Gabaix Xavier, and Makse Hernán A. The Area and Population of Cities: New Insights from a Different Perspective on Cities. American Economic Review, 101(5), 2205–25 (2011).
  6. 6. Simini Filippo, González Marta C., Maritan Amos, and Barabási Albert-László. A Universal Model for Mobility and Migration Patterns. Nature 484 (7392): 96–100 (2012). pmid:22367540
  7. 7. Barnes Trevor J. and Wilson Matthew W. Big Data, Social Physics, and Spatial Analysis: The Early Years. Big Data & Society 1 (1): 2053951714535365 (2014).
  8. 8. Stewart John Q., Suggested Principles of ‘Social Physics’. Science, 106 (2748): 179 (1947). pmid:17749163
  9. 9. Sarkar Somwrita, Phibbs Peter, Simpson Roderick, and Wasnik Sachin. The Scaling of Income Distribution in Australia: Possible Relationships between Urban Allometry, City Size, and Economic Inequality. Environment and Planning B: Urban Analytics and City Science 45 (4): 603–22 (2018).
  10. 10. Bettencourt Luís M. A., Lobo José, Helbing Dirk, Kuhnert C., and West Geoffrey B. Growth, innovation, scaling, and the pace of life in cities. Proceedings of the National Academy of Sciences, 104(17):7301–7306, 4 2007. pmid:17438298
  11. 11. Bettencourt Luís M. A., Lobo José, Strumsky Deborah, and West Geoffrey B. Urban scaling and its deviations: Revealing the structure of wealth, innovation and crime across cities. PLoS ONE, 5(11):e13541, 11 2010. pmid:21085659
  12. 12. Bettencourt Luís M. A. The origins of scaling in cities. Science, 340(6139):1438–1441, 2013. pmid:23788793
  13. 13. Louf Rémi and Barthelemy Marc. Scaling: lost in the smog. Environment and Planning B: Planning and Design, 41(5):767–769, 10 2014.
  14. 14. Arcaute Elsa, Hatna Erez, Ferguson Peter, Youn Hyejin, Johansson Anders, and Batty Michael. Constructing cities, deconstructing scaling laws. Journal of The Royal Society Interface, (i):3–6, 2015. pmid:25411405
  15. 15. Ribeiro Fabiano L., Meirelles Joao, Ferreira Fernando F., and Neto Camilo R. A Model of Urban Scaling Laws Based on Distance Dependent Interactions. Royal Society Open Science 4 (3): 160926 (2017). pmid:28405381
  16. 16. Clauset Aaron, Shalizi Cosma R., and Newman Mark E. J., Power-Law Distributions in Empirical Data, SIAM Rev. 51, 661 (2009).
  17. 17. Gerlach Martin and Altmann Eduardo G. Testing statistical laws in complex systems, Phys. Rev. Lett. 122, 168301 (2019). pmid:31075025
  18. 18. Corral Álvaro, Udina Frederic, and Arcaute Elsa. Truncated Lognormal Distributions and Scaling in the Size of Naturally Defined Population Clusters. Physical Review E 101(4): 042312 (2020). pmid:32422775
  19. 19. Leitao Jorge C., Miotto Jose M., Gerlach Martin, and Altmann Eduardo G., Is this scaling nonlinear? Royal Society Open Science 3, 150649 (2016). pmid:27493764
  20. 20. Cosma R. Shalizi. Scaling and hierarchy in urban economies. arXiv:1102.4101, 2011.
  21. 21. Finance Olivier and Cottineau Clémentine. Are the Absent Always Wrong? Dealing with Zero Values in Urban Scaling. Environment and Planning B: Urban Analytics and City Science 46(9): 1663–77 (2019).
  22. 22. Grünwald Peter D. The Minimum Description Length Principle. MIT Press, 2007.
  23. 23. The data and codes used in this paper are available at: