Skip to main content
Advertisement
  • Loading metrics

Urban scaling with censored data

  • Inês Figueira ,

    Contributed equally to this work with: Inês Figueira, Rayan Succar

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Center for Urban Science and Progress, New York University Tandon School of Engineering, Brooklyn, New York, United States of America, Department of Mechanical and Aerospace Engineering, New York University Tandon School of Engineering, Brooklyn, New York, United States of America

  • Rayan Succar ,

    Contributed equally to this work with: Inês Figueira, Rayan Succar

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Center for Urban Science and Progress, New York University Tandon School of Engineering, Brooklyn, New York, United States of America, Department of Mechanical and Aerospace Engineering, New York University Tandon School of Engineering, Brooklyn, New York, United States of America

  • Roni Barak Ventura,

    Roles Data curation, Investigation, Writing – original draft, Writing – review & editing

    Affiliations Center for Urban Science and Progress, New York University Tandon School of Engineering, Brooklyn, New York, United States of America, Department of Mechanical and Aerospace Engineering, New York University Tandon School of Engineering, Brooklyn, New York, United States of America, School of Applied Engineering and Technology, New Jersey Institute of Technology, Newark, New Jersey, United States of America

  • Maurizio Porfiri

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Resources, Supervision, Writing – original draft, Writing – review & editing

    mporfiri@nyu.edu

    Affiliations Center for Urban Science and Progress, New York University Tandon School of Engineering, Brooklyn, New York, United States of America, Department of Mechanical and Aerospace Engineering, New York University Tandon School of Engineering, Brooklyn, New York, United States of America, Department of Biomedical Engineering, New York University Tandon School of Engineering, Brooklyn, New York, United States of America

Abstract

In the realm of urban science, scaling laws are essential for understanding the relationship between city population and urban features, such as socioeconomic outputs. Ideally, these laws would be based on complete datasets; however, researchers often face challenges related to data availability and reporting practices, resulting in datasets that include only the highest observations of the urban features (top-k). A key question that emerges is: Under what conditions can an analysis based solely on top-k observations accurately determine whether a scaling relationship is truly superlinear or sublinear? To address this question, we conduct a numerical study that explores how relying exclusively on reported values can lead to erroneous conclusions, revealing a selection bias that favors sublinear over superlinear scaling. In response, we develop a method that provides robust estimates of the minimum and maximum potential scaling exponents when only top-k observations are available. We apply this method to two case studies involving firearm violence, a domain notorious for its suppressed datasets, and we demonstrate how this approach offers a reliable framework for analyzing scaling relationships with censored data.

Author summary

Over the past two decades, urban scaling has become essential for understanding the rural-urban continuum by quantifying how urban characteristics depend on a city’s population size. For example, more populous cities are expected to have more patents and wages per capita, but fewer gas stations and road surfaces. Nonetheless, access to incomplete datasets about urban features systematically skews the conclusions derived from this theory. This issue is particularly relevant for features related to health outcomes, which are regularly obtained from partially censored datasets. For instance, data on firearms in the United States remain inaccessible to the public. To address this limitation, we developed a framework that enables urban researchers to draw reliable conclusions about urban scaling, even when dealing with censored datasets. We demonstrate this framework with data on firearm homicide and the number of firearms recovered by authorities in American cities.

1 Introduction

Scaling laws are ubiquitous in nature, describing many of the phenomena and processes that surround us. A scaling law summarizes the behavior of a system through a power-law, connecting certain properties of the system with its size [1]. Scaling laws have been instrumental in characterizing relationships across a wide range of domains, including biological and physical systems. For example, Kleiber’s law illustrates how metabolic rates of organisms scale with their body mass [2]. Likewise, scaling laws in the field of ecology indicate that the number of species supported by an ecosystem relates to its area [3]. In the ideal gas law, scaling describes the relationships between pressure, volume, temperature, and the number of molecules [4].

As urbanization rates are ever-increasing [5], understanding scaling of urban features with city population is critical to urban science, management, and planning. Many scaling relationships between the population of a city X and urban feature Y have been documented, which have led to the development of urban scaling theory. Given N cities, an urban scaling law takes the form of , with i = 1, …, N, where C is a common baseline, β is the scaling exponent that illustrates how an urban feature varies with city size, e is the Napier’s constant, and εi represents the deviation of city i from its nominal behavior [6]. The scaling parameters C and β are typically computed by logarithmically transforming the scaling law to ln Yi = ln C + β ln Xi + εi and fitting a linear model [6].

Researchers have shown that urban features can scale differently with population size, reflecting systematic relationships across urban and societal metrics. Empirical studies demonstrate that socioeconomic features such as GDP, property values, patents, homicides, and violent crimes exhibit a superlinear dependence on city population (β > 1) [512], meaning that larger (smaller) cities exhibit higher (lower) rates of these features per capita. In contrast, the space occupied by urban infrastructure such as roads, cables and built area scales sublinearly with city population (0 < β < 1) [13, 14]. Household and individual needs like total employment, housing, and water consumption, instead, typically show a linear dependency on city population (β = 1) [5, 15].

Over the years, several studies have refined urban scaling and expanded its framework to address methodological limitations. For example, Bettencourt et al. distinguished cross-sectional from temporal scaling to capture temporal dynamics beyond pure scale effects [16]. Cross-sectional scaling compares cities at a fixed point in time, whereas temporal scaling tracks changes within cities but can be unstable in cities with slow or negative growth. Finance and Cottineau addressed the issue of null observations in cities during scaling analysis [17]. Although these values may be valid (for example, a city where no patents were filed), the standard practice was to remove them from the analysis, as the logarithm of zero is undefined [18]. The authors explored alternative methods to ordinary least squares (OLS) for fitting urban models to avoid the exclusion of zero counts. Xiao and Gong argued that spatial dependencies exist between cities that are geographically proximate [19]. They designed a spatial filtering method to account for such dependencies in urban scaling and found that models that do not account for spatial interactions may overestimate GDP in developed regions and underestimate it in underdeveloped ones. In spite of the great strides made in the growing field of urban scaling, the vast majority of existing analyses assume access to a complete data set when fitting a model.

When working with city-level data, access to complete datasets becomes a common challenge. One cause of incomplete data is the obligation of government agencies to prevent the identifiability of sensitive information. For example, the Centers for Disease Control and Prevention Wide-Ranging Online Data for Epidemiological Research (CDC WONDER) publishes data on the underlying causes of death among United States (U.S.) citizens. They provide the yearly counts of each cause of death at the resolution of the entire country, states, and counties. However, to protect individuals’ privacy, the agency suppresses counts of nine and lower. Hence, urban scaling research on causes of death in the U.S. are difficult to perform. Similarly, the Tiahrt Amendments [20] impose restrictions on the reporting of data by the U.S. Bureau of Alcohol, Tobacco, Firearms and Explosives (ATF), limiting the disclosure of trace data related to firearms used in crimes to the public. Instead of sharing complete data, the ATF is only allowed to report limited information, such as the top ten cities in each state with the highest number of gun recoveries and the total number of firearms recovered in that state. For both the CDC WONDER and ATF cases, data are censored because they fall below a certain threshold, a situation known as “left-censoring”. Such data censoring poses a serious challenge to urban scaling studies on firearm recoveries in the U.S.

Data on cities may also be incomplete due to “missingness”, where data points are not available because they are not recorded. The reasons underlying missing data are commonly known as “missing data mechanisms”. These mechanisms, as described in [21], fall into three categories: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). Data MCAR occurs when there is no relationship between whether a data point is missing and any values in the dataset, either missing or observed. When the probability of a missing value is dependent on other observed variables but not the value itself, it is considered MAR. In the case of MNAR, the missingness is systematically related to unobserved data or factors not measured by the researcher. For instance, for the CDC WONDER or ATF datasets, data are missing not at random as they are not available when falling below a certain threshold.

Various methods have been devised to address the issue of incomplete data. Recent methodological research [22, 23] has focused on maximum likelihood estimation (MLE) [24, 25], Bayesian estimation [26, 27], and multiple imputation [28, 29]. However, most advanced statistical imputation methods mainly aim at imputing MCAR and MAR and are not suitable for MNAR [30]. Some statistical methods have also been developed for regression analyses when data are MNAR, such as the Tobit model and its variations [31], Powell quantile estimators [32], or othe nonparametric estimators [33]. While effective, these methods are quite general and fail to utilize key information provided by the reporting entity that may be accessible to researchers (for example, the sum of the censored data). Moreover, in the context of urban scaling, the primary focus of a model is whether scaling is superlinear or sublinear, making the precise value of a scaling exponent less critical than its bounds.

In this paper, we aim to address censored data in the context of urban scaling. We focus on data related to firearms and mortality, only available for the highest (“top-k”) observations due to privacy reasons. We propose a rigorous, yet simple, method tailored for urban scaling analysis that estimates scaling behavior. Along with the top-k observations, the method incorporates the total counts of the feature across the dataset in the form of a constraint, taking advantage of the aggregated observations reported in existing datasets. By solving an optimization problem, we bound the regression slope by providing its minimum and the maximum possible values. This approach not only simplifies the estimation process compared to existing methods, but also provides robust bounds necessary for determining whether an urban feature scales superlinearly or sublinearly. Our method offers a powerful tool for urban researchers, ensuring reliable assessment of scaling behaviors even when working with incomplete data.

In the following, we first conduct numerical simulations using both complete and incomplete synthetic datasets to explore how the use of incomplete data could bias the estimation of scaling laws. We then present an algorithm that iteratively distributes missing values to unknown cities. We apply the developed framework to two case studies. In the first, we inspect suppressed data on firearm homicides from CDC WONDER and complete data from National Center for Health Statistics’ (NCHS) Restricted-Use Vital Statistics Data. We compare the estimates of the scaling exponent when using the incomplete and complete data and validate our β-bounding method. In the second case study, we apply the bounding method on the partially reported data to conclude whether firearms recovered by the ATF follow a superlinear or sublinear scaling. Our results demonstrate the value of this bounding process in the study of urban scaling laws when datasets suffer from censored observations.

2 Results

2.1 Assessing bias in urban scaling due to censored data

As a first step to understand how incomplete data can bias the estimation of scaling laws and the inference of superlinearity and sublinearity, we conduct a numerical study using both complete and incomplete synthetic datasets. We simulate the typical case of health-related outcomes where data are only available for a subset of k cities with the highest values of the urban feature reported (top-k), and no other information is given regarding other cities except for the total value of the outcome variable in larger spatial units (as reported by CDC WONDER and ATF).

We aim to quantify the deviation of the estimated regression slope (where a hat refers to an estimated value and superscript “k” denotes the known partial data) from the true value β due to censored data. To this end, we compute the error of the estimation () over a range of changes to key factors that could impact the estimation of β, including the true scaling law exponent (β), proportion of known data (top-k%), standard deviation of the error (σ), and complete dataset size (N). In addition, we consider two distributions for the population data: normal and log-normal. We generate random synthetic observations while systematically varying these parameters in a factorial design (see Methods for details).

First, by using censored data, we find that the error of the estimation of β can be relatively high, and that it is similar for different values of β (Fig 1A). Interestingly, we find that the error of the estimation is asymmetric and biased toward sublinear scaling, such that one is more likely to infer a sublinear scaling relationship although a truly superlinear one exists. This asymmetry is engendered by the selection of the top-k cities based on their urban feature (Fig 1B). Specifically, the top-k cities are more likely to have a positive residual with respect to the linear fit on the complete dataset, so that considering only them leads to underestimation of the scaling exponent. In agreement with our expectations, we find that regardless of the population distribution (normal or log-normal) or the value of β, the magnitude of the error tends to increase as the percent of known data becomes smaller (Fig 2A and 2B), and as the standard deviation of the noise increases (Fig 2C and 2D). The error does not change with the size of the complete dataset (Fig 2E and 2F), although we notice that for larger datasets, the variance of the estimator decreases. Such a decrease does not guarantee the consistency of (see Section A of S1 Appendix).

thumbnail
Fig 1. Bias in estimating the urban scaling exponent with censored data.

(A) Assessment of the estimate of the scaling exponent () from data generated using a true scaling law ( for i = 1, ⋯, N) with X following either a normal distribution (blue) or log-normal (orange), as a function of the true scaling exponent. The proportion of known data points is selected based on the k-highest percent value of the response variables Y. The violin plots represent the distribution of the error, while the boxes inside represent the first (Q1) and third (Q3) quartiles, and their whiskers extend to 1.5 times the interquartile range from Q1 and Q3. Each violin plot contains 500 data points. For each violin plot, we also report the true positive rate (TPR) for the inference of sublinear (β < 1) and superlinear (β > 1) scaling. (B) Illustration of the reason for bias towards sublinear scaling discovered in (A). Using a censored dataset that only uses the top values of a selected urban feature (red-filled circles) incorrectly discounts observations in the complete dataset (open circles) that have negative residual with respect to the true fit (black dashed line), thereby leading to biased model estimation (red solid line).

https://doi.org/10.1371/journal.pcsy.0000029.g001

thumbnail
Fig 2. Factors influencing bias in estimating the urban scaling exponent with censored data.

Assessment of the estimate of the scaling exponent () from data generated using a true scaling law ( for i = 1, ⋯, N, and β = 5/6 or β = 7/6) with X following either a normal distribution (blue) or log-normal (orange), as a function of (A-B) proportion of known data, (C-D) standard deviation of the true error, and (E-F) complete dataset size. The proportion of known data points is selected based on the k-highest percent value of the response variables Y. The violin plots represent the distribution of the error, while the boxes inside represent the first (Q1) and third (Q3) quartiles, and their whiskers extend to 1.5 times the interquartile range from Q1 and Q3. Each violin plot contains 500 data points. For each violin plot, we also report the true positive rate (TPR) for the inference of sublinear (β = 5/6) and superlinear (β = 7/6) scaling.

https://doi.org/10.1371/journal.pcsy.0000029.g002

In all of the simulations, we consider whether regressing with incomplete data causes urban scaling classification errors by looking at the true positive rate (TPR) for true superlinear and sublinear scaling relationships (Figs 1 and 2). The TPR measures the proportion of sublinear (superlinear) cases correctly identified by a model as such, allowing us to evaluate the performance of hypothesis testing regarding sublinear or superlinear dependence on population. For instance, in the case of true superlinear scaling relations, the TPR represents the proportion of correctly identified superlinear relations when only a certain proportion of the data is known (β > 1 and ; see Methods). Due to the asymmetry in the errors ( is underestimated), we find that the TPR for superlinear scaling is less than that for sublinear scaling, potentially being as low as zero.

In Section B of S1 Appendix, we present results in Fig 2 for β = 2/3 and 4/3 where similar trends are observed. We also show the relationship between the error in the estimation of the scaling exponent when using censored data and the coefficient of determination of the censored data estimation (Rk)2, where we see that the higher (Rk)2, the lower the bias.

2.2 Greedy algorithm to bound the scaling exponent

We devise a general bounding framework that uses a greedy optimization to estimate the minimum and maximum possible scaling exponents, . By computing these bounds, we aim to reach a more reliable conclusion about a scaling behavior, while effectively addressing the biases encountered when using OLS on the censored data. Within a system of N cities, we address the case in which the researcher has only access to urban measurements in a subsystem of k < N cities, and to the total count of the urban feature across all N cities. In order to find the upper bound of the scaling exponent (), we solve the constrained optimization problem (1) where the column vector X = [X1, ⋯, XN]T contains the population sizes of all N cities, comprises the k known values of the urban feature, consists of the Nk unknown values for which we are optimizing “uk”. Similar to city population data, we also consider the urban features to be positive integer numbers. We denote vectors and matrices in bold and use T for matrix transpose. The function fβ(X, Yk, Yuk) represents the OLS estimator of the scaling exponent (for further details, see Methods).

In this greedy approach, we pose that the sum of and over i is equal to the total of the urban feature . In addition, we constrain between Ymin,i and Ymax,i, the values of which will depend on the reporting and censoring process. The lower bound of the scaling exponent () can be written equivalently to Eq (7) (see Methods), with “min” instead. Once obtained, the upper and lower bounds can be used to verify the validity of inferences based on partial datasets. In fact, will offer backing to the inference of sublinear scaling and to the inference of superlinear scaling. Some insight into the optimal Yuk can be garnered by linearizing the objective function and solving the optimization problem analytically (see Section C of S1 Appendix). Such an analysis suggests that bigger cities should be assigned values close to Ymax,i and smaller cities values close to Ymin,i, thereby maximizing the contrast between them.

2.3 Case studies of urban scaling with censored data

To demonstrate the value of the our bounding scheme in urban research, we apply it to two real datasets with partial observations: firearm homicides from the CDC and firearms recovered by the ATF. In the CDC case study, we obtained access to the uncensored dataset from the National Center for Health Statistics (NCHS) [34] allowing us to validate the scaling conclusions. Such privilege is not granted with the ATF study case. Applying our framework to these datasets, we not only gain a deeper understanding of firearm-related violence and crimes in the U.S., but also demonstrate how this optimization process can be generalized to other censored datasets for estimating scaling laws.

2.3.1 Firearm homicides.

Similar to Bettencourt et al. [16], we perform cross-sectional scaling of firearm homicides with population for U.S. cities, over the five-year period between 2016 and 2020 (Fig 3). The results are presented for cities, encompassing both Metropolitan Statistical Areas (MSAs) and Micropolitan Statistical Areas (MicroSAs). While urban scaling relations are highly sensitive to the spatial boundaries defining a city [35], there is no standardized definition for a city in the U.S. Consequently, both MSAs and MicroSAs are commonly used as functional cities in analyses [36].

thumbnail
Fig 3. Urban scaling exponent of firearm homicides in the U.S. MSAs and MicroSAs (2016–2020).

Yellow dots and orange diamonds represent the minimum () and maximum () scaling exponent, respectively, obtained by implementing the optimization strategy on the reported data (CDC suppresses firearm homicides in cities where there are fewer than ten incidents). These serve as bounds for the actual (dark purple open circles) and (light purple squares) obtained using only the reported data; horizontal lines (whiskers) denote the limit of the 95% confidence interval. The horizontal dashed line represents the limit above which the scaling relation is superlinear.

https://doi.org/10.1371/journal.pcsy.0000029.g003

Urban scaling for firearm homicides in the U.S. exhibits a power-law relation with city population for both the censored and complete datasets. Using a censored dataset leads to the inference of a sublinear relationship across all years, with the true exponent being consistently underestimated (Fig 3). With the complete dataset, reflects a strictly sublinear relationship for all years, except in the year 2020. In this year, when the reported MSAs and MicroSAs account for about three quarters of the total firearm homicides, , with a 95% confidence interval of [0.921;1.013] (Table 1). Given the confidence interval, we cannot reject the hypothesis that β = 1. We also note that the coefficient of determination of the complete model (R2) is larger than that of the partial data ((Rk)2), indicating that using OLS regression on the complete dataset could yield better-fitted results (Table 1).

thumbnail
Table 1. Results on urban scaling exponent for firearm homicides in the U.S. MSAs and MicroSAs from 2016 to 2020, using suppressed and complete data.

https://doi.org/10.1371/journal.pcsy.0000029.t001

We apply our bounding scheme assuming each suppressed county had between one and nine counts of homicide. Our results indicate that and across all years so that when working with partial data, one should be prudent in interpreting their results (Fig 3 and Table 1). In particular, the fact that the upper bound is always greater than 1 indicates that one should not exclude the possibility that their inference based on partial data is incorrect. This is the case for the year 2020, when partial data would yield with confidence [0.492;0.697] and real data are instead supportive of a linear scaling , with a 95% confidence interval of [0.921;1.013].

2.3.2 Recovered firearms.

In the second case study, we investigate the scaling of firearms recovered across the U.S. in 2022 with city population. These yearly data are made available by the ATF, where the top-k cities per state with the most firearms recovered are reported, along with the total number of firearms recovered in the entire state. Using only the reported values, it is difficult to conclude whether firearms recoveries scale sublinearly or superlinearly with population across the U.S states. The small sample size (10 cities for each state except Vermont and Washington) does not allow for precise estimation, resulting in wide confidence intervals (Table 2).

thumbnail
Table 2. Estimates of the scaling exponent () for recovered firearms in each state of the U.S. (except of Hawaii) and the District of Columbia (D.C.), for the year 2022 based on ATF reported data, along with the corresponding bounds from the optimization.

https://doi.org/10.1371/journal.pcsy.0000029.t002

To address this issue and bound the exponent , we apply the developed optimization algorithm with the assumption that each city has at least one firearm recovered. For 11 of the 49 states (all states except of Hawaii, see Methods), it is not possible to apply the optimization scheme since the number of cities other than the reported top-k exceeds the number of recovered firearms outside of the top-k cities, violating the underlying assumption. Out of the remaining 38 states, only three (Arizona, California, and Rhode Island) have . Therefore, we cannot reject the hypothesis of superlinearity or linearity for these states. For the remaining states, , indicating a sublinear behavior of firearm recoveries with respect to city population.

Fig 4 shows the bounds for the scaling relation when considering the combined 38 states and the District of Columbia (D.C.), where and , reflecting the trend of sublinearity in the country. For this case study, we numerically explore the global optimality of the solution through exhaustive perturbations (see Section D of S1 Appendix).

thumbnail
Fig 4. Urban scaling results for recovered firearms in the U.S. in 2022 after optimization.

The dots identify the optimal number of recovered firearms as a function of the population in 28,970 Census Incorporated Places and Minor Civil Divisions. The number of unknown recovered firearms in each of the 38 states and D.C. was optimally distributed among the different states to compute the minimum (A) and the maximum (B) scaling exponent β. All places were assumed to have at least one recovered firearm. Of the 49 states, it was not possible to compute and for 11. This issue arises because, in these 11 states, the number of cities not in top-k exceeds the number of recovered firearms there, indicating some of them had zero firearms recovered. The bounding procedure used to compute and operates under the assumption that every city within a state has at least one firearm recovered. When this assumption is violated, these bounds cannot be computed.

https://doi.org/10.1371/journal.pcsy.0000029.g004

2.4 Sensitivity analysis

The proposed bounds may be prone to error due to noise in the data. In order to assess the robustness of these bounds, we conduct Monte Carlo simulations to estimate 95% confidence intervals in the ATF dataset. We perform two variations of the simulations. The first (Method 1) assumes the sum of simulated values are within 5% difference with respect to the real data, and the second (Method 2) that the top-k% of the synthetic data matches the real. Each of these methods preserves different characteristics of the data (see Methods for details) and allows us to estimate the 95% confidence intervals of and . We observe narrow confidence intervals for both simulations, indicating robustness of the bounding scheme (Table 2). For all states where , the confidence intervals are below 1, reinforcing our claim of sublinear scaling (Table 2). For Rhode Island, despite , the confidence interval using different methods are below 1, indicating potential ambiguity in the scaling interpretation for this state.

3 Discussion

Urban scaling is a fundamental tool used in urban science, yielding interesting power laws that capture the relationship between urban features and city population. Ideally, urban scaling needs a complete dataset to derive accurate scaling exponents; however, legal and ethical considerations often lead to censoring of data, thereby presenting significant challenges to the estimation of urban scaling relationships. Censored data differently affect cities as a function of their count of an urban feature, whereby small cities are more prone to be characterized by smaller value of some urban features, potentially below the minimum that agencies can share with the public.

In numerical simulations, we explore multiple factors that could impact the estimation of the scaling exponent. Our results indicate that two of these factors critically affect the estimation of scaling exponents: the proportion of known data and the variance of the noise. While their role in the estimation of scaling is intuitive as both these factors determine the quality of a dataset, we also find that scaling exponents are consistently underestimated. Therefore, one is more likely to correctly infer a sublinear relationship and fail to infer a superlinear one. Arguably, performing OLS fitting using a top-k dataset leads to systematic underestimation of the scaling exponent. For sufficiently dispersed datasets (ones with high noise), the cities experiencing the largest values of the urban feature under investigation may not be the most populous ones. Thus, a linear model with only the top-k cities could omit cities with large populations but values of the urban feature lower than the top-k. These cities have a negative residual with respect to the fit on the complete dataset; discarding them will lead to underestimating the scaling exponent. In real datasets, such a discrepancy may also result from data segmentation, where different population segments have been found to exhibit different scaling behaviors [3739]. To address the biases that result from censored data, we devise a bounding method that determines the minimum and maximum possible scaling exponents, and apply it to two case studies.

The first case study focuses on the scaling of firearm homicides over the five-year period between 2016 and 2020, where we compare the performance of a left-censored dataset against that of a complete one. For the complete dataset, we find a sublinear relationship for all years except for 2020, when the COVID-19 pandemic started and an increase in firearm purchases and violence has been documented [40, 41]. In 2020, the data are, in fact, indicative of a linear scaling. Using the left-censored dataset, we are not able to recover such a change in time, whereby we consistently register sublinear scaling of firearm homicides for all the years. Our bounding scheme successfully casts a doubt on the validity of the sublinear trend. In fact, our lower bound is below one and our upper bound is above one, so that prudence is needed when drawing conclusion on the scaling exponent with partial data. Interestingly, aggregating the data over multiple years to mitigate zero counts may not resolve the issue of data missingness in the scaling. First, aggregation could skew the inference towards superlinear scaling, by systematically under counting firearm homicides in small cities without affecting the counting in large cities. Second, the aggregation does not capture time trends in the scaling, such as the one observed herein due to the COVID-19 pandemic (see Section E of S1 Appendix). Both these factors are likely the reasons for which several studies support firearm homicides to be more frequent in urban rather than rural settings [12, 4245].

In the second case study, we investigate the scaling of recovered firearms in the year 2022. In the absence of a complete dataset, the fit of an OLS model produces extremely wide confidence intervals, ranging from negative to values greater than one in some cases. Thus, conclusive interpretations of scaling behavior become virtually impossible. However, implementing our proposed bounding scheme allows to shed light on scaling of firearm recoveries. Our results support the sublinear scaling of firearm recoveries in the U.S., hinting that firearms might be more prevalent in rural areas. This notion aligns with the sublinear behavior of firearm ownership and federal firearm-selling licenses reported by Succar and Porfiri [12]. Similarly, a recent Pew Research Center survey has shown that 46% of people who reside in rural areas reported themselves as firearm owners, compared to 19% of people who live in urban areas [46]. The observed sublinear scaling in firearm recoveries could also be attributed to varying strategies for tracking and recovering firearms across different jurisdictions. The Tiahrt Amendment prohibits federal agencies from creating searchable firearm databases, making the ATF’s firearm recovery efforts extremely inefficient [47]. Under these circumstances, records of completed firearm sales have become invaluable for regional law enforcement, especially when maintained and retained permanently in a central database. For instance, handgun sales records in California are stored in a state Department of Justice database, enabling law enforcement agencies to swiftly trace the ownership of handguns recovered in crimes [48]. California is also one of the three states where we observe a potential superlinear relationship between city population and the number of firearms recovered by the ATF (). Additionally, it is tenable that recovering firearms in smaller cities is easier than in larger ones due to familiarity among locals [49, 50], their investment in creating a safe environment through community policing [51, 52], and higher trust and cooperation between citizens and authorities [51].

While both study cases demonstrate its value in firearm research, our bounding method could also be implemented in domains other than urban science. For example, recent work suggests that metabolic rates of eusocial systems scale sublinearly with the mass of a colony [53, 54]. Yet, practical limitations have hindered validation of this proposition across a wide range of colony sizes, as measurements of metabolic rates for small colonies are difficult to capture by typical respirometry apparatuses. Similarly, performing experiments on large colonies is challenged by housing requirements in the laboratory. As such, data in these metabolic studies are left- and right-censored. Our approach could help overcome those data limitations by bounding the scaling exponents of partial datasets and inferring the metabolism laws of colonies. Another possible application is in the field of environmental studies, where concentration of pollutants or chemicals is often left-censored because analytical instruments have detection limits below which pollutants cannot be accurately measured [55]. Instead of reporting an exact concentration, values below the detection limit are commonly recorded as “less than” the limit or the percent detected [56, 57]. By applying our approach to these left-censored datasets, environmental scientists could bound the scaling exponents that describe the relationship between chemical concentrations and various environmental factors.

Our study has five significant limitations. First, in the numerical simulations we consider the residuals to be normally distributed. This assumption may not always be appropriate [58], hindering the generalization of our conclusions to scenarios where the errors do not follow a normal distribution. The second limitation concerns the acquisition of data on city populations in the firearm recoveries case study. The ATF does not have a consistent definition of a city. While most of the cities included in the top-k list correspond to census incorporated places and minor civil divisions, some do not. This is the case for 22 areas, such as Eagle River in Alaska (a community within the Municipality of Anchorage). In our analysis we consider only census incorporated places and minor civil divisions [59], thereby excluding these 22 other areas. This inconsistency in defining cities complicates the analysis, making it challenging to accurately define all possible cities not mentioned in the top-k list. The third limitation relates to our bounding method’s assumption that there is at least one observation in each city. For the ATF case study, we were unable to bound the scaling exponent for 11 states because the number of cities not included in the top-k list exceeds the number of firearms recovered in those areas. This limitation could be addressed in a future study by combining the proposed problem with the work of Finance and Cottineau [17] that employ estimation techniques to handle datasets with zero counts so that our bounding method accounts for the possibility of zero observations. Fourth, our approach assumes that cities are independent of each other in line with classical urban scaling theory. As a result, we apply standard OLS for the estimation. We envision integrating our approach with the one proposed by Xiao and Gong [19] to account for spatial interactions between cities, by generalizing the objective function of our optimization. Finally, the proposed optimization framework based on a greedy algorithm was developed for scaling with specific cases, which may limit the generalization of the algorithm to other problems. These problems may include datasets with large variances or a high number of outliers, different types of constraints, or scaling that requires estimators other than OLS.

In conclusion, our work identifies a potential flaw in the current use of partial data to draw conclusions about scaling relationships in urban data. We offer compelling evidence that censored data may lead to inaccurate predictions of scaling exponents, where superlinear relationships could be erroneously identified as sublinear ones. We put forward a simple methodology to bound the scaling exponent from censored observations, based on the solution of a constrained optimization problem that assumes absence of zeros in the dataset and leverages information on the sum of all counts. We propose that future reporting of urban scaling relationships in technical papers (especially sublinear ones) include explicit information about the number of inaccessible data points along with an estimation of the expected effect of such a data missingness. The latter can be pursued through the implementation of a bounding scheme like the one proposed in this work (when possible) or stress tests on the scaling exponent through Monte Carlo simulations.

4 Methods

4.1 Urban scaling law

Given N cities, an urban scaling law is a relationship between some urban feature of interest and the city population of the form (2) where i = 1, …, N, Yi and Xi are the urban feature and population size for city i, C is a common baseline, and εi is the deviation of city i from its nominal behavior. This scaling law can be written in linear form [15], (3) where we introduce log-transformed variables yi = ln Yi, xi = ln Xi, and α = ln C. Since urban scaling relations are linear on the log-log scale, we can estimate the parameters of the scaling relationship by using OLS, which minimizes the sum of squared errors [6]. Such a minimization yields (4) (5) where and are the estimated values of α and β, and and are the averages of the log-transformed population size and urban feature, respectively. One of the limitations of OLS regression is that it requires complete data for all variables included in the model to ensure unbiased estimation. If there are missing data points, OLS may result in biased and unreliable regression coefficients [21].

4.2 Assessing bias in urban scaling due to censored data

The synthetic data are simulated according to a true scaling law, , with zero intercept (ln C = 0). This true scaling law serves as a baseline for comparing estimated scaling laws when using , allowing us to quantify the bias more accurately. The synthetic data are generated to account for different scenarios of scaling that could be produced by a real dataset. Specifically, we identify the population distribution, the true slope (β), standard deviation of the error (σ), size of the dataset (N), and the proportion of known data points (top-k%) out of the dataset as parameters that could meaningfully alter the estimation of the scaling exponent.

To assess whether the marginal distribution of X affects the estimated , two population distributions are considered, normal () and log-normal (), and sampled using numpy (version 1.26.4; [60]). For each population distribution, we employ a factorial design varying the other four parameters: β ∈ {2/3, 5/6, 7/6, 4/3}, σ ∈ {0.01, 0.05, 0.1}, N ∈ {100, 500, 3000}, and top−k% ∈ {25%, 50%, 75%}. The values of β were selected based on the literature on urban scaling laws, which have helped identify typical scaling exponents as a function of the city organization and type of urban feature [13, 61]. In total, the factorial design for each distribution contains 108 combinations (216 in total).

For each possible combination in the factorial design, linear regression is performed on the subset of the known top−k% data points to obtain a value of . We simulate the experiment on the entire design 500 times, totaling 108,000 observations. To further assess how the bias resulting from using censored data affects the estimation of scaling relationships, we also look at the TPR of real superlinear and sublinear scaling relations. Specifically, for sublinear scaling cases (β < 1), we consider estimates as true only when the upper bound of . Similarly, for superlinear cases (β > 1), we consider estimates as true only when the lower bound of . TPR is then computed as the fraction of correct estimates out of all 500 estimates.

4.3 Greedy algorithm to bound the scaling exponent

The bounding method consists of optimizing over Yuk to estimate the scaling exponent using the OLS estimator derived from Eq (4), denoted as (6)

To find the maximum or minimum regression slopes, we construct the unknown observations . As an initial step, we assign , where Ymin,i ≥ 1 in accordance with the assumption that all cities must have non-zero values for their feature, which may be violated in reality. To find the Yuk entries that result in (), we iteratively increase the value of each entry by one, without surpassing Ymax,i, and seek the largest increase (decrease) of fβ. In other words, for each iteration over the entries of Yuk, we compare the values of fβ for all updated entries and identify the entry that results in the largest (or smallest) value of . If two or more entries produce the same result for fβ, the algorithm will select the first entry that appears in the order of iteration. We end the process when the sum of known and unknown values matches . This greedy scheme is detailed in Algorithms 1 and 2. To gain a better intuition about the procedure, we describe it using the following equation: (7) for t = 1, ⋯, tf, where , , and ξ is the set of all standard basis vectors of length Nk, that is, {[1, 0, …, 0]T, …, [0, 0, …, 1]T}. The maximum regression slope is found during the last iteration, . The optimization in Eq (7) is executed through exhaustive search, that is, searching over the entire set ξ.

Algorithm 1 Greedy algorithm to find

Input:

Output:

Initialization:

for iteration from 1 to do

  

  imax ← 0

  for i from k + 1 to N do

   YauxYuk

   

   Ensure:

   if then

    

   end if

  end for

end for

return

Algorithm 2 Greedy algorithm to find

Input:

Output:

Initialization:

for iteration from 1 to do

imin ← 0

for i from k + 1 to N do

  YauxYuk

  

  Ensure:

if then

   

  end if

end for

end for

return

4.4 Case studies of urban scaling with censored data

4.4.1 Firearm homicides.

Firearm homicide data are obtained from the CDC WONDER database and NCHS’s Restricted-Use Vital Statistics Database. For both data sets, we query for incidents of firearm homicides using the following ICD-10 Codes: X93 (Assault by handgun discharge), X94 (Assault by rifle, shotgun, and larger firearm discharge), and X95 (Assault by other and unspecified firearm discharge). We filter the data for years between 2016 and 2020, and group the results by year and county. Population counts in each county are returned with the query.

We conduct scaling analyses for each year, at the level of MSA and MicroSA. We begin with an OLS regression on logarithmically transformed variables to compute from the left-censored dataset and from the complete dataset. The bounds and for are estimated using the greedy optimization algorithm described earlier for the censored data. Cities with null values are removed from the analysis.

For the MSA and MicroSA level analysis, we first convert county level data to MSAs and MicroSAs. We rely on the U.S. Bureau of Labor Statistics’ Quarterly Census of Employment and Wages County-MSA-CSA Crosswalk [62] to aggregate counts of firearm homicides in counties to MSAs and MicroSAs, based on county codes. The total number of homicides () in all the MSAs and MircroSAs is also reported. After grouping the counties into their respective MSA or MicroSA, we take the ones that do not have any suppressed counties and construct the vector Yk. Each element of Yuk consists of an MSA/MicroSA that has at least one suppressed county. Let hi represent the total homicides reported in MSA/MicroSA i, and sci represent the number of suppressed counties. Within each MSA/MicroSA i, the entries of Yuk are constrained by Ymin,i = hi + sci and Ymax,i = hi + 9sci, since the CDC suppresses values between one and nine for each county, while reporting the counties with zero homicides. For example, if MSA/MicroSA i has three suppressed counties and 14 homicides reported in total, we constrain the entries of i in the range 17 to 41, corresponding to one or nine homicides in each of the suppressed counties.

4.4.2 Recovered firearms.

For the analyses of the total number of firearms recovered, we manually collect data from the “U.S. Firearms Trace Data by State” provided by the ATF [63]. The dataset includes the total number of firearms recovered and traced by state in 2022, along with the top-k cities in terms of recoveries within each state (k=10 for all states, except for Vermont and Washington where k=15 and 11, respectively). Due to limited data on population size in its cities, Hawaii is excluded from this analysis. The population data are collected from the Census “Incorporated Places and Minor Civil Divisions Datasets” [59]. Although there is no standardized definition for a city in the U.S. [36], the cities included in this dataset encompass various administrative divisions such as incorporated places, minor civil divisions, and census-designated places, among others, leading to an inconsistent definition of what constitutes a city.

The scaling estimates for each state are calculated using OLS regression on logarithmically transformed data. The bounds and could not be computed for 11 states because they have more cities than recovered firearms, indicating that in some places no firearm is recovered. This situation is not accounted for in the algorithms because we assume that the number of firearms recovered in each of the unknown cities is between one and the smallest of the top-k, that is, Ymin,i = 1 and Ymax,i = min(Yk).

To extend the scaling analyses to the entire U.S., we must account for the fact that each state has a different total number of recoveries. We re-define in Eq (7) as (8)

Here, the observations are organized into G states such that there are kj reported cities out of the total of Nj cities in state j. We apply the optimization algorithm with vectors Yk and Yuk being constructed by stacking each state’s and , respectively, where j = 1, …, 38 is the index of each state. We constrain the elements of so that , where we account for the different states having different constraints depending on the top cities reported.

4.5 Sensitivity analysis

To investigate the effects of small perturbations on the optimal bounds, we compute the 95% confidence intervals for the ATF case study for each state separately. We rely on Monte Carlo simulation to estimate the variance and the confidence intervals. Specifically, for each simulation, we generate a set data points that resembles the known real data reported by the ATF by sampling k synthetic urban features from the power-law distribution , with . Here, , , and are the parameters estimated from the real known data. Each time we sample the vector , we optimize accordingly to obtain a distribution for and . We estimate the variance of and from 1, 000 realizations of the Monte Carlo simulations using the var(⋅) function in R. Assuming the distributions of and to be Gaussian, we compute the confidence intervals using the standard normal approximation, which calls for scaling the standard error by 1.96 [64].

We note that sampling according to the estimated power law may not preserve the sum of the unknown values. We propose two methods to address this issue. In Method One, we disregard samples with more than a 5% difference with respect to the sum of the known data, specifically (9)

Hence, while optimizing, we assume that the sum of the entries of equals the difference between the reported total and the sum of our generated indicators, (10)

In Method Two, we posit that (11) retaining all samples and ensuring that the top-k% of the generated known data matches the real one, (12)

Supporting information

S1 Appendix. This appendix consists of five sections that provide additional details supporting the claims made in the main manuscript.

Section A: Assessing consistency in urban scaling. Section B: Assessing bias in urban scaling with alternative 12 values of β and N. Section C: Optimization problem. Section D: Validity of the greedy algorithm solution. Section E: Urban scaling of firearm homicides with complete data.

https://doi.org/10.1371/journal.pcsy.0000029.s001

(PDF)

Acknowledgments

The authors would like to thank Christopher Buglino for useful discussion on the methodology and for porting the optimization algorithm from R to Rcpp.

References

  1. 1. Barenblatt GI. Scaling. 1st ed. Cambridge: Cambridge University Press; 2003.
  2. 2. West GB, Brown JH, Enquist BJ. A general model for the origin of allometric scaling laws in biology. Science. 1997;276(5309):122–126. pmid:9082983
  3. 3. García Martín H, Goldenfeld N. On the origin and robustness of power-law species–area relationships in ecology. Proc Natl Acad Sci USA. 2006;103(27):10310–10315. pmid:16801556
  4. 4. Sengers JMHL, Greer WL, Sengers JV. Scaled equation of state parameters for gases in the critical region. J Phys Chem. 1976;5(1):1–52.
  5. 5. Bettencourt LM, Lobo J, Helbing D, Kühnert C, West GB. Growth, innovation, scaling, and the pace of life in cities. Proc Natl Acad Sci USA. 2007;104(17):7301–7306. pmid:17438298
  6. 6. Bettencourt LM. Introduction to urban science: evidence and theory of cities as complex systems. 1st ed. Cambridge: MIT Press; 2021.
  7. 7. Oliveira M. More crime in cities? On the scaling laws of crime and the inadequacy of per capita rankings–a cross-country study. Crime Sci. 2021;10(1):27.
  8. 8. Lobo J, Bettencourt LM, Strumsky D, West GB. Urban scaling and the production function for cities. PLOS One. 2013;8(3):e58407. pmid:23544042
  9. 9. Alves LG, Ribeiro HV, Lenzi EK, Mendes RS. Distance to the scaling law: a useful approach for unveiling relationships between crime and urban metrics. PLOS One. 2013;8(8):e69580. pmid:23940525
  10. 10. Meirelles J, Neto CR, Ferreira FF, Ribeiro FL, Binder CR. Evolution of urban scaling: Evidence from Brazil. PLOS One. 2018;13(10):e0204574. pmid:30286102
  11. 11. Bilal U, de Castro CP, Alfaro T, Barrientos-Gutierrez T, Barreto ML, Leveau CM, et al. Scaling of mortality in 742 metropolitan areas of the Americas. Sci Adv. 2021;7(50):eabl6325. pmid:34878846
  12. 12. Succar R, Porfiri M. Urban scaling of firearm violence, ownership and accessibility in the United States. Nat Cities. 2024;1(3):216–224.
  13. 13. Bettencourt LM. The origins of scaling in cities. Science. 2013;340(6139):1438–1441. pmid:23788793
  14. 14. Angel S, Parent J, Civco DL, Blei A, Potere D. The dimensions of global urban expansion: Estimates and projections for all countries, 2000–2050. Prog Plann. 2011;75(2):53–107.
  15. 15. Bettencourt LM, Lobo J. Urban scaling in Europe. J R Soc Interface. 2016;13(116):20160005. pmid:26984190
  16. 16. Bettencourt LM, Yang VC, Lobo J, Kempes CP, Rybski D, Hamilton MJ. The interpretation of urban scaling analysis in time. J R Soc Interface. 2020;17(163):20190846. pmid:32019469
  17. 17. Finance O, Cottineau C. Are the absent always wrong? Dealing with zero values in urban scaling. Environ Plan B Urban Anal City Sci. 2019;46(9):1663–1677.
  18. 18. Leitao JC, Miotto JM, Gerlach M, Altmann EG. Is this scaling nonlinear? R Soc Open Sci. 2016;3(7):150649. pmid:27493764
  19. 19. Xiao Y, Gong P. Removing spatial autocorrelation in urban scaling analysis. Cities. 2022;124:103600.
  20. 20. 114 Congress. 114 HR 1449 IH: Tiahrt Restrictions Repeal Act; 2015 [cited 2024 Aug 3]. Available from: https://www.congress.gov/bill/114th-congress/house-bill/1449.
  21. 21. Little RJ, Rubin DB. Statistical analysis with missing data. vol. 793. 3rd ed. Hoboken, New Jersey: John Wiley & Sons; 2019.
  22. 22. Enders CK. Missing data: An update on the state of the art. Psychol Methods. 2023. pmid:36931827
  23. 23. Enders CK. Applied missing data analysis. 2nd ed. New York: Guilford Publications; 2022.
  24. 24. Savalei V, Falk CF. Robust two-stage approach outperforms robust full information maximum likelihood with incomplete nonnormal data. Struct Equ Modeling. 2014;21(2):280–302.
  25. 25. Lüdtke O, Robitzsch A, West SG. Analysis of interactions and nonlinear effects with missing data: A factored regression modeling approach using maximum likelihood estimation. Multivar Behav Res. 2020;55(3):361–381.
  26. 26. Du H, Enders C, Keller BT, Bradbury TN, Karney BR. A Bayesian latent variable selection model for nonignorable missingness. Multivar Behav Res. 2022;57(2-3):478–512. pmid:33529056
  27. 27. Levy R, Mislevy RJ. Bayesian psychometric modeling. New York: Chapman and Hall/CRC; 2017.
  28. 28. Grund S, Lüdtke O, Robitzsch A. Multiple imputation of missing covariate values in multilevel models with random slopes: A cautionary note. Behav Res Methods. 2016;48:640–649. pmid:25939979
  29. 29. Chan KW, Meng XL. Multiple improvements of multiple imputation likelihood ratio tests. Stat Sin. 2022;32(3):1489–1514.
  30. 30. Baraldi AN, Enders CK. An introduction to modern missing data analyses. J Sch Psychol. 2010;48(1):5–37. pmid:20006986
  31. 31. Amemiya T. Advanced econometrics. Cambridge: Harvard university press; 1985.
  32. 32. Powell JL. Censored regression quantiles. J Econom. 1986;32(1):143–155.
  33. 33. Lewbel A, Linton OB. Nonparametric censored regression; 1998. Available from: https://elischolar.library.yale.edu/cowles-discussion-paper-series/1434/.
  34. 34. Centers for Disease Control and Prevention. 1999-2020: Underlying Cause of Death by Bridged-Race Categories; 2013 [cited 2024 Aug 15]. Available from: https://wonder.cdc.gov/Deaths-by-Underlying-Cause.html.
  35. 35. Molinero C, Thurner S. How the geometry of cities determines urban scaling laws. J R Soc Interface. 2021;18(176):20200705. pmid:33726542
  36. 36. Bettencourt L, West G. A unified theory of urban living. Nature. 2010;467(7318):912–913. pmid:20962823
  37. 37. Hanley QS, Lewis D, Ribeiro HV. Rural to urban population density scaling of crime and property transactions in English and Welsh parliamentary constituencies. PLOS One. 2016;11(2):e0149546. pmid:26886219
  38. 38. Ribeiro HV, Hanley QS, Lewis D. Unveiling relationships between crime and property in England and Wales via density scale-adjusted metrics and network tools. PLOS One. 2018;13(2):e0192931. pmid:29470499
  39. 39. Sutton J, Shahtahmassebi G, Ribeiro HV, Hanley QS. Rural–urban scaling of age, mortality, crime and property reveals a loss of expected self-similar behaviour. Sci Rep. 2020;10(1):16863. pmid:33033349
  40. 40. Schleimer JP, McCort CD, Shev AB, Pear VA, Tomsich E, De Biasi A, et al. Firearm purchasing and firearm violence during the coronavirus pandemic in the United States: a cross-sectional study. Inj Epidemiol. 2021;8:1–10. pmid:34225798
  41. 41. Sun S, Cao W, Ge Y, Siegel M, Wellenius GA. Analysis of firearm violence during the COVID-19 pandemic in the US. JAMA Netw Open. 2022;5(4):e229393–e229393. pmid:35482307
  42. 42. Branas CC, Nance ML, Elliott MR, Richmond TS, Schwab CW. Urban–rural shifts in intentional firearm death: different causes, same results. Am J Public Health. 2004;94(10):1750–1755. pmid:15451745
  43. 43. Crifasi CK, Merrill-Francis M, McCourt A, Vernick JS, Wintemute GJ, Webster DW. Association between firearm laws and homicide in urban counties. J Urban Health. 2018;95:383–390. pmid:29785569
  44. 44. Siegel M, Solomon B, Knopov A, Rothman EF, Cronin SW, Xuan Z, et al. The impact of state firearm laws on homicide rates in suburban and rural areas compared to large cities in the United States, 1991-2016. J Rural Health. 2020;36(2):255–265. pmid:31361355
  45. 45. Reeping PM, Mak A, Branas CC, Gobaud AN, Nance ML. Firearm death rates in rural vs urban US counties. JAMA Surg. 2023;158(7):771–772. pmid:37099312
  46. 46. Parker K, Horowitz JM, Igielnik R, Oliphant JB, Brown A. America’s complex relationship with guns. Pew Research Center’s Social and Demographic Trends Project. Pew Research Center.; 2017 Jun 22 [cited 2024 Aug 16]. Available from: https://www.pewresearch.org/social-trends/2017/06/22/the-demographics-of-gun-ownership.
  47. 47. Horwitz S, Grimaldi JV. ATF’s oversight limited in face of gun lobby. Washington Post.; 2010 Oct 26 [cited 2024 Aug 2]. Available from: https://www.washingtonpost.com/wp-dyn/content/article/2010/10/25/AR2010102505823.html?sub = AR.
  48. 48. Giffords Law Center to Prevent Gun Violence. Maintaining Records of Gun Sales in California; 2023 [cited 2024 Aug 7]. Available from: https://giffords.org/lawcenter/state-laws/maintaining-records-of-gun-sales-in-california/.
  49. 49. Hummon DM. In: Altman I, Low SM, editors. Community attachment. Boston, MA: Springer US; 1992. p. 253–278.
  50. 50. Belanche D, Casaló LV, Rubio MA. Local place identity: A comparison between residents of rural and urban communities. J Rural Stud. 2021;82:242–252.
  51. 51. Sozer MA, Merlo AV. The impact of community policing on crime rates: Does the effect of community policing differ in large and small law enforcement agencies? Police Pract Res. 2013;14(6):506–521.
  52. 52. Everytown Research & Policy. Community-Led Public Safety Strategies; 2022 [cited 2024 Aug 14]. Available from: http://www-cs-faculty.stanford.edu/~uno/abcde.html.
  53. 53. Waters JS, Holbrook CT, Fewell JH, Harrison JF. Allometric scaling of metabolism, growth, and activity in whole colonies of the seed-harvester ant Pogonomyrmex californicus. Am Nat. 2010;176(4):501–510. pmid:20735259
  54. 54. Porfiri M, De Lellis P, Aung E, Meneses S, Abaid N, Waters JS, et al. Reverse social contagion as a mechanism for regulating mass behaviors in highly integrated social systems. PNAS Nexus. 2024;3(7). pmid:38962249
  55. 55. Hernandez-Vargas G, Sosa-Hernández JE, Saldarriaga-Hernandez S, Villalba-Rodríguez AM, Parra-Saldivar R, Iqbal HM. Electrochemical biosensors: a solution to pollution detection with reference to environmental contaminants. Biosensors. 2018;8(2):29. pmid:29587374
  56. 56. Shen G, Preston W, Ebersviller SM, Williams C, Faircloth JW, Jetter JJ, et al. Polycyclic aromatic hydrocarbons in fine particulate matter emitted from burning kerosene, liquid petroleum gas, and wood fuels in household cookstoves. Energy Fuels. 2017;31(3):3081–3090. pmid:30245546
  57. 57. Yu Y, Katsoyiannis A, Bohlin-Nizzetto P, Brorstrom-Lunden E, Ma J, Zhao Y, et al. Polycyclic aromatic hydrocarbons not declining in Arctic air despite global emission reduction. Environ Sci Technol. 2019;53(5):2375–2382. pmid:30746937
  58. 58. Sutton J, Shahtahmassebi G, Hanley QS, Ribeiro HV. A heteroscedastic Bayesian generalized logistic regression model with application to scaling problems. Chaos Solit Fractals. 2024;182:114787.
  59. 59. United Stated Census Bureau. Incorporated Places and Minor Civil Divisions Datasets: Subcounty Resident Population Estimates: April 1, 2020 to July 1, 2023 (SUB-EST2023); 2024 [cited 2024 May 11]. Database: City and Town Population Totals: 2020-2023 [Internet]. Available from: https://www.census.gov/data/tables/time-series/demo/popest/2020s-total-cities-and-towns.html.
  60. 60. Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, et al. Array programming with NumPy. Nature. 2020;585(7825):357–362. pmid:32939066
  61. 61. Ortman SG, Cabaniss AH, Sturm JO, Bettencourt LM. The pre-history of urban scaling. PLOS One. 2014;9(2):e87902. pmid:24533062
  62. 62. United States Bureau of Labor Statiscs. COUNTY-MSA-CSA CROSSWALKS, 1990-2012 & 2013-2023; 2024 [cited 2024 July 7]. Database: City and Town Population Totals: 2020-2023 [Internet]. Available from: https://www.bls.gov/cew/classifications/areas/county-msa-csa-crosswalk.htm.
  63. 63. Bureau of Alcohol, Tobacco, Firearms and Explosives. U.S. Firearms Trace Data by State; 2022 [cited 2024 May 11]. Database: Data & Statistics [Internet]. Available from: https://www.atf.gov/resource-center/data-statistics.
  64. 64. Devore JL, Berk KN, Carlton MA, et al. Modern mathematical statistics with applications. vol. 285. 3rd ed. Cham, Switzerland: Springer; 2012.