Demarcating Geographic Regions using Community Detection in Commuting Networks

We find agglomerations of U.S. counties that are partitioned by commuting patterns by representing inter-county commuting patterns as a weighted network. To do so, we develop a community detection method based on the configuration model to identify significant clusters of nodes in a weighted network that prominently feature self-loops which represent same-county commuting. Application of this method to county level commuting data from 2010 yielded regions that are significantly different from existing delineations such as Metropolitian Statistical Areas and Megaregions. Our method identifies regions with varying sizes as well as highly overlapping regions. Some counties may singularly define a region, while others may be part of multiple regions. Our results offer an alternative way of categorizing economic regions from existing methods and suggest that geographical delineations should be rethought.


Introduction
Delineations of agglomerations of geographic objects are usually determined by their analytic and political purposes (Jones and Paasi, 2013;Paasi, 2013). For example, watershed boundaries are determined by hydrological flows, councils of governments are based on political connections among jurisdictions, and labor market areas are based on matching employment opportunities with worker residences. On the other hand, megaregions are identified by the Regional Planning Association (RPA) based on linkages between economic systems, infrastructure patterns, and projected population growth (Hagler, 2009). These agglomerations are regions, which are fundamental objects in regional science (Pike, 2013). In this paper we present a method to delineate regions using community detection methods from network science. One of the key innovations of this approach is to identify multiple overlapping regions of different types and at different scales in the same statistical inference framework. We also extend community detection methods to include self-loops that have typically been ignored in other community detection work. However, self-loops are of great importance in commuting networks and the more general context of spatial networks that characterize collective behaviors of agglomerations of social or physical particles.
Commuting networks have been used in many different settings (see e.g. Manley, 2014;Tonev et al., 2017;Nelson and Rae, 2016). For example, the U.S. Office of Management and Budget (OMB) uses commuting networks to define core based statistical areas (CBSA). Surrounding counties are added to a CBSA core county (population thresholds of 10,000 or 50,000 in urban areas for micropolitan (µSA) and metropolitan areas (MSA)), if social and economic ties (as defined by commuting to the core county) are more than 25% of the total ties. Adjacent CBSAs are merged into a single CBSA when the central county or counties of one CBSA qualify as an outlying county or counties to the other CBSAs. The county to CBSA relationship is a one-to-one relationship; i.e. a county belongs to only one CBSA. If a county satisfies threshold requirements for multiple CBSAs, then it is allocated to the CBSA that has the strongest ties unless it is a core county. Furthermore, CBSAs are necessarily geographically contiguous ({Office of Management and Budget}, 2010).
The core-based approach to identifying urban regions has a long history (since 1950s). Key to this approach is the identification of a core and its connections with the hinterland (nodal communities) (Nystuen and Dacey, 1961). Research has focused on the core as the central subject of study rather than acknowledging the polycentric regional forms that have come to dominate regions around the world (Fowler et al., 2018). Since the1980s, research has shown that the majority of the commuting flows in urban systems are lateral (i.e. between different parts of suburbs and hinterland) rather than core-centered (Plane, 1981). Methods have been proposed to account for these peripheral commuting patterns and use them to delineate regions (e.g. Tong and Plane, 2014).
In addition to this issue of core focus, another problem of unique membership persists in standard delineations. While objects can logically belong to multiple agglomerations, many of the applications tend to assign them to one by breaking membership ties. Many counties have large numbers of commuters to different cities that are relatively close to one another, especially in densely urbanised regions (Han and Goetz, Han and Goetz;Kim et al., 2017). In such instances, it is useful to relax the unique relationship condition between an object and the agglomeration it belongs to.
The other major issue that has received less attention in the literature (both in the context of regional demarcation as well as in the area of network science) is the idea of self-connection. Many agglomeration delineations in both network science and regional science have focused on the dyadic connections between two nodes/counties. However, commuting networks have significant self-loops (i.e. commuters within a county). In fact, 56% of the total commuters in 2010 in the US commuted within the same county they resided in. Ignoring this large commuting pattern skews the results of agglomerations. Since some nodes may be preferentially attached to themselves (measured by the weight of the self-loop), they should be treated as their own agglomerations known as monads (section 2.5). To this end, we propose a methodology that accounts for these three different types of regions that account for multiple memberships, core-based nodal communities and monads (see figure 1).
Networks are used to model the relational structures between individual units of an observed system. A multitude of data structures may be conceived of as networks from the biological, physical, and social sciences. Over the last few years, owing to the explosion in data from a host of areas including social networks, information networks such as the Internet and biochemical networks such as gene regulatory networks, there has been a concerted inter-disciplinary approach to understand this data, see (Newman, 2018(Newman, , 2003Boccaletti et al., 2006;Durrett, 2007;Van Der Hofstad, 2009). Due to the inherently relational nature of the commuting data, network methods are a naturally fitting approach to identify clusters.
In this paper, we present a method to identify clusters in a weighted network with self-loops. Using this method, we analyze the commuting between counties in the United States in 2010 to identify the clusters of counties that are significantly connected with themselves and with others. We proceed by describing the methodology and theoretical basis. We then present and discuss the results of regional delineation. We conclude with the limitations and furture research directions.

Methodology
Community detection is an approach to divide a set of nodes in a given network into clusters that are strongly connected within clusters as opposed to between clusters. Many techniques have been proposed for unweighted or binary data ranging from modularity optimization (Girvan and Newman, 2002;Clauset et al., 2005), structured stochastic blockmodels (Holland et al., 1983;Nowicki and Snijders, 2001;Peixoto, 2018;Yan et al., 2014), and extraction methods (Zhao et al., 2012;Lancichinetti et al., 2011). Most of these methods are based on a null network model. Typically, one constructs a network model that preserves some aspects of the observed network (in the context of unweighted networks, this is most often the degree distribution of the network, see Section 2.2 for details) but creates a network with no inherent clustering tendency. One then compares the observed network with this null model to extract subsets which seem more densely connected within the subset as compared to the null model.
The Configuration Model on a simple graph, first introduced by Bollobas (1980) and Bender and Canfield (1978), is a probability measure on a family of multigraphs that preserves the degree sequence. The input to the model is an observed graph from which we extract the degree sequence namely a list consisting of the vertices and their corresponding degrees. The model then constructs a random graph as follows: start with the degrees of nodes with d u denoting the degree of vertex u; associate every vertex u is with d u "stubs". One then performs a uniform matching on these stubs to form full edges, thus resulting in a random graph with the prescribed degree sequence but without any other inherent clustering tendency. The relative proclivity of each node to form ties is determined purely upon its degree.
Many of the aforementioned community detection methods utilize the configuration model as the null (Lancichinetti et al., 2011;Palowitch et al., 2018;Fosdick et al., 2018). We significantly extend the methodology developed by a subset of the authors (Palowitch et al., 2018) for weighted network data by developing a new null model for weighted networks with self-loops. The outcome of the methodology reveals both significantly connected communities, monads , as well as well as nodal communities (see figure  1).
In the following sections, we elaborate on our methodology by first defining the terminology in section 2.1. We then describe the null model in section 2.2 paying close attention to the adjustments for weighted self-loops. In section 2.3 we discuss the variance parameters of the latent variable associated with the null model. We describe the community detection steps (initialization, update, filtering) in section 2.4 and describe the post-processing step to separate the nodal communties from non-nodal communities in section 2.6. In addition to communities, since monads are important, we describe their identification in section 2.5. We finally describe methods to benchmark the results with other delineations in section 2.7.

Notation
Let an undirected weighted network on n nodes be denoted by the triple G = ([n], A, W), where [n] := {1, 2, . . . , n} is the set of n labeled nodes; A = (A uv ) is an n×n square, symmetric adjacency matrix with A uv = 1 if there is an edge between u and v, and A uv = 0 otherwise. Since we are interested in networks with self-loops, we assume A uu ≡ 1 for all u ∈ [n]. W = (W uv ) is another symmetric matrix representing (non-negative) weights of edges between vertices with W uv denoting the weight between u, v ∈ [n] with W uv ≡ 0 if there is no edge between u and v.
We let d u = v∈ [n],v =u A uv denote the degree of a vertex, which specifically is the total edges connecting to u but not u itself. The total strength of a node u is defined as: s u = v∈[n] W uv . We let d T = u∈[n] d u and s T = u∈[n] s u denote the total degree and weight of the network respectively. We define u to be the propensity of the node to connect to itself by We let the baseline propensity of the self-loop ratio for the entire network be: We let d = (d 1 , . . . , d n ) and s = (s 1 , . . . , s n ) denote the degrees and strengths of nodes in [n] respectively.

Null model for weighted networks with self-loops
A null model in the context of community detection is a random network model which preserves some aspects of an observed network but without any explicit community structure. The most common null model used in the context of unweighted networks is the configuration model as described in the previous section. Once one has a null model for the network, communities based on the null model are groups of nodes that deviate from the notion of a baseline by being more connected to each other than expected under the null.

Related Null Models
Various functionals are used to measure the deviation of a set or a partition of the entire node set from the null model, the most popular among these being the modularity score Girvan and Newman (2002); Clauset et al. (2005). One can then try to optimize such scores to find the best partitions. (Fosdick et al., 2018) introduced a framework for configuration models that accounts for self loops and used a modularity optimization approach for community detection as an application, but do not focus on weighted networks whose most prominent edges are self loops. Another null-model based approach is to assess the statistical significance of the deviation of subsets from what one would expect under the null, correcting these estimates for multiple testing, and then extracting communities that appear to be more significantly connected than under the null. Several approaches have implicitly utilized the notion of deviation against null partitions such as likelihood ratios (Yan et al., 2014) or Bayes factors (Peixoto, 2018).
Significance-based testing was directly pursued in the context of unweighted networks in Wilson et al. (2014) and weighted networks in Palowitch et al. (2018), but do not account for networks with strong self-looping characteristics. We extend the significance testing based approach in which we now extend both in scope and application by adjusting for self-loops in a dataset with prominent weighted self-loops.

Model Construction
We start by describing the null model for weighted networks with self-loops that will serve as a comparative model for an observed weighted network G = ([n], A, W). The model is indexed by a family of parameters θ = (d, s, κ SL , κ nSL , a, b) where d, s are as before the degree and weight sequence of the observed network respectively, κ SL , κ nSL > 0 are parameters that control the variance of self-loop and non self-loop edge distribution in the null model and a, b > 0 are parameters constrained by the relation a/(a + b) = p where p is as in (1) is the global self-looping tendency of the observed graph. Implicitly we will fix two distributions F SL and F nSL on R + with mean one and variance κ SL and κ nSL respectively. Using the above ingredients we construct a random weighted graph G = ([n],Â,Ŵ) as follows: (i) Network topology: By designÂ uu = 1 for all u ∈ [n]. For all u = v we let (ii) Self-loop edges: For self-loop edges, we generate edge strengths as follows: First for each vertex u ∈ [n] (independently across vertices), we generate its self-loop propensityˆ u ∼ Beta(a, b) (i.e. a Beta distribution). Next we generate ξ uu from distribution F SL (independent ofˆ u ). Hence, we modelŴ (iii) Non self-loop edges: For u = v generate edge strengths as follows: first ifÂ uv = 0 from step (i) then letŴ uv = 0. IfÂ uv = 1 then let ξ uv ∼ F nSL and let where each q uv represents the following ratio of strengths and degrees of u and v: Writing D(u) and S(u) for the degree and strength of vertex u in the random graph it is easy to check that E(D(u)) = d u , E(S(u)) = s u , E(Ŵ uu ) = ps u .
Thus the model preserves (on average) the degrees and strengths of the observed graph without any other specific notion of clustering. Further in the null model, each vertex has no particular preferential notion of self-loopiness, rather on the average the self-loopiness density of each vertex is the same as the observed graph. Palowitch et al. (2018) use a method-of-moments estimator to specify parameters. We extend this assumption and posit two variance parameters to capture the variation in interconnectivity as well as self-looping tendencies of the null model. We specify two types of variables to describe the uncertainty arising out of the strengths of the nodes' connection propensities. Recall that we denote κ SL as the variance of self-looping edge weight distribution (with distribution F SL ) and κ nSL as the variance of the non-self-looping edge weight distribution (with distribution F nSL ). Both of these variables have mean one to ensure the preservation of the strengths and degrees for the configuration null model.

Variance Parameters
The expressions of κ SL and κ nSL are as follows: whereσ 2 p represents the estimated variance of u using empirical method-of-moments.
κ nSL represents the variation in relative weights between two edges when we know that the strengths (total sum of weights) are the same), andκ SL represents the variation within a single self-directed edge. Details on the derivations of these parameters can be found in the Appendix A.

Beta Random Variable to Model Self Looping Propensity
We specify u as adhering to a Beta(a, b) distribution with mean p and with variance equal toσ p 2 , the sample variance of p. The beta distribution is supported on [0, 1]. We are considering the proportion of self-looping commuters in each node as being made up of the averages of decisions to either commute in-county or out-of-county by a host of commuters, and have found that the beta distribution is a suitable description of such data. We have verified that the empirical distribution ofˆ u closely match the simulated values ofˆ u , except for a few nodes that are 0% or 100% self-commuting (Appendix B).
The variance of u is approximated using its sample varianceσ 2 p , which is calculated with a methodof-moments estimator, as in equation 8.

Community Detection Algorithm
The Continuous Configuration Model Extraction (CCME) algorithm used here can be split into three general phases:initialization, update, and filtering. These steps compose the general procedure of iterative testing. Significant communities as imputed by the iterative testing algorithms on networks are those with cross-edges that deviate considerably from the expected edges under a null model. Significant communities are determined by repeatedly applying an iterative search algorithm that starts with a seed set B 0 , finds all nodes with edges connecting to the seed set, then summing up the edge-weights as a test statistic. The test statistic is then evaluated against the expected values of the sums of the weights in the set under the null model (described in appendix A.4) with respect to each node in the starting seed set B 0 , and imputes a p-value for each node.
Each p-value from the whole set is then rejected after being filtered by the Benjamini-Hochberg correction. The nodes with p-values that are rejected after the correction are extracted as the significant nodes upon the first iteration and are used as the initial seed sets for the next iteration step. The final set B is extracted when the node-set becomes stable: when at some iteration step k, B k = B k+1 . Nodes in the final set have stronger affiliation with each other and have less edge connections with all other nodes outside the set.

Initialization
We initialize (step 1) sets B 0 in our application by setting counties with high commuting volume as seed nodes/counties. We select counties that have above 20,000 self-commuters as seed counties. For each seed county, we then find all counties which have commuters connected to the seed county. The seed county/node in addition to counties connected to it are used as the initializing sets B 0 . The seeds are largely irrelevant to the final outcome: the final outcomes are essentially same regardless of what the initial counties selected are , so long as a majority of the high-volume counties are included.

Update
Stable communities are found using iterative node-set updating scheme based on the p-value of the connectivity between a single node u ∈ [n], a candidate (testing) set B ∈ G. We denote S(u, B, G) as the connectivity of a single node (in this case, county) to the set of nodes which is hypothesized to be a community: When the observed value of S(u, B, G) significantly exceeds the expected sum of weights under the null model, then there is evidence to support that there is some additional structure undergirding the set of nodes than that which is posited by the null model, which attributes connectivity between sets of nodes as only dependent on the strengths and degrees of the aggregations of the nodes themselves.
The p-value representing the significance of node-set is given by: In the above equation G is fixed but G is random with distribution given by the null model P and each B represents the candidate set to be tested. When the observed value of S(u, B, G) is much larger than the expected value, expressed as S(u, B, G), the p-value is low. Low p-values are rejected in an iterative fashion as to allow the formation of node-sets with edges that are consistently significantly connected to each other. We define these sets as communities.
The iterative method is described as follows: for each u ∈ [n], given a set B (or denoted by B k at k th iteration if each iteration step is indexed), we find all counties that are connected to the present set, then p-values are imputed for members of the candidate set B and repeatedly tested until the set becomes stable upon sequential iterations: (i) calculate p-values p = p(u, B, G). P-value are calculated using a normal approximation of the values of S(u, B, G). Details on this part of the procedure are given in appendix A.4 (ii) obtain threshold τ (p) using a Benjamini-Hochberg multiple testing procedure . The procedure is used for sets of p-values that are obtained through multiple hypothesis testing. The rejection method ensures that the expected number of falsely rejected hypotheses divided by the total number of rejected hypotheses ( false discovery rate, or FDR) has a maximum percentage of α (Benjamini and Hochberg, 1995) . A false discovery rate threshold α of 5% is common in many applications, however, for community detection, however, we find empirically that such a threshold should be lower to avoid excess overlaps.
(iii) the next set reached by the iteration is defined as B = {u : p(u, B, G) ≤ τ (p)} The above steps are iterated with B replacing B until we reach a fixed point. We set α = .01 at each step of iterative testing to perform community detection. The threshold can be made higher or lower, and such adjustments do not change the results drastically (see supplement for details), but the threshold of .01 appears to to optimal for maximizing coverage and minimizing overlaps.

Filtering
After obtaining J initial communities, with the j th labeled B j , we filter communities first by redundancy, then we differentiate between types of clusters using the clustering coefficients of nodes. Redundant sets that have a high Jaccard overlap similarity are filtered and removed. A Jaccard similarity index of two sets is defined as the ratio of the size of common elements between the two sets over the total distinct elements, or concisely expressed as J(A, B) = |A B|/|A B| for two sets A, B (Jaccard, 1901) . Such an index for measuring overlaps is utilized in the same way for filtering clusters in (Palowitch et al., 2018).
For each pair of found communities, labeled B i , B j , we evaluate their Jaccard similarity J(B i , B j ). If it is above a given pre-set threshold τ , then the clusters are redundant and we select a preferred cluster by calculating the average weight per connection between nodes. We use a simple formula for a given node-set B: K(B) roughly measures the average sum of weights among cross-edges per node within a community B. A higher K(B i ) compared to K(B j ) signifies more interconnectivity between nodes in B i , and thus makes B i redundant compared to B j if there is ample overlap between the two sets. In this application, when comparing B i , B j , we choose whichever K(B i ) or K(B j ) is higher. In the implementation of CCME, we set the τ parameter to be 0.80, meaning that for two communities with more than 80% overlap, the one with higher average connectivity is used, while the other is filtered out.

Detection of Monads
One unique feature of geographical commuting networks is that many prominent nodes that have large strengths are not found in communities because most of the residents stay within their counties to commute. We define the degree of monadicity of a given node as the following: Assuming the observed graph originated from the null model , the variable I u measures two things. Firstly, I u measures how much more u is for a given u than the global mean self-looping tendency p. Secondly, I u measures (assuming the network did originate from the null model) how much more the latent ξ uu component ofŴ uu is than its expected value of one ( recall that self-loop weights are modeled asŴ uu =ˆ u ξ uu s u from equation 3).
The exact form of the variance of I u is difficult to calculate, but we use a simulation method to approximate a p-value for I u . We first determine estimates for a, b from the mean and variance of u under the null model. Given p, and the sample standard deviationσ 2 p , we find estimates for a, b as follows, by solving for the concentration parameters of the beta distribution from equation 9: We use these estimated parameters to simulate a measurement of how extremeÎ u of a given node is, compared to what would be expected assuming random generation from a beta(â,b) distribution. If the the actual value I u is large, as measured by whether it is above the α-th quantile of the simulated values, then the node is deemed significant because it consistently exceeds what is to be expected from if u was randomly generated from a distribution of the same parameters. Such a simulation-based method to approximate p-values is commonly used in the sciences (Ewens, 2003).
The process to find significantly monadic nodes may be concisely described by the set of the following procedures. First,we simulate˜ u from beta(â,b) for every node (county). We then obtain the empirical distributions of how monadic a given node is by computing the empirical distribution ofÎ u = W uu −˜ u s u . Following this step, we consider I u = W uu − ps u . If I u is in the 1 − α th tail of the distributionÎ u then the node is a monad at this instance of simulation. We repeat the procedure 10,000 times and the nodes that have been identified 10,000 times are conclusively categorized as monads. In practice, we set α equal to .05 for this test. The resultant group of nodes are significantly monadic at the 5% significance level.

Differentiating Nodal Communities from Non-nodal communities
In the post-processing phase, we identify nodal communities within the statistically significant communities detected by using the local clustering coefficient as defined by Opsahl and Panzarasa (2009) for weighted networks in the post-processing phase. For an unweighted network, the local clustering coefficient of a node u is based on fraction of number of present ties over the total number of possible ties between the node's neighbours. Thus root node of a tree-like structure , where there is a single core node that is connected to all its peripheral nodes, but none of the peripheral nodes connect to each other, will have low coefficient, while a node in a complete graph has a coefficient of 1.
For a weighted network, the minimum clustering coefficient of a particular node u in a set J is defined as A triplet defined as δ(u, v, w) is any three nodes that have at least one edge to at least one other node in the triplet. A closed triangle ∆(u, v, w) is a set of three nodes that have edges which are all connected with each other. The ratio C min (u, J) is the ratio of sum of all closed triangles incident on u. If the sum of minimum values of triangles are low compared to those of all triplets within a cluster, then that implies the presence of dominant node that projects high edge weights across many peripheral nodes. As such, a low clustering coefficient signifies that communities are bound together by a common node, while a high C min (u, J) signifies that the nodes are all connected to each other.
We determine the overall clustering coefficient of a community by calculating an average of 1 − C min (u, J) within set J weighted by the proportion of the sum of weights of each node u divided by the sum of total strengths in J: This metric is constructed to capture how tree-like the highest-strength nodes in a given cluster is. The lower C min (u, J) is, the more tree-like the node is. 1 − C min (u, J) weighted by the strengths of nodes in a given cluster assigns a value of how tree-like and strong a node is. Summing these values gives an overall measure of how monocentric a cluster is, as the strongest nodes tend to have the smallest C min (u, J). The empirical distribution of the above coefficient is bimodal (see Appendix fig B) with a split around 0.4. We use this to identify nodal communities.

Methods to compare communities with other delineations
We compare our results with other existing delineations (in particular OMB's Metropolitan areas) by using the Fuzzy Rand Index (FRI) (Chakraborty et al., 2017). The Fuzzy Rand Index is a metric that measures the similarity of two covers, C 1 and C 2 are. A cover C is the assignment of vertices of a graph into K groups. Our model allows our vertices, counties, to be assigned to multiple groups, or communities. Counties that are not assigned to any community will belong to their own group, a community of counties that do not belong to any other group. Let V represent the set of vertices and P represent the fuzzy partition of V , a cover. Each element v ∈ V is characterized by its membership vector P is the degree of membership of v in the i th community P i . The norm of P (u) − P (v) in a cover with N communities will be defined to be ||P (u) − P (v)|| = i∈1,...,N |P i (u) − P i (v)|/2. For some cover C, the similarity measure between two nodes u, v is defined as E C (u, v) = 1 − ||P (u) − P (v)||. The distance measure between two covers, C 1 and C 2 of a network V is defined as: Likewise, the similarity measure between two covers is 1 − d(C 1 , C 2 ).
If a county, u, belongs to n communities, then its membership vector will be 1/n for each group that it belongs to and 0 for all other groups. The MSAs are disjoint so the membership vectors for counties in this cover will be 1 for the group it belongs to and 0 otherwise.

Data Description
We downloaded the data from the Local Origin Destination Employment Statistics (LODES) by US Census. This data contains commuter data between census tracts for all of the continental United States in the year 2010, which are then aggregated to the county level. The data is stored as an undirected and weighted network with self-loops such that each node represents a county and each edge represents the number of commuters between the connected counties. Edges with fewer than 100 commuters are removed from the network. Commuters who travel more than 100km (distance between population weighted centroids) are also ignored to remove the effect of telecommuters or supercommuters. Such preprocessing was also done in Nelson and Rae (2016). The resulting network has 3,091 nodes and 17,222 edges.

Results
We use the term 'clusters' to refer to all groups of counties or single counties that were imputed from our detection procedures, 'communities' to refer to clusters that are contain more than one county, and 'nodal communities' to refer to clusters that have strong nodal centers with little lateral commuting and non-nodal communities to refer to communities that have strong peripheral commuting characteristics. We refer to 'monads' as those counties that are strongly connected to themselves.  From the 3,091 total number of counties, we found a total of 182 significant communities. Of these clusters, 14 are nodal communities, 78 are non-nodal communities, and 90 are monads. Together they cover 90.3 % of the population and commuters (93 % within-county, 87 % between-county ). The method simultaneously delineates both small (such as singletons and dyads) and large clusters consisting of hundreds of counties (see table 1. For example, Santa Barbara and San Luis Obispo counties in California are strongly connected to one another and are separate from other clusters. Of the regions identified, 99 are dyads or singleton counties. 68 of these clusters are medium sized, comprising between 2 and 50 counties. In 20 other instances, our method identifies clusters consisting of 50 or more counties that span several states. These tend to be polycentric regions centered around multiple large cities such as Philadelphia, Washington DC, and Baltimore (see figure 2). Out of the 182 total clusters, the average size of a cluster is 24 counties, with a standard deviation of 59. The median size, however, is only 2, signifying that the majority of imputed clusters are monads. When only considering communities, the average size is 49, with a standard deviation of 78. The median size of communities is 15, suggesting that the distribution of community sizes is right-skewed (see table 1). In general, many of the counties belong to overlapping clusters, with some belonging to as many as six different ones (see figure 3) The parameters associated with the null model are as follows: p, roughly the mean of within-county commuters (though weights between cross-edges are double counted, the proportion of same-county commuters is 56%) is .46.σ 2 p , the sample standard deviation of u is .04, signifying that around half of commuting volume in the US are within the same county, and that the dispersion about the mean is fairly tight. The estimates for concentration parameters a and b of the beta variable u derived from p andσ 2 p are respectively 3.86 and 4.50, signifying that the self-looping tendencies of counties is not sparse. κ SL andκ nSL are .08 and .79 respectively, signalling a much higher variance for the latent variable undergirding between-county commute volume than for same-county commuting.

Benchmarking with other delineations
In general, the clusters found by our proposed method are much larger than existing delineations and account for much more of the inter-county commuting activity. We define the coverage rate as the number of commuters between all edges of the given sets of clusters divided by the sum of the total edges. The criterion captures how much of the total commuting activity is 'captured' by different modes of aggregation. The rate of coverage for all inter-county commuting was 86 % for all clusters, compared to only 48 % from OMB delineated MSAs and 77 % from megaregions. The coverage rate for all within county, or same-county, commuting was 92 % from clusters, compared to 86 % from MSAs and 74 % from megaregions.
Similarities between clusters and MSAs are assessed by using the Fuzzy Rand Index (Section 2.7) and by comparing size and volume characteristics in table 2. Clusters and MSAs are most similar in West North Central and Mountain regions and most different in East South central. In the Mountain and Pacific regions, the number of clusters are nearly the same, and mean commuter volumes are more comparable than in the eastern states. Communities are most consistent with MSA delineations in large urban areas (figure 5). The size differences between the detected clusters and other delineations are muted in the regions around population centers such as Chicago, New York and Texas cities. The more populous and denser a core urban area is, the more concentrated a community becomes. In Texas, for example, the economic regions of major cities Austin, Houston, Dallas, and San Antonio are all detected by our method, but regions are larger and overlapping. The Austin region is the only Texas community is drastically differs from MSA, perhaps indicating greater economic influence of the region that demarcated by the MSA. The Houston region can be split into coastal and inland commuting zones, which overlap at the core Houston area. The overlapping counties between Austin and San Antonio areas include Bastrop and Hays counties and the communities identify the dual memberships of these counties in larger cities' economic spheres of influence. Each of the four cities are also quantified as monads (not shown), signifying the relationship between large urban centers and their associated suburbs and hinterlands (Nystuen and Dacey, 1961). However, the larger, more diffuse nature of the communities signals emergent lateral commuting behavior between these peripheral counties (Plane, 1981).
The communities associated with New York all include the five boroughs; and two of the commu- Figure 3: Heatmap of frequencies of each county to appear in any cluster (community, nodal cluster, or monad) nities are similar in size with the MSA, although the communities do not incorporate as much of New Jersey as the MSA. One community in particular one stretches North to upstate NY and part of New Hampshire. In California, an imputed community in the Bay area region is similar to the San Francisco-Oakland-Fremont MSA, but the community amalgamates counties from San Jose-Sunnyvale-Santa Clara, CA MSA (see figure 4). The inter regional commuting among these counties warrant this amalgamation. Thus in the Bay Area case, the community and MSAs are alike in the core area, but the community includes more peripheral counties. The same could be said for many other urban-centric areas, such as the community surrounding Minneapolis, which encompasses the MSA but incorporates more peripheral counties (figure 5).
Megaregions, in general, are larger than communities or nodal clusters, though with some exceptions. The largest megaregion is in the Midwest connecting Minneapolis from the west to Western Pennsylvania in the east, comprising 388 total counties. The largest community yields a similar number of counties at 356 but is located in the Georgia-Tennesee region, centered around Atlanta (see table 1). The largest communities, though similar in size to megaregions, appear to be in the southern central part of the US, where megaregions cover the least amount of ground in (fig 2). Though megaregions tend to be larger than the average community, the megaregions in the South and central part of the US still do not take into account the counties with smaller commuter volumes. The coverage rate of commuting volumes is lower than communities for both within-county and between-county cases. The CCME algorithm tends to find larger communities in counties that are more tightly connected by commuting, while megaregions tend to identify clusters of large population centers that are relatively proximal. The Piedmont megaregion stretches from Atlanta, GA to central NC, and only comprises around 150 counties while most communities around that region are more concentric in shape and spread towards smaller, more   Table 2: Characteristics of clusters (monads, nodal and non-nodal communities) compared to MSA for each Census Division. Cliques (non-nodal communities) are also included in the comparison of coverage rural areas but contain many more counties, upwards of 350. Megaregions are also altogether missing in the central US, where very large clusters are also found. Perhaps because cities like Tulsa do not have massive centers of population, they are not accounted for by megaregions. However, these cities , though less populous, are still influential commuter hubs that facilitate the formation of large regions that are tied together by commuting.
The mean sizes of communities far exceed those of MSAs in all Census Divisions, except in the western states, where the communities are only somewhat larger than MSAs (see table 2). In the East South Central Division, the average cluster size is the largest at 59. In the Mountain and Pacific regions, however, the sizes of clusters is almost on par with the number of MSAs. The largest delineated region centers around the Atlanta area in East South Central region, which is where the largest MSA is similarly located.
Communities on average cover much more commuting volume than MSAs. In the mountain and pacific states, communities cover much less of the commuting volume because most populous counties are classified as monads. However, when taking into account monads, the coverage is higher than MSAs in all regions. The proportion of commuting volume is nearly entirely captured by the clusters from CCME in the East South Central and Midatlantic divisons. 97 % of total commuting activity is accounted for by clusters (table 2). Communities and clusters mostly cover above 80 % of the commuting volume, while the MSAs cover on less than 70 %.
Many of the communities in the midwest and south have many overlaps. For example, Cook and Tulsa counties are members of 5 different communities. North Carolina is covered by three large, heavily overlapping communities. Nodal clusters are regions that have strong central attractors but little peripheral commuting. Most nodal clusters are found in the Mountain West, where the counties are on average much larger and more sparsely settled. Monadic regions are much more common along the western US because the larger size of counties. Most of the major cities are identified as monads, including Cook, Harris , and Los Angeles. Notably, New York County (Manhattan) and King County (Brooklyn) are not monads because the counties that comprise New York City are all they are highly interconnected to each other (see figure 2)

Discussion
Nearly all U.S. major urban areas are grouped into clusters, though not with the same extent as what other delineations would suggest. Counties like St Louis, MO and Fulton, GA (Atlanta), connect to many such smaller, less populated counties. The cities that surrounded by such counties yield much larger clusters than more urban areas such as New York or Harris (Houston.) On the other end, counties in Southern California is not grouped into a community at all but are identified as individually distinct monadic regions because counties in Southern California are on average have much higher rates of same-county commuting. In Northern California, however, one community shows cities like Sacramento , Stockton, are all grouped with the Greater Bay Area (figure 4). The clusters are heavily overlapping: similarly, some counties in the Bay Area are also part of communities associated with the Sacramento metropolitan area.
New York City is linked to upstate New York in some communities, Philadelphia is linked to Baltimore and Washington DC, and Baltimore/Washington DC are more linked to each other and to other urban centers along the eastern seaboard, stretching down to North Carolina. However, the Northeast and Midatlantic regions yield smaller communities on average compared to the South and Midwest. Because the commuting volumes in the Northeast are higher and counties are smaller, the counties in this region tend to cluster around each other in medium-sized communities of usually between 20-30 counties, which are larger than those in the West, but smaller compared to those in the South and Midwest.
Communities have some similarities to megaregions in the Midatlantic. New York, Philadelphia, Washington DC and Baltimore are all grouped in the same megaregion, like several communities (figure 2). However, megaregions also do not account for the interconnected nature of Baltimore and Washington DC to Eastern North Carolina. That the clusters from the CCME method allows for overlaps allows one to say that both Eastern North Carolina and New York are connected to Washington DC significantly, but without being connected to each other, allows different urban areas to be flexibility classified in different regions.
The wide range of communities' sizes is due to the different area sizes of counties as well as the differences in population along different regions of the United States. A county in California or Colorado tends to be at least a few times larger than a county in a state like Georgia. Small communities are mostly found in the Western part of the Contiguous US, while the largest clusters are found in the South and Central US.
The imputed clusters are larger than MSAs except for in populous regions. CCME naturally finds larger communities when inter-county commuting is relatively high compared to total commuting and smaller communities when large population centers are included. Communities have large variation in size partly because they account for the power-law distribution of human populations. Compared to MSAs, which necessarily must be at most a few counties, (excepting some metro regions like the Atlanta MSA) and megaregions , which (excepting regions in the West like Southern California) necessarily must be large, community sizes are much more variable. That a single county can be compared to a massive region is a natural adjustment to the highly uneven dispersion of population in the US; this is a natural feature of results from an unsupervised method such as CCME. Los Angeles county is the most populous county in the US with over 10 million inhabitants and 4 million commuters, which more commuters than all counties combined in most states. LA County is identified as monadic and is within the same class of comparison with the 300-county communities stretching across several states in the South. Such a comparison is not tenable in considering only small or large regions with each other as constrained by the MSAs or megaregion delineations.
Communities found by our proposed method account for much more of the total cross-county commuting than MSAs. The most significant improvement that the CCME method has over MSA demarcation is in accounting for all the inter-county commuting in the Southeast, Northeast, and Midwest parts of the US. These regions are where communities and MSAs are most dissimilar according to the FRI and exhibit the largest relative disrepancies in the size, volume, and number of delineations (table 2 ). MSAs in these regions capture between 50-70 % compared to over 90 % from communities. Counties in these regions are most interconnected by commuting and their corresponding MSAs may be classifying counties in too fragmented a manner to effectively describe the highly interconnected nature of the regions (table 2). In the Central US and the South, where the clusters are massive, commuting patterns within counties are more spread out across counties.
The CCME method produces clusters with high variation in size. The most populous MSAs house similar counties as communities (see figure 5). However, the sizes of clusters can be as large as megaregions, although the clusters tend to cover different types of regions. Megaregions capture cities with large overall commuting volumes that are closely located to each other, but large communities capture regions that are closely linked by commuting. The larger the community, the smaller the absolute strength of each member tends to be, but the higher the relative strength each linkage is. In the case of the South and Central US, counties tend to be small and rural and commuting behavior tend to be across counties, thereby more counties tend to be more interconnected in clusters. Such aggregations have not been depicted in existing official regional delineations may be a novel contribution of the method in this study.
Though detection of singletons may be antithetical to typical notions of community in network theory, they are very important in the this particular application of spatial commuting networks with strong selflooping tendencies. The of monads is closely tied to the prevalence of self-looping behavior, and this study calls into attention the role of self loops in community detection of networks modelling human mobility with spatial constraints. As described in Clauset et al. (2009), most network data arising from nature adhere to power laws, and commuting flows, which are reflective of human populations, are no exception. This study shows that power laws in spatial settings are intricately linked to self-referential behavior. Many studies have described human populations s adhering to heavy-tailed distributions such as the Zipf Law (Newman, 2005;Barabási and Albert, 1999), but this study is among the first to indicate how might regional delineations arise to adjust for such phenomena.
Strongly self-commuting counties that are identified as monads can also be classified as clusters are they are found using tests of similar hypotheses in evaluating how unlikely the size of the total commuting activity in a given geographical unit is compared to a scenario where commuting activity was generated at random subject to constraints arising from the weighted configuration model. The null hypothesis for monadicity parallels the null hypothesis for each community connectivity in evaluating whether a test node is significantly more strongly connected to other nodes, or itself, than expected.
Monads tend to be parts of communities and tend to have multiple community memberships: nodes with high strengths extend influence both within county and between county. This may be explained by viewing high volumes of within-county commuting as a driving force for outward expansion, first becoming nodal clusters, then to tightly interconnected communities. One may conceive of a sequential toy model that offers a compromise between the concentric and lateral models. Population concentrates at first in a core area, then economic activity spreads out nodally. However, the once-peripheral areas begin to commute amongst themselves after a certain point (Plane, 1981). Identifying certain nodes as simultaneously monads and members in a cluster gives more insight to the type of effect that the self-looping behavior of single node exerts on connectivity patterns around the node.

Limitations and Further Research
Several limitations accompany applying CCME to demarcate regions using commuting networks. Though almost entirely unsupervised, there are several tuning parameters that must be specified, such as the threshold of commuting volume cutoff, the overlap parameter, and the p value. However, sensitivity analysis (supplementary materials) over a range of p-values and overlap thresholds demonstrates that results do not change much when tuning parameter are tweaked. The post processing step of determining the nodal communities does not capture the extent of monocentricity that arise from areas that we expect to exhibit these properties such as Multnomah County (Portland) and Hennepin County (Minneapolis).
Several fruitful directions of further research may emerge from this study of commuting regions. This study documents the influence of self loops in spatial networks describing complex patterns arising from the collective behavior of many individuals. As network data in these realms become more and more available, we expect emergence of more methods accounting for self-looping behavior in networks to arise. Another extension will be to extend the methodology proposed in this study to directed networks as to model the orientation of commuter flows. A further extension of the method described in this study is to use a temporal model to measure change in communities across time. One such application is to look at several time points and to track the evolution of clusters imputed by CCME. Another avenue is to investigate the characteristics of power-law distribution of populations that are embedded within the commuting networks and how distributional characteristics of power laws may play in community detection.
Much of the three types of communities hints at a relationship between monads, nodal clusters, and communities. Future studies may explore the dynamics of a core-periphery evolutionary model in which initial cores (monads) branch out into nodal clusters, which become further interconnected into communities, across time. Further research may also necessitate a more theoretically justified way of quantifying nodal clusters.

Conclusions
Traditional delineations of geographic regions have relied on agglomerations of smaller geographies, historical and political boundaries, separating edges and central foci. The boundary characterization is important not only for scientific purposes of tracking and tracing historical evolution of urban systems but also for administrative purposes of allocating infrastructure investments and formulating economic development strategies. For example, boundaries of metropolitan areas in the United States are artifacts of delineation definitions, yet are central to tracking demographic and economic changes, funding allocations, determination of fair market rents, housing subsidies that depend on area median income and a host of other federal and state programs, even when the agencies caution their use for non-statistical purposes. These delineations are central but invisible to the lives of many. In this paper, we provide a robust method for accounting for multiple regions that a place can belong to.
The main methodological contribution of this paper is in augmenting CCME to account for self loops. This paper also is one of the first forays of applied network analysis that makes much use of selflooping characteristics of spatial networks. The application of the model suggests that regional clusters in commuting networks are far larger than other existing delineations. It also suggests that accounting for within-county commuting patterns dramatically changes our understanding of the metropolitan structure of the United States. Furthermore, allowing for regions to overlap allows us to create institutional structures and policies that are tailored not only to singular geographical entities, but to multitudinous, interacting identities that space and place assume.
To calculate I, we note that, Var(W uv |A uv ) = q 2 uv Var((1 − u )ξ uv |A uv ) Therefore, calculating I first necessitates calculating the conditional variance of (1− u )ξ uv , given that there exists an edge, under an expectation. As shorthand, we define the operation Var A (·) := Var(·|A uv ) and E A (·) := E A [·|A uv ].
Putting the two equations 15 and 16 together, we are able to solve for the variance of W uv from equation 12, and substituting the expression for P(A uv ) from equation 2, Var(W uv ) = E[Var(W uv |A uv )] + Var(E[W uv |A uv ]) = q 2 uv · (σ 2 p + (1 − p) 2 ) · κ nSL + σ 2 p ) · P(A uv ) + q 2 uv P(A uv )(1 − P(A uv ))) = q 2 uv P(A uv )((σ 2 p + (1 − p) 2 ) · κ nSL + σ 2 p + 1 − P(A uv )) = r uv (σ 2 p + (1 − p) 2 ) · κ nSL + σ 2 p + 1 − A.3 Variance of ξ uv : κ nSL Now we construct a similar method of moments estimator for κ nSL as was done for κ SL . However, we also make use of the expression for the conditional variance of W uv given the existence of an edge in equation 14. After solving for κ nSL in the above equation and substituting unknown variables with their estimates, then changing the approximation to an equation, we derive the estimateκ nSL for κ nSL , thus obtaining the estimate as given in equation 6: In this section we detail the calculation of the expectation and variance of the relative strength S(u, B, G) of a given node-set B used in iterative testing, described in the body of the text in Section 2.4.2.
Taking the expectation of each ξ uv gives the following expression for the strength of the node set We have found, in section A.2, that the variance of a given W uv is expressed as: Var(W uv ) = r uv (σ 2 p + (1 − p) 2 )κ nSL + σ 2 p + 1 − Hence in each step of iterative testing in the update step of CCME, the normal p-value is used to iteratively reject insignificant nodes in a candidate community. Assumptions for and proofs of this Central Limit Theorem can be found in (Palowitch et al., 2018).