Statistical Analysis of Bus Networks in India

In this paper, we model the bus networks of six major Indian cities as graphs in L-space, and evaluate their various statistical properties. While airline and railway networks have been extensively studied, a comprehensive study on the structure and growth of bus networks is lacking. In India, where bus transport plays an important role in day-to-day commutation, it is of significant interest to analyze its topological structure and answer basic questions on its evolution, growth, robustness and resiliency. Although the common feature of small-world property is observed, our analysis reveals a wide spectrum of network topologies arising due to significant variation in the degree-distribution patterns in the networks. We also observe that these networks although, robust and resilient to random attacks are particularly degree-sensitive. Unlike real-world networks, such as Internet, WWW and airline, that are virtual, bus networks are physically constrained. Our findings therefore, throw light on the evolution of such geographically and constrained networks that will help us in designing more efficient bus networks in the future.


Introduction
From the neural architecture of the brain to the patterns of social interactions, many physical systems and real-world phenomena are being formulated as network models [1][2][3][4][5][6][7][8]. These models are complex because of their size and the various emergent properties that arise due to their inter-nodal connections. Any physical, chemical, biological or social system can be visualized as a complex network; the constituting elements are known as nodes, and the interactions between them are identified as links. Based on the nature of the links, these networks can be broadly classified into virtual and spatial networks. In the former category, the links are physically absent, e.g., social networks or collaboration networks, whereas, in the latter case, the links are physically present, i.e., geographically embedded road or railway networks [9][10][11][12][13]. In between these two broad classes there exist networks in which the links although physically absent are, however, geographically constrained. The structure of the real-world networks such as bus or electric power grid are dependent upon the structure of the physically constrained, geographically embedded networks on which they grow and evolve. monorails) have been shown to exhibit scale-free behaviour with varying values of the powerlaw exponent, γ [16,[28][29][30]. Airline and metro-networks show scale-free degree distribution patterns whereas degree-distribution in bus and rail networks tend more towards exponential patterns. The reason for this contrasting behaviour could be attributed to the following two observations: (i) airline-networks are not bounded by geographical constraints and (ii) metronetworks are local often catering to a part of the city whereas, bus and railway-networks are global as they are spread throughout the entire city/state and sometimes across the entire country. Specific to Indian scenarios exhaustive studies on public transit networks as a whole are yet to be conducted. Previous work have shown that the pattern of nodal connectivity of the Indian Railway Network (IRN) drastically differs from that of the Airport Network of India (ANI) [11,25]. The nature of Indian bus networks still remains understudied.
Bus transport networks have been studied elsewhere. Analysis of the statistical properties of bus transport networks (BTNs) in China revealed their scale-free degree distribution and small-world properties. The presence of nontrivial clustering indicated a hierarchical and modular structure in the BTN. Weighted analysis of the network was done considering routes as nodes and weights as the number of common stations between the routes. The weight distribution followed a heavy tailed power law, and the strength and degree were linearly dependent [31]. In another study, an empirical investigation was conducted on the bus transport networks (BTNs) of four major cities of China. When analyzed using P-space topology, the degree distribution had exponential distribution, indicating a tendency for random attachment of the nodes. The authors also evaluated two statistical properties of BTNs, viz., the distribution of number of stops in a bus route (S) and the number of bus routes a stop joins (R). While the former had an exponential functional form, the latter had asymmetric unimodal functional forms [32]. The statistical analysis of the urban public bus networks of two Chinese cities, Beijing and Chengdu revealed scale free topology and small world characteristics. Presence of more hubs in the Beijing network led to a comparatively smaller exponent of degree distribution and larger clustering coefficient. Similar location of bus stops in the two cities has led to a hierarchical structure, denoted by power law behaviour (with nearly same exponents) of the weights characterizing the passenger flows [33]. The rail (RTS) and bus transportation systems (BUS) in Singapore were studied with respect to their topological as well as dynamic perspectives. The stations in RTS had high average degree indicating high connectivity amongst them, while the BUS had a small average degree. Both networks had an exponential degree distribution indicative of randomly evolved connectivity. Strength of nodes defined as the sum of weight of incident edges, appeared scale free for both networks indicating the existence of high traffic hubs. The BUS network exhibited small world characteristics and had a hierarchical star like topology. RTS had slightly negative topological assortativity, while the weighted BUS displayed disassortative nature [34]. An extended space (ES) model with information on geographical location of bus stations and routes was used to analyze the spatial characteristics of bus transport networks (BTNs) in China [35]. The ES model consisted of directed weighted variations of the L-and P-space networks designated as ESL and ESP networks respectively, and the symmetry-weighted ESW network that stored information of the short-distance station pairs (SSPs). Often, two bus stations which are geographically close to each other may not have any direct bus route link between them. Such stations which are at walkable distances from each other, are defined as SSPs. The SSPs greatly influence the BTNs by reducing the transfer times as well as the number of bus routes. The average clustering coefficient of the ESW networks was considerably large, denoting a nearly circular location of the SSPs around a station. Majority of the route sections in the bus routes were short, while a few route sections connecting cities downtowns and satellite towns or special purpose BRT routes were long, leading to a power law edge length distribution of the ESL networks.
Majority of the above studies have looked into the structural properties of the bus networks in both L-and P-spaces. The ESW network is one such network which has looked into the aspect of network redundancy due to geographical placement of the nodes. In this paper, we do a comparative study of the bus networks of some of the major Indian cities, namely Ahmedabad (ABN), Chennai (CBN), Delhi (DBN), Hyderabad (HBN), Kolkata (KBN) and Mumbai (MBN). In order to understand the structure of bus networks in India we calculate various metrics, such as clustering coefficients, characteristic path lengths, degree-distribution and assortativity. We also simulate network robustness and resiliency by first removing nodes at random, followed by targeted removal based on degree, closeness, and betweenness. This provides us with interesting results on network (nodal) redundancy, as well as structural invariance.

Methodology
The bus network (BN) datasets were obtained from the government websites of Ahmedabad Bus Rapid Transit System for Ahmedabad (ABN), Metropolitan Transport Corporation for Chennai (CBN), Delhi Transport Corporation for Delhi (DBN), Andhra Pradesh State Road Transport Corporation for Hyderabad (HBN), Calcutta State Transport Corporation for Kolkata (KBN) and Brihanmumbai Electricity Supply and Transport for Mumbai (MBN). Every stop is considered a node, and the routes joining the stops form the set of links. We define a graph, G = (N, L) where the set N = (n 1 , n 2 , n 3 , . . .) with each n i as a bus-stop, and the set L = (l 1 , l 2 , l 3 , . . .) where each l connects the node pair (n i , n j ). The set of nodes belong to the ndimensional Euclidean space, R n , and the set of links form the Cartesian product over R n . We define the set of routes as the set R such that [ i l i 2 R for some i. In order to analyze the networks, we generate the graph adjacency matrix, A ij such that any matrix element a ij of A ij is either equal to one or zero depending upon the existence of a connecting link between nodepair (i, j). The degree of any node is given as k i = S j a ij . The above formulation generates a Lspace network without weights. In order to assign weights, we calculate the route overlaps between a pair of nodes which we call edge-weights, w ij . The degree strength matrix is given by W ij and the weighted degree or the node-strength is given as s i = S j w ij . Since the flow of transport is along both the directions, we consider the network links to be undirected. The local clustering coefficient is given by CðiÞ ¼ where a ij is the link connecting node pair (i, j), and k i are the neighbours of the node n i . The neighbourhood, N i , for a node, n i is defined as the set of its immediately connected neighbours, as N i ¼ fn j : l i 2 L _ l j 2 Lg. For the complete network, Watts and Strogatz defined a global clustering coefficient, C = S i C i /n [5,6,36].
Another important measure is the characteristic path length, l av which is defined as the average number of nodes crossed along the shortest paths for all possible pairs of network nodes.
The average distance from a certain vertex to every other vertex is given by Then, l av is calculated by taking the median of all the calculated d i 8i 2 R n . In order to check the small-world property, we generate random graphs of same size, i.e., keeping network size N constant. However, the network topology of a random graph is governed by a wiring probability, p w which determines the connectedness of the network (or the number of edges of the network). In order to generate random networks of comparable sizes (similar number of nodes and edges), calculate the wiring probability as p w N 2 2 $ N. The centralities, betweenness and closeness, tell us the relative importance of nodes in the network. Betweenness centrality of any node is calculated as, C B ðiÞ ¼ S s6 ¼i6 ¼t s s;t ðiÞ s s;t , where σ s,t is the number of shortest paths connecting s to t and σ s,t (i) number of shortest paths connecting s to t but passing through i (the indices, s and t run over all N). Likewise, closeness centrality for any node is calculated by The average closeness is the harmonic mean of the shortest paths from any node to every other node. In weighted networks, usually the edge weights are considered as cost functions; therefore, larger the edge weight, lesser is the node's closeness, as the cost of travel would be large. However, in our case the edge weights play an altogether different role signifying the 'ease' of travel. Hence, we take the inverse of edge weights during the calculation of weighted C C as in collaboration networks given by C w The degree-assortativity or the Pearson correlation coefficient of degree between pairs of linked nodes is given by S jk , where e jk is the joint probability distribution of the remaining degrees of the two vertices at either end of a randomly chosen edge with S j,k e jk = 1 and S j e jk = q k . Here, q k is the normalized degree-distribution of the remaining degrees, and s 2 q is the variance of the distribution q k given by [37] The degree-distribution P(k), or the strength distribution P(s) gives the probability of finding a node with a degree k, or degree strength s in the network. This basically represents the ratio of all the nodes in the network with degree equal to k, or degree strength s to the size of the network, N. The degreedistribution is observed to follow a heavy-tailed function. The equation for the power-law or exponential fits (see Table 1) are calculated using Maximum Likelihood Estimation (MLE) and the Kolmogorov-Smirnov test is employed to check for goodness of fit [38]. The degreestrength correlation is evaluated using linear-regression model, and the least-square error is calculated.

Results
In Fig 1, we plot the network structure using force directed algorithms. The figure compares the structural construct of the networks. We can clearly observe the nature of connectivity between the nodes in the different networks. While DBN is densely packed, CBN, HBN and KBN are sparse. The network structure of MBN is particularly striking. The long branches with multiple intermediate nodes as seen from the figure cause the characteristic path-length, l av of MBN to increase abnormally (see Table 1). We also calculate the modularity of the networks to identify community structure. Networks with high modularity have dense connections between the nodes within the same modularity class but weak connections between nodes in different modularity class. In order to identify communities we colour-code the nodes based upon the modularity classes. Community detection in bus networks help us in identifying the different zones of operation. As large as six communities were identified for CBN and MBN whereas fewer (four or less) communities were identified for ABN, DBN, HBN and KBN. Table 1. Tabular representation of the statistical data for the bus routes of six major Indian cities (N = number of nodes, L = number of edges, l av = characteristic path length, C av = average clustering coefficient, l rand av = characteristic path length of an equivalent random network, C rand av = average clustering coefficient of an equivalent random network, γ = power-law exponent, r = degree assortativity, < k > = average node degree, and < s > = average weighted degree or degree-strength). In Table 1, we present the statistical analysis for the various networks in a tabular form. It can be seen from the table that the network sizes of all the cities are comparable to each other, except that of KBN because CSTC is localized and operates as a subdivision of West Bengal Surface Transport Corporation (WBSTC) that operates buses in the entire state. The network density, ρ, which is the ratio of the number of edges in a given network to the corresponding completely connected graph varies from 0.001 to 0.006. An interesting feature is the variation of the characteristic path length l av from as low as 3.87 to as high as 10.02. In order to get a deeper insight into the structure of these networks, we carried out a weighted analysis by assigning a weight corresponding to the overlap of routes connecting a particular pair of nodes that helps us understand the potential flow of traffic between that nodal pair. The weighted degree of a node or its strength is observed to follow a heavy-tailed distribution on a double logarithmic scale, and the node strength and node degree are found to be related non-linearly. This implies that the potential traffic at a node due to route overlaps increases exponentially as compared to the actual number of routes it is connected to [36].

Bus Networks
We observe that the average clustering coefficient, C av also shows a remarkable variation from 0.07 to as high as 0.26. We check the presence of small-world phenomenon in the above networks by generating random graphs with the same number of nodes and comparable number of edges, and calculate the characteristic path length, l rand av and average clustering coefficient, C rand av in each case. Upon comparing with the data in Table 1, we find that C av >> C rand av each time, whereas l av is comparable to l rand av . Based upon the above comparisons, we can state that the bus networks show small-world phenomenon. As we discussed earlier, L-space formulation merely gives the relationship between bus stops and bus routes, whereas it is the P-space formulation which helps in determining the number of transfers, or in this case, number of bus changes. We can estimate the number of bus changes required by looking at the average number of bus stops present in each of the routes. CBN and MBN typically show the largest magnitudes of characteristic path-lengths in L-space.
As discussed earlier, node-degree distribution plays an important role in understanding the structure and evolution of complex networks. In Fig 2, we plot the degree strength distribution for all the networks on a double logarithmic scale. The degree-distribution patterns show mostly heavy-tailed characteristics, with MBN showing a slight deviation from the power-law behaviour. In Fig 3, we plot the centrality distributions (closeness and betweenness), P(C C ) and P(C B ) in the first two rows for ABN, HBN (scale-free) and MBN (non scale-free) on a double logarithmic scale to contrast the differences between scale-free networks and non scale-free ones. We find that the distribution function follows an exponential decay given by P(C C )*exp (−λC C ) (similarly for C B ) where the value of the exponent λ is shown in each of the plots. In the last row, we plot the variation of betweenness centrality with the degree of a node which follows a power-law relationship, given as C B * k α with the magnitude of the exponent α also shown in the plots.
In Fig 4, we plot the response of the network's characteristic path length, l av to random and systematic perturbation. We simulate the robustness and resiliency of the networks by modeling perturbations as node removals. Due to their strong assortative nature, MBN and CBN disintegrate into separate entities very quickly, whereas the other networks remain connected upto atleast 4% of node removals. It is observed that in all the cases the targeted node removals are crucial for the network to remain connected. In the regime of p i 4%, a closer look reveals that the magnitude of l av does not change much (at most it increases by one 'hop'). Finally, in Figs 5 and 6, we plot the degree-distribution for ABN and MBN after removing 20%, 40%, and 60% of the nodes in order to check the invariance in the topological structure of these networks. We choose ABN and MBN as these two networks are topologically different.

Discussion
In this paper, we analyzed the statistical properties of the bus routes of the six Indian cities, namely Ahmedabad, Chennai, Delhi, Hyderabad, Kolkata, and Mumbai. Our analysis suggests that the bus networks show a wide spectrum of topological structure from power-law to exponential with varying magnitude of the power-law exponent γ. Ahmedabad (ABN) is particularly interesting in this regard because it has a BRTS (Bus Rapid Transit System) with dedicated lanes-a type of public transit system that is yet to be introduced at a large scale in India. ABN's BRTS, thus, holds a structural advantage by the presence of many hubs to which extreme routes are connected, a structure similar to WWW or the airline networks (WAN and ANI) [25,36]. As we saw in the earlier sections, CBN and MBN do not show the small-world property in L-space. They, however, do show the small-world property in terms of transfers (P-space topology), as majority of the places can be visited by making as little as 1 or 2 bus changes [39]. The structural relationship between bus stops as observed from the strength-distribution plots in Fig 2 is of particular interest. In Fig 2, we plot the weighted degree-distribution of the networks that capture the strength of the nodes with respect to the traffic handled in terms of the number of routes. In order to check for correlations between node degree, k and node weighted-degree, s we plot them on a double-logarithmic scale. Interestingly, ABN shows a strong correlation as, s * k β with β = 1.27 and R 2 = 0.91, whereas the other networks fail to show such strong relationships (CBN, KBN and HBN show similar relationships with β * 1.44 − 2.08, however, with lower correlation coefficients, R 2 * 0.60 − 0.74). The degreedistribution in case of ABN has the power-law exponent, γ as 2.47, whereas the degree-strength exponent, β is found to be 1.27. This implies that the strength of a node increases faster as compared to its degree indicating a sense of order in ABN where higher degree nodes, for example, large or important bus stops, handle heavy traffic as majority of the routes pass through them. This is definitely missing in the other networks where the edge weights or routes seem to be more randomly distributed. Also the topological structure of the road networks in the city of Ahmedabad show a scale-free degree distribution with γ = 2.5 and l av = 5.20, which is very similar to ABN [10] (see Table 1).
In Fig 3, we plot the centrality distribution for betweenness (C B ) and closeness (C C ). We consider betweenness and closeness because they play a crucial role from a transportation perspective. C C is a measure of a node's relative importance in the network due to the existence of shortest paths from that particular node to every other node in the entire network. C B on the other hand acts as a bridging node connecting different parts of the network together. When traveling from one node to the other, it is often beneficial to get to the node with the highest value of C C first if a direct path does not exist between the origin-destination pair. Often transportation network of a city is planned in a way such that the hubs allow maximum number of routes to pass through them, and all other nodes in the network to be easily reachable from them. Since, centrality is positively correlated to node degree, the hubs in a network also tend to have the largest degrees. We found this pattern in all the networks, (C B * k α ); however, in DBN and MBN the relationship between degree and centrality is not conclusive because of noise in the network due to random attachment of nodes (see Fig 3 last row). The noise or the presence of redundant nodes (links) due to random attachment of the nodes in the network causes the degree-distribution patterns to shift from a purely power-law decay to truncated power-law and exponential decays. The presence of these redundant nodes increase the degree of non-central nodes which is observed in the degree-centrality plots (see Fig 3). These nodes due to their random placement tend to appear at random places in the network causing hindrance in the direct connectivity of the hubs. The networks (except CBN and MBN) therefore show disassortative or weakly assortative behaviours. We also observe that the centrality-distribution functions follow exponential decay, as P(C C )*exp(−λC C ) (similarly, for C B ) which shows that nodes in a network are different, i.e., some nodes are more 'central' as compared to other nodes. An interesting observation is that nodes in the networks tend to connect to existing high degree nodes preferentially whereas such a preferential attachment rule is missing when, for example, node-betweenness is considered as the metric. A close observation of Fig 3 reveals that nodes with high betweenness certainly have high degrees however, the reverse is not true. Statistical Analysis of Bus Networks in India Some nodes do not play any significant role in the network's overall functionality, i.e., they are redundant. In Fig 4, we evaluate the network's response to external perturbations by random and directed removal of nodes. We fix an important measure l av and check its variation upon percentage removal of nodes (bus stops). As we saw earlier, CBN and MBN due to their strong assortative behaviour, seem to be very sensitive to node removals as they quickly disintegrate, whereas ABN, DBN, HBN and KBN do not show any significant change in l av upto 4% of node removal. This basically amounts to roughly 40-70 nodal redundancy (in numbers), that if removed can reduce cost of construction, operation, and maintenance significantly in the network. However accessibility for all users has to be carefully studied before removing any node. We also observe that the clustering coefficient C varies inversely with the node degree which implies that the nodes with low clustering coefficients tend to have higher degrees and vice-versa. This is because nodes (bus stops) having higher degree will be a part of multiple bus routes whereas, those bus stops through which fewer bus routes pass will have lower degree. Thus, it is more likely for the nodes in the later case to form clusters as compared to the ones which are connected to multiple bus routes.
Finally in Figs 5 and 6, we observe that the topological structure of the networks are preserved when the networks are subjected to large number of node removals. It can also be clearly seen that the networks are degree-sensitive. Degree-biased node removal causes the heavy tails in the degree-topology to disappear thus signifying gradual decrease in the number of hubs. Interestingly, a similar effect is also observed when the nodes are removed based upon their betweenness centralities. Although, the effect is relatively less significant, it is more when compared to closeness biased node removals and random node removals. In Fig 6, we plot the degree-distributions for MBN with respect to percentage node removal. In case of MBN it is particularly interesting to note that the degree-distribution plots, which originally showed a better fit for exponential distribution (Fig 2), evolves into a scale-free topology (as can be observed from straight line slope in the double-logarithmic scale) with varying power-law exponent, γ, when nodes are removed. At 20% node-removal, MBN starts showing heavytailed degree topology. The above phenomenon could be attributed to the reduction of noise (randomness of connectivity and nodal redundancy) due to removal of nodes.
Also, from Table 1, we observe that the bus networks, like all other surface transport networks are assortative in nature with HBN and KBN showing weak disassortative behaviour. The strong assortativity observed in these networks result in increased characteristic pathlengths. Since, the nodes (bus stops) are spatially distributed throughout the city, the tendency of similar nodes to attach to nodes with similar statistical properties causes the characteristic path-lengths to increase significantly. From a transportation perspective, assortative mixing is beneficial as this will allow direct connectivity between hubs. However, it will also increase the number of hops in traversing from any given source to a destination within the network. In terms of transfers, the small-world property is retained, yet the traveling time between any random origin-destination pair will increase, due to delays associated with numerous intermediate stops.
As noted earlier, bus networks form a specific class of complex networks that grow and evolve over physically constrained spatial networks. Road intersections are usually separated by a distance which is geographically much smaller as compared to the distance between bus stops; therefore, our results emphasize that transportation undoubtedly brings the world closer. What we observed from our paper is that bus networks show scale-free topology and small-world property in the number of transfers. Also, from the above analysis we observe that the bus networks although structurally different, show similar as well as self-similar topological structures. With the exception of MBN, all the networks show scale-free topology with MBN showing slight deviation towards an exponential distribution. The presence of heavy-tails in the degree-distribution plots imply a preferential attachment rule, the tendency of high degree nodes to cluster with low degree nodes reveal a hierarchical organization, and the stability of characteristic path length with gradual removal of nodes reveal the presence of nodal redundancy in the network. The non-trivial relationship between the degree and the centrality of the nodes reveal the not only the presence of a self-similar topology, but also helps in understanding how the placement of nodes in a network is topologically governed by the presence of geographical constraints. Where nodes expected to be high on centrality measures show higher numder of connectinons, the reverse is not true. The relatively larger magnitudes of the characteristic path lengths of CBN and MBN can be justified by looking at the exact placement of the nodes on the map. The presence of water bodies have resulted in a more linear evolution of these cities and hence the transportaion networks of these cities rather than a radial one [39].
The present study opens before us new horizons for efficient transportation network designing and planning. Questions such as: what are the statistical properties of the network that will ensure efficiency or how network topology is related to the statistical properties and vice-versa would be both challenging and worthwhile to answer. It would be exciting to come up with innovative models to capture the growth and evolution of real-world large scale public transit networks. Developing generative methods to reduce noise (in the network) due to random node attachment by including geographic and socio-economic constraints such as demand, flow, and cost, to maximize certain network parameter(s) or node-utility function(s) based on the above constraints is another promising area of future work.