Complex Network Theory Applied to the Growth of Kuala Lumpur’s Public Urban Rail Transit Network

Recently, the number of studies involving complex network applications in transportation has increased steadily as scholars from various fields analyze traffic networks. Nonetheless, research on rail network growth is relatively rare. This research examines the evolution of the Public Urban Rail Transit Networks of Kuala Lumpur (PURTNoKL) based on complex network theory and covers both the topological structure of the rail system and future trends in network growth. In addition, network performance when facing different attack strategies is also assessed. Three topological network characteristics are considered: connections, clustering and centrality. In PURTNoKL, we found that the total number of nodes and edges exhibit a linear relationship and that the average degree stays within the interval [2.0488, 2.6774] with heavy-tailed distributions. The evolutionary process shows that the cumulative probability distribution (CPD) of degree and the average shortest path length show good fit with exponential distribution and normal distribution, respectively. Moreover, PURTNoKL exhibits clear cluster characteristics; most of the nodes have a 2-core value, and the CPDs of the centrality’s closeness and betweenness follow a normal distribution function and an exponential distribution, respectively. Finally, we discuss four different types of network growth styles and the line extension process, which reveal that the rail network’s growth is likely based on the nodes with the biggest lengths of the shortest path and that network protection should emphasize those nodes with the largest degrees and the highest betweenness values. This research may enhance the networkability of the rail system and better shape the future growth of public rail networks.


Introduction
The traffic network represents the fundamental structure of a city. As the carrier of its transportation activities and functions, it can be considered the aorta of a city's economy and its operational development. A city's traffic network is its most crucial subsystem for ensuring that the city runs in a stable and orderly fashion. Traffic is highly connected to and influenced the evolution of the Public Urban Rail Transit Network of Kuala Lumpur (PURTNoKL). We also present a dynamic cascade process to show how limited structural change can affect the entire network and determine which components are most valuable within the network and which have the potential for future development. This analysis thus aims to enhance PURT-NoKL's networkability with limited capital input and to ensure that future growth will be shaped practically and scientifically.
connected to the Ampang LRT Line by 2 stations. This line runs straight through the Bukit Bintang area, which is home to the most developed shopping malls in the city.
The MRT Sungai Buloh-Kajang Line, which is expected to be completed in 2017, will link the entire network from west to east, starting from Sungai Buloh station, running across KL Sentral and the Bukit Bintang area, and connecting the Kelana Jaya, Ampang, Sri Petaling and KL Monorail Lines before terminating at Kajang station.

Methods
In this section, we will review and introduce the indicators used in the network design of public rail systems to test and calculate the performance and status of the PURTNoKL. First, we reviewed the static characteristics of networks. The network was represented using graph theory and was analyzed using the PURTNoKL's natural growth process. We then developed a network growth matrix with a time series. Fig 1 shows the basic network structure and network development plan of the PURTNoKL. Essential data from this network growth process were analyzed using the space-L method (primary method) to illustrate topological performance. Fig 2 shows the evolution of the topological network structure via increases in the service area boundary and the addition of nodes. The weight and length of each edge was denoted as 1, which improved measurements and illustrated the importance of the nodes topologically. The same station was provided a different notation for each different line; therefore KL Sentral station is seen on many lines. Unlike certain previous theories, this assumption explain the performance of a single node in various lines.
Next, we simulated the network growth process by adding different numbers of nodes to observe the dynamic changes in the network and to identify areas that are relatively stable and unaffected by change. These stations are likely to be considered potential growth points. Finally, we used artificial methods to attack the network to determine its robustness.

1 Network representation
Using the popular graph theory, the PURTNoKL can be represented as an undirected connected network G = <V, E>, where V is the set of nodes, N is the number of nodes when V = {v i |i 2 I {1, 2,. . ., N}}, E is the unordered pairs or edges of elements of V and is denoted by e ij , and E = {e ij = (v i, v j )|i, j 2 I}. In addition, the number of edges is denoted as M. The adjacency matrix of networks is A = [a ij ] n×n , representing the connection between nodes v i and v j , which is defined as a ij ¼ 1; , where a ii = 0 to remove any self-connections. In addition, A = [a ij ] n×n is symmetrical and non-negative. G 2017 denotes the rail transit network system in 2017.

Classical traffic indicators.
Here, we choose the number of nodes and edges, the complexity, the connectivity, the network loops and the availability of loops as related indices. The ratio between M and N, b ¼ M N , reveals the complexity of network growth. Similarly, connectivity-a connection indicator proposed by Kansky [3]-measures the ratio between the number of actual edges and M. The total number of edges possible in a plane, M max , can be calculated through the deduction of Euler's formula and is represented as M max 3N−6. Therefore, the connectivity index can be defined as t ¼ M 3NÀ6 . As per graph theory, the loops in a network can be calculated by the formula presented by Ore [31]: M loops = M−N + 1 . The availability of loops is also introduced by Kansky [3]. This value is calculated as the ratio between the number of existing loops and the total possible number of loops in a plane, a loops ¼ MÀNþ1 2NÀ5 .

Complex network indicators.
There are many indicators in complex network theory. Here, we present indicators that illustrate the topological network characteristics and their spatial implications for transit networks using three categories: connection, clustering, and centrality. Connection indicators. The degree of connection k i is defined as the connection of nodes, suppose that node 1 only have one link with node 2, then k 1 = 1. The average degree of connection is < k >¼ 2M N ; the total degree of network connection is denoted as TD. Degree distribution stands for the probability of randomly choosing node v i when k i equals k and is denoted as P(k). The cumulative degree distribution is defined as P k ¼ Pðk 0 Þ; in other words, it is the fraction of nodes with degree values that are not less than degree value k. Given two nodes, v i , v j 2 V, let d ij min be the shortest path length between them; we can then define the longest path length between these two nodes as the network diameter, D. The average path length of the network is then described as APL ¼ 2 In terms of the efficiency of the global network [14], E(G) is the inverse of the shortest path length between each pair of nodes v i and v j and is computed as EðGÞ ¼ 2 ; it shows the average efficiency of transit flow or information between nodes in the network.
Clustering indicators. Watts and Strogatz [32] introduced the clustering coefficient to characterize the degree of clustering in a network. This coefficient is a measure of the extent to which a node, v i , shares neighbors with other nodes, which is defined as Cðv i Þ ¼ 2e i m i ðm i À1Þ , where e i is the number of edges shared with local neighbors of node v i , and m i is the connection degree of local neighbors of node v i . C(v i ) is derived from the unit interval 0 to 1, which is called the local clustering coefficient. The global clustering coefficient is denoted as Cðv i Þ=N. The K-core partition is a concept introduced by Seidman [33] from the field of social networks to represent the evolution of a network and to depict an area that has a crucial influence on that network. In some extraordinary examples, it can also explain why some nodes have a greater degree of connection but are less important in a network. The k-core of graph G is a maximally connected subgraph in which all vertices have at least degree k. Centrality indicators. The degree of centrality is defined as DCðv i Þ ¼ ki NÀ1 , or the number of links incident on a node that can reflect the importance of the node v i in relation to spatial geography, which indicates that a node with more neighbors is more important in a network. Closeness centrality (C closeness ) is denoted as the reciprocal of the average distance between is defined and can be obtained using . This index means that if a node is closer to other nodes, it is more important in the network; it describes the relative location of a node. Betweenness centrality (C betweenness ) was originally defined by Freeman [34] as the total number of shortest paths between two separate nodes d min,st and passing through node v i ; it reflects the load on node v i and can alternately understood as the controllability of the node. The more routes passing through a node, the easier it is to control the flow of transit to other nodes. On this basis, centrality can be clarified as C betweenness ðv i Þ ¼

Cascading failures
When the rail network is affected by internal and external factors, the capacity of some of the nodes or edges is exceeded. When this occurs, failure or error emerge. Because of combinations and connections with surrounding nodes and edges, failure and error are amplified and spread according to certain rules. This ripple effect eventually leads to the jam of part of a network or even the collapse of the entire network. This phenomenon is called cascading failure [35]. Most researchers focus on the connectivity and reliability of networks and test their robustness [16][17][18][19][20][21]. In this study, we also test the robustness of the PURTNoKL. Several related strategies are applied, including the node and edge elimination strategy, based on the network indices shown in section 3.2.
Numerical Analysis 4.1 The description of the topological evolution process 4.1.1 Steady increase in network size. Based on the related functions described in sections 2 and 3, the basic properties and topological characteristics of PURTNoKL are provided in Table 1, excluding the centrality of nodes, which will be analyzed in the next section. Fig 3 shows the relationship between the number of nodes N and the number of edges M in the network. The results presented in Table 1 clearly define a linear relationship between these two indices; the fitting function for the line is y = 1.4x−9.5, R 2 = 0.9905 (Fig 3), and there is a high fitting confidence coefficient. This function illustrates that the growth of nodes and the connections between them obey a linear rule, which lays the foundation for a broad network prediction.   Fig 6A shows that the degree distribution of PURTNoKL is conspicuous. In other words, these nodes with a degree value of 2 occupy a high proportion within existing nodes; at each phase of development their proportion was greater than 57%, and this value will reach 69% (at its peak) by 2017. At the networkgenerating phase (G 1995 ), the largest node degree was only 3, but this figure will increase to 9  by 2017. This increase illustrates that the link between two nodes is primarily related from the same rail lines; and the level of intensive network development cannot keep pace with the speed of network expansion in the plane. Since 2002, the distribution pattern has stabilized, as in a heavy-tailed distribution. The degree's cumulative probability distribution (CPD) was calculated and is shown in Fig 6B. It fit an exponential distribution and is defined by the function PðkÞ ¼ m 1 e Àm 1 x . The scaling factor μ 1 falls within [1.8, 2.6] and here is μ 1 = 2. The average degree <k> was also calculated. According to the data, the average degree falls within the interval [2.0488, 2.6774], and the arithmetic mean is 2.5506. This index increased from 1995 to 2004, when it peaked. Since that time, it fluctuated slightly but slowly decreased; and it will reach 2.5309 in 2016 and 2.5699 in 2017.
As Table 1 shows, the minimum APL of G 2004 was 9.9920. The APL will increase to 12.2693 in 2016, when it will reach its highest value.  Table 1 indicates that with the expansion of the network, the Global Clustering Coefficient C(G) increased from 0.0732 to 0.1326 and peaked in 2004. It has since decreased and will reach 0.0908 in 2017. In other words, the PURTNoKL had its tightest topological structure in 2004, and the structure has become looser since that time. Fig 8 shows the variation in the Clustering Coefficient C(v i ). In general, the entire network clusters around 5 points; the most distinct cluster is centered on KL Sentral station, which is indicated by station numbers 100 to 120. The k-core partition reveals a similar clustering expression; most of the nodes fall within the 2-core, and KL Sentral station has the largest k-core value (see Fig 9).

Urban Rail Transit Networks and Complex Network
The total average degree of centrality is 0.0213; and the average G 2017 is 0.0134. Unlike the k-core, most nodes with values of 1-core and 2-core have the same degree of centrality (Fig 10). As the network grows, nodes that once had a considerable degree of centrality become less important. In 2017, two important areas will develop higher k-core values and experience a change in their degree of centrality: Bank Negara-Kuala Lumpur-KL Sentral-Seputeh-Salak Selatan (station numbers 7 to 11), and Bandaraya-Masjid Jamek-Plaza Rakyat (station numbers 47 to 49).
The average closeness value for the network was 0.0991 and will reach 0.0870 in 2017. The cumulative probability distributions of closeness centrality are best fit by a normal distribution function, depicted as FðX; m 3 ; s 3 Þ ¼ 1  Fig 11. The larger the index value, the greater the impact or service area. This value can also be employed to estimate the prevalence of traffic congestion in a network [36,37]. In general, the lowest closeness centrality values will be recorded in 2017; however, as discussed above, the network clusters around five points. We should note that with the operation of a new line in 2017, the area linking Surian-Mutiara Damansara-Bandar Utama-TTDI-Phileo Damansara will undergo improvements in regional development because this area will have increased in closeness centrality (the average value will be approximately 0.1). Using the same data measuring method, we found that the average betweenness centrality value was 0.0836, and the maximum value, 0.5216, occurred in 1995 at Kuala Lumpur station. At that time, the network consisted of only two lines. Therefore, a central node was able to better control all transit in the network. The data show that the value of KL Sentral Station is 0.2790 in 2017, which corresponds to the real situation. The CPD of betweenness centrality was calculated and is shown in Fig 12. An exponential distribution best fit these data, which is described by the function PðkÞ ¼ m 4 e Àm 4 x and scaling factors μ 4 2[0.0593,0.2544]. We selected μ 4 = 0.1 for modelling purposes. In 1995, CPD formed a nearly straight line because   the network mirrored a tree structure and was less complex. The changing trend shows that with the expansion of the network, μ 4 becomes smaller; however, an increase in nodes reduces the betweenness centrality.

Network growth
Prior to this work, planners mainly considered the traditional effects of adding nodes to a network; these effects included the influence of nearby nodes on traffic diversion, the cost of time and surrounding land prices. Our research findings suggest that another effect be considered. Based on the topological network structure and related indices, four typical growth styles affect the PURTNoKL, and the optimization growth style requires a different strategy.
4.2.1 Four growth styles. The first kind of network growth style is single station add-on. By adding nodes to the network, it has been found that nodes can work separately as pushnodes and pull-nodes. The addition of a push-node may decrease APL (here, we selected APL as the measurement indicator because APL is the inverse of E(G), the efficiency of the global network), which would make the network more efficient. This efficiency was seen when KLCC and Pasar Seni stations were added in 1998, and when the Mid Valley station was added in 2004. By contrast, pull-nodes (e.g., Kepong Central station, added in 2006) cause the network to become less efficient (see the calculations shown in Table 1). Table 1 and Fig 13 illustrate that when we add one node to G 2017 , APL changes dramatically; adding nodes to particular stations or lines will affect the network structure and make the network run much more inefficiently (e.g., APL increases, lies above the black triangle line) or more efficiently (below the line). When APL decreases, there is the potential for network extension. connect the Seremban Line with the Sri Petaling Line. This process yields a more complex network structure, and the connecting stations become vital transfer stations that allow people to efficiently change lines.
Line extensions, such as those that occurred on the Sri Petaling and Kelana Jaya Lines resulted smaller connectivity, and larger APL values, additionally, the topological structure became simpler and less efficient. Most importantly, by adding select individual nodes at the end of the rail line, the network change becomes a line extension linked by a single station add-on.
We investigate what would occur when we added different numbers of nodes to the system based on G 2017. We used m to count the nodes that add in. When m = 1, m = 4, m = 8 and m = 12, we observed dynamic changes in D and identified specific areas that were relatively stable and unlikely to be significantly affected by the addition of the nodes. These points have the greatest potential for network extension-even after adding a new line-because adding nodes to at these points will not change the diameter of a network. Fig 14 indicates that stations on the Port Klang Line (Putra to Setia Jaya, stations 21 to 30, station list can be see Table in S1  Table) and the Sungai Buloh-Kajang Line (the points near KL Sentral station, stations 170 to 180) have a more stable structure. The data also indicate that 8 nodes on the Ampang Line (Sentul Timur to Sungai Besi, station 60 to 70), the Sri Petaling route (Cheras to Sri Petaling, station 72 to 77) and the Kelana Jaya Line (KLCC to Subang Jaya, station 78 to 90) have extension potential.
Most of the points with higher extension potential are located near the city's economic center, meaning that the rail networks may contribute to the rich-get-richer phenomenon. In general, they obey the degree of preferential attachment mechanism. We also could selected m = 12 based on the APL of the network, selected particular nodes as fixed nodes, and identified every extension line that passed through those nodes to determine the changing trend. This type of extension may be treated as a new line added in or as a connection to related lines.
4.2.2 Network growth trend. Based on the situations discussed above, we maintained an unchanged numbers of initial nodes for G 2017 and identified six different strategies to simulate the growth process. Strategy 1 requires the calculation of total network betweenness centrality for each node and connects the nodes with the smallest betweenness centrality values. If more than one pair of nodes has the lowest values or are already linked, another pair with the same small value is randomly chosen and a new network is created. Then, the new network is recalculated for purposes of obtaining a new matrix of the smallest betweenness centrality values for every node, and new links are added following the same strategy. This process is run repeatedly until one-half of the total number of links are added, as measured by f a = M 0 /M 0 = 0.5, where M' is the total number of new links added, and M 0 is the total number of edges for G 2017 . Strategy 2 computes the shortest path lengths between each node, and then connects the two nodes with the largest value. If more than two nodes have the longest shortest path lengths, two of these are randomly selected and linked. As in strategy 1, the new network is recalculated and links are added until f a is obtained. Strategy 3, as proposed by Huang and Chow [38], is based on strategy 2 and computes the pair of nodes p and q, which have the biggest shortest path lengths, where k p and k q are the degrees of nodes p and q, respectively; Q is the product of k p and k q , and nodes p and q are connected if they have the smallest value of Q. Meanwhile if there is more than one such pair of nodes or these two nodes are already linked, another pair is randomly chosen. The new network is then recalculated and links are added until f a is obtained. Strategies 4, 5 and 6, are based on the smallest degree, the most insignificant clustering coefficient and the most diminutive closeness centrality of nodes, respectively; and each connects the nodes with the lowest value, and then follows the same procedure as outlined in Strategy 3.
Figs 15 and 16 show network growth following the 6 strategies outlined above. When links or edges are added to the network, APL decreases regardless of strategy; however, Strategy 2 indicates a significant decrease in both the APL and D of the network; when 10% of the total number edges is added, APL declines by 50% and D decreases by 60%. This result indicates that a network's efficiency and character can be heavily affected by linking nodes with the biggest shortest path lengths; thus, it also suggests that the network should be expanded upon at these nodes.

Network failures
Network cascading failures were investigated to measure the response of the rail network to different attack strategies. First we used the largest degree-based attack strategy. Analyzing a total of 17 networks from G 1995 to G 2017 , we found that, during the expansion network process, the network became less robust, and 5% of existing nodes were removed (Fig 17), the total fraction of edges will drop approximately 40% by G 2017 . If this situation were to occur in real life, the entire rail network would rapidly be paralyzed. With the network's growth, larger degree nodes occupy a smaller ratio of total network nodes. Removing these nodes would sharply decrease the network's efficiency. When the ratio of the recalculated number of edges and the initial number of edges are set on the Y axes, we notice that, although this type of growth obeys the degree of preferential attachment mechanism and costs less, it does not benefit future development. Next, we proposed several different attack strategies to analyze the changing trend of G 2017 . The 1st strategy focuses on the highest clustering coefficient and is based on the node elimination strategy. In other words, the node with the highest clustering coefficient is removed first, and the network is then recalculated to determine the new node with the largest value to be removed from operation. If two or more nodes have the largest value, one is randomly selected for elimination. The 2nd strategy is the largest node degree elimination strategy; it prioritizes the removal of the node with the greatest degree value. The 3rd strategy is the random attacks based on node elimination strategy. Here, one node is randomly selected and removed from G 2017 , and then the network's performance is recalculated and the process is repeated. The 4th strategy is similar to the 3rd one; however, random attacks are based on the elimination of edges. The 5th strategy, which focuses on eliminating the node with the highest betweenness value, meaning that the node with the highest betweenness will be removed preferentially. Fig 18 suggests that the largest node degree elimination strategy (Strategy 2) affects the G 2017 most seriously, with 5% of the nodes removed and a 20% decline in the fraction of edges. However, the highest betweenness node elimination strategy can also effectively destroy the network. Therefore, to enhance network protection, additional investments and attention should be undertaken and paid to those nodes with the largest degrees and highest betweenness values.

Conclusion
The construction of a topological network structure using complex network theory in this research reveals the network growth trend in the PURTNoKL. This novel approach to transit network mapping was based on an analysis of the network's expansion with reference to the nodes with the biggest shortest path length values. It also prioritized network protection and revealed that those nodes with the largest degree and the highest betweenness values are most important to the network's operation. These research findings have contributed to the design of the rail network. We also calculated the related network indices and topological network characteristics such as connection, clustering, and centrality. These calculations allow for a deep analysis and evaluation of rail and road networks, which may be useful in the urban planning processes of other countries, particularly as these relate to the analysis of traffic demand and the macro-analyses of local land-use). However, we did notice some shortcomings in the study. For example, we considered the network as an undirected graph and did not calculate the distance between each node or assess traffic flows.
Supporting Information S1 Table. The list of stations. (PDF)