Generation of 2-mode scale-free graphs for link-level internet topology modeling

Comprehensive analysis that aims to understand the topology of real-world networks and the development of algorithms that replicate their characteristics has been significant research issues. Although the accuracy of newly developed network protocols or algorithms does not depend on the underlying topology, the performance generally depends on the topology. As a result, network practitioners have concentrated on generating representative synthetic topologies and utilize them to investigate the performance of their design in simulation or emulation environments. Network generators typically represent the Internet topology as a graph composed of point-to-point links. In this study, we discuss the implications of multi-access links on the synthetic network generation and modeling of the networks as bi-partite graphs to represent both subnetworks and routers. We then analyze the characteristics of sampled Internet topology data sets from backbone Autonomous Systems (AS) and observe that in addition to the commonly recognized power-law node degree distribution, the subnetwork size and the router interface distributions often exhibit power-law characteristics. We introduce a SubNetwork Generator (SubNetG) topology generation approach that incorporates the observed measurements to produce bipartite network topologies. In particular, generated topologies capture the 2-mode relation between the layer-2 (i.e., the subnetwork and interface distributions) and the layer-3 (i.e., the degree distribution) that is missing from the current network generators that produce 1-mode graphs. The SubNetG source code and experimental data is available at https://github.com/netml/sonet.


Introduction
As the largest human-made complex network, the Internet grows with no central authority. Internet connectivity is provided by tens of thousands of Autonomous Systems (AS), organizations that maintain a physical network, or a group of networks. Each AS is assigned a unique identification number to be employed with the Border Gateway Protocol (BGP), and possesses Internet Protocol (IP) address ranges to be utilized for unique identification of devices across the Internet and routing of the network traffic. A large number of decentralized AS, which vary in size and geographic footprint, connect individuals, businesses, and organizations. Each

PLOS ONE
PLOS ONE | https://doi.org/10.1371/journal.pone.0240100 November 9, 2020 1 / 23 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 degree of a node is typically defined as the number of nodes it is connected and is typically higher than the number of network interfaces of the device. A 2-mode graph representation reflects the actual interfaces of a node as well as it's one-hop connectivity via subnetworks. Modeling of Internet topology as a 2-mode graph helps us capture the building blocks of the networks along with the large-scale characteristics of the Internet backbone.
As we are interested in the link-level Internet topology, we focus on the interface distribution and subnetwork distribution as two metrics to model the underlying connectivity in addition to the commonly utilized degree distribution. Interface distribution reflects the number of network interfaces of devices (such as routers, servers, etc) and plots the number of devices with a given number of interface count. Likewise, subnetwork distribution exhibits the number of network interfaces connected to subnetworks (i.e., point-to-point links or multi-access links) and plots the number of subnetworks with a given number of attached devices. The degree distribution ignores the access medium over which a single interface connects multiple nodes over a subnetwork. A link-layer device (e.g., a bus, switch, or FDDI ring) or a link-layer network (e.g., a switch forms the single collision domain of the subnetwork. Note that "degree" indicates the number of one-hop neighbors, and is proportional to the number of interfaces and the size of the subnetworks those interfaces are attached. In order to produce realistic synthetic topologies, we analyzed the backbone Internet topologies sampled by the Autonomous System Mapper (ASM) [22]. Our analysis of several AS reveals that many have power-law distribution patterns in the degree, subnetwork, and interface distributions [20]. In this study, we derive the power-law exponent of the degree, subnetwork and interface distributions. We also derive the condition for the power-law exponent ranges that ensure the existence of a connected network when interface and subnetwork distributions are power-laws. We then utilize these results in the SubNetwork Generator (SubNetG) topology generator.
Focusing on the relation between layer 2 (i.e., subnetwork and interface) and layer 3 (i.e., degree) in the Internet, this study aims to provide a link-level network topology generation mechanism focusing on the building blocks of communication networks. Consideration of the multi-access links in the synthetic topology generation addresses a missing level of granularity in the Internet topology models. Additionally, modeling multi-access links based on genuine Internet measurements produces synthetic topologies that better capture the underlying characteristics of the backbone networks.
In the rest of the paper, Section 2 summarizes synthetic network generation approaches. Section 3 summarizes the missing component of the current Internet topology generators. Section 4 introduces the SubNetwork Generator (SubNetG) that produces 2-mode graphs where the distributions are power-law. Section 5 presents evaluations of SubNetG, and Section 6 concludes the paper.

Related work
In this section, we present an overview of synthetic network generation approaches that could be employed to represent Internet topologies.
Random network models: Initial network generation relied on traditional random network frameworks such as the Erdos-Renyi model [13] where nodes are randomly interconnected. Random network model is not a good representation of the Internet topology due to its failure to capture many crucial properties such as the heavy-tailed degree distribution and high clustering.
Hierarchical network models: Hierarchical topology generators mimic network deployment practices. Tiers [23] captures the hierarchical aspect of the Internet by implementing the network hierarchy where nodes are linked with a minimum spanning tree at LAN and MAN levels. Similarly, GT-ITM [24] generates hierarchical networks by building transit and stub domains. GT-ITM generates a connected random graph in which each node is considered as a transit domain and then grows each domain to contain a random graph. After expanding the operation for n-levels, a number of random graphs are generated and connected to each node as stubs. Finally, IGEN [25] implements Internet engineering heuristics to populate networks based on design choices. While focusing on the network growth processes, hierarchical network generators miss large-scale characteristics of the Internet.
Small-World network models: Many real-world networks, including the Internet, have shown to exhibit the small world characteristic, i.e. high clustering and low characteristic path length. Watts-Strogatz model interpolates ordered lattices with large clustering coefficients and purely random networks with small average path lengths to produce small world networks [14]. Although small world networks obtain high clustering and low average path length, they lack degree characteristics observed in the Internet topologies.
Scale-Free network models: Internet topologies are shown to exhibit power-law degree distribution at AS level and router level [21,26]. These studies shifted the attention to degreebased generators [27]. In order to bridge the gap between the local and global properties of the Internet, statistical physics-based approaches were proposed [28]. Preferential attachment mimics network growth where edges are not placed randomly but have a tendency to connect to high degree nodes [29]. The Boston university Representative Internet Topology gEnerator (BRITE) [6] generates networks with a power-law degree distribution and allows localitybased preferential attachment to generate hierarchical networks. BRITE also utilizes the Erdos-Renyi model where the probability of the existence of a link between two nodes is inversely proportional to the distance between the nodes. Inet [30] produces synthetic Internet graphs that have power-law degree distributions. Jellyfish [31] uses a core formed around central nodes to obtain topologies that have core-periphery structures.
dK-Series network models:dK-series provides a basis to characterize a graph [32]. For an n node network 0K-graph only matches the average degree, 1K-graph matches the degree distribution, 2K-graph matches the Joint Degree Distribution, and so on. The nK-graph is isomorphic of the original graph. Researchers have introduced a methodology for the rescaling process to produce different sized graphs with the same 2K-series characteristics [33], but there is no known efficient generation mechanism for higher dK matches.
Dual Internet topology generator: Center for Applied Internet Data Analysis (CAIDA) introduced a scalable tool that generates dual Internet topologies that aim to capture both router-level and AS-level network characteristics [34]. They use the methods from [35] to generate a network of ASes and methods from [36] to generate a router network for each AS. Nonetheless, the generated networks do not reflect the 2-mode link-level characteristics.

Link-level internet characteristics
Researchers have studied various graph metrics to summarize reference graphs. Although many of these metrics (such as degree distribution, clustering, and characteristic path length) are essential to specify a network, there may be metrics that are better suited for a specific domain. Current topology generators do not consider multi-access links and model all subnetworks as point-to-point links. This may become an important hurdle for achieving accurate network simulations/emulations. In this study, in addition to the commonly utilized degree distribution, we focus on the interface and subnetwork distributions to capture the link-level connectivity of the Internet backbone networks. Subnetworks provide link-level connectivity among a set of device interfaces.
Before examining the interface and subnetwork distributions, we first illustrate these distributions on a toy network. Fig 1 shows two different topologies with the same node size. Network 1 is composed of only point-to-point links, whereas Network 2 involves multi-access links that enable one-hop connectivity to multiple nodes through a single link. Nodes belonging to the same subnetworks are marked with the same colors. In both topologies, nodes have the same node degrees, and hence the degree distributions are the same. However, when the number of interfaces is compared, the distinction of multi-access links becomes clear. Although router R has the same degree of 5 in both of networks, its interface count is 5 in Network 1 but 2 in Network 2. While both networks have the same degree distribution, their interface distributions are considerably different as shown in Fig 2. To show the difference of the network representations in practice, we perform a maximum throughput simulation based on the Internet2 [37] topology, backbone of the academic networks in the U.S. Network throughput is an important metric to measure the maximum amount of data that could be sent from a source to a destination. Gathering the ground truth information from configuration files of Internet2 routers [38], we found that Internet2 backbone had 440 nodes (i.e., routers and servers) connected over 90 subnetworks. We observe 60% of the Internet2 subnetworks are multi-access links with more than two attached network interfaces.
We use ns-3 network simulator [39] to simulate the throughput where the Internet2 backbone is modeled as a 1-mode graph (where routers are assumed to be directly interconnected via point-to-point links) and a 2-mode graph (i.e., network interfaces are connected over  subnetworks). We then assume data traffic is sent between 10 random source and destination pairs and repeat each scenario 100 times. We use the same source-destination pairs and assume a link bandwidth of 1 Gb for transferring data in both representations. Table 1 shows the 5-number summary of cumulative throughput between randomly selected node pairs. We observe that 1-mode representation of the topology, which ignores the underlying subnetworks, produces an inflated bandwidth. The traditional 1-mode model of the internet topologies ignores the underlying multi-access links that share the communication link among multiple subnetwork interfaces and hence has a single collision domain that limits simultaneous data transfers. Hence, the 1-mode model produces results that are significantly higher than the achievable bandwidth that is reflected in the 2-mode model. This experiment shows the importance of the 2-mode graph modeling of the link-level Internet.
We showed the effect of neglecting multi-access links and the corresponding collision domain between the systems connected over a shared link. Overall, current topology generators that only use point-to-point links considerably overestimate the maximum throughput of networks, even if they could capture the cliquishness of the underlying topologies. As the inflation is by an order of n 2 for n interfaces, the difference in the bandwidth of the simulations will be exponentially higher for larger subnetworks.

SubNetG: SubNetwork generator
In this section, we present the SubNetwork Generator (SubNetG) to produce synthetic topologies that reflect the link-level characteristics of the Internet backbone. In particular, we generate network topologies in the proximity of the desired network size, node interface distribution, and subnetwork size distribution. The generation process also converges the degree distribution to the measured power-law distribution.

Obtaining power-law distribution with a cutoff
Power-law distribution is a distribution where frequency of attributes vary as a power of the attribute, and follows the exponential form F i = Ai −α where i indicates the attribute such as degree, A is the scaling coefficient and α is the power-law exponent. In a power-law distribution, there are considerably abundant small values (i.e., F i is high for small values of i) while extremely large values are rare but possible. As log(F i ) = log(Ai −α ) = log(A) + (−α)log(i), we observe a power-law distribution as a line in the log-scale where α exponent determines the slope of the curve. While α exponent uniquely determines the distribution, the scaling coefficient A reflects the network size.
We could generate a power-law distribution with any desired α using the transformation method [40], such that, one can produce a number of uniformly distributed random numbers within the 0 � r < 1 range and then convert the numbers using x = (1 − r) −1/(α−1) transformation equation. The resulting distribution would be a power-law within the range of 1 � x < 1 and the given power-law exponent α. This approach would yield a set of numbers having a distribution of the given power-law exponent. However, with this method, we can only generate interface and subnetwork distributions with a given number of interfaces, but we don't have direct control over the number of nodes.
Size i ++ Algorithm 1 generates a set of numbers with the specified number of nodes N and powerlaw exponent α. The min and max parameters are used to adjust the minimum and maximum degrees in the distribution. Routers and switches have a physical footprint, and hence having a router or switch with a very large number of interfaces is impractical. The max interface routers or subnetworks is determined based on extensive measurement of ASes [22]. Similarly, an optional parameter min is used to adjust the minimum degree nodes for subnetworks since subnetworks have at least two interfaces. If min is not specified, a value of 1 is used.
In Line 2 of the algorithm, a temporary distribution curve is generated with the given power-law exponent α but with a much larger distribution coefficient A. In Line 3, the integral of the temporary distribution curve is calculated. Then, the algorithm iterates number of nodes N times to obtain a skewed probability distribution using the temporary curve. Line 5, generates uniformly distributed random numbers so that the loop in Lines 7-9 determines the number of interfaces for the node. Finally, Line 10 increments the distribution count of the determined size i.
At the end of the algorithm, the sum of values in the final distribution ∑ i Ai −α will approximate the desired network size N. Downscaling in integers introduces error due to rounding, and the sum of values might be a bit different than the intended distribution. Hence, the final network size might be different from the expected network size by a couple of nodes.
Using Algorithm 1, we generated multiple networks with various number of nodes (i.e., 1K, 10K, and 100K) and power-law exponents of 2, 2.5, and 3. Fig 3 shows the Probability Distribution Function (PDF) and Complementary Cumulative Distribution Function (CCDF) of interface distributions for networks without any min/max cutoff using power-law exponents of 2, 2.5, and 3 respectively. Similarly, Fig 4 presents the subnetwork distribution which uses a min subnetwork size of 2 without a max cutoff. As observed in the figures, the generated interface and subnetwork sizes can be unrealistically high especially for low power-law exponents.
The max cutoff shifts the power-law distribution of the network and depends on the network size and the expected power-law exponent. To determine the produced alpha value change based on the cutoff and input alpha value, we generated 3 sets of networks with 1k, 10k and 100k nodes and a cutoff value of 40% of the network size. In each set, we run the algorithm 15 times with varying power-law exponents and measured the resulting power-law exponent due to the max cut-off. Fig 5 shows the median alpha values of the resulting distributions and their exponential regression.
We analyzed the largest AS networks and their interface and subnetwork sizes [22]. The maximum interface size for the largest AS networks was around 2 12 . Similarly, the maximum subnetwork size was around 2 15 . Hence, we picked these values as the max cutoff for the network generation.

2-mode graph generation approach
Subnetwork based Internet topologies cannot be modeled as a conventional 1-mode graph as it requires a distinction between routers and subnetworks. A hypergraph H = (N,S) is a generalized graph form where N is the set of nodes and S is the set edges. Each element of S represents a subset of N, which is connected through the same subnetwork. Hypergraphs are also  illustrated using 2-mode bipartite graphs. In order to maintain the subnetwork relations among the nodes, we use an undirected bipartite graph where vertices are either nodes (e.g., a router, a server, or a computing device with one or multiple IP addresses) or subnetworks, as shown in a toy network in Fig 6. Each node is attached to at least one subnetwork (shown as clouds), and each subnetwork is attached to at least two nodes (shown as routers). The number of attachments for each subnetwork and node is shown on the figure.
In order to generate a network, the number of nodes N, the power-law exponent of interface distribution α I , and the power-law exponent of subnetwork distribution α S are provided by the user or obtained from the measurement data. Given the slope α I , the area beneath the interface distribution curve should match N (i.e., ∑ i ID i = N where ID i indicates the number of nodes with i interfaces) [2]. In Fig 6, there are 2 nodes with 3 interfaces (ID 3 = 2), 2 nodes with 2 interfaces (ID 2 = 2), and 2 nodes with 1 interface (ID 1 = 2). Subsequently, we can compute the scaling coefficient A I of the interface distribution.
Once the interface distribution is determined based on Algorithm 1, the number of interfaces (i.e., I) can be calculated from I = ∑ i i � ID i . The number of interfaces on all nodes I is equal to the sum of subnetwork sizes (i.e., S) so that all nodes and subnetworks are interconnected. That is X Subsequently, for a given subnetwork distribution exponent α S , we can determine the scaling coefficient A D of the subnetwork distribution.

Dependence of distributions
In this section, we analyze the dependency of degree distribution to the interface and subnetwork distributions when all are power-laws. As the degree distribution is determined from the 1-mode projection of the nodes in the 2-mode graph, it is dependent on the underlying distributions of the 2-mode graph. Once the power-law exponents of interface distribution (i.e., α I ) and subnetwork distribution (i.e., α S ) are determined, we can compute the average degree (i.e., < k >) in the network as follows. A subnetwork of size j, by definition, connects j nodes in the one-hop distance. Hence, once a node connects to this subnetwork, its degree k increases by j − 1. Subsequently, the total degree contribution of the subnetwork to the network is j � (j − 1). When we consider all of the subnetworks in the network, the total degree ∑k can be calculated as ∑ j SD j � j � (j − 1) where SD j indicates the number of subnetworks with j devices. Dividing this value by the number of nodes (i.e., N) gives the average degree < k > of the network as follows.
Similarly, the degree distribution can be utilized to calculate the number of nodes. That is, ∑ k k � DD k = < k > � N where DD k indicates the number of nodes with degree k. Hence, Additionally, for a given power law exponent α and a number of nodes N, we can calculate the A coefficient. Therefore, the power law exponent of the degree distribution (i.e., α D ) can be calculated for a given set of N, α I and α S parameters using the Eqs 2 and 3 and the power-law formula of F i = Ai −α . Fig 7 presents the power-law exponent of the degree distribution α D with respect to a given pair of subnetwork α S and interface α I distributions assuming ideal power-law distributions. Note that subnetwok distribution assumes that there is no subnetwork of size 1 (i.e., SD 1 = 0) and hence the plot is not symmetrical. We calculate the power-law exponent of the degree distribution α D for all interface α I and subnetwork α S combinations with an increment of 0.1. Note that some of the combinations in the vicinity of 1.0 x 1.0 have a α D = 0 (shown as black color) since they are infeasible. In a power-law distribution, when 1 < α < 2, the first moment (i.e., the average < k >) is infinite along with all the higher moments [40]. Similarly, when 2 < α < 3, the first moment is finite, but the second (i.e., the variance) and higher moments are infinite. Since such distributions contain extremely large values [41], obtaining a feasible configuration of interface and subnetworks (see Fig 6) becomes impossible.

Network connectivity
Utilizing pure probabilistic generation methods after assigning certain target degree to nodes performs poorly in terms of connectivity [32]. During our experiments, we observed that random matching of interfaces to subnetworks result in disconnected graphs and that the giant component shrinks as the subnetwork and interface distributions become steeper. In this section, we analyze issues in ensuring connectivity between all subnetworks and nodes during the 2-mode graph generation.
Generating a connected network with power-law degree distribution is not possible for all α I and α S pairs. Intuitively, if the ratio of single interface routers increases, it becomes harder or even impossible to generate a connected network. Similarly, having subnetworks with only two interfaces limit the number of configurations. Overall, every node with more than one interface can utilize its first interface to attach to the current giant component and each of the other interfaces to attach a new subnetwork. Hence, the condition that guarantees the existence of a connected configuration can be formulated as where S indicates the total number of interfaces in all of the generated subnetworks. Fig 8 plots the connectivity with respect to the power-law exponents of the subnetwork and interface distribution where the black line indicates above which connected networks cannot be generated. Similarly, the red line shows the region below which networks are infeasible irrespective of connectivity as discussed in Section 4.3. The figure also presents the power-law exponents of the interface and subnetwork distributions of the AS sampled by ASM [22] (also employed in the evaluations of Section 5.2) and observe them to be within the feasible region. When the interface and subnetwork distribution exponents are above the black line, there is no connected graph satisfying both power-law distributions.
As the slopes become steeper, the ratio of single interface nodes and/or ratio of smaller subnetworks increases, and eventually connectivity becomes infeasible. Analysis of the contour and the axes reveal that connectivity is affected more by the subnetwork distribution compared to the interface distribution. Note that, as the networks are generated through a random process and the obtained power-laws are not perfect the boundaries are not hard boundaries and

PLOS ONE
Generation of 2-mode scale-free graphs one may obtain a connected network with power-laws above the line. However, it becomes harder to find a connected network configuration as power-law exponents are closer to either boundary.

Generation methodology
Precise matching of the distributions can be challenging to achieve due to the discrete nature of the graphs [40]. After calculating all distributions using Algorithm 1, we generate vertices of the bipartite graph without any edges. We assign the interface count of each node and the size of each subnetwork according to the previously calculated ID i and SD j , respectively. We mix the order of the nodes and subnetworks to eliminate bias (i.e., obtain non-assortative graphs). If both subnetworks and nodes are ordered with respect to their size, the assortative mixing of the resulting graph would be high [42]. However, our analysis of measured AS networks indicate non-assortative connectivity in terms of attachment between low/high interfaces and subnetworks.
Algorithm 2: GenerateNetwork(Nodes, Subnetworks) Algorithm 2 provides the pseudo-code for the network generation from a given set of Nodes and Subnetworks that are produced with the Algorithm 1 using the interface and subnetwork distributions, respectively. In order to ensure connectivity (see Section 4.4), we employ a Connected set of subnetworks that are connected to the main component so far. As the network is modeled as a bipartite graph, SubNetG grows the network expanding the connected component to have a path between any subnetwork pairs. The design relies on the idea that at least one interface of each node is connected to a subnetwork that is already part of the connected component, while the rest of the interfaces of the node are connected to random subnetworks.
After initializing Edges to an empty set (Line 1), we assign a random Subnetwork to the Connected component (Line 2). The outer loop in Line 3 iterates over all Nodes that were randomly sorted and the inner loop (Line 10) iterates over each interface of the node (i.e., Nodes i ) after the first one (i.e., Nodes 1 i ) is connected to the Connected component (Lines 4-8). A random subnetwork is selected to be connected (Lines [11][12][13][14][15][16] and the selected subnetwork is appended to the Connected component (Line 17). Lines 5-8 and Lines 12-16 ensure that the selected subnetwork has room for a new node connection, and there is not already an edge between the node and subnetwork. If the subnetwork is full, it is removed from the respective list so that it is not redundantly selected (lines 6-7 and 14-15). Finally, the selected subnetwork is connected to the node's interface (Line 18). The algorithm terminates when all interfaces of all nodes have been considered for attaching to the subnetworks. Note that in practice we also need to check for the parameters to ensure feasibility before execution and consider the lack of subnetworks especially in the Connected component during network configuration. The time complexity of the algorithm is O(I + S) where I is the total number of interfaces in the network, and S is the number of subnetworks. While there are two outer loops at line 3-4, each node interface is processed only once. Likewise, the internal loops at lines 7 and 12 are executed only once for each subnetwork to be removed from the list when it becomes full.

Evaluation
In this section, we present samples of 2-mode power-law network generation with SubNetG. In Section 5.1, we analyze whether current topology generators reflect link-level subnetwork characteristic. In Section 5.2, we evaluate the data we obtained from large backbone Autonomous Systems sampled across the Internet. In Section 5.3, we show the results for the synthetic topologies that we generate using SubNetG.

Analysis of network generators
In this section, we assess whether network generators capture the link-level characteristic (i.e., subnetwork connectivity) of the Internet topology. As commonly employed network generators produce 1-mode graphs [43], we transform the 2-mode network of link-level Internet (i.e., subnetworks and nodes) into a 1-mode network between routers. Hence, we model subnetworks as cliques between all attached nodes, as shown in Fig 10. Note that this estimation is not perfect as neighboring subnetworks (such as three point-to-point links between the

PLOS ONE
same triple of nodes) may incorrectly be assumed as a single subnetwork (i.e., a three node subnetwork).
In order to analyze the clique distribution, we sampled the subnetwork based Internet2 backbone topology [37]. Note that, even though router-level topology of other networks are publicly shared, they do not provide the subnetwork information that is needed for a ground truth comparison. We convert the subnetwork topology to a point-to-point graph by replacing multi-access links of each subnetwork with a clique of links among the subnetwork nodes. Finally, we run a clique search on the graph and compare the distributions with the actual Internet2 Subnetwork distribution to analyze how successful the clique search approach captures the multi-access links.
Internet2 Subnetwork curve in Fig 11 illustrates the subnetwork distribution of the Internet2 backbone. The result of clique search is shown as the Internet2 Clique curve. We observe a slight difference between the subnetwork and clique distributions, i.e., at 2 and 3. In clique search, all point-to-point triangles are assumed to be a multi-access link with three nodes. Note that, similar incorrect assumptions in larger cliques have a negligible probability of occurrence as this will require all nodes in both subnetworks to be attached to each other. We also utilize the Orbis topology generator [33], to produce the same size graph with Internet2 Clique as the reference graph. Orbis can produce synthetic topologies of any size with the exact 1k or 2k distribution (marked as Generated in Fig 11), and rewire the original graph while preserving the 2k or 3k distributions (marked as Rewired in the Figure). Although the reference Internet2 graph includes up to 20-cliques, 1k or 2k synthetic topologies do not preserve the clique distribution. We observe that 3k rewiring preserves the clique distribution, but a close examination revealed that cliques larger than 3 were not rewired and remained intact.
Finally, we produced topologies replicating the characteristics of sample AS networks in Table 2. We generated synthetic graphs using Inet [30] and BRITE [6] with both of the Waxman and Barabasi-Albert (BA) models. For BA preferential attachment model, BRITE uses an m parameter to indicate the number of connections a new node makes. As the m value is increased, network density increases. However, this did not result in a significant increase in the size of cliques in the topology. Fig 12 shows the clique distributions of a sample AS 8928. As shown in the figure, both the number of cliques and the clique sizes in the real topology is significantly larger than the ones in the generated topologies. The generated topologies have clique sizes up to 20, but as seen from the figure, measured topologies of the similar sizes can have clique sizes up to 200. Inet produces largest cliques among other generators but it has a completely different distribution compared to the measurements. Both the Internet2 Subnetwork curve in Fig 11 and clique sizes and clique distribution of the measured AS in Fig 12 illustrate the frequencies of multi-access links in the Internet2 and real Internet topology, hence it shows the existence of subnetworks in the backbone. Overall, we observe none of the analyzed generators capture the subnetwork characteristics of the sampled backbone Internet topologies.

Sample network topologies
In this section, we analyze the topological characteristics of sample AS networks obtained from measurements of the Internet backbone via Autonomous System Mapper (ASM) [22]. ASM collects partial traces to every observed IP address of an AS from all border routers it could identify for the AS. After filtering trace anomalies such as loops and bounce-backs [44], ASM identifies subnetworks [45], IP aliases [46], and unresponsive routers [47] to infer the underlying link-level network of the AS. Even though ASM collects the most comprehensive snapshot of backbone AS, there may still be unmapped regions of the network. While public measurement platforms such as CAIDA Ark [48] and RIPE Atlas [49] provide samples of the Internet backbone, they do not comprehensively map all interface IPs of an AS. In a previous study [43], we had relied on such public measurement platforms [22,48,50] to estimate the power-law parameters. In this study, we realized that their samples yield power-law parameters that were not within the feasible range as discussed in Section 4.4. Hence, we deployed ASM to better capture link-level connectivity of AS topologies.
Interface distribution presents the histogram of systems (e.g., router and server) with a given number of interfaces. The number of interfaces on a system is typically equal to the number of IP aliases for that system. ASM utilizes MIDAR [51] and ASIAR [52] tools to resolve IP aliases in the sample topologies. Fig 13 present the interface distributions of three sample AS. The power-law exponent value of most of the measured AS vary between 2 and 3, as shown in Table 2. We defaulted the interface distribution of SubNetG to the average of sampled AS, i.e., α = 2.5.
Subnetwork distribution presents a histogram of the subnetworks with a specific size, i.e., the number of interfaces attached to the subnetwork [38]. For a given i, SD i indicates the number of subnetworks with i attached systems. Note that the minimum i is 2, as there should be at least two interfaces attached to a subnetwork. This metric complements the interface distribution, where the total number of interfaces is equal to the sum of the subnetwork sizes. We use ASIAR [52] to infer the subnetworks of the AS based on the BGP announcements of the AS. Fig 14 presents the subnetwork distributions of three sample AS. The power-law exponent value of the sampled ASes is between 1.5 and 4, as shown in Table 2. We consider the average of samples, i.e., α = 2.4, as the default while generating synthetic topologies with SubNetG. Degree distribution is the most utilized metric to characterize a network. Degree distribution represents a histogram of the nodes with a certain degree and gives insight into the structure of the network. For each node, its degree is computed from the number of nodes that are within one-hop distance. For the sampled backbone AS, the power-law exponent α value is mostly within 2 and 3 range and has an average of 2.45. Fig 15 shows the degree distributions of three sample AS networks. Table 2 presents the power-law exponent for the Interface, Subnet, and Degree distributions of 12 AS mapped by ASM. As research has shown that visual verification of the linearity in the logarithmic scale may be misleading, we utilize [41] for fitting and verifying the observed power-laws. Note that, as the empirical data is not ideal power-law distributions, the α D measured and α D expected differ for individual AS measurements (such as in AS 9505, 33891 and 262589). Overall, the median and mean difference between the expected and measured α D is 0.07 and 0.09, respectively. Similarly, generated networks employ a random configuration of subnetworks and node interfaces after establishing a minimum spanning tree to ensure connectivity, and hence α D generated may differ from both as seen with AS 4826, 31133 and 262589. Overall, the median and mean difference between the expected and generated α D is 0.04 and 0.08, respectively, indicating that generated networks are closer to the expected distributions with the exception of a few outliers. One may generate new networks until a configuration within a threshold of α D is obtained. Generated networks are further analyzed in the next Section.

Synthetic topologies
In this section, we analyze synthetic topologies generated with the interface and subnetwork distributions of the measured backbone AS. While SubNetG matches the measured power-law exponents of the interface α I and subnetwork α S distributions, it does not directly match the degree distribution α D . We observe that the resulting power-law exponents of the degree distributions to be similar to the genuine networks as presented in the Table 2. Furthermore  We observe that, the generated interface and subnetwork distributions are different from the measured distributions even though the powerlaw exponents are the same. Particularly, the CCDF of measured distributions seem to contain greater perturbations, which is expected in empirical data [41].
Additionally, we generate synthetic topologies with 1K, 10K, 100K, and 1M nodes. Based on the averages of AS measurements presented in Section 3, we set α ID = 2.5 and α SD = 2.4. Fig 19 present the interface, subnetwork, and degree distributions, of the generated networks. Even though the generation method did not consider the degree distribution directly, the Although sample topologies presented in this study use data obtained from ASM [22], the presented algorithm is independent of the specific data set and can produce synthetic topologies with any feasible set of distribution. Generation parameters α DD , α ID and α SD can be selected from any measurement dataset and ported as the reference topology. Moreover, the user can supply any feasible (for feasible parameter space) set of parameters to be matched. Overall, we observe that produced networks have a similar interface, subnetwork, and degree distributions to the genuine topology they are modeled after.

Conclusion
Currently, synthetic network generators for the Internet topology ignore the multi-access links and model the network as consisting of point-to-point links. However, multi-access links such as FDDI ring, and Ethernet are widely deployed as link-layer technologies at the backbone networks. To assess the need for 2-mode modeling, we analyzed the impact of subnetworks on the Internet topologies by comparative graph structure analysis of current network topology generators and performed comparative network simulations. Our analyses on the previous network topology generators revealed that neither the subgraph structures nor the bandwidth related characteristics of the Internet topology are represented by the generated graphs. Additionally, we analyzed the interface and subnetwork size distributions of sample backbone AS in addition to the degree distribution that the current power-law based topology generators focus on. In our analysis of top ranked backbone AS, we observe that both subnetwork and interface distributions occasionally exhibit power-law characteristics similar to the degree distribution.
We introduced SubNetwork Generator (SubNetG) that captures both the link-level interface and subnetwork distributions and the network-level degree distribution. We showed that the degree distribution is uniquely defined for a given pair of subnetwork and interface distributions that are ideal power-laws. We also showed the necessary condition for obtaining a connected graph with all distributions being a power-law. Note that, the generation parameters (i.e., interface and subnetwork distributions and network size) can be estimated based on the measurement results or provided by the user. Finally, we present synthetic networks and show that the SubNetG captures subnetwork characteristics of the link-layer Internet topologies as well the as degree distribution. The SubNetG can be used for generating synthetic network topologies at link-level, which can be utilized for simulating network protocols for more realistic link-layer behavior (i.e., multi-access links) as well as analysis of link-layer topologies that reflect the interaction between layer 2 (i.e., subnetworks) and layer 3 (i.e., routers). As future work, the SubNetG algorithm can be improved to eliminate potential outliers, consider router-level metrics such as rich-clubs that would represent network cores, and generate subnetworks with other distributions such as exponential, log-normal, and Weibull.