Structural and spectral properties of generative models for synthetic multilayer air transportation networks

To understand airline transportation networks (ATN) systems we can effectively represent them as multilayer networks, where layers capture different airline companies, the nodes correspond to the airports and the edges to the routes between the airports. We focus our study on the importance of leveraging synthetic generative multilayer models to support the analysis of meaningful patterns in these routes, capturing an ATN’s evolution with an emphasis on measuring its resilience to random or targeted attacks and considering deliberate locations of airports. By resorting to the European ATN and the United States ATN as exemplary references, in this work, we provide a systematic analysis of major existing synthetic generation models for ATNs, specifically ANGEL, STARGEN and BINBALL. Besides a thorough study of the topological aspects of the ATNs created by the three models, our major contribution lays on an unprecedented investigation of their spectral characteristics based on Random Matrix Theory and on their resilience analysis based on both site and bond percolation approaches. Results have shown that ANGEL outperforms STARGEN and BINBALL to better capture the complexity of real-world ATNs by featuring the unique properties of building a multiplex ATN layer by layer and of replicating layers with point-to-point structures alongside hub-spoke formations.


Introduction
Airline Transportation Networks (ATNs) attracted more attention as they are generally more efficient, safer, and can easily connect remote areas compared to other means of travelling [1]. In particular, network science studies are surfacing primarily on the U.S. airline network, as it is one of the most advanced transportation infrastructures in the world blending services offered by the commercial, military, and public environments [2][3][4]. BINBALL model by enforcing both a differentiated layer set sizes as well as a hub-spoke layer model observed in the early studies at ATNs [44]. Moreover, a different approach is defined in the ANGEL model, which removes attention from the preferential attachment to gain more influence on the intra-and inter-layer structure of the multilayer synthetic network created [45]. A preliminary analysis and comparison of these synthetic models followed recently [46], which we use as inspiration for this work.

Contributions
Our goal in this work is an extensive analysis of the three synthetic models, BINBALL, STAR-GEN, and ANGEL, in terms of topological, resilience, and spectral properties. Although all three models were formulated to mimic the same reference network, they differ in their approaches. One major concern is to demonstrate that the pure preferential attachment approach, which is adopted in BINBALL and STARGEN, is not sufficient to imitate the complex structure of an airline transportation network, especially viewed as a multiplex. By contrast, being designed to generate a multiplex layer by layer and to balance between the number of hub-spoke and point-to-point structured layers, the ANGEL model lends itself as the most sophisticated generative model to effectively mimic a real-world ATN. Experimental results from the various stages of analysis we carried out, have indeed unveiled that only the ANGEL provides reliable approximations in all facets of the complex reference networks, while STAR-GEN and BINBALL perform comparably mainly on the macroscopic level, i.e., when viewing the entire multiplex. We compare the synthetic networks to the reference ones also in terms of resilience, in both site and percolation process scenarios, under different types of attack. Besides, we investigate on the failure effects in relation to the presence of both hub-spoke and point-to-point structures, which is a unique feature of the ANGEL model. Moreover, we analyze spectral and eigenfunction properties of the synthetic and reference networks, based on the Random Matrix Theory modeling approach. Even in this evaluation stage, ANGEL turns out to be the best suited model w.r.t. both EATN and USATN.
The remainder of the paper is organized as follows. Section Background gives an overview of the three methods under study as well as the reference networks we use for validation. Section Topological Analysis encompasses our extensive structural analysis of the synthetic networks generated by the three methods versus the real instances. Section Resilience Analysis is devoted to site and bond percolation process to validate the resilience behavior of the synthetic and reference networks, whereas Section Spectral Analysis contains our study on the eigenfunction properties of the synthetic and reference networks. Also, an insight into the efficiency of the three generative methods is provided in Section Running Times. Section Discussion summarizes the key traits of the three methods, highlights their features but also points out their limitations. Finally, in Section Conclusions, we summarize our contributions and give an outlook on what our work can be used for and how it could be continued.

Background
Before we commence our comprehensive study, we provide background information on both the synthetic network generation models and reference network data that we will use in this work.

Synthetic network generation models
As it is well-known in the complex network literature, a multiplex is a multilayer network, the layer graphs of which are defined on the same set of nodes.
In our setting, layers correspond to different airlines, nodes to airports, and edges to flight connections; this means that two or more flights for the same pair of airports may occur though referring to different airlines. Each node in the multiplex can be considered both from a local and a global perspective: this reflects on the notion of degree, which hence can be defined locally, i.e., the degree of a node within a particular layer graph or globally, i.e., the total degree of a node across all layers.
Let n, m, and l denote the total number of nodes, edges, and layers in the multiplex to be generated. BINBALL initializes the node set V L in each layer L by uniformly distributing the n nodes (i.e., by dividing the multiplex node-set V into possibly equally-sized subsets). At each iteration, an edge is created and added to a layer selected uniformly at random. To create an edge, the two end-nodes are selected in a preferential attachment manner according to their local and global degrees; more precisely, one end-node u is sampled from a probability distribution f L (V L , Θ) that is directly proportional to the local degree of u, and the other end-node v is sampled from a probability distribution f M (V, Θ) that is directly proportional to the global degree of v, where Θ denotes a set of parameters that might account for node weighting schemes. We refer to f L (�) and f M (�) as the local and global preferential attachment functions, respectively. It should be noted that BINBALL produces a multiplex composed of layers with similar sizes for both the node and edge sets. This limitation is overcome in STARGEN, which is designed to differentiate the growth of the layers, allowing for different sizes according to a non-uniform distribution P edgeL of the layers' edge counts. Moreover, STARGEN introduces local and global scaling factors in the probability functions f L and f M , respectively, so to impact on the diversification of the intra-layer topology based on the different weights assigned to each layer. Despite their differences, BINBALL and STARGEN share a common network generation scheme, which is captured in Algorithm 1. Note that we use superscripts (S) and (B) to distinguish between the preferential attachment functions and parameters used in STARGEN and BINBALL, respectively. We refer the interested reader to [6,44] for further details on the two models.

Algorithm 1: BINBALL and STARGEN models
Input: Total number of nodes n, edges m, and layers l desired in the multiplex, distribution P edgeL (uniform for BINBALL) for the layers' edge-set sizes Output: Layers L 1 , . . ., L l and the multiplex M {Initialize data structures} 1: Let L 1 , . . ., L l be empty graphs representing layers 2: Let M be the multiplex with n isolated nodes 3: for i = 1 to m do 4: Sample layer L from P edgeL 5: Sample node u from the local preferential attachment function f ðBÞ L ðV L ; Y ðBÞ Þ, resp. f ðSÞ L ðV L ; Y ðSÞ Þ 6: Sample node v from the global preferential attachment function f ðBÞ M ðV; Y ðBÞ Þ, resp. f ðSÞ M ðV; Y ðSÞ Þ 7: Add the edge (u, v) to L and M 8: Update local and global degrees of u and v 9: end for The hub-spoke layers produced by both BINBALL and STARGEN result in homogeneous structures due to the way a preferential attachment method is applied. However, as found in [45] based on a thorough investigation of the EATN network, most layers appear to show a mixture of both hub-spoke and point-to-point structures. Addressing this crucial aspect is a major focus in the ANGEL model, the algorithmic scheme of which is sketched in Algorithm 2.
Initially, ANGEL distributes the nodes of the multiplex among the layers (according to the distribution P layerN ), favoring their overlapping (according to the distribution P nodeL ). The next stage is the identification of hubs. For this purpose, since hubs are usually found as nodes with a central geographical location, a further notable enhancement introduced in ANGEL is that it incorporates the spatial location of the nodes in the network, by randomly distributing the nodes of a layer on a grid (with a shape proportional to the square root of the node-set size); next, a minimum spanning tree is computed according to the Euclidean distance between the nodes, and eventually used to identify the hubs of that layer. A hub-subnetwork is then formed by using the configuration model with respect to a degree sequence sampled uniformly between 1 and the total hub count in the multiplex.
Unlike BINBALL and STARGEN, which generate all layers simultaneously, ANGEL enables each layer to be formed separately from one another, in such a way that point-to-point structures in the layers are mimicked alongside the hub-spoke structures. For each layer, nodes are assigned to points on a grid and a minimum spanning tree is computed. For the point-to-point strategy, too long and too short distances are avoided while adding a number of edges randomly chosen in the range between one and the difference between the edge-set size of the replicated layer and the edge count of the minimum spanning tree calculated; the remaining edges are added according to a preferential attachment. To create the hub-andspoke structure of the layer, low degree nodes are iteratively connected with nodes close to the spatial center of the minimum spanning tree; then a certain percentage of edges sorted by increasing distance are chosen, with one end being a hub, and the remaining amount of edges is attached in such a way that low degree nodes but leaves are preferably linked with high degree nodes but hubs. The multiplex finally emerges as a multigraph obtained as the union of all nodes, discounting repetition, and all the edges in the layers, allowing the repetition from different layers. We refer the interested reader to [45] for further details.
Algorithm 2: An outline of the ANGEL model Input: Total number of nodes n, edges m, and layers l required in the multiplex, distribution P edgeL for the layers' edge-set sizes, distribution P nodeL of the node count per layer, distribution P layerN for the random selection of the number of layers a node appears in, and the percentage p of layers to be formed by the point-to-point strategy Output: Layers L 1 , . . ., L l and the multiplex M {Initialize data structures} 1: Let L 1 , . . ., L l be empty graphs representing layers 2: Let M be the multiplex with n isolated nodes {Assign nodes to layers} 3: for u 2 M do 4: Sample layer repetition count, r u , from the P layerN 5: Select r u different layers, according to P nodeL , to place u in 6: end for {Create hub-subnetwork} 7: Assign hubs to layers and create a multigraph on all hubs using configuration model {Create layers} 8: Assign number of edges to layers according to P edgeL 9: for i = 1 to bp � lc do 10: Call a point-to-point layer creation procedure for L i 11: Add all edges from L i to M 12: end for 13: for i = bp � lc to l do 14: Call the hub-spoke layer creation procedure for L i 15: Add all edges from L i to M 16: end for

Reference networks
The European ATN (EATN) was originally studied in [49] and extensively analyzed in our earlier work [45]. We refer the interested reader to the above works for further details, whereas here we provide an overview through a selection of statistics reported in Table 1.
The second reference ATN we consider is composed of the US domestic airline connections retrieved from https://openflights.org in 2018. This airline network, hereinafter referred to as the USATN, consists of 14 layers, 436 nodes, and 4483 edges. Table 2 summarizes the main structural characteristics of this network, for each layer and the multiplex in the bottom row. Figs 1 and 2 provide further details, which are described below. The leftmost plot in Fig 1 shows the number of nodes and edges as well as a boxplot with node degrees for each layer. The ordering on the x-axis corresponds to layers sorted by the node count. The next four boxplots compile the values for the density, the transitivity, the average shortest path length, and the average clustering coefficient, respectively, collected for all layers in Table 2. The rightmost plot captures the overlap across all layers, where each boxplot corresponds to one layer, say L, and consists of the values that are computed for every other layer in the network, L 0 6 ¼ L, where V L and V L 0 denote the set of nodes in layer L and L 0 , respectively. The ordering of the layers, i.e., on the x-axis, is determined by the median of the boxplots. The two plots from the left in Fig 2 display the cumulative histograms of the number of nodes and edges, previously listed in Table 2. The green curves represent the fittings to the exponential distributions that are used in the input for the ANGEL method (cf. Algorithm 2). Their parameters are listed in Table 3 alongside the KS-test statistics.
The remaining two plots to the right in Fig 2 concern the statistics on hubs. Nodes of this kind form the core of network structures that are typical for flight connections. The affinity of a graph G (with edge-set E) to build hub-spoke formations can be measured using the s-metric  value, s G m ¼ P ðu;vÞ2E degðuÞ � degðvÞ [50]. In [45], the s-metric formula is applied to set up an empirical definition of a hub. According to it, a node is identified as a hub if where V L and E L denote the set of nodes and edges in L, respectively. Next, the degree distribution of hub nodes within the sub-network of the USATN they induce is shown in the third diagram from the left in Fig 2. This substructure plays a special role in the ANGEL model [45] (cf. Algorithm 2, step 7) and is created after nodes and before edges are assigned to layers. The configuration model applied in that context is based on the assumption that degrees of nodes in this sub-network follow a uniform distribution. According to a KS-Test, this degree distribution fits the uniform distribution on the interval [0, 65] (shown in the plot) with the p-value of 0.6139 for the maximum negative deviation D − = 1.105.
Finally, the rightmost plot in Fig 2 refers to the term layer repetition count per node in Algorithm 2 and displays the histograms on how many layers a node shares. The orange line corresponds to the layer repetition count per hub, whereas the blue one to non-hubs. The fitting curve to the latter distribution, P layerN , is applied in the first step of the ANGEL model (cf. Algorithm 2, step 4), where nodes are allocated to the layers. We refer to Table 3 for the parameters and KS-test statistics of this exponential probability function.

Topological analysis
This section is divided into two subsections, each presenting the performance of the synthetic models under study-BINBALL, STARGEN, and ANGEL-against the reference networks, the EATN and the USATN, respectively. In addition, each subsection consists of two parts: the first takes into account the entire multiplex, the second focuses on the layers.
To begin with, let us provide some details of the evaluation methodology adopted. The three algorithms for the generation of synthetic networks are initialized with equal input data in terms of number of nodes, edges, and layers that come from the respective real network, as displayed in Table 4, alongside the additional input for the ANGEL algorithm. The remaining parameters are fixed as specified in [6,44,45].
The statistics we present base on 100 multiplex replicas generated for each synthetic model or one multiplex randomly selected from the sample. Each multiplex is represented by the set of layers it is composed of, and viewed as a multigraph. When an average line over the 100 samples is plotted, it is calculated in the following way: for each sample, y-values are sorted, then per x-value the average over 100 y-values is calculated. Usually, a thick colored line is the average and is drawn over the group of faded lines that correspond to 100 replicas in the background. The color code for the models and the reference is introduced in Fig 3 and carried throughout the section. Validation of the synthetic models versus the EATN Topological comparison of the multiplex networks. A macroscopic view of the synthetic multiplexes compared to the EATN is provided by the values compiled in Table 5. It displays, line by line, the minimum, average, and maximum values for the respective network statistics. The columns, except for the reference, relate to the minimum, mean, and maximum values per row, calculated over the 100 synthetic replicas with respect to the model given in the heading. As one can observe, the values of the synthetic networks differ from the EATN within a small range and especially the average values over the replicas stay close to the reference.
Next, we turn to a more detailed analysis of the generated multiplexes. In the first statistic we consider, the degree histograms shown in Fig 3, it can be noted the comparably good performance of ANGEL and STARGEN networks and the trailing off effect by the BINBALL model. This trend will continue.
The plots in Fig 4 give an insight into structural similarity aspects of the networks. Per plot, one instance from the sample of the considered model is randomly selected and compared with the reference. The darker the shading is, the higher the cosine similarity per node-pair is (i.e., the number of common neighbors of the two nodes normalized by the geometric mean of their degrees). Nodes on both axes are sorted by degree. The darkest zone in the lower right corner of the EATN triangle indicates that high degree nodes, i.e., hubs, are strongly connected among themselves. Apparently, this tendency is less pronounced for synthetic networks. However, ANGEL's relatively diverse transition stands out against the smoother textures of STAR-GEN and BINBALL. PDF e (x, l, s) as defined in Table 3 https://doi.org/10.1371/journal.pone.0258666.t004 In Fig 5, we present statistics on the sub-network of the multiplex induced by the hub nodes, cf. Eq (2). About 20% of the hubs in the reference network share their central role across the layers. ANGEL and STARGEN networks build hubs in number close to the EATN, but with fewer repetitions, as it can be read from the first plot on the left. Since airlines naturally offer connections between the central airports, the hub-subnetwork is a multigraph. According to the next chart in the row, every connection is offered twice on average in the EATN. When the repeated edges are discarded from the multigraph, the density is reduced by about a half. The same applies to ANGEL and STARGEN.
The high transitivity values confirm that the hubs are very well connected both in the EATN as in ANGEL and STARGEN networks. However, hubs in these replicas achieve higher maximum degrees than the reference, as the average degree distributions of the hub-subnetworks in the right plot show. Both histograms, for the ANGEL and STARGEN model, approach the uniform distribution. In ANGEL, it results from the application of the configuration model on uniformly distributed degrees in the creation of the hub sub-network (cf.  Algorithm 2, step 7). In STARGEN, the degree distribution is definitely a consequence of the preferential attachment method. Considering all charts in Fig 5, BINBALL replicas feature very poorly hub-spoke formations. Only a few hubs are counted, which leads to small graphs they span and deviations in the statistics. The weakness of BINBALL in generating networks spanned on hubs is confirmed by the low s-metric value (cf. Section Background), as shown in the bottom left plot in Fig 6. The boxplots show the s-metric values computed for the entire multiplex and normalized with the squared number of edges. Notice that the latter quantity is the same for each model or close to that in the reference. Both ANGEL and, in particular, STARGEN multiplexes consist of well-exposed hub-spoke formations, but not as strong as the EATN.
The middle chart of Fig 6 confirms the presence of hubs in all the considered networks. The degree assortativity is negative and close to 0 for ANGEL and STARGEN as in the EATN. Such values are expected in networks with hub-spoke structures, where low degree nodes are satellites of the strongly connected centers. Here again, we observe that hubs in BINBALL have the weakest attraction.
Finally, in the right plot in Fig 6, we show the average distributions of edge repetitions in the multiplex, i.e., how often one connection is offered by several airlines. As it can be noted, while the majority of the edges in all networks including the reference are simple, the replicas have fewer repeated edges, but with higher frequencies, compared to the EATN.

PLOS ONE
consolidated view on the degree distribution within the individual layers of a multiplex, the reference or a random selection from the 100 replicas.
Per chart, each boxplot shows all node degrees within one of the 37 layers of the multiplex. The ordering of the boxplots is performed with respect to the median followed by the mean value. Compared to the other models, ANGEL layers show the most diversity in the degrees within the layers, even if they have more leaves or high degree nodes than the reference. The STARGEN algorithm creates homogeneous layer formations where the majority of nodes have a degree less than 2. The uniform structure of BINBALL layers is striking again.
The next group of charts, Fig 9, presents histograms on node and edge numbers over a layer set per multiplex in the 100 sample. The thick line is the average distribution, the faded ones correspond to every single multiplex in the 100 sample. Although the fit to the reference for the node and edge number, P nodeL and P edgeL , is used in the input for ANGEL, this model struggles to mimic the outlier of the EATN, the layer with 128 nodes and 601 edges. The replication of layers with a smaller number of nodes and edges is more accurate. STARGEN offers the best performance in terms of mimicking the edge size, but mostly produces layers with higher node numbers than the reference, except for the largest. The BINBALL distributions are not surprising when recalling the uniform layer size constraints in the input.
The histograms in Fig 10 merge layer repetition counts per node. The value indicates how many layers a node belongs to. The nodes are divided into hubs (left plots) and non-hubs (right plots). Due to the insufficient number of hubs, the BINBALL curves are not plotted for hubs and hence the non-hub curve relates to almost all nodes. Considering hub repetitions in layers, we observe that STARGEN networks usually have hubs present in almost all layers, not like the EATN or ANGEL, and in ANGEL, fewer hubs with a higher repetition frequency than in the reference can be observed. Both ANGEL and STARGEN show overall a very good performance with respect to non-hub repetition counts. Recall that the fit to the reference curve for non-hubs, P layerN , is used in the input for ANGEL.
On the contrary, the structural characteristic of the EATN we consider next is difficult to replicate. The charts in Fig 11 offer another view on the layer node-set overlap. The boxplots correspond to the statistic defined in Eq (1), and express the percentage of nodes each layer shares with the others. For every chart except the rightmost one showing the reference network, a multiplex is randomly selected from the 100 replicas of the respective model. Compared to the reference, STARGEN and BINBALL form a quite uniform pattern. ANGEL shows a growing tendency, but only a few outliers reach the overlap over 50%, the majority of the EATN layers is around or close to.

Validation of the synthetic models versus the USATN
Topological comparison of the multiplex networks. At a first glance, the standard network statistics shown in Table 6 reveal that the USATN multiplex is more difficult to replicate by the synthetic models than the EATN. Remarkably, ANGEL, STARGEN, and BINBALL appear to be on average comparable to one another.
Considering the statistics plotted in Figs 12-15, ANGEL and STARGEN compete with each other as they have done in the validation against the EATN. Both fail to approximate the degree distribution of the USATN, however, follow its power-law fitting (Fig 12).
In Fig 13, the pattern created by the USATN in the charts for comparing the cosine similarity indicates that the overlap of the neighbors is greater, the higher the degree of the node. Recall that nodes are sorted by the increasing degree on both axes. This tendency can be observed in the synthetic models, and especially in ANGEL, but not to the extent as in the reference.
Plots in Fig 14 concern the graph induced by the hubs that have been previously identified in the layers. As we have learned from the validation against the EATN, nodes in BINBALL fail to be tagged as hubs. In ANGEL, only outliers reach the count of the reference, and it may

PLOS ONE
happen that very few hubs are present in the multiplex. The hub numbers in STARGEN networks stay steadily around half of the USATN. The small repetition rate of hubs in this reference network is difficult to replicate in the synthetic multiplexes. As the values of the densities and the degree distribution of the networks with discarded multiple edges (labeled as a simple graph in the middle and right plot) indicate, the hub sub-networks in ANGEL and STARGEN form connected structures comparable with the reference. However, from the deviations between the curve, we close on fewer repetitions of the edges in the synthetic multigraphs.
It is not surprising that the s-metric values of the multiplexes, presented in the left plot in Fig 15, are as high as when replicating the EATN. All synthetic models are designed to mimic airline networks with a predetermined affinity to form hub-spoke structures. However, they fail to reproduce the extremely high s-metric value of the USATN. The degree assortativity values of the replicas are also similar to those from the EATN, cf. the middle plot in the row. Clearly, the positive value for the USATN correlates to the cosine similarity charts, where congestion around high degree nodes has been observed. The plot to the right in Fig 15 confirms the presence of edges with a high multiplicity in the USATN, which has already been observed when analyzing the hub sub-network. The generative models, especially ANGEL, manage to produce multiplexes having edges with a comparably high repetition rate as the reference, but with a lower frequency.
Topological comparison of the layers. Following the outline of the analogous section when validating against the EATN, we look first at the layer structures created by the generative models.
In Fig 16, alongside the boxplots consolidating the s-metric in every layer within the 100 replicated multiplexes or in the 14 of the USATN, a layer having a minimum (top) and  Considering the boxplots with node degrees in a set of layers of one randomly selected multiplex from the 100 sample (Fig 17), we observe that although ANGEL layers have more leaves compared to the reference, they have nodes in a wider range of degrees. This diversity is missing in STARGEN. Here, either low or high degree nodes have the majority within a layer.
The homogeneous trend in STARGEN is continued with regard to the number of nodes in the layers (Fig 18), which remains in a narrow range between 100 and 200. The performance is much better when layer edge numbers are replicated. However, this is to be expected, since the model is initialized with the fit to the reference distribution studied here. The same applies to ANGEL. In both cases, the exponential fitting to the layer edge number distribution, P egdeL , leads to remarkably long tails.
As it can be observed in Fig 19, the hub repetition count per layer is highly volatile, in ANGEL as well as in STARGEN. Recall that in BINBALL no hubs are counted. Nevertheless, the average curve for ANGEL progresses close to but above the reference. Considerably higher hub repetitions can be observed in STARGEN. On the contrary, the non-hub curves of these two models meet the reference.

PLOS ONE
The statistic presented in Fig 20 proves a weakness of all generative models that already has been observed when validating the EATN. The proportion of nodes that a synthetic layer has in common with others remains in a small range or is almost the same as in case of STARGEN or BINBALL. Nevertheless, the ANGEL model is the best-performing method.

PLOS ONE
The performance of the BINBALL model on statistics concerning the layer structure (Figs 17-20) is comparable to the previous reference. Remarkably, the percentage of nodes in layer intersection is much higher than when replicating the EATN.
Individual layer replication using ANGEL. Layers in the STARGEN and BINBALL models grow simultaneously and cannot be distinguished as exact replicas of particular layers of the reference. Not as in the ANGEL model, where the multiplex is built layer by layer, and therefore it is possible to imitate individual layers of the replicated multilayer network. In Fig  21, we present a selection of statistics introduced in [45] for validation of ANGEL's singlelayer replication against the EATN, here computed with respect to the USATN.
From the left, the average path lengths, the transitivity, the s-metric, and the number of hubs (cf. Eq (2)) are displayed. In every plot, one boxplot collects the measured value calculated for 100 replicas of one USATN layer tagged on the x-axis. The black solid lines connect the values of the reference layers. In the second plot, the grey line relates to the density values, that result in the same for the synthetic and real layers. Despite a few larger discrepancies, the overall approximation of the metrics is noticeable.

Resilience analysis
Having reviewed the structural properties of the synthetic networks we turn to test their resilience. Our first goal is to compare the resilience of the synthetic networks to the reference ones. Moreover, we want to investigate a way of building an optimally structured airline network with minimal failure effects; for this purpose, we will resort to the flexibility of the ANGEL model to balance between the number of hub-spoke and the point-to-point layers.
Like any other type of network, air traffic networks are rated according to their resilience. The closure of a site (i.e., an airport), or even a loss of a link (i.e., a connection between two airports) can have a huge impact on an air transportation network. The enormous costs, delays in hours or even days, and the entire logistical background that has to be diverted should not be underestimated. Keeping track of the succession of the failure of sites or bonds that leads to the total breakdown of the network is also a related aspect to investigate.
In our study, we simulate and analyze various attack strategies for site as well as bond percolation tasks. In general, in addition to a random attack (i.e., based on a random ordering of either the nodes or the edges), we contrast it with a structured sequence of failures, such as starting with nodes with the highest degree, or edges that connect them. We now elaborate on each of the proposed percolation strategies.

Site percolation
We define a global strategy and a local strategy for site percolation. In the global approach, the nodes are detached according to their closeness centrality with respect to the multiplex, whereas the local strategy utilizes the degree centrality with respect to the layers as criterion. If a node belongs to several layers, the maximum degree value determines the order. In both cases, the global closeness and the local degree centrality, nodes are removed in either decreasing or increasing order from the multiplex and so its layers.

Bond percolation
In the bond percolation, we pursue two main strategies in addition to the random one. In the first approach, the edges are sorted according to the degree of their end nodes. Either the minimum or the maximum end degree is taken for the ordering, i.e., for any two edges e 1 = (u, v) and e 2 = (r, s) we consider ðu; vÞ � ðr; sÞ , fðdegðuÞ; degðvÞÞ < fðdegðrÞ; degðsÞÞ; where f here denotes a placeholder for function min or max.
In the case of multiple edges, the local degrees in the biggest layer with respect to the number of edges are considered. In the second strategy, the edges are selected for removal depending on their multiplicity. In every percolation scheme we pursue, edges are removed from the multiplex and layers one by one. If an edge is multiple, the connection in the multiplex is only broken after all multiplicities of the edge have been removed. Note that, since layers are simple graphs, the layer an edge belongs to is uniquely determined, therefore only one layer at a time is affected when a non-singular edge is detached. Figs 22 and 23 show the results of site and bond percolation processes on the synthetic networks compared to the references. Each figure displays five strategies that are arranged in rows. For the site percolation, we have the random attack in the first row followed by node arrangements according to the decreasing and increasing closeness centrality. The two last rows display the strategies based on the local degrees of the nodes. Similarly, in the bond percolation, the first row shows the random attack, and the next two rows apply to the ordering of the edges by degrees of their ends, i.e., the decreasing minimum end degree and the decreasing maximum end degree. In the last two rows, the multiplicity of the edges determines the sequence of their removal.

Validation against reference networks
To catch a glimpse from the global and local point of view on the percolation process, we consider the following statistics. In each row, the first plot shows the percolation process on the entire multiplex whereas the subsequent three plots compare the resilience of the layers. The average, minimum, and maximum size of the largest component over all layers are computed and iteratively updated after a node or edge has been removed. In every plot, the relation of the percentage size of the largest connected component (y-axis) to the percentage of removed items (x-axis) is captured. The curves of the synthetic models represent average values over 100 synthetic multiplexes.
Let us first consider the plots concerning the percolation on the multiplex (i.e., the first column of plots in each figure). Looking at the site percolation results on both the EATN and the USATN, there is evidence of negligible differences in the effects on both the models and the references when using a random attack or an attack based on an increasing-order criterion. When applying the strategies based on decreasing-order criteria, in the EATN scenario, the three synthetic models tend to diverge over mid-regimes, with STARGEN showing similar resilience as the reference network, whereas BINBALL and ANGEL networks are less comparable to the EATN as their resilience is higher than the reference. In the USATN scenario, the three models tend to diverge up to high regimes in some cases, with BINBALL again being less comparable to the resilience of USATN, and ANGEL and STARGEN behaving similarly to the reference in this respect. Overall, the synthetic models exhibit a stronger resilience, which can generally be seen as an advantage when simulating ATNs. The results on bond percolation also reveal a fairly homogeneous behavior for all synthetic networks and their respective references across all regimes of the removal fraction, with the exception of BINBALL showing better resilience with respect to 50% and 75% of removed nodes. Now, we consider the plots showing the percolation effects on layers, i.e., the average, minimum, and maximum size of the largest component over all layers shown in the second, third, and fourth columns of plots in each figure. Concerning the site percolation results, we observe from the 'mean' plots that all models tend to be comparably or less resilient than the EATN or the USATN when using a random or an increasing-order criterion strategy. However, the opposite tends to occur when using a decreasing-order criterion. Looking at the 'max' plots, the STARGEN resp. ANGEL resilience tends to be the closest to the EATN when using an increasing-resp. decreasing-order criterion; for the USATN, ANGEL is the closest to this reference regardless of the criterion. BINBALL tends to be more resilient than all other networks, synthetic and real, only for the 'min' case with decreasing-order criterion and only over lowmid regimes. Apart from that, it generally remains the less preferred model in terms of resilience.
As for the bond percolation results, BINBALL resilience is generally lower than all other models including both references, at least when measuring the 'mean' and 'max' size of the largest component over all layers. Also, ANGEL and STARGEN seem to have a similar impact, with the former better on 'mean' plots over all regimes, and slightly worse over some restricted mid or mid-high regimes on 'max' plots. For 'min' plots, ANGEL again outperforms the other models with respect to the EATN.

Validation against the network structure
When further evaluating the resilience of the multiplex networks, we take into account variations in the layer structure. Indeed, the airline networks we analyze consist of layers with hubspoke or point-to-point structures, and it might be intuitive that point-to-point formations must be more resilient against local attacks in comparison to those centralized around hubs. To validate this conjecture, we will refer to the ANGEL model, as it is the only synthetic model that is able to replicate layers with point-to-point structures alongside hub-spoke formations, the latter being defaults in STARGEN and BINBALL.
In Figs 24 and 25, we consider the same strategies and their representation as in the previous section. For each strategy (i.e., row in each plot), 100 samples of ANGEL are generated with the fixed input data as used for mimicking the reference network (i.e., # nodes, # edges, # layers, and distributions P edgeL , P nodeL , P layerN ) except for the percentage of layers building a point-to-point or a hub-spoke structure. The thresholds used for the percentage of the pointto-point layers are 10%, 50%, and 90%, and the leftover corresponds to the hub-spoke layers. In this setting, the 10% threshold is the reference as it is the proportion of the point-to-point layers in the EATN and the USATN and also the default in the ANGEL model.
At a first glance, we observe from every plot in the figures a trend whereby higher percentages of point-to-point layers correspond to better resilience of the network, although significant differences in the percolation impact cannot be always detected. More specifically, for

PLOS ONE
both the EATN and USATN scenarios, the above trend is more frequently observed for decreasing-order-based strategies in site percolation, while differences tend to be negligible over all strategies for bond percolation (with some deviations around mid regimes for the 'min' plots). Moreover, for site percolation, we observe an opposite yet less evident trend for increasing-order-based strategies. This can be explained since by removing nodes with lower closeness or local degree first, the hubs remain to the end and substantially determine the largest component, thus higher resilience is observed for lower percentages of point-to-point layers.
The above results confirm our expectations, although in many cases we can report null or small difference in the resilience behavior when using 50% (or even the default 10%) of pointto-point layers rather than the highest 90% of such layers. This suggests that networks generated by the ANGEL model can be equally resilient regardless of a particular setting of the point-to-point layer proportion, which is a unique feature of the ANGEL model against the competing ones.

Spectral analysis
In this section, we analyze spectral and eigenfunction properties of the BINBALL, STARGEN and ANGEL models, and contrast them with the properties of our evaluation real-world ATNs, i.e., the EATN and the USATN. To this purpose, we resort to Random Matrix Theory (RMT) modeling.

RMT models and measures
RMT has numerous applications in many different fields, from condensed matter physics to financial markets (e.g., [51]). In the case of complex networks, the use of RMT techniques might reveal universal properties. Among several studies available in the literature we can mention, as examples, that (1) the nearest-neighbor spacing distribution P(s) of the eigenvalues of the adjacency matrices of various network models follow Gaussian Orthogonal Ensemble (GOE) statistics [52]; (2) the P(s) and the entropic eigenfunction localization lengths of the adjacency matrices of Erdös-Rényi networks are universal for fixed average degrees [53]; (3) spectral and eigenfunction properties of multilayer networks [54], random rectangular graphs [55], and bipartite graphs [56] are universal for properly-defined scaling variables; and (4) RMT-based scaling analysis allows to predict the performance of network discovery algorithms [57].
In light of the above motivations, here we also use RMT modeling to analyze spectral and eigenfunction properties of heuristic and synthetic ATNs. Moreover, we consider an important modification to the standard adjacency matrix definition: We consider weights for vertices and edges in the layers of the ATNs studied here. Our main motivation to include weights, particularly random weights (i.e., statistically independent random variables drawn from a normal distribution with zero mean and variance one), is to retrieve well-known random matrices in the appropriate limits to use RMT results as a reference, see for instance [53][54][55][56]. With this prescription, the adjacency matrix of a completely disconnected network becomes a diagonal random matrix, known in RMT as the Poisson limit, whereas a member of the GOE is recovered for a fully connected network.
The adjacency matrix of the multiplex networks we consider here is a block matrix structured as follows. All blocks are n × n matrices, n being number of nodes of the multiplex. There are l × l blocks, l being the number of layers in the multiplex. Let A ij , i, j 2 {1, . . ., l}, correspond to the block matrix in row i and column j. A diagonal block, A ii , i 2 {1, . . ., l}, corresponds to the adjacency matrix of the i-th layer. A ij = A ji , i, j 2 {1, . . ., l}, i 6 ¼ j are the so called quasi-identity matrices: we set 1 on the diagonal entry a kk if node k belongs to both layers i and j. All other entries are zero. Finally, we replace all diagonal entries of the block adjacency matrix with independent Gaussian variables with zero mean and variance equal to 2. Clearly, since the adjacency matrices of the ATNs studied here are very sparse, we expect to observe a scenario close to the Poisson limit.
We use exact numerical diagonalization to obtain the eigenvalues λ n and eigenfunctions C n (n = 1. . .N) of the adjacency matrices of size N = n × l of the ATNs. Specifically, in our study we use the nearest-neighbor spacings s, the ratios of consecutive level spacings r, and the entropic eigenfunction localization lengths ℓ to characterize spectral and eigenfunction properties of the adjacency matrices of weighted ATNs. We define s, r, and ℓ as follows.
Let λ n be a set of ordered eigenvalues, then the corresponding nearest-neighbor spacings s n and the ratios r n are s n ¼ l nþ1 À l n D and r n ¼ minðs n ; s nÀ 1 Þ maxðs n ; s nÀ 1 Þ ; respectively, where Δ is the mean eigenvalue density and r 2 [0, 1]. The probability distribution functions of s and r in the Poisson limit, that we will use below as a reference, are as follows [58]: respectively. It is important to stress that P(s) is already a well accepted quantity to measure the degree of disorder in complex networks, however, the use of P(r) is relatively recent; see an example in [59]. The entropic eigenfunction localization length of the eigenfunction C n is given as [53]: where S n is the Shannon entropy of C n , defined as S n ¼ À P N m¼1 ðC n m Þ 2 lnðC n m Þ 2 . In (

Validation against the EATN
In Fig 26 we show spectral and eigenfunction properties of the EATN (first column) and of the corresponding synthetic models (BINBALL, STARGEN and ANGEL in second, third and fourth columns, respectively). Specifically we report: (first row) a single spectrum λ n ; (second row) the density of eigenvalues or density of states, DoS; (third row) P(s); (fourth row) P(r); and (fifth row) the probability distribution of the entropic eigenfunction localization lengths, P(ℓ). We note that for the EATN all distributions (DoS, P(s), P(r), and P(ℓ)) are computed from a single randomly-weighted network; while in the case of the synthetic models all distributions are constructed from 100 random networks. Also, it is important to mention that P(s), P(r), and P(ℓ) were computed from 50% of the states located at the center of the spectra, where the states are expected to be equivalent.

PLOS ONE
From this figure we can see that none of the spectral quantities shown (λ n , DoS, P(s) nor P (r)) can distinguish between different ATNs. Moreover, while the DoS departs slightly from a Gaussian function (expected in the Poisson limit) both the P(s) and P(r) match the predictions corresponding to the Poisson limit; see Eq (4), also shown in red dashed lines.
Additionally, in Table 7 we report the values of hri, that indeed are very close to hri P � 0.38629 (reported from numerical simulations of diagonal random matrices [58]). This means that the EATN, as well as the corresponding synthetic models, show the properties of almostdiagonal random matrices; that is, all their eigenfunctions should be strongly localized. In the case of the EATN this is confirmed by the P(ℓ) which has a huge peak at ℓ = 2.07 and decreases in and exponential-like way for increasing ℓ. However, the P(ℓ) for the synthetic models presents an unexpected characteristic, that is, well defined local maxima for given ℓ � 2.07. This local maxima in the P(ℓ) of the synthetic models reveals the existence of a well defined nonnegligible set of extended eigenfunctions. Now, in order to roughly characterize the extension of those eigenfunctions we compute the average value of ℓ, hℓi, but excluding the values of ℓ that contribute to the peak of P(ℓ) at ℓ � 2.07. The obtained values of hℓi (shown as vertical red-dashed lines in the lower panels of Fig 26) are reported in Table 7, also for the EATN. Finally, it is relevant to remark that, even with the presence of extended eigenfunctions, ANGEL provides the best approach to the EATN in the case of eigenfunction properties.

Validation against the USATN
In Fig 27, we show spectral and eigenfunction properties of the USATN. We used the same coding as in Fig 26. We also report the values of hri and hℓi obtained for the USATN and the corresponding synthetic models in Table 8.
The situation we observe for the USATN is quite similar to the one reported for the EATN, with two noteworthy aspects that are reported as follows: • Both P(s) and P(r) for BINBALL fall below the Poisson limit for small s and r, see the insets in the corresponding panels. While the P(s) and the P(r) for the USATN, STARGEN and ANGEL fall on top of P P (s) and P P (r), respectively. This means that STARGEN and ANGEL reproduce better than BINBALL the spectral properties of the USATN.
• The P(ℓ) of the USATN shows a maximum for ℓ � 2.07, observed in the EATN analysis for the synthetic models only. Moreover, when comparing hℓi, it is fair to say that ANGEL provides the best approach to the eigenfunction properties of USATN.

Running times
All three algorithms for synthetic creation of multiplexes were implemented in Python 3.6.0 and carried out on 2.2 GHz Intel Core i7, macOS 10.14.5, 8GB RAM. The running times of the routines broken down by the replicated reference are reported in Fig 28. To compare the sensitivity of the algorithms to the data volume, we scaled the original input data size (node, edge, and layer number) with the factors 0.25, 0.5, 0.75, and 1.0 respectively. The ticks on the x-axis

PLOS ONE
correspond to these factors and are labeled with the corresponding number of layers, nodes, and edges. As it can be seen from the plots corresponding to both reference networks, all three models exhibit running times that grow linearly with the increase in the data size. However, while STARGEN and BINBALL appear to be less sensitive to the data size due to their simpler methodology, ANGEL computing times show a relatively higher growth. Nonetheless, despite the higher complexity in the design compared to the other two models, the ANGEL algorithm is still efficient as it scales linearly with the data size.
It should be noted that the above results are consistent with the time computational complexity analysis of the three models. In fact, for both BINBALL and STARGEN, the main loop iterates over all m edges that are to be added to the multiplex, and in each pass, two vertices are randomly selected from one layer; therefore, both methods have a time cost of Oðm � n L max Þ, where n L max is the maximum number of vertices assigned to a layer. The running time of the ANGEL algorithm depends on the calculation of a minimum spanning tree on a complete graph spanned on all nodes in each layer of the hub-spoke structure. We observed that ðn L max Þ 2 is OðmÞ, i.e., ðn L max Þ 2 ¼ c � m, and c does not exceed the value 5 in the considered network data.
Therefore, the running time can be estimated by Oðl � m � logðn L max ÞÞ, where l is the number of layers. This explains the moderate growth of the running times of ANGEL in relation to the increasing data size, in particular the number of edges in the multiplex.

Discussion
In this section, we provide a summary of the main findings of the generative models for synthetic multilayer ATNs that we have considered in this work. Building upon our analysis in the previous sections, we have identified a number of features that are relevant to the three models, as reported in Table 9 that will be used as a guide to our discussion.
The first three features in the table refer to the main model-parameters besides the three basic dimensions of the multilayer network (i.e., number of nodes, edges and layers), namely the distribution P nodeL of the layer node-set size, the distribution P edgeL of the layer edge-set size, and the distribution P layerN of the per-node layer replication. In this regard, BINBALL can only produce network layers with similar sizes of both node and edge sets; this limitation is partially overcome by STARGEN as it can handle non-uniform edge counts of the layers. In ANGEL, this aspect is significantly enhanced as not only this model is able to control the repetition of nodes in the layers but also their overlapping. Overall, according to all the above parameters, ANGEL represents the most flexible model. The above key advantage of ANGEL is further strengthened by its unique capability of controlling the formation of point-to-point structures in the layers alongside the hub-spoke structures, where the latter are also generally more exposed than in STARGEN and especially BINBALL. Moreover, this ANGEL's feature is coupled with another distinguishing aspect, which enables a sort of incremental generation of the layer networks: indeed, unlike BINBALL and STARGEN, ANGEL is designed to generate layers separately from each other, which enables the model to mimic individual layers of the reference multilayer network.
In terms of network resilience, while STARGEN and ANGEL again outperform BINBALL -w.r.t. most of the attack strategies considered, under site and bond percolation scenarios-ANGEL can be equally resilient regardless of a particular setting of the point-to-point layer proportion, which is a unique feature of the ANGEL model against the competing ones. In addition, spectral analysis results based on RMT modeling w.r.t. both EATN and USATN modeling have shown that ANGEL is the best approach to characterize spectral and eigenfunction properties of the adjacency matrices of weighted ATNs.
In terms of efficiency, all methods scale linearly with the size of the network, with BINBALL and STARGEN having identical asymptotic time complexity, and ANGEL being sensitive to the number of layers of the network.
Despite some notable features shown by the generative models analyzed, particularly ANGEL, it should be noted that there are a number of limitations that encourage further research on synthetic modeling of ATNs.
For instance, STARGEN tends to be the closest to the reference in terms of edge-set size, but at the cost of a higher number of nodes per layer. ANGEL delivers much higher diversification in terms of node degrees and a better overall percentage of nodes shared among the layers than BINBALL and STARGEN, however the proportion of nodes that a synthetic layer has in common with others remains in a small range compared to the reference network layers. BIN-BALL replicas feature hub-spoke formations with very weak attraction by hubs, which is certainly not the case in ANGEL and STARGEN, however here, hub-spoke formations appear not as strong as in the reference networks. Moreover, although ANGEL focuses on structures of the layers in the multiplex (as suggested by its name, i.e., Air Network Generation Emphasizing Layers) and to replicate them it incorporates parameters to control node-set and edgeset sizes as well as the layer repetitions per node, this model still weakens when it comes to reproducing the interconnection between the layers. The problem lies in the difficulty of capturing this multidimensional relationship, and thus to measure and model it.

Conclusions
Our presented study has focused on the opportunity of leveraging generative models conceived for the creation of synthetic multilayer ATNs in order to analyze the properties of realworld large-scale ATNs. To this end, we have thoroughly analyzed the state of the art for such models, namely BINBALL, STARGEN, and ANGEL, against two benchmark ATNs, i.e., the European and the U.S. ATNs. Our assessment concerned the ability of the three methods to be compared with the structural, spectral and resilience properties of the reference networks on both the global level of the multiplexes and the local level of the layers. We have also provided a summary of the key findings from our study to highlight the similarities and differences between the three models, as well as their limitations, both when compared with one another and with the reference networks.
We hope that our work can pave the way for further development of generative models for ATNs. Among the several directions that could be drawn, we raise the opportunity of incorporating side-information-in the form of attribute objects at node and/or edge level-as well as time-aware variables (e.g., flight departure and arrival times): indeed, while enriching the representation of an ATN and enabling the evolution over time of an ensemble of airline layers, these also lead synthetic generative models to deal with new challenges. Another interesting line of research would involve representation learning approaches and the exploitation of their knowledge patterns extracted from real-world ATNs into the design of a synthetic generative model for ATNs.
Finally, from an application perspective, we envisage the usefulness of synthetic ATN generative methods for simulating human mobility scenarios, also in critical situations like epidemic spread phenomena, so to help design appropriate intervention strategies and evaluate enhanced transport policies.