Measuring the complexity of directed graphs: A polynomial-based approach

In this paper, we define novel graph measures for directed networks. The measures are based on graph polynomials utilizing the out- and in-degrees of directed graphs. Based on these polynomial, we define another polynomial and use their positive zeros as graph measures. The measures have meaningful properties that we investigate based on analytical and numerical results. As the computational complexity to compute the measures is polynomial, our approach is efficient and can be applied to large networks. We emphasize that our approach clearly complements the literature in this field as, to the best of our knowledge, existing complexity measures for directed graphs have never been applied on a large scale.


Introduction
Graph complexity measures have been studied extensively [1][2][3][4]. Although a large number of complexity measures have been defined, few deal specifically with directed graphs. However, many real-world networks such as transportation networks [5] and biological networks [6] are directed graphs whose edges express critical interactions, flows and so forth. Examples of complexity measures for undirected graphs include treewidth [2], cycle rank [2] and numerous socalled topological indices, see [4,7]. Some of the classical graph complexity indices like the distance-based Wiener index [8] or the graph entropy measure based on vertex orbits due to Mowshowitz [9] can be computed for directed graphs as well. For example, Knor et al. [10] studied the Wiener Index on directed graphs. Other classical and distance-based measures like the Szeged index [11] could also be applied to directed graphs. But to the best of our knowledge, there is no body of literature that focuses on comparing structural graph measures for undirected and directed graphs. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Measures for analyzing directed graphs [12] include DAG-width [3], directed treewidth [13] and girth [1]. Treewidth and directed treewidth are both based on a game-theory applied to special graph decompositions. It might be difficult to apply these measures to large real-world networks. Also, the girth of a directed graph has been defined as the minimum length of a directed cycle [1]. If the graph is acyclic, the girth is infinite [1]. Another technique is due to Bertz et al. [14]; they investigate the complexity of digraphs by identifying all possible subgraphs with a certain number of vertices representing patterns such as trees, paths, rings etc. Degree sequences are then used to quantify the complexity or diversity of the digraphs [14]. Hunter and Kreutzer [15] investigate the meaning of several methods for determining the complexity of directed graphs and point out differences between undirected and directed graphs. Berwanger and Grädel [16] use special tree-decompositions and define the graph measure entanglement and its relationship to treewidth. Estrada and Hatano [17] define the measures reciprocity and returnability based on eigenvalues of special graph-theoretical matrices. Heterogeneity measures, interpreted as irregularity based on differences between in-degrees and out-degrees, have been developed by Ye et al. [18]. We emphasize that in this paper, we put the emphasis on examining complexity measures for analyzing complex networks. Another important branch of Quantitative Graph Theory [19] relates to measure the similarity between networks. See [20] for an up-to-date review to survey this area.
In this paper, we propose an approach that departs from the contributions sketched above. Based on the occurrences of out-and in-degrees of directed graphs, we define certain graph polynomials. We show that every directed graph can be characterized by an out-and in-degree polynomial. In order to obtain positive zeros, we define modified graph polynomials and show they must possess a unique, positive zero in the interval (0, 1), depending on certain parameters. So, we analyze properties of these polynomials and prove interrelations between their zeros. Based on these zeros, we define graph complexity measures and investigate issues such as the correlation between the measures and the homogeneity of the zeros which are associated with a graph.

New complexity measures for directed graphs
In this section, we introduce some preliminaries. The directed graphs [21] considered here are without loops and multiple edges.
Now, we define two special graph polynomials with real coefficients. Note that the coefficients capture structural information of the given graph.
Obviously P G,out (x) and P G,in (x) have no positive zeros since their sequences of coefficients have no sign changes. This follows from Descartes' Rule of Signs, see [22]. In the following, we establish the conditions under which two associated polynomials have a unique, positive zero 2 (0, 1). Theorem 2.1 Let G = (V, E) be a directed graph. Now we define the polynomials P � G;out ðxÞ ≔ a out À P G;out ðxÞ; ð22Þ P � G;in ðxÞ ≔ a in À P G;in ðxÞ: ð23Þ There exist parameters a out 2 R and a in 2 R, such that the polynomials P � G,out (x) and P � G,in (x) have a unique, positive zero δ out (G) and δ in (G), respectively. In fact, the values of δ out (G) and δ in (G) depend on α out and α in , respectively.
Proof: In order to simplify the notation, we write the polynomial as Also, we need to define Observe that lim x!1 P ?
Since the sequence of coefficients of P ? G ðxÞ has only one sign change, Descartes' Rule of Signs tells us that P ? G ðxÞ has a unique zero δ. Now δ 2 (0, 1) if and, finally Thus the inequalities (26) and (28) provide a range for δ 2 (0, 1). It is evident that δ depends on the choice of α needed to satisfy the Inequalities (26) and (28). Finally, the theorem holds for the two different polynomials represented by the Eqs (22) and (23) along with the corresponding parameters α out and α in .
In the following, we elaborate briefly on the problem of choosing α out and α in . Again, consider the Inequalities (26) and (28), whose parameters can be real numbers or positive integers. If we choose real numbers, we get an infinite number of polynomials P � G,out (x) and P � G,in (x) whose roots lie in the interval (0, 1). If α out and α in are taken to be positive integers, the set of possible polynomials is finite. To determine the effect of the parameters on the roots, we appeal to the continuity theorem for complex and real polynomials, see [22]. This theorem states that the zeros of a polynomial are continuous functions of the coefficients of the polynomial, which mean a small change in the coefficients will cause only a small change in value of the zeros [22]. It seems to be unclear who was the first who proved the continuity theorem. Yet, it appears that a proof was already given by Weber in 1895, see [23]. Several other proofs of this statement have also been given independently, see [22].
Suppose, G is a directed graph and we wish to apply Theorem (2.1). To do this we have to determine the sets fa ½1� out ; . . . ; a ½p� out g and fa ½1� in ; . . . ; a ½q� in g, if we choose positive integers. The following ordering can be obtained by permuting the indices The sets of roots fd 1 out ðGÞ; . . . ; d p out ðGÞg and fd 1 in ðGÞ; . . . ; d q in ðGÞg in the interval (0, 1) can also be obtained. Applying the continuity theorem may lead to a simplification. For instance, we always choose the minimum value namely a ½1� out 2 Z and a ½1� in 2 Z satisfying the Inequalities (26) and (28). Consequently, we could reduce the problem to the zeros fd 1 out ðGÞg and fd 1 in ðGÞg. In Section (3.5), we investigate numerically the effect of parameters α out and α in on the zeros of P � G,out (x) and P � G,in (x) numerically.

Graph complexity measures
In this section, we define some complexity measures on directed graphs based on the findings of Section (2.1). Recall the two polynomials, defined earlier, based on out-and in-degrees (see the Definitions (2.4), (2.5), and the modified polynomials with unique positive zeros in the interval (0, 1). Here, we argue that these zeros can serve as measures of the structural complexity of a directed graph. These measures are similar to those defined as a function of the eigenvalues of certain graph polynomials, see, e.g., [24,25]. The eigenvalue based measures, represented by Eqs (33)- (36), have been defined with an eye to reducing their degeneracy, see also [24][25][26]. Degeneracy implies that a measure is unable to distinguish between non-isomorphic graphs, [4,26], and is thus an undesirable property. Taking account of this we define the following measures. Definition 2.6 δ out (G) and δ in (G) are unique, positive zeros of the respective out-and in-degree polynomials. I 7 is the well-known edge density [27].

Examples
In the previous section, we briefly discussed how to find the parameters α out and α in by using Theorem (2.1). Now, we calculate the polynomials P � G,out (x) and P � G,in (x) as well as their roots in some special cases. Consider the graphs shown in Fig (1). For G 1 we determine From definitions (2.4), (2.5), we obtain To determine the range of α out , we use the Inequalities (26) and (28) and infer According to Theorem (2.1), P � G,out (x) and P � G,in (x) have a unique positive zero in the interval (0, 1) if 3 < α out < 9. If we choose positive integers, we obtain the set {4,5,6,7,8} as valid candidates. In Section (2.1) we explained that due to the continuity of the zeros, it makes sense to choose the minimum value of this set in order to calculate the zero. Thus, gives δ out (G 1 ) ¼ : 0.683953. Following the same procedure, we get a in > 2 and a in < 9: This leads to Solving Eq (40) finally gives δ in ¼ : 0.608309. Similarly, for G 2 in Fig (1), We see that P G 2 ;out ðxÞ ¼ P G 2 ;in ðxÞ. Also, we infer So, α out = α in 2 {2, 3, 4, 5, 6, 7, 8}. Finally, Hence, δ out = δ in ¼ : 0.125. These findings are summarized in Table (1). 3 Numerical results

Software and computation
For the work of this paper, we used R [28] to generate the numerical results. To generate the classes of directed graphs, we used igraph and its functions [29], see Section (3.2). Also, we performed tests using igraph to ensure the graphs are pairwise non-isomorphic as well as connected. Moreover, the packages graph and QuACN were used to determine the in-degree and out-degree distribution of the generated graphs [30,31].

Definition of graph classes
In this section, we define the graph classes we used for performing the numerical analysis. Note that we always performed 1000 repetitions when generating the graphs as we deal with random graphs.   Note that we have relied on the Erdős-Rényi model [32] to generate these digraphs where min(|E|) = 8 and max(|E|) = 36. The number of edges and their direction were randomly selected.
Definition 3.2 G 1 2 contains 500 directed, connected, hierarchical graphs G = (V, E) which are randomly generated. The vertex and the edge sets are given by [33] These graphs have four levels. Level 0 is the root level. |V i |, 0 � i � 3 is the number of vertices on level i. E 1 represents the set of edges which jump exactly one level. E 2 is the set of jump edges which over-jump at least one level. Here, all edges move upwards, which means from level 3 to the root level. Thus, 5 � |V| � 30.
Note that these graphs are usually called directed universal graphs and have been introduced in [33]. An example of such a graph is depicted in Fig (2). The reason we choose hierarchical (random) graphs for our analysis is that they appear in many real world applications, see [33]. Hierarchical graphs appear in many disciplines such as biology, management and manufacturing, see [34,35]. Noteworthy are BOM-structures (Bill of Material) [34,35]. These graphs have been widely used to analyze production systems and for representing optimizational tasks.

Correlation analysis
In this section, we discuss correlations between the graph measures applied to the classes of graphs defined above. The discussion is limited to the results shown in Figs (3) and (4). Other correlations have been found but are not presented explicitly here. We begin with results shown in Fig (3) for graph class G 1 . Observe that there are many degenerate cases, i.e., non-isomorphic graphs having the same measure values. Interestingly, the two zeros of the polynomials P � G,out (x) and P � G,in (x) represented by the measures I 1 and I 2 are rather weakly correlated. See Fig (3a). Thus, these indices capture structural information differently on random graphs with 9 vertices. A plausible explanation for the values of the Spearman correlation being higher for the random graphs of G 1 (shown by Fig (3d), (3g) and (3h) is that I 3 , I 4 , I 5 also depend on I 1 . For some combinations of I 1 vs. I j , 2 � j � 5, the correlation is also weak and, hence, the associated measures give quite different values. For instance, this is the case for I 1 and I 2 on G 1 2 . A possible reason for the weak correlation and for the wide spread shown on the scatter plots is that the underlying hierarchical graphs have a more distinct structure compared to the completely random graphs 2 G 1 . This also implies that the degeneracy is much lower compared with 2 G 1 , which also holds for the scatter plots shown in Fig (3) for G 2 2 . All the values of the Spearman correlation are given in Table (2). Similar results are obtained for all other combinations of I i vs. I j , 2 � j � 5, 2 � i � 6, which is the reason they are not shown explicitly here. Finally, consider Fig (4). These scatter plots as well as the values in Table (2), show that the edge density I 7 has no structural relationship with all other measures. The left column of Fig (4), also shows that I 7 is highly degenerated. This is not surprising given its definition. As indicated earlier, the measures applied to graph classes containing hierarchical graphs have fewer degeneracies.

Extremal graphs/relations
Now we take a closer look at graphs that attain maximum or minimum values under the graph measures of Definition (2.6). Fig (5) shows two graphs for which max(I 1 ) and min(I 1 ) obtains. Take graph G 3 as an example. To calculate P � G 3 ;out ðxÞ, we apply Theorem (2.1) and solve the According to Theorem (2.1), we know that Eq (47) has a unique, positive zero in the interval (0, 1) which obviously depends on the parameter α out . Solving Eq (47) gives x ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi a out À 8 8 p .
So, to maximize graph measure I 1 we have to determine max a out f ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi a out À 8 8 p g: Note that Theorem (2.1) also gives 8 < α out < 9. In Table (3), we set α out = 8.5 in order to compute the values of the graph complexity measures. Clearly, this is the maximum value of I 1 for the given parameter. The case min(I 1 ) for G 4 can be shown analogously. Now, we are in position to begin our analysis of extremal conditions and relations between digraphs.
Theorem 3.1 Let G = (V G , E G ) and H = (V H , E H ), be two digraphs. Define and assume that there exist graphs G and H with the given polynomials. Also, assume that the conditions of Theorem (2.1), namely, are satisfied, and there exists α satisfying the Inequalities (50) and (51). The equation Table 3. Polynomials, parameters and graph measures for calculating I 1 and I 2 for G 3 , G 4 2 G 1 .
Assuming P H (δ α (G)) < 0, we conclude that Inequality (52) must be satisfied. But As a > a G 0 , Inequality (55) is always satisfied. Finally, we state the following theorem.
assuming that there are graphs G and H with the given polynomials. Now, Both δ α (G) and δ α (H) lie in (0, 1). Proof: From the Inequality-System (60), we derive But P G (x) > P H (x) implies Inequality (61). Consider the two graphs G 4 (see Fig (5)) and G (see Fig (2)). We use these examples to demonstrate Theorem (3.1). In this demonstration, we consider only the out-degree polynomial and obtain Note that G 4 and G 2 (see Fig (1) and Table (1)) have the same polynomial P G 4 ;out ðxÞ and P G 2 ;out ðxÞ but the two graphs are non-isomorphic. We call such polynomials degenerate. The Inequalities (26) and (28) give for G 4 , 1 < α < 9 and for G, 6 < α < 23. So, if we choose, e.g., α = 7, these two inequalities are satisfied. So, we need to check whether the Inequality (53) The two graphs shown by Fig (6A) provide another illustration of Theorem (3.2). Again, we deal only with the out-degree polynomials. From Fig (6), we determine The two graphs have a different number of vertices and edges but their underlying outdegree polynomials have the same degree. Moreover, the Inequality-System (60) is satisfied. For Inequalities 4 < α < 8 for G 5 and 6 < α < 13, we can choose α = 7. Calculating the zeros of the polynomials represented by the Eqs (68)

Homogeneity of The Zeros-Influence of α out and α in
In this section, we briefly investigate the influence of α out and α in . To measure the divergences between the resulting zeros, we define a homogeneity measure. For this discussion we restrict attention to out-degrees; analogous results hold for indegrees. Let G be a directed graph with associated polynomials P G,out (x) and P � G,out (x). Suppose the Inequalities (26) and (28) The homogeneity of S G out is defined by The value for in-degrees is defined similarly. A high h-score indicates that the set S G is inhomogeneous while a small value of h gives a high homogeneity rank of S G . The definition is illustrated in Fig (7).
The homogeneity values are plotted against the number of polynomials in Fig (7). First, observe that the distributions of the homogeneity values for out-degrees and in-degrees look very similar. Also, we observe that the differences between the zeros is quite small, which implies that homogeneity is high. This can be seen from the value-range in Fig (7). This result is not surprising as we explained in Section (2.1), i.e., the zeros of a polynomial are continuous functions of the coefficients of the polynomial. In fact, if we vary α out or α in , we see that the coefficients of the resulting polynomials are quite similar, when only the constant terms α out and α in are changed, see the Eqs (22) and (23). Therefore, the small differences between the roots (and the high homogeneity values) reflect the continuity theorem for complex and real polynomials, see [22].

Computational complexity
In this section, we briefly sketch some ideas to determine the computational complexity to compute δ. Note that calculating the vertex degrees requires polynomial time, i.e., O(n 2 ) in case n is the order of an input graph. Assigning the out-and in-degrees to the monomials x i can be achieved in constant time and adding up those terms requires linear time complexity, i.e., O(k); k is the degree of the polynomial. Altogether, we see that we are able to construct an efficient algorithm to compute δ.

Summary and conclusion
In this paper, we have introduced new complexity measures for real networks. One reason for developing an alternative to degree-based measures such as the Zagreb indices [4] or entropies based on vertex degrees [36] is their high degeneracy, meaning that many pairwise non-isomorphic graphs have the same measured value. Similar to [37], we developed a polynomialbased approach for measuring the complexity of directed graphs. To the best of our knowledge, there are very few measures for directed graphs, e.g., treewith and girth, and these are true complexity measures for encoding structural information of a directed graph. Following [37], we developed graph polynomials based on the out-and in-degrees of directed graphs and constructed modified polynomials which posses a unique, positive zero in the interval (0, 1), depending on up to two parameters α out and α out . These zeros δ out and δ in can be interpreted as complexity measures. Interestingly, we start with a graph invariant and construct polynomials which are associated with the graph. However, the graph measures defined here are algebraic quantities representing the zeros of polynomials.
Analytical results showing relationships between the graph measures have been demonstrated; we have obtained numerical results that show correlations between the graph measures, and have investigated the homogeneity of the zeros (graph measures). We compared our graph measures with the well-known edge density and found that our measures capture structural information differently. Also some of our measures seem to have useful properties, e.g., they possess a high discrimination power for graphs with a distinct graph topology. Our approach to analyzing the complexity of directed graphs is promising in that low computational complexity (i.e., vertex degrees of a directed graph can be determined in polynomial time) allows for applying the polynomial based measures to large networks.
As part of our ongoing research, we plan to continue investigating extremal properties of the measures. Also, we should like to perform a correlation analysis with other measures on a large scale, if we can find ones that can be computed in polynomial time. Existing measures based on game theory are computationally complex, see, e.g., [13].