Structural Discrimination of Networks by Using Distance, Degree and Eigenvalue-Based Measures

In chemistry and computational biology, structural graph descriptors have been proven essential for characterizing the structure of chemical and biological networks. It has also been demonstrated that they are useful to derive empirical models for structure-oriented drug design. However, from a more general (complex network-oriented) point of view, investigating mathematical properties of structural descriptors, such as their uniqueness and structural interpretation, is also important for an in-depth understanding of the underlying methods. In this paper, we emphasize the evaluation of the uniqueness of distance, degree and eigenvalue-based measures. Among these are measures that have been recently investigated extensively. We report numerical results using chemical and exhaustively generated graphs and also investigate correlations between the measures.


Introduction
Structural analysis of graphs has been an outstanding problem in graph theory for several decades [1][2][3][4]. A challenging problem in this theory is to investigate structural features of the graphs and their characterization. Another important task is to quantify the structural features of graphs, as well as their complexity [2,3,5,6]. The former relates to developing measures such as the clustering coefficient or the average distance of a graph [7]. The latter relates to deriving complexity indices for graphs, which are often called structural descriptors/measures or topological indices [8][9][10][11].
In this paper, we deal with evaluating the uniqueness, discrimination power or degeneracy of special graph measures for investigating graphs holistically (in contrast to local graph measures) [12]. A descriptor is called degenerate if it possesses the same value for more than one graph. In view of the large body of literature on structural graph measures [2,3,5,13], the degeneracy problem has been somewhat overlooked in graph theory. In fact, the uniqueness of structural descriptors has been investigated in mathematical chemistry and related disciplines for discriminating the structure of isomeric structures and other chemical networks [14][15][16]. A detailed survey on the uniqueness of topological indices by using isomers and hexagonal graphs has been given by Konstantinova [16]. For more related work, see also [17].
To date, no complete graph invariant, i.e., a measure that is fully unique on general graphs, has been found. Indeed, some measures turned out to be complete by using special sets of graphs [15,17,18]. In a more general context, i.e., by using graphs without structural constraints, any topological graph measure has a certain kind of degeneracy, which also depends on the mathematical method to define the measure, see [19,20]. A highly discriminating graph measure is desirable for analyzing graphs; hence, measuring the degree of its degeneracy is important for understanding its properties, limits and quality.
The main contribution of this paper is to investigate to what extent known degree, distance and eigenvalue-based measures are degenerate. Among the measures we examine (see Table 1) are the recently developed geometric-arithmetic indices [21,22], the atombond connectivity index [23] and the Estrada index [24], which is based on the eigenvalues of a special graph-theoretical matrix [25], here the adjacency and Laplacian matrix. It turns out that some of the measures based on distances and eigenvalues are highly unique in exhaustively generated graphs (e.g., see Table 2). Using these graphs is a greater challenge than only using isomeric structures, as exhaustively generated graphs do not possess any structural constraints. However, it is clear that other distance or eigenvalue-based measures exist that possess only low discrimination power [26], implying that the uniqueness of a measure crucially depends on its mathematical composition and the graph class under consideration.

Uniqueness of Topological Descriptors
In this section, we present numerical results when evaluating the uniqueness of certain topological descriptors. Note that a summary of the topological indices used in this paper can be found in Table 1. As mentioned, the discrimination power of these measures has not yet been evaluated extensively on a large scale. Therefore, the results might be useful for gaining deeper insights into these measures and for enabling implications when designing novel topological descriptors. As usual, we use the measure which was called the sensitivity by Konstantinova [15], for evaluating the uniqueness of an index I. Clearly, S I depends on a graph class G; ndv are the values that cannot be distinguished by I, and DGD is the size of the graph set. Now, we start interpreting the results by considering Table 2 and observe that we have arranged the used descriptors into four groups. We also emphasize that the values in Table 2 have been calculated by using the graph classes N i , i~8,9,10. These are the classes of exhaustively generated nonisomorphic, unweighted and connected graphs with i vertices each. The cardinalities DN i D are also depicted in Table 2.
For the degree-based indices, it is not surprising that these measures have only little discrimination power, as many graphs can be realized by identical degree sequences. This effect is even stronger if the cardinality of the underlying graph set increases, see Table 2. The highest discrimination power among the indices of this class has the ABC index. This is in accordance with the well-known fact that the degeneracy of topological descriptors decreases in the following order: firstgeneration(e:g:,NK) §secondgeneration(e:g:,ABC) §thirdgeneration, see [27]. Recall that first-generation indices are integer measures derived from integer local vertex invariants such as vertex degrees or distances sums [28]. Second-generation indices are real numbers derived from integer local vertex invariants [28]. Third-generation indices are real numbers derived from real local vertex invariants [28].
Most of the information-theoretic measures (e.g., I C , OdC) we have evaluated in this study are based on grouping elements (e.g., vertices, degrees, etc.) in equivalence classes [6,8] to determine probability values. We observe that the uniqueness of these measures is also low. In contrast, the degree-degree association index I l f D exp [29] is highly discriminating for all three graph classes [30]. Surely, a reason for this is the fact that this measure is nonpartition-based, as probability values have been assigned to each vertex in the graph by using the special information functional f D , see [29]. Note that N 10 contains almost 12 million graphs. Calculating the discrimination power of the distance-based measures, such as the second or third geometric-arithmetic indices [22,31], leads to a somewhat surprising result: the uniqueness for N 8 ,N 9 and N 10 is very high, but recall that they belong to the class of so-called second-generation indices [27]. Again, we see that the composition of the graph invariant (here, distances) to define the measure is crucial.
If we compare the sensitivity values (using Equation 1) of some second-generation indices, e.g., the geometric-arithmetic indices with some of the third-generation indices (information-theoretic and eigenvalue-based measures), we observe that the uniqueness of  e.g., GA 2 , GA 3 is unexpectedly high. In particular, the high uniqueness of GA 3 for graphs [ N i , i~8,9,10, is probably caused by the fact that its calculation is based on distances between edges. As the number of edges lies in the interval ½n{1,n(n{1)=2, the range of the third geometric-arithmetic index is 0 to n(n{1)=2 [32], and the probability that two graphs have different index values is certainly larger than in the case when the number of edges would be fixed. This hypothesis can be supported by comparing the values of the sensitivity index (using Equation 1) of the GA 3 index shown in Tables 2 and 4. Thus, the sensitivity index resulting from GA 3 shown in Table 2 is greater than 0.94 ( §94%), while, if the number of edges is fixed, see Table 4, the corresponding sensitivity index is less than 0.02 (ƒ2%). Using this idea again, it can be understood why the sensitivity index of GA 3 (see Table 2) does not decrease with the number of vertices. Let us turn to the uniqueness of some eigenvalue-based measures such as the graph energy E, the Estrada index EEo and the Laplacian Estrada index LEE. As expected, it is high because these measures belong to the class of third-generation indices (e.g., information-theoretic measures). We point out that the sensitivity index of the graph energy E and Laplacian energy LE could be affected by rounding errors. The reason for this is based on the fact that the difference between the values of E and LE for some graphs is less than 10 {8 [33]. However, since the number of such graphs is very small, see [33], this does not strongly affect the computation of the uniqueness of E and LE measured by S and ndv. In particular, the Estrada and Laplacian Estrada indices possess high uniqueness for all three graph classes N i . To give some arguments for this, recall their definitions, namely where l i and m i are the eigenvalues of the adjacency and Laplacian matrices, respectively. Knowing that e is irrational and transcendental, it can be presumed that any power and the sum thereof is also irrational and transcendental. Hence, the graphs with the same Estrada (Laplacian Estrada) index are isospectral. In addition, the uniqueness of these measures is quite stable, and the same holds for I l f D exp . This means that there is only very little dependency between their uniqueness and the cardinality of the underlying graph set. Clearly, this result demonstrates that certain   measures/functions based on the eigenvalues of graphs possess a high discrimination power. This contradicts the widely assumed hypothesis that graph spectra are not feasible to discriminate graphs properly because of the existence of isospectral graphs, see [34,35]. Another positive example can be found in [36] where Dehmer et al. presented spectrum-based measures based on a probability distribution of structural values with low degeneracy. In Table 3 and Table 4, we have also evaluated the discrimination power of the measures using isomers and chemical trees. In particular, we use the isomeric classes C iso 11 and C iso 12 containing all isomers with 11 and 12 vertices, see Table 3. The numerical results are quite similar to Table 2. However, when evaluating the indices by using the classes of chemical trees C 20 , C 21 and C 22 , we see that the discrimination power of I l f D exp deteriorates significantly. To better understand this, note that the information functional f D exp (v i ) relies on determining the shortest paths for all v i [ V and, then, degree-degree associations thereof resulting in f D exp (v i ), see [29]. Finally, when applying this measure to trees, the reason for the deterioration of its uniqueness could be understood by the occurrence of a large number of paths possessing similar length and, hence, resulting in very similar probability values and entropies. Interestingly, the eigenvaluebased measures LE and LEE possess high uniqueness, and whose values are almost independent of the cardinality of the graph sets. Thus, these measures turned out to be quite feasible to discriminate chemical trees uniquely.

Value Distributions
In order to tackle the question of what kind of degeneracy the measures possess, we plot their characteristic value distributions.        The y-axis is the absolute frequency of the graphs, with a certain index value depicted on the x-axis. For a graph class, we use the class of exhaustively generated non-isomorphic, connected and unweighted graphs denoted by N 9 . We start with Figures 1 and 2 and observe the vertical strips, indicating that a large number of graphs have quite similar index values discretely distributed on a certain interval. In addition, the hull of these value distributions looks like a Gaussian curve. This means that by using GA 1 and ABC, there exist many degenerate graphs possessing quite similar index values where the hull of the distributions forms a Gaussian curve.
As we can see from Figures 3,4,5,6, the value distribution (and in fact the distribution of degenerate graphs) when considering the information-theoretic measures is significantly different. We start with I C , and see that the value distribution is quite scattered, i.e., there are no regions in which the graphs are closely clustered. In contrast, the values of OdC are rather clustered. Similarly, this also holds for MA R and observe that all three measures (I C , OdC and MA R ) are highly degenerate on N 9 . But, the degree-degree association index I l f D exp possesses a high discrimination power (see Figure 6). In particular, we see that there exist only a very few degenerate graphs whose index values exploit the entire domain.
The results of plotting the value distributions for the eigenvaluebased measures graph energy E and Estrada index EE are depicted in Figures 7 and 8. We see that they possess a high   Figure 7, the horizontal strip for y~1 indicates the low degeneracy of this measure. This is similar for the EE shown in Figure 8.

Correlations Between Indices
In order to investigate the correlation ability of the topological indices, we calculate the linear correlation between them and depict the results as correlation networks. More precisely, the linear correlation between the descriptor values of two data vectors has been computed according to the method of Pearson [37]. In the depicted plots of the correlation networks, the calculated Pearson Product-Moments have then been used as edge weights for labeling the edges connecting the vertices representing the compared descriptor pairs. The correlation networks are shown in Figures 9,10,11,12,13,14. We use the graph classes C 21 and N 9 , and choose different thresholds for the correlation coefficient, resulting in different networks.
Definition 1. Let fI 1 , . . . ,I k g be a set of topological indices defined on a graph class G and let hƒ1. The vertex and edge set of the correlation network G §h :~(V ,E) inferred from G is defined by V :~fI 1 , . . . ,I k g and E :~fI i ,I j g : where r [ ½{1,1 is the correlation coefficient. Definition 2. Let fI 1 , . . . ,I k g be a set of topological indices defined on a graph class G and let hƒ1. The vertex and edge set of the correlation network G ƒh :~(V ,E) inferred from G is defined by V :~fI 1 , . . . ,I k g and E :~fI i ,I j g : where r [ ½{1,1 is the correlation coefficient.
We start interpreting the results by considering the left-hand side of Figure 9. The vertices of the graph G 1 §0:9 represent indices that are highly correlated (here, DrD §0:9) by using the graph class C 21 . In all correlation graphs, hub vertices, i.e., those with a high degree, are colored in gray. In particular, the grayer the color of a vertex is, the higher its degree.
In G 1 §0:9 , the first geometric-arithmetic index (GA 1 ) and other measures are highly correlated with other indices that belong to different groups, e.g., degree-based and eigenvalue-based, etc. In addition, graph energy (E) and Estrada index (EE) are highly correlated with other measures such as the Modified Zagreb index   (degree-based). By using the graph class N 9 , we obtain the same type of correlation network denoted by G 2 §0:9 . Observe that the connectedness of this network is similarly high in G 1 §0:9 , however, there exist new hubs. For instance, the Balaban J and the augmented Zagreb index (AZI) index represent such vertices, i.e., they are highly correlated with other indices from different paradigms such as degree-based and eigenvalue-based measures. Interestingly, the uniqueness (measured by ndv and S) of, e.g., AZI and l max by using N 9 is higher than by taking C 21 into account. Nevertheless, these indices (and others) possess larger neighborhoods compared to C 21 . This means that they contain more highly correlated vertices adjacent to AZI and l max than by using C 21 . One would have expected this in a reverse order as the isomers (C 21 ) are structurally more similar among each other than the graphs contained in N 9 . It is likely that the reasons for this are different structural characteristics captured by the underlying graphs of N 9 and C 21 .
For studying indices that are only slightly correlated, firstly consider G 1 ƒ0:2 in Figure 10. We see that the degree-degree association index (I l f D exp ) is a hub vertex, i.e., there is only a small correlation. That means I l f D exp (by using C 21 ) captures structural information significantly different compared to almost all other measures (representing vertices) in this network. If we consider N 9 as a graph set, we observe that G 2 ƒ0:2 has more hubs than G 1 ƒ0:2 . For instance, I D and I a represent hubs and therefore possess only a small correlation with other measures from different paradigms. This also implies that the structural characteristics of the graphs [N 9 are different to those [C 21 . Also, the hubs in G 2 ƒ0:2 could serve as potential candidates to be tested for solving QSAR/QSPR problems [38] as they capture structural characteristics differently (compared to classical indices) and some (e.g., efficiency complexity and offdiagonal complexity) have not yet been used in mathematical chemistry and drug design. In addition, it would be interesting to examine their ability for classifying graphs optimally by using supervised learning techniques, e.g., see [39].
To finalize this section, we consider Figures 11,12,13,14. We have also plotted the evolution of the correlation networks for h~0:01,0:05, and have obtained the networks G ƒ0:01 and G ƒ0:05 for both N 9 and C 21 , respectively. From Figure 11, we see that by using N 9 , the measures C e and E are highly uncorrelated (h~0:01). In addition, the degree-degree association index I l f D exp and GA 1 are highly uncorrelated by using C 21 (h~0:01). If we now choose h~0:05 for N 9 and C 21 , the resulting networks (see Figures 13 and 14) also show highly uncorrelated indices. Starting with N 9 (see Figure 13), far more indices are highly uncorrelated (h~0:05) compared with Figure 11. These indices belong to different paradigms (degree-based, information-theoretic, etc.). But when considering the graph class C 21 (see Figure 14), only the degree-degree association index I l f D exp is highly uncorrelated (h~0:05) with many other indices. It is clear that the differences between these correlation networks are clearly induced by the structural differences (factors such as cyclicity and connectedness, which contribute to the complexity of the graphs) of the graph classes. Note that we obtained a similar result by comparing N 9 and N 10 (instead of N 9 and C 21 ). Figure 14 expresses that by using trees, I l f D exp captures structural information significantly different than many other non-information-theoretic indices such as E, EE, etc. We hypothesize that this result also holds for other tree classes as well. As mentioned above, the index I l f D exp could be used to characterize graphs for problems in structural chemistry or QSAR, with the aim that it solves a particular problem (e.g., QSAR/QSPR) better than existing indices which have already been used.

Summary and Conclusion
In this paper, we have explored to what extent degree and eigenvalue-based measures are degenerate. To tackle this problem, we used exhaustively generated undirected, connected and nonisomorphic graphs and chemical graphs. Interestingly, we found that some recently developed distance-based measures, e.g., GA 2,3 , have a much better uniqueness than measures that are known to be highly unique for chemical graphs, e.g., the Balaban J index. Note that the results for the Balaban J index by using the classes N i , i~8,9,10, have been reported in an earlier paper [30]. Equally, some of the eigenvalue-based measures such as E,LE and LEE possess high discrimination power for all graph classes that we examined in this paper. This shows that such measures for discriminating graphs structurally can be feasible, despite the existence of isospectral graphs. A strong point of all measures (except the topological information content for large graphs, as it relies on determining their automorphism groups) used in this study is their polynomial time complexity. Hence, they could also be applied to large complex networks. First studies of examining the uniqueness of structural measures by using gene networks inferred from high-throughput data are under development. We will also examine the relationship between the uniqueness of a measure and the ability to classify graphs meaningfully.

Author Contributions
Analyzed the data: MG. Wrote the paper: MD BF MG.