Interrelations of Graph Distance Measures Based on Topological Indices

In this paper, we derive interrelations of graph distance measures by means of inequalities. For this investigation we are using graph distance measures based on topological indices that have not been studied in this context. Specifically, we are using the well-known Wiener index, Randić index, eigenvalue-based quantities and graph entropies. In addition to this analysis, we present results from numerical studies exploring various properties of the measures and aspects of their quality. Our results could find application in chemoinformatics and computational biology where the structural investigation of chemical components and gene networks is currently of great interest.


Introduction
Methods to determine the structural similarity or distance between graphs have been applied in many areas of sciences. For example, in mathematics [1,2,3], in biology [4,5,6], in chemistry [7,8] and in chemoinformatics [9]. Other application-oriented areas where graph comparison techniques have been employed can be found in [10,11,12]. Note that the terms 'graph similarity' or 'graph distance' are not unique and strongly depend on the underlying concept. The two main concepts which have been explored extensively are exact and inexact graph matching, see [13,3]. Exact graph matching [2,3] relates to match graphs based on isomorphic relations. An important example is the so-called Zelinka distance [3] which requires computing the maximum common subgraphs of two graphs with the same number of vertices. However, it is evident that this technique is computationally demanding as the subgraph graph isomorphism problem is NP-complete [14]. In contrast to this, inexact or approximative techniques for comparing graphs match graphs in an errortolerant way, see [13]. A highlight of this development has been the well-known graph edit distance (GED) due to Bunke [15]. String-based techniques also fit into the scheme of approximative graph comparison techniques [1,16]. This approach aims to derive string representations which capture structural information of the underlying networks. By using string alignment techniques, one is able to compute similarity scores of the derived strings instead of matching the graphs by using classical techniques. Concrete examples thereof can be found in [1,16].
As mentioned, numerous graph similarity and distance measures have been explored. But in fact, there is still a lack of a mathematical framework to explore interrelations of these measures. Suppose let d 1 : G|G?R z and d 2 : G|G?R z be two comparative graph measures (i.e., graph similarity or distance measures) which are defined on the graph class G. Typical questions in this idea group would be to prove interrelations of the measures by means of inequalities such as d 1 v (w) d 2 . For instance, inequalities involving graph complexity measures have been inferred by Dehmer et al. [17,18].
The main contribution of this paper is to infer interrelations of graph distance measures. To the best of our knowledge, this problem has not been tackled so far when using graph distance measures. However, interrelations of topological indices interpreted as complexity measures have been studied, see [7,19,20,17,18]. For instance, Bonchev and his co-workers investigated interrelations of branching measures by means of inequalities [7,19,20]. Dehmer [17] examined relations between information-theoretic measures which are based on information functionals and between classical and parametric graph entropies [18]. We here put the emphasis on graph distance measures which are based on so-called topological indices. These measures themselves have not yet been studied. Note that we only consider distance measures (without loss of generality) as they can be easily transformed into graph similarity measures [21]. In order to define these measures concrete, we employ an existing distance measure (see Eq. (6)) and the well-known Randić index [22], the Wiener index [23], eigenvalue-based measures [24], and graph entropies [17,25]. Also, we discuss quality aspects of the measures and state conjectures evidenced by numerical results.

Topological Indices and Preliminaries
In this section, we introduce the topological indices which are used in the paper. A topological index [23] is a graph invariant, defined by Simple invariants are for instance the number of vertices, the number of edges, vertex degrees, degree sequences, the matching number, the chromatic number and so forth, see [26].
We emphasize that topological indices are graph invariants which characterize its topology. They have been used for examining quantitative structure-activity relationships (QSARs) extensively in which the biological activity or other properties of molecules are correlated with their chemical structures [27]. Topological graph measures have also been applied in ecology [28], biology [29] and in network physics [30,31]. Note that various properties of topological graph measures such as their uniqueness and correlation ability have been examined too [32,33].
Suppose G~(V ,E) is a connected graph. The distance between the vertices u and v of G is denoted by d(u,v). The Wiener index of G is denoted by W (G) and defined by The name Wiener index or Wiener number for the quantity defined is common in the chemical literature, since Wiener [34] in 1947 seems was the first who considered it. For more results on the Wiener index of trees, we refer to [35].
In 1975, Randić [36] proposed the topological index R (R {1 and R { 1 2 ) by using the name branching index or connectivity index, suitable for measuring the extent of branching of the carbon-atom skeleton of saturated hydrocarbons. Nowadays this index is also called the Randić index. In 1998, Bollobás and Erdös [37] generalized this index by replacing { 1 2 by any real number a, which is called the general Randić index. In fact, the Randić index and the general Randić index became the most popular and most frequently employed structure descriptors used in structural chemistry [38]. For a graph G~(V ,E), the Randić index R(G) of G has been defined as the sum of (d(u)d(v)) {1=2 over all edges uv of G, i.e., where d(u) is degree of a vertex u of G. The zeroth-order Randić index due to Kier and Hall [6] is For more results on the Randić index and the zeroth-order Randić index, we refer to [39,22,38].
For a given graph G with n vertices, l 1 ,l 2 , . . . ,l n are the eigenvalues of G. The energy of a graph G, denoted by E(G), has been defined by due to Gutman in 1977 [40]. For more results on the graph energy, we refer to [41,24,42].

Novel Graph Distance Measures
Now we define the distance measure [21] d(x,y)~1{e { x{y Proof. Let x~jI(G){I(H)j, then d I is a monotone increasing function on x. Therefore, the maximum value of d I is attained if and only if the maximum value of jI(G){I(H)j is attained. % From Observation 1 and some existing extremal results of topological indices, we obtain some sharp upper bounds of d I for some classes of graphs. As an example, we list some of those results for trees. Theorem 1. Let T and T 0 be two trees with n vertices. Denote by S n and P n the star graph and path graph with n vertices, respectively.
(i). The maximum value of d W (T,T 0 ) is attained when T and T 0 are S n and P n , respectively.
(ii). The maximum value of d R (T,T 0 ) is attained when T and T 0 are S n and P n , respectively.
(iii). The maximum value of d0 R (T,T 0 ) is attained when T and T 0 are S n and P n , respectively.
(iv). The maximum value of d E (T,T 0 ) is attained when T and T 0 are S n and P n , respectively.

Interrelations of Graph Distance Measures
Observe that 0ƒd I (G,H)v1, which implies that However, d I is not a metric graph distance measure, since the triangle inequality d I (G,H)zd I (H,K) §d I (G,K) for G,H,K[G, does not hold generally. Actually, we obtain a modified version of the triangle inequality.
Therefore, we have the following inequality, i.e., d I (G,H)zd I (H,K) §s I (G,K). % We emphasize if the Inequalities 11 are satisfied, the modified triangle inequality holds. In practice, the triangle inequality may not be absolutely necessary (e.g., for clustering and classification problems) and is often required to prove properties of the measures.
Theorem 3. Let I 1 and I 2 be two topological indices. Let G be a class of graphs and G,H[G. If then where aw0 is a constant. Proof. Since we obtain (I 1 (G){I 1 (H)) 2 §a 2 (I 2 (G){I 2 (H)) 2 : ð18Þ i.e., Thus, The proof is complete. % Suppose I 3 is also a topological index. Then if we derive similarly where bw0 is a constant. Therefore, we obtain the following theorem. Theorem 4. Let I 1 , I 2 and I 3 be three topological indices. Let G be a class of graphs and G,H[G. If then we infer where a,bw0 are constants. Theorem 5. Let I 1 and I 2 be two topological indices. Let G be a class of graphs and G,H[G. If then we get where aw0 is a constant.
Proof. Since we infer And therefore, Hence, From the definition of d I2 , i.e., we obtain that Finally, by substituting (35) into (33), we get the desired result. % Suppose I 3 is also a topological index. Then if we have where bw0 is a constant. Therefore, we obtain the following theorem.
Theorem 6. Let I 1 I 2 and I 3 be three topological indices. Let G be a class of graphs and G,H[G. If then we have where a, bw0 are constants.
Theorem 7. Let I 1 , I 2 and I 3 be three topological indices. Let G be a class of graphs and G,H[G. If then we infer ln (1{d I 1 (G,H))ƒ{s 2 ln (1{d I 2 (G,H)) : ln (1{d I 3 (G,H)): ð42Þ Proof. Since we derive And therefore, Interrelations of Graph Distance Measures Hence we obtain which implies that By substituting (35) into (47), we easily obtain the assertion of the theorem. % By performing a similar proof as in Theorem 7, we obtain a more general result.
Theorem 8. Let I, I 1 I 2 ,I 3 , . . . , I k be topological indices. Let G be a class of graphs and G,H[G. If we infer Theorem 9. Let I 1 , I 2 and I 3 be three topological indices. Let G be a class of graphs and G,H[G. If where c 1 ,c 2 w0, then we get Proof. Since we derive Therefore, which implies By applying the substitutions and into (56), we obtain the final result. % By performing a similar proof as in Theorem 9, we obtain a more general result again.
Theorem 10. Let I, I 1 I 2 ,I 3 , . . . , I k be topological indices. Let G be a class of graphs and G,H[G. If where c j w0 for 1ƒjƒk, then we infer

Graph Distance Measures Based on Randić Index
In this section, we consider the values of the graph distance measure based on the Randić index and other topological indices for some classes of graphs. Denote by W and R the Wiener index and Randić index, respectively.
Theorem 11. Let G be a class of regular graphs with n vertices and I is an arbitrary topological index. For two graphs G,H[G, we infer Proof. Let G and H be two regular graphs of order n. By the definition of the Randić index, we obtain that R(G)~R(H)~n 2 , which implies that R(G){R(H)~0. Therefore, we infer d R (G,H)~0. Since d I (G,H) §0 for any topological index, then we obtain the desired inequality.
By using the definition of the zeroth-order Randić index for two graphs with the same degree sequences, we obtain that 0 R(G)~0R(H). Therefore, we get the following theorem.
Theorem 12. Let G be a class of graphs with the same degree sequences and I is an arbitrary topological index. Then for two graphs G,H[G, we infer For a given graph G of order n, we get ffiffiffiffiffiffiffiffiffiffi n{1 p ƒR(G)ƒ n 2 (see [39]). Thus, From (63), we infer an upper bound for d R (G,H). Theorem 13. Let G and H be two connected graphs of order n. Then we get The equality holds if and only if G and G 0 are S n and a regular graph, respectively.
Suppose u and v are two pendent vertices, and u 0 the unique neighbor of u. We define an operation as follows: deleting the edge uu 0 and adding the edge uv. We call this operation ''transfer u to v''.
Theorem 14. Let G~(V ,E) be a graph with n vertices. Denote by P 1 and P 2 the two pendent paths attaching to the same vertex such that jP 1 j §jP 2 j §1. Denote by H the graph obtained by transferring the pendent vertex of P 2 to the pendent vertex of P 1 . Then we have Proof. Let G~(V ,E) be a graph with n vertices. Suppose P 1~u u 1 u 2 . . . u a and P 2~u v 1 v 2 . . . v b with a §b §1. Since P 1 and P 2 are two pendent paths attaching to the same vertex, then we get By using the definition of H, we infer H~G{v b{1 v b zu a v b . By using the definition of d I , we only need to show Observe that V(G)~V (H)~V . We will discuss the difference of the distances between two vertices in G and H. Let x and y be two vertices of G. If x, y[V \fv b g, then we have d H (x,y)~d G (x,y). Now we suppose x~v b . If y1V (P 1 )|V (P 2 ), then Observe that Therefore, we have i.e, For b §3, it is easy to verify R(G)~R(H). Therefore For b~2, from (66), we have 2ƒaƒn{4 and jW (G){W (H)j~(n{a{3)(a{1). By performing some elementary calculations, we get i.e., for 2ƒaƒn{4 and each value of n. Therefore, from (63) i.e., for 1ƒaƒn{3 and each value of n. Therefore, from (63) Observe that for every T[T n \T 0 n , there must be a tree T 0 [T 0 n such that T can be obtained from T 0 by repeatedly transferring pendent vertices. Therefore, we obtain the following corollary.
Actually, numerical experiments show that for any two trees T,T 0 [T n , the inequality d W (T,T 0 )wd R (T,T 0 ) holds. We state the result as a conjecture. Conjecture 1. Let T and T 0 be any two trees with n vertices. Then holds.
As an example, we consider (all) 23 trees with 8 vertices and calculate all possible values of d W (T,T 0 ) (blue) and d R (T,T 0 ) (red) as shown in Figure 1. From Figure 1, we observe that d W (T,T 0 ) §d R (T,T 0 ) holds for each pair of trees T and T 0 .

Graph Distance Measures Based on Graph Entropy
In this section, we consider graph distance measures which are based on graph entropy and other topological indices for some classes of graphs.
In order to start, we reproduce the definition of Shannon ' s entropy [43]. Let p~(p 1 ,p 2 , . . . ,p n ) be a probability vector, namely, 0ƒp i ƒ1 and X n i~1 p i~1 . The Shannon ' s entropy of p has been defined by We denote by d Ip the graph distance measure based on I(p).
In the following, we infer an upper bound for d Ip (G,H). Theorem 15. Let G and H be two graphs with the same vertex set. Denote by p~(p 1 ,p 2 , . . . ,p n ) and p 0~( p 0 1 ,p 0 2 , . . . ,p 0 n ) be the probability vectors of G and H, respectively. If p i ƒp 0 i for each i, then we infer where A~X n i~1 p 0 i log (1z Proof. Since p i ƒp 0 i for each i, then we obtain p i vp 0 i z1 and log p i v log (p 0 i z1). Then we have Therefore, we get the inequality, i.e., I(p 0 ){I(p)vA. Hence, The desired inequality holds. % In [25], Dehmer and Mowshowitz generalized the definition of graph entropy by using information functionals. Let G~(V ,E) be a connected graph. For a vertex v i [V , we define where f represents an arbitrary information functional. By substituting p(v i ) to (78), we have We denote by d If the graph distance measure based on If .

Relations between d E (G,H) and d Ig (G,H)
Denote by l 1 ,l 2 , . . . ,l n the eigenvalues of a graph G. By setting f (v i )~jl i j in (87), we obtain a new expression of the graph entropy namely Recall that the energy of G is defined as E(G)~X n i~1 jl i j.

Then we infer
From the definition of Ig(G), it is interesting to investigate the relation between the graph distance measures d E and d Ig .
Theorem 16. Let G and H be two graphs of order n with E(G)wE(H). Denote by l 1 ,l 2 , . . . ,l n and l 0 1 ,l 0 2 , . . . ,l 0 n the eigenvalues of G and H, respectively. Let l 0~m ax 1ƒiƒn fjl 0 i jg and l~min 1ƒiƒn fjl i jg. Then we get where j[(E(H),E(G)) is a constant. Proof. Let G and H be two graphs of order n. Let E~E(G) and E 0~E (H) with EwE 0 . Then we get where j[(E 0 ,E). Thus, i.e., Taking logarithm for the two sides of the above inequality, we have The required inequality holds. % Actually, numerical experiments show that for any two distinct trees T,T 0 [T n , d E (T,T 0 ) §d Ig (T,T 0 ) holds. See Figure 2 as an example, in which we consider (all) 23 trees with 8 vertices and calculate all possible values of d E (T,T 0 ) (red) and d Ig (T,T 0 ) (blue). We state this observation as a conjecture.
Conjecture 2. Let T and T 0 be any two distinct trees with n vertices. Then holds.
Using a similar proof method of Theorem 16, we can obtain a generalization for the distance measure based on If (see Eq. (87)). Let f be an arbitrary information functional and f (G)~X n i~1 f (v i ) be a topological index. Theorem 17. Let G and H be two graphs of order n with f (G)wf (H). Let where g[(f (H),f (G)) is a constant. Dehmer and Mowshowitz [44] introduced a new class of measures (called here generalized measures) that derive from functions such as those defined by Rényi ' s entropy and Daròczy ' s entropy. Let G be a graph of order n. Then If we let f (v i )~jl i j, then we can obtain the new generalized entropy based on eigenvalues. We denote the entropy by For a given graph G~(V ,E) with n vertices, denote by l 1 ,l 2 , . . . ,l n the eigenvalues of G. By substituting E~X n i~1 jl i j into equality (104), we have 1 The last equality holds since X n i~1 l 2 i~2 m. By the following theorem, we study the relation between d E and d Ig1 .
Theorem 18. Let G be a class of graphs with n vertices and m edges. For two graphs G,H[G, let E~E(G) and E 0~E (H). Then we get and where a[(1{ 16m 2 E 3 E 03 ,1) is a constant.
Proof. Let G and H be two graphs with n vertices and m edges. Without loss of generality, we suppose E §E 0 .
To show the first inequality, it suffices to prove Then from (107), we derive If we want to prove we only need to show From a well-known bound of energy E 0 §2 ffiffiffiffi m p , we have E 02 w2m and E 2 w(EzE 0 ). Therefore, d E (G,H)wd Ig1 (G,H) holds. Now we show the second inequality. From (111), we have Therefore, we have From the definition of the distance measure, by some elementary calculations, we finally infer where a[(1{ 16m 2 E 3 E 03 ,1) is a constant. The proof is complete. %

Relations between d I (G,H) and d If (G,H)
Let G~(V ,E) be a connected graph with n vertices, m edges and degree sequence (d 1 , (87), we can obtain the new entropy based on degree powers, denoted by If k (G) For k~{1=2, the expression the zeroth-order Randić index 0 R(G). Then by using Theorem 17, we obtain the following result. Theorem 19. Let G and H be two graphs of order n with 0 R(G)w 0 R(H). Let Then we have where g[(f (H),f (G)) is a constant.
For k~1, we get Furthermore, by the definition of If k (G), for two graphs with the same degree sequences, we obtain that If k (G)~If k (H). Therefore, we get the following result.
Theorem 20. Let G be a class of graphs with the same degree sequences and I is an arbitrary topological index. Then for two graphs G,H[G, we infer By using the similar proof method applied in Theorem 14, we obtain a weaker result.
Theorem 21. Let T~(V ,E) be a tree with n vertices. Denote by P 1 and P 2 two pendent paths attaching to the same vertex such that jP 1 j §jP 2 j §1. Denote by T 0 the tree obtained by transferring the pendent vertex of P 2 to the pendent vertex of P 1 . Then we have For a tree T with n vertices, we get If 1 (S n )ƒIf 1 (T)ƒIf 1 (P n ). By performing elementary calculations, we get Observe that V (T)~V (T 0 )~V . We first discuss the difference of the distances between two vertices in T and T 0 . Let x and y be two vertices of T. If x,y[V \fv b g, then we have d T 0 (x,y)~d T (x,y). Now we suppose x~v b . If y1V(P 1 )|V (P 2 ), then d T 0 (x,y){d T (x,y)~(az1){b~a{bz1: Observe that In the following, we suppose b~1.
We obtain 1ƒaƒn{3 and jW (T){W (T 0 )j~a(n{a{2). By performing elementary calculations, we get for 1ƒaƒn{3 and each value of n. Therefore, To prove the other inequality, we need more detailed discussion. By using the definition of graph entropy, we get for each n, i.e., jR(T){R(T 0 )jwjIf 1 (T){If 1 (T 0 )j. % From Theorem 14 and 21, we obtain the following corollary. Corollary 2. Let T~(V ,E) be a tree with n vertices. Denote by P 1 and P 2 the two pendent paths attaching to the same vertex such that jP 1 j §jP 2 j §1. Denote by T 0 the tree obtained by transferring the pendent vertex of P 2 to the pendent vertex of P 1 . Then we have Therefore, we obtain a similar result to comparing the values of distance measures of trees.
Actually, our numerical results (see section 'Numerical Results') show that for any two trees T,T 0 [T n , the following inequality may hold.
Conjecture 3. Let T and T 0 be any two trees with n vertices. Then holds.
By way of example, we consider all 23 trees of 8 vertices and calculate all possible values of d If1 (T,T 0 ) (blue) and d R (T,T 0 ) (red), respectively, as shown in Figure 3. From Figure 3, we observe that holds for each pair of trees T and T 0 .

Numerical Results
In this section, we interpret the numerical results. First, we consider all trees with 8 vertices. The number of trees is 23 and the number of pairs is 253 (see [45]). From the curves shown by Figure 1, we see that both measures d W (T,T 0 ) (blue) and d R (T,T 0 ) (red) satisfy the inequality Eq. (77). From the curves shown by Figure 2, we observe that both measures d E (T,T 0 ) (red) and d Ig (T,T 0 ) (blue) satisfy the inequality Eq. (101). From the curves shown by Figure 3, we also learn that both measures d If1 (T,T 0 ) (blue) and d R (T,T 0 ) (red) fulfill the inequality Eq. (143). By using this method, several other inequalities could be generated and verified graphically. Figures 4 and 5 show the numerical results by using the graph distance measures based on graph energy E, the Wiener index W and the Randić index R, respectively. We consider all trees with 11 vertices. The number of trees is 235 and the number of pairs is 27495 (see [45]). By Figure 4, we depict the distributions of the ranked distance values, that is, d E (red), d W (blue), and d R (yellow). First and foremost, we see that the measured values of all three measures cover the entire interval ½0,1. This indicates that the measures are generally useful as they are well defined. By considering d W , we observe that only a relatively little number of pairs have a measured value ƒ 0.8. But a large number of pairs possess distance values § 0.8. When considering d R , the situation is reverse. The distance values of d E seem to slightly increase with some up-and downturns. However, Figure 4 does not comment on the ability of the graph distance measures to classify graphs  efficiently. This needs to be examined in the future and would far beyond the scope of this paper.
Furthermore, we have computed the cumulative distributions by using the measures d E (red), d W (blue), d R (yellow), respectively, as shown in Figure 5. In general, the computation of the cumulative distribution may serve as a preprocessing step when analyzing graphs structurally. In fact, we see how many percent of the 235 graphs have a distance value which is less or equal d. Also, Figure 5 shows that the value distributions are quite different. From Figure 5, we see that the curve for d W strongly differs from d E and d R . When considering d R , we also observe that about 80% of the 235 trees have a distance value approximately ƒ 0.5. That means most of the trees are quite dissimilar according to d R . For d W , the situation is absolutely reverse. Here 80% of the trees have a distance value approximately ƒ 0.98. Finally evaluating the graph distance measure d E on these trees reveals that about 80% of the trees possess a distance value approximately ƒ 0.85. In summary, we conclude from Figure 5 that all three measures capture the distance between the graphs quite differently. But nevertheless, this does not imply that the quality of one measure may be worse than another. Again, an important issue of quality is fulfilled as the measures turned out to be well defined, see Figure 4. Another crucial issue would be evaluating the classification ability which is future work.

Summary and Conclusion
In this paper, we have studied interrelations of graph distance measures which are based on distinct topological indices. In order to do so, we employed the Wiener index, the Randić index, the zeroth-order Randić index, the graph energy, and certain graph entropies [25]. In particular, we have obtained inequalities involving the novel graph distance measures. Evidenced by a numerical analysis we also found three conjectures dealing with relations between the distance measures on trees.
From Theorem 1, we see that the star graph and the path graph maximize d I among all trees with a given number of vertices, for any topological index we considered here. Actually, this also holds for some other topological indices, such as the Hosoya index [46,47], the Merrifield-Simmons index [48,49,47], the Estrada index [50,51,52], and the Szeged index [53,54]. All other theorems we have proved in this paper shed light on the problem of proving interrelations of the measures. We believe that such statements help to understand the measures more thoroughly and, finally, they are useful to establish new applications employing quantitative graph theory [55]. We emphasize that the star graph and the path graph are apparently the two most dissimilar trees among all trees. Similar observations can also be obtained for unicyclic graphs or bicyclic graphs. Therefore, in the future, we would like to explore which classes of graphs have this property, i.e., identifying graphs (such as the path graph and the star graph) which maximize or minimize d I .
Another direction for future work is to compare the values of d I (G,G 0 ) where G,G 0 are general graphs. For example, we could assume that G and G 0 are obtained by only one graph edit operation, i.e., GED(G,G 0 ) = 1, see [15]. Then, all the graph which fulfill this equation are (by definition) similar. This construction could help to study the sensitivity of the measures thoroughly. Note that similar properties of topological indices have already been investigated, see [56]. As a conclusive remark, we mention that dynamics models on spatial graphs have been studied by Perc and Wang and other researchers, see [57,58]. It would be interesting to study the distance measures in this mathematical framework as well.

Supporting Information
Supporting Information S1 CSV file containing descriptor values of 235 trees by using the Randić index.