2 Aug 2013: Dehmer M, Mowshowitz A (2013) Correction: The Discrimination Power of Structural SuperIndices. PLOS ONE 8(8): 10.1371/annotation/792f2983-0889-432a-b047-e1c9577dcc7e. https://doi.org/10.1371/annotation/792f2983-0889-432a-b047-e1c9577dcc7e View correction
In this paper, we evaluate the discrimination power of structural superindices. Superindices for graphs represent measures composed of other structural indices. In particular, we compare the discrimination power of the superindices with those of individual graph descriptors. In addition, we perform a statistical analysis to generalize our findings to large graphs.
Citation: Dehmer M, Mowshowitz A (2013) The Discrimination Power of Structural SuperIndices. PLoS ONE 8(7): e70551. https://doi.org/10.1371/journal.pone.0070551
Editor: Bülent Yener, Rensselaer Polytechnic Institute, United States of America
Received: January 6, 2013; Accepted: June 25, 2013; Published: July 25, 2013
Copyright: © 2013 Dehmer, Mowshowitz. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: MD thanks the Austrian Science Funds for supporting this work (project P22029-N13). MD also gratefully acknowledges funding from the Standortagentur Tirol (formerly Tiroler Zukunftsstiftung). Also, research was sponsored by the United States Army Research Laboratory and the United Kingdom Ministry of Defence and was accomplished under Agreement Number W911NF-06-3-0001. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the United States Army Research Laboratory, the United States Government, the United Kingdom Ministry of Defence or the United Kingdom Government. The United States and United Kingdom Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The absence of a polynomial time algorithm for determining if two arbitrary graphs are isomorphic has stimulated efforts to develop efficient heuristics that work in almost all cases. In particular, research on structural network measures has been undertaken in recent decades, see, e.g., –. Several different types of network measures have been developed. Some of them have been used to characterize the structure of graphs locally or globally –. Others have been used to characterize graphs quantitatively, and these have been applied to problems in areas such as structural chemistry, structural drug design, ecology, and computational physics , –. Bonchev  and Balaban et al.  developed structural indices to detect branching in molecular graphs. In addition to research directed at measuring structural features of a given network, work has been carried out on comparative network measures –. Examples include such work as graph similarity and graph distance measures which have been applied to graph clustering and other problems, see –.
Properties of structural measures have also been examined in some detail. Research in this area encompasses investigations of the mathematical interrelations between network measures , , correlations between measures , , and their respective discrimination powers (also called uniqueness) –. Discrimination power (or the uniqueness property) is the central concern of this paper. In addition to earlier work on the uniqueness of structural graph measures –, , , Dehmer et al. ,  recently performed large scale analyses of the uniqueness of information-theoretic, degree-based and eigenvalue-based network measures. Here we focus on single indices defined relative to graph decompositions such as those induced by symmetry structure, distances, vertices, chromatic features, etc. Such an index is a mapping and can be interpreted as a graph complexity measure , , . Single indices interpreted as graph invariants  have been studied in areas such as structural chemistry ,  and computer science . Also, we emphasize that approaches employing single indices for finding complete graph invariants have failed so far , , . A complete graph invariant is an index that distinguishes between non-isomorphic graphs in a given collection. The reason for their failure is that every known single index has a certain degree of degeneracy , , that is, the measure can not distinguish non-isomorphic graphs by its values. Hence, single structural indices are not suitable for determining graph isomorphism, see .
In this paper, we explore the uniqueness of so-called superindices , ,  for graphs (see section ‘SuperIndices’). Such superindices have been studied in structural chemistry and other disciplines , , . A superindex is a composition of several structural index components, and is designed to obtain a measure which captures structural information more meaningfully than the individual components by themselves. To the best of our knowledge, the uniqueness of superindices  has not yet been explored to any great extent. To this end we use exhaustively generated general graphs  rather than any special graph classes such as chemical graphs , , . The reason for using exhaustively generated general graphs (i.e., graphs without any structural constraints ) is to study the uniqueness of the superindices applied to arbitrary graphs. In short, the problem we address is the use of structural superindices that appear useful in determining graph isomorphism. Superindices are not restricted to any particular class of graphs - they can be applied to arbitrary graphs. Furthermore, a graph index is a measure that maps a single graph to the reals. In contrast, a graph metric , ,  is a comparative measure designed to determine the structural similarity between graphs. Those metrics will not be used in this paper. Other graph measures such as the clustering coefficient or degree-based measures do not quantify structural features of graphs meaningfully as they exhibit a high degree of degeneracy .
Superindices , ,  are combinations of existing indices, where “combination” means algebraic or transcendental operations on the component indices. The term superindex was coined by Bonchev et al.  who devised superindices to achieve better discrimination between isomers than was possible using individual graph measures. Dehmer et al.  applied information-theoretic superindices to the Ames benchmark dataset of Hansen et al.  using supervised machine learning. In addition, Pogliani  derived certain superindices and demonstrated their power to predict melting points.
Let be a graph class and a topological index (or descriptor). Given and we define the following superindices, chosen because they are the simplest and most obvious linear combinations of two indices, and turn out to have high discrimination power, and, after all, this is the acid test of the utility of the indices. It is of course possible that other combination methods, based for example on rank reduction techniques such as Singular Value Decomposition, would produce indices with even greater discrimination power. However, that is something to be explored in future papers. We define:(1)(2)(3)(4)(5)(6)(7)(8)(9)
Balaban et al.  proposed similar superindices in QSAR/QSPR , . That selection proved quite useful and has influenced our choice of superindices for the current study of uniqueness. In the following sections, we analyze the discrimination power of these superindices numerically and statistically. In particular, we demonstrate that some superindices far outperform the underlying single descriptors.
Data and Computation
The uniqueness of the superindices listed above has been analyzed on a collection of exhaustively generated graphs . This collection, denoted (with ) , consists of all non-isomorphic connected graphs on 9 vertices. As in , the graphs in this collection were generated by the program geng from the Nauty package . The individual as well as the superindices were calculated with the aid to the R-package QuACN , .
The random graph construction model was selected because it yields the most general class of graphs, and seems appropriate for an initial study of the discrimination power of superindices. Other construction methods, e.g.,  are also of interest, especially because they model many real world graphs known to exhibit a power law distribution. However, application of the superindices to graphs produced by other construction methods is beyond the scope of the current paper.
Table 1 presents the QuACN-descriptors  with their input options (parameter) and their abbreviations. Superindices with components drawn from the descriptors in Table 1 have been calculated. The results of these computations (discussed below) are shown by Tables 5, 6, 7, 8, 9, 10, 11, 12. Table 4 shows the uniqueness of QuACN-descriptors for given ndv-values, i.e., the number of the non-distinguishable values (graphs) for a particular index and sensitivity(10)see , . The tables show that only a few of the QuACN-descriptors possess high uniqueness, having . Examples of such highly discriminating indices are infotheolin2, infotheoquad2, infotheoexp2, infotheoexp3, laplacianEstrada, minBalabanID, eigenvalaugement, eigenvalextadj, eigenvalvertconnect, eigenvalrandomwalk, eigenvalweightedlin, eigenvalweightedexp. High discrimination power has already been observed (see , ) for some of the indices, namely, information-theoretic measures (e.g., infotheolin2, infotheoquad2, infotheoexp2 etc.) and the entropic eigenvalue-based measures (eigenvalaugement, eigenvalextadj, eigenvalvertconnect etc.) due to Dehmer , . Note that the uniqueness of the minBalabanID  is less than the uniqueness of some of the above mentioned measures due to Dehmer , . Most of the so-called molecular ID numbers (such as minBalabanID) appear to be highly discriminating but have never been evaluated on general graph classes such as exhaustively generated general graphs. It has also been observed that the uniqueness of structural graph indices depend on the graph class under consideration, see , , .
Tables 5, 6, 7, 8 present the uniqueness results for certain combinations of descriptors involving the superindices. Each pair of tables shows the the results for two subsets of such indices. The first subset consists of Equations 1–5 (e.g., Table 5) and the second subset consists of Equations 6–9 (e.g., Table 6), respectively. For instance if we look at Table 5, we see that most of the superindices now discriminate the graphs perfectly (ndv = 0) even when indices with very low uniqueness (such as augmentedZagreb, bertz, wiener etc.) are involved. When applying the descriptors radialCentric and eigenvalaugement to the Equations representing the superindices, some of them are much less discriminating (ndv = 79676 corresponds to ). This is due to the fact that radialCentric has little discrimination power (it discriminate only two graphs out of 261080). A similar effect can be seen in Tables 9, 10, 11, 12. For instance, Table 9 shows that the composition (based on the superindices) of a descriptor with little discrimination power (e.g., narumiKatayama; ndv = 260925, 0.00059, see Table 4) with another descriptor having high discrimination power (e.g., eigenvalvertconnect; ndv = 1089, 0.99583, see Table 4) leads again to a highly unique measure. In this particular case and by using the superindex , we find its discrimination power to be ndv = 535 and . Uniqueness (measured by ndv and ) of the new measure is better than the uniqueness of the component measures, see Table 9. More extreme cases can be found in Table 12 defined as the composition of the two descriptors topologicalinfocontent and eigenvalvertconnect using the superindex . In short, Tables 5, 6, 7, 8, 9, 10, 11, 12 demonstrate that most of the superindices possess high uniqueness when one of the constituent graph measures has little discrimination power.
To better understand the behavior of these indices it would be desirable to explore the structural interpretation of these measures. Many of the constituent measures have a structural interpretation associated with a branching index ,  (e.g., the Wiener index (wiener) or as a cyclicity index  (e.g., the Balaban index (balabanJ). A correlation analysis might be used to determine classes of superindices having a distinctive interpretation, e.g., branching, cyclicity, irregularity etc. Such an analysis would involve finding the correlations between and , and , and , etc. However, this is beyond the scope of the present paper.
To determine the scalability of our findings on discrimination power of superindices applied to the graphs in , we have performed a statistical analysis. The aim of this analysis is to determine whether or not the results for determining uniqueness are statistically stable for graphs with larger numbers of vertices. Central to this analysis is a method for generating random graphs. We used Bootstrapping ,  to estimate the underlying sampling distribution.
For the statistical analysis see Figures 1 and 2. Samples of random (Erdös-Rényi) graphs have been generated using the R-library igraph  for . More precisely, we have generated 50 random graphs for each of the edge sizes . The parameter denotes the bound on the size of the random sample dictated by the computational algorithm. The procedure we used is detailed in the following algorithm.
- Generate a connected random graph possessing vertices and -1 edges.
- Add edges randomly between non-adjacent vertices to obtain edge sizes .
- Check each generated random graph for isomorphism with previously generated graphs. If the newly generated graph is not isomorphic to any of the previously generated graphs, we add this graph to the list, and return to step 1.
Performing the computation in Algorithm 1, we obtain complete random samples for . For the sake of completeness, we also give the sizes of the random samples generated:
- and . By choosing , we generated random graphs with . Hence, we obtain 58500 random graphs in total.
- and . By choosing , we generated random graphs with . Hence, we obtain 134650 random graphs in total.
- and . By choosing , we generated random graphs with . Hence, we obtain 242150 random graphs in total.
- and . By choosing , we generated random graphs with . Hence, we obtain 550650 random graphs in total.
In order to calculate the superindices, we computed all possible (pairwise) combinations of the descriptors given in Table 2. To calculate the mean sensitivity for each descriptor combination, we bootstrapped the samples -times without replacement. Finally, the mean values of all sensitivity values for superindices and together with their variances are shown by Figures 1 and 2. The mean values are quite stable. Thus, there is little dependency between the mean sensitivity and the number of vertices of the generated random graphs. In particular, we see that the mean value detoriates slightly for . In short, Figure 1 strongly supports the hypothesis that the computed superindices have high discrimination power for graphs of increasing size and the values are quite stable. Indeed, stability could be defined here by the degree of the dependency between the mean sensitivity values and the number of vertices. Note that the analysis whose results are shown in Figure 1 was computationally demanding due to the combinatorial explosion of cases. Hence, to repeat the analysis for much larger (i.e., ) may not be feasible.
In contrast to the superindices, the results in Figure 2 show that the discrimination power of the individual descriptors listed in Table 2 is worse for larger graphs. This is indicated by the mean sensitivity values which are much lower than the ones shown in Figure 1. This demonstrates that superindices and have a much better discrimination power on the generated random graphs. A reason for this is that the superindices seem to capture structural information more meaningfully than the individual ones. This seems to be clear (for the used graph class) as multiple descriptors capture several different aspects of structural information which may complement each other and, thus, provide a (super) index with improved discrimination power.
The results in Figures 1 and 2 summarize the uniqueness of some superindices as a function of the size of randomly generated graphs. We next consider the relationship between uniqueness (measured by ) and graph size. The results are shown in Figure 3, 4, 5. Earlier work by Dehmer et al.  on superindices restricted the component individual indices to information-theoretic measures. In the present study, we aim to examine the dependency between the uniqueness of the superindex using certain descriptor categories applied to generated random graphs of fixed size (. The categories included eigenvalue-based, information-theoretic, distance-based and degree-based descriptors. The descriptors in the categories are listed in Table 3. In order to calculate the mean sensitivity using the descriptors of the above mentioned categories, we bootstrapped the descriptor values times without replacement for each combination to determine of randomly generated graphs (). The sample sizes are 100, 1000, 10000, 100000, 900000.
To calculate the superindex, we used all combinations of eigenvalue-based descriptors (Left) and eigenvalue-based and information-theoretic descriptors (Right), see Table 3.
To calculate the superindex, we used all combinations of eigenvalue-based and distance-based descriptors (Left) and eigenvalue-based and degree-based descriptors (Right), see Table 3.
To calculate the superindex, we used all combinations of distance-based descriptors (Left) and distance-based and degree-based descriptors (Right), see Table 3.
Figures 3, 4, 5 shows the impact of the underlying category on the above mentioned dependency. From Figure 3 we see that there is nearly no dependency between and the sample size. A plausible reason for this is the high uniqueness of the underlying individual descriptors of the categories employed, namely, (left) eigenvalue-based descriptors and (right) eigenvalue-based and information-theoretic descriptors (see Table 4). Figure 4 shows a similar result but there is a slight detoriation of uniqueness for the degree-based descriptors used calculate the superindex. This seems plausible as many degree-based measures possess little discrimination power, e.g., see . The left hand side of Figure 5 shows the dependency plot by using the (pure) category of distance-based measures (see Table 3). In particular, the variances are very high and the mean sensitivity values detoriate substantially as the sample size increases. Again, this can be understood by the low uniqueness of various distance-based graph measures (see Table 4). The right hand side of Figure 5 shows that this effect is eased for a (mixed) category of descriptors - distance-based and degree-based descriptors in the present case. In summary, we see that the uniqueness of the superindex does not depend much on the sample size when the component descriptors are relatively unique. In our study, this applies to the eigenvalue-based and information-theoretic descriptors. It is not surprising that we obtained very similar results by using the superindex .
Summary and Conclusion
In the foregoing we examined the discrimination power of structural superindices composed of two or more individual measures (or descriptors) defined on graphs. Our results show that superindices generally have greater discrimination power than individual descriptors. The initial analysis of the superindices was performed the collection of graphs on nine vertices. In addition, we examined the relative performance of superindices on randomly generated connected graphs on 50, 75, 100, and 150 vertices, respectively. The findings show that the superindices perform consistently over these different sized graphs, whereas individual descriptors exhibit declining performance. We conjecture that this superior performance of superindices is attributable to their taking account of multiple structural features of a graph, rather than the single feature captured by individual descriptors. Further research is needed to account for the differences in performance between different superindices, and between superindices and individual descriptors.
- 1. Dehmer M, Kraus V (2012) On extremal properties of graph entropies. MATCH Commun Math Comput Chem 68: 889–912.
- 2. Devillers J, Balaban AT (1999) Topological Indices and Related Descriptors in QSAR and QSPR. Gordon and Breach Science Publishers. Amsterdam, The Netherlands.
- 3. Diudea MV, Gutman I, Jäntschi L (2001) Molecular Topology. Nova Publishing. New York, NY, USA.
- 4. Halin R (1989) Graphentheorie. Akademie Verlag. Berlin, Germany.
- 5. Mowshowitz A (1968) Entropy and the complexity of the graphs I: An index of the relative complexity of a graph. Bull Math Biophys 30: 175–204.
- 6. Todeschini R, Consonni V, Mannhold R (2002) Handbook of Molecular Descriptors. Wiley-VCH. Weinheim, Germany.
- 7. Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40: 35–41.
- 8. Antiqueira L, da F Costa L (2009) Characterization of subgraph relationships and distribution in complex networks. New Journal of Physics 11.
- 9. Kier LB, Hall LH (1976) Molecular Connectivity in Chemistry and Drug Research. Academic Press. New York, USA.
- 10. Ulanowicz RE (2004) Quantitative methods for ecological network analysis. Computational Biology and Chemistry 28: 321–339.
- 11. Bonchev D (1995) Topological order in molecules 1. Molecular branching revisited. Journal of Molecular Structure: THEOCHEM 336: 137–156.
- 12. Balaban AT, Mills D, Kodali V, Basak SC (2006) Complexity of chemical graphs in terms of size, branching and cyclicity. SAR and QSAR in Enviromental Research 17: 429–450.
- 13. Bunke H (1983) What is the distance between graphs ? Bulletin of the EATCS 20: 35–39.
- 14. Sobik F (1982) Graphmetriken und Klassifikation strukturierter Objekte. ZKI-Informationen, Akad Wiss DDR 2: 63–122.
- 15. Sobik F (1986) Modellierung von Vergleichsprozessen auf der Grundlage von Ähnlichkeitsmaßen für Graphen. ZKI-Informationen, Akad Wiss DDR 4: 104–144.
- 16. Zelinka B (1975) On a certain distance between isomorphism classes of graphs. Časopis pro pest Mathematiky 100: 371–373.
- 17. Emmert-Streib F, Dehmer M, Kilian J (2006) Classification of large graphs by a local tree decomposition. In: et al HRA, editor, Proceedings of DMIN'05, International Conference on Data Mining, Las Vegas, USA. 200–207.
- 18. Pržulj N (2007) Network comparison using graphlet degree distribution. Bioinformatics 23: e177–e183.
- 19. Raymond JW, Blankley CJ, Willet P (2003) Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures. J Mol Graph Model : 421–433.
- 20. Dehmer M, Mowshowitz A, Emmert-Streib F (2011) Connections between classical and parametric network entropies. PLoS ONE 6: e15733.
- 21. Zhou B (2008) Bounds on the balaban index. Croatica Chemica Acta 81: 319–323.
- 22. Bonchev D (1983) Information Theoretic Indices for Characterization of Chemical Structures. Research Studies Press, Chichester.
- 23. Basak SC, Balaban AT, Grunwald GD, Gute BD (2000) Topological indices: Their nature and mutual relatedness. J Chem Inf Comput Sci 40: 891–898.
- 24. Aigner M, Triesch E (1994) Realizability and uniqueness in graphs. Discrete Mathematics 136: 3–20.
- 25. Bonchev D, Mekenyan O, Trinajstić N (1981) Isomer discrimination by topological information approach. J Comp Chem 2: 127–148.
- 26. Konstantinova EV (1996) The discrimination ability of some topological and information distance indices for graphs of unbranched hexagonal systems. J Chem Inf Comput Sci 36: 54–57.
- 27. Diudea MV, Ilić A, Varmuza K, Dehmer M (2011) Network analysis using a novel highly discriminating topological index. Complexity 16: 32–39.
- 28. Dehmer M, Grabner M, Varmuza K (2012) Information indices with high discriminative power for graphs. PLoS ONE 7: e31214.
- 29. Randić M (1986) On molecular indentification numbers. J Chem Inf Comput Sci 24: 164–175.
- 30. Raychaudhury C, Ray SK, Ghosh JJ, Roy AB, Basak SC (1984) Discrimination of isomeric structures using information theoretic topological indices. Journal of Computational Chemistry 5: 581–588.
- 31. Dehmer M, Grabner M, Furtula B (2012) Structural discrimination of networks by using distance, degree and eigenvalue-based measures. PLoS ONE 7: e38564.
- 32. Liu X, Klein DJ (1991) The graph isomorphism problem. Journal of Computational Chemistry 12: 1243–1251.
- 33. McKay BD (1981) Graph isomorphisms. Congressus Numerantium 730: 45–87.
- 34. Borgwardt M (2007) Graph Kernels. Ph.D. thesis, Ludwig-Maximilians-Universität München, Fakultät für Mathematik, Informatik und Statistik.
- 35. Dehmer M, Grabner M, Mowshowitz A, Emmert-Streib F (2012) An efficient heuristic approach to detecting graph isomorphism based on combinations of highly discriminating invariants. Advances in Computational Mathematics.
- 36. Dehmer M, Barbarini N, Varmuza K, Graber A (2010) Novel topological descriptors for analyzing biological networks. BMC Structural Biology 10.
- 37. Pogliani L (2011) Applications of chemical graph theory to organic molecules. In: Putz MV, editor, Carbon Bonding and Structures, Springer Netherlands, volume 5 of Carbon Materials: Chemistry and Physics. 117–157.
- 38. Kaden F (1990) Graph similarity and distances. In: Bodendiek, Henn R, editors, Topics in Combinatorics and Graph Theory, Physica-Verlag. 397–404.
- 39. Klein DJ (1997) Graph geometry, graph metrics and wiener. MATCH Communications in Mathematical and in Computer Chemistry 35: 7–27.
- 40. Dorogovtsev SN, Mendes JFF (2003) Evolution of Networks. From Biological Networks to the Internet and WWW. Oxford University Press.
- 41. Hansen K, Mika S, Schroeter T, Sutter A, Laak AT, et al. (2009) A benchmark data set for in silico prediction of ames mutagenicity. J Chem Inf Model 49: 2077–2081.
- 42. Balaban AT, Ivanciuc O (1999) Historical development of topological indices. In: Devillers J, Balaban AT, editors, Topological Indices and Related Descriptors in QSAR and QSPAR, Gordon and Breach Science Publishers. 21–57. Amsterdam, The Netherlands.
- 43. McKay BD (2010). Nauty. http://cs.anu.edu.au/~bdm/nauty/.
- 44. Müller LAJ, Kugler KG, Dander A, Graber A, Dehmer M (2011) QuACN - an R package for analyzing complex biological networks quantitatively. Bioinformatics 27: 140–141.
- 45. Müller L, Schutte M, Kugler KG, Dehmer M(2012) QuACN: Quantitative Analyze of Complex Networks. URL http://cran.r-project.org/web/packages/QuACN/index.html. R Package Version 1.6.
- 46. Bollobás B, Riordan O (2004) The diameter of a scale free random graph. Combinatorica 24: 5–34.
- 47. Dehmer M (2008) Information processing in complex networks: Graph entropy and information functionals. Appl Math Comput 201: 82–94.
- 48. Dehmer M, Varmuza K, Borgert S, Emmert-Streib F (2009) On entropy-based molecular descriptors: Statistical analysis of real and synthetic chemical structures. J Chem InfModel 49: 1655–1663.
- 49. Ivanciuc O, Balaban A (1996) Design of Topological Indices. Part 3. New Identification Numbers for Chemical Structures: MINID and MINSID. Croatica Chemica Acta 69: 9–16.
- 50. Dehmer M, Barbarini N, Varmuza K, Graber A (2009) A large scale analysis of informationtheoretic network complexity measures using chemical structures. PLoS ONE 4: e8057.
- 51. Efron B (1979) Bootstrap methods: Another look at the jackknife. Annals of Statistics 7: 1–26.
- 52. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, London, United Kingdom.
- 53. Csardi G, Nepusz T (2009). Package ‘igraph’; network analysis and visualization. http://igraph.sourceforge.net.