Graphettes: Constant-time determination of graphlet and orbit identity including (possibly disconnected) graphlets up to size 8

Adib Hasan; Po-Chien Chung; Wayne Hayes

doi:10.1371/journal.pone.0181570

Abstract

Graphlets are small connected induced subgraphs of a larger graph G. Graphlets are now commonly used to quantify local and global topology of networks in the field. Methods exist to exhaustively enumerate all graphlets (and their orbits) in large networks as efficiently as possible using orbit counting equations. However, the number of graphlets in G is exponential in both the number of nodes and edges in G. Enumerating them all is already unacceptably expensive on existing large networks, and the problem will only get worse as networks continue to grow in size and density. Here we introduce an efficient method designed to aid statistical sampling of graphlets up to size k = 8 from a large network. We define graphettes as the generalization of graphlets allowing for disconnected graphlets. Given a particular (undirected) graphette g, we introduce the idea of the canonical graphette as a representative member of the isomorphism group Iso(g) of g. We compute the mapping , in the form of a lookup table, from all 2^{k(k − 1)/2} undirected graphettes g of size k ≤ 8 to their canonical representatives , as well as the permutation that transforms g to . We also compute all automorphism orbits for each canonical graphette. Thus, given any k ≤ 8 nodes in a graph G, we can in constant time infer which graphette it is, as well as which orbit each of the k nodes belongs to. Sampling a large number N of such k-sets of nodes provides an approximation of both the distribution of graphlets and orbits across G, and the orbit degree vector at each node.

Citation: Hasan A, Chung P-C, Hayes W (2017) Graphettes: Constant-time determination of graphlet and orbit identity including (possibly disconnected) graphlets up to size 8. PLoS ONE 12(8): e0181570. https://doi.org/10.1371/journal.pone.0181570

Editor: Yongtang Shi, Nankai University, CHINA

Received: March 1, 2017; Accepted: June 23, 2017; Published: August 23, 2017

Copyright: © 2017 Hasan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All the code and the data files (the permutation maps, automorphism orbit lists) are available in www.github.com/Neehan/Faye.

Funding: The authors received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Network comparison is a growing area of research. In general the problem of complete comparison of large networks is intractable, being an NP-complete problem [1]. Thus, approximate heuristics are needed. Networks have been compared for statistical similarity from a high-level using simple, easy-to-calculate measures such as the degree distribution, clustering co-efficients, network centrality, among many others [2, 3]. While more sophisticated methods such as spectral analysis [4, 5] and topological indices [6] have been useful, the study of small subnetworks such as motifs [7] and graphlets [8, 9] have become popular. They have been used extensively to globally classify highly disparate types of networks [10] as well as to aid in local measures used to align networks [11–14].

A graphlet is a small, connected, induced subgraph g of a larger graph G. Given a particular graphlet g, the automorphism orbits of g are the sets of nodes that are topologically identical to each other inside g. Graphlets and their automorphism orbits with up to k = 5 nodes were first introduced in 2004 [8], and are depicted in Fig 1. Recently, automated methods have been created that can enumerate, in a larger graph, all graphlets and their automorphism orbits up to graphlet size k = 5 [15] and subsequently to any k [16], although the latter authors only applied it up to k = 6. Unfortunately, we have found that these methods take a very long time (hours to days) even just to count graphlets up to size k = 5 on some large biological networks, such as those in BioGRID [17]. It is not clear that such methods, especially for even larger k, will be applicable to the coming age of ever bigger networks, since the total number of graphlets appearing in a large network tends to increase exponentially with both k (the graphlet size) and n (the number of nodes in the large network). Eventually, an exhaustive enumeration of all graphlets appearing in a large network may become infeasible simply due to the number of graphlets that need to be enumerated, even under the optimization of using orbit counting equations. On the other hand, graphlets are too useful to abandon as a method of quantifying the topological structure of graphs. An achievable alternative for a large network G is to statistically sample its graphlets rather than exhaustively enumerate them. Additionally, such sampling could be useful with the recent advent of comprehensive biological network databases [18]: each sampled graphlet would act as a seed for local matching between larger networks, similar to how k-mers (short sequences of length k) are used for seed-and-extend sequence matching in BLAST [19].

Download:

Fig 1. All (connected) graphlets of sizes k = 3, 4, 5 nodes, and their automorphism orbits; within each graphlet, nodes of equal shading are in the same orbit.

The numbering of these graphlets and orbits were created by hand [8] and do not correspond to the automatically generated numbering used in this paper. The figure is taken verbatim from [16].

https://doi.org/10.1371/journal.pone.0181570.g001

To efficiently create a statistical sample of graphlets in a large network G, one must be able to take an arbitrary set of k nodes from G, and efficiently (preferably in constant time) determine both which graphlet is represented, as well as the automorphism orbits of each of the k nodes. Here, we solve this problem both by enumerating all graphlets (and their disconnected counterparts, which we term graphettes) and their automorphism orbits up to graphettes of size k = 8. We present a method that creates a lookup table that can quickly determine the graphette identity of any k nodes, as well as their automorphism orbits. Since the lookup table required significant time to pre-compute for k = 7 (a few hours on a single core) and k = 8 (hundreds of CPU weeks on a cluster), we provide the actual lookup tables for these values of k online at http://github.com/Neehan/Faye.

Materials and methods

Definitions and notations

Given a graph G on n nodes, a k-graphette is a (not necessarily connected) induced subgraph g on any set of k nodes of G. There are many ways one could choose the k nodes, for example (i) choosing k nodes uniformly at random from G, or (ii) performing a local search around some node u. We expect the former to be useful only in dense networks, while the latter is probably more useful in sparse networks because most random sets of k nodes in a sparse graph will be highly disconnected and thus not very informative. One could also (iii) perform edge-based selection (with local expansion) to ensure dense regions are sampled more frequently than sparse regions [20]; still other methods have been suggested [21].

Given a set of k nodes, we wish to quickly ascertain which graphette is represented, and which automorphism orbits each of the k nodes belong to. To do that we need a canonical list of graphettes and their orbits, and a fast way to determine which canonical graphette is represented by any permutation of k nodes. Here we demonstrate how, if k is fixed and relatively small (k ≤ 8 in our case), this can be accomplished in constant time by pre-computing and storing a lookup table indexed by a bit vector representation of the lower triangular matrix of the (undirected) adjacency matrix of the induced subgraph. Given such an index, the value associated with that index identifies the canonical graphette (a canonical ordering of the nodes for that graphette). We also pre-compute the automorphism orbits of all the canonical graphettes. Thus, by reversing the lookup table we can, in constant time, infer the orbit identity of each of the k nodes in that k-graphette. As a corrollary, we can also update the (statistically sampled) graphette orbit degree vector of each of the k nodes, similar to the graphlet degree vector [9].

We use the following abbreviations and notations throughout:

Download:

.

https://doi.org/10.1371/journal.pone.0181570.t001

Canonization of graphettes

If graphs G and H are isomorphic, it essentially means they are exactly the same graph, but drawn differently. For example, Fig 2 shows three different drawings of the Petersen graph. Technically, an isomorphism between networks G and H is a permutation so that

Download:

Fig 2. Three isomorphic representations of the Petersen graph.

https://doi.org/10.1371/journal.pone.0181570.g002

Consider a 3-graphette with nodes w, x and y. There are only 4 possible such graphettes, depicted in Fig 3. However, by permuting the order of the nodes, each of these graphettes can be represented by several isomorphic variants. In order to determine if two graphettes are isomorphic, we will represent its (undirected) graph with the lower-triangle of its adjacency matrix. We will place this lower-triangular matrix into a bit vector, resulting in a representation similar to existing ones for orbit identification [16].

Download:

Fig 3. All the possible 3-graphettes.

https://doi.org/10.1371/journal.pone.0181570.g003

We now describe the idea of a canonical representative of each isomorph. To provide an explicit example, consider Fig 4, depicting the three isomorphic configurations of the 3-graphette that has exactly one edge. In order to determine that these graphettes are all isomorphic, we take the bit vector representation depicted, and define the lowest-numbered bitvector among all the isomorphs as the canonical representative. All the other isomorphs in the lookup table point to it. In this way, every graph on 3 nodes can be efficiently mapped to its canonical 3-isomorph.

Download:

Fig 4. All 3-graphettes with exactly one edge; the canonical one is the one with lowest integer representation (the middle one in this case).

Each of them is placed in a lookup table indexed by the bit vector representation of its adjacency matrix, pointing at the canonical one. In this way we can determine that it is the one-edge 3-graphette in constant time.

https://doi.org/10.1371/journal.pone.0181570.g004

We also automatically determine the number of automorphism orbits (see below) for each canonical isomorph. Table 1 represents, for various values of k, the number of bits b(k) required to store the lower-triangular matrix of all graphettes on k nodes (i.e., the length of the bit vector used to store this matrix); the resulting total number possible representations of k nodes (which is simply 2^b(k)); the number of canonical isomorphs NC(k); and the number of canonical automorphism orbits. Note that, to map each possible set of k nodes to their canonical isomorphs, the lookup table has 2^b(k) entries, and each entry has a value between 0 and NC(k) − 1. Note that for k up to 8, the graphettes can be stored in 32 bits. In that case, the maximum space required will be 32 × 2²⁸ = 1 GB. This is as far as we go, for now. Moore’s Law suggests that we may be able to go to k = 9 within a few years, and to k = 10 in perhaps a decade or two.

Download:

Table 1. For each value of k: The number of bits

required to store the lower-triangle of the adjacency matrix for an undirected k-graphette; the number of such k-graphettes counting all isomorphs which is just 2^b(k); the number of canonical k-graphettes (this will be the number of unique entries in the above lookup table [22], and up to k = 8, 14 bits is sufficient); and the total number of unique automorphism orbits (up to k = 8, 17 bits is sufficient) [27].

Note that up to k = 8, together the lookup table for canonical graphettes and their canonical orbits fits into 31 bits, allowing storage as a single 4-byte integer, with 1 bit to store whether the graphette is connected (i.e., also a graphlet). The suffixes K, M, G, T, P, and E represent exactly 2¹⁰, 2²⁰, 2³⁰, 2⁴⁰, 2⁵⁰ and 2⁶⁰, respectively.

https://doi.org/10.1371/journal.pone.0181570.t002

We note that the most expensive part of our algorithm is creating the lookup table between an arbitrary set of k nodes, to the canonical graphette represented by those k nodes; in the absence of a requirement for this lookup table, one could use orbit counting equations [16] to generate automorphism orbits up to k = 12.

Generating the lookup table from non-canonical to canonical graphettes

Assume the large graph G has n nodes labeled 0 through n − 1, and pick an arbitrary set of k nodes U = {u₀, u₁, …, u_{k − 1}}. Create the subgraph g induced on the nodes in , and let its bit vector representation B be of the form lower-triangular matrix described in Fig 4. We now describe how to create the lookup table that maps any such B to its canonical representative.

We iterate through all 2^b(k) bit vectors in order; for each value B, we check to see if it is isomorphic to any of the previously found canonical graphettes; if so, the lookup table value is set to the previously found canonical graphette; otherwise we have a new, previously unseen canonical graphette and the lookup table value is set to itself (B).

When checking for isomorphism between B and all previously found canonical graphettes, we use a relatively simple brute force approach. If the degree distribution of the two graphettes are different, we can immediately discard the pair as non-isomorphic; otherwise we resort to cycling through every permutation of the nodes checking each pair for graph equality, which has worst-case running time of k²k!. The total run time to compute the lookup table for a particular value k is thus bounded above by k²k! ⋅ NC(k) ⋅ 2^b(k), where k! is the maximum number of permutations we need to check if a non-canonical matches an existing canonical, k² is the worst-case running time to check if 2 specific permutations of k-graphettes are isomorphic, there are at most NC(k) canonicals to check against [22], and 2^b(k) = 2^{n(n − 1)/2} is the total number of undirected graphs on k nodes. More sophisticated approaches exist [23], which may more easily allow higher values of k.

This process can also be parallelized, which is what we did for k = 8. Essentially, we can split the 2^b(k) non-canonical graphettes into m sets of about 2^b(k)/m graphettes each, and then spread the computation across m machines. For each of the m sets S_i, we loop through all graphettes in that set and mark out which are isomorphic to each other. For each set S_i, we will find a set T_i of lowest-numbered “temporary” canonical graphettes in S_i, along with the map TC: S_i → T_i of which graphettes in S_i map to each temporary canonical in T_i. That is, for each graphette g ∈ S_i, ∃h ∈ T_i for which the temporary canonical TC(g) = h. Finally, once all the m sets have been evaluated in this way, a second stage passes through all the T_i, i = 0, …, m − 1, merging the temporary canonicals together into a final, global list of canonical graphettes, while also propagating these globally lowest-numbered canonicals back up through the m temporary canonical maps, so each graphette g globally maps to the globally lowest-numbered canonical; we call this process sifting for canonicals, and it may require several iterations to globally find the final list of canonicals. In this way we ran k = 8 in about a week across 600 cores, for a total of 600 CPU-weeks. This process could probably be made more efficient with smarter isomorphism checking [23, 24].

Graph automorphism and orbits

An isomorphism (from a graph g to itself) is called an automorphism.

While an isomorphism is just a permutation of the nodes, it is called an automorphism if it results in exactly the same labeling of the nodes in the same order—in other words exactly the same adjacency matrix. The set of all automorphisms of g will be called Aut(g).

An automorphism orbit, or just orbit, of g is a minimally sized collection of nodes from that remain invariant under every automorphism of g [25]. There can be more than one automorphism orbit, and each orbit can have anywhere from 1 to k member nodes; refer again to Fig 1 for some examples. More formally, a set of nodes ω constitute an orbit of g iff:

For any node u ∈ ω and any automorphism π of g, u ∈ ω ⟺ π(u) ∈ ω.
if nodes u, v ∈ ω, then there exists an automorphism π of g and a γ > 0 so that π^γ(u) = v.

Now, we shall prove a few relevant results that will be useful later for automatically enumerating the orbits.

Proposition 1. For each node and each automorphism , there exists an integer λ > 0 such that π^λ(u) = u.

Proof. Because π is an automorphism, Since is finite and π is bijective, the conclusion obviously follows.

We shall call the set of nodes the cycle of u under automorphism π, where λ is the smallest positive integer such that π^λ(u) = u.

Note that λ is not unique since π^λ(u) = π^2λ(u) = ⋯ = u. Also, π, u, and λ are tied together into triples such that knowing any two determines the third.

Corollary 1.1. π maps every node ∈ to a node (possibly same) ∈ .

Corollary 1.2. In any automorphism π of g, every node appears in exactly one cycle.

In other words, the cycles π creates are disjoint. (However, the cycles from different automorphisms might not be so.) Hence, it makes sense to say splitting an automorphism into its cycles. For example consider the permutation π = (201354) of (012345). Since π(0) = 2, π(2) = 1, π(1) = 0, the nodes (012) form a cycle. Now start with the next node, 3. π(3) = 3. So, (3) is another cycle. Finally, π(4) = 5, π(5) = 4, so, (45) form another cycle. Hence, the permutation (201354) is split into three cycles, namely (012), (3), (45).

Proposition 2. The orbits are disjoint. (In other words, each node appears in exactly one orbit.)

Proof. Assume the contrary, i.e., a node appears in two different orbits ω₁ and ω₂. According to the second condition, for any other node v ∈ ω₁, there exists an automorphism π of g and a γ so that π^γ(u) = v. However, from the first condition, Therefore, every node v ∈ ω₁ also belongs to ω₂. Hence, ω₁ ⊆ ω₂.

Following the same logic, ω₂ ⊆ ω₁, implying ω₁ = ω₂. ⇒⇐

Corollary 2.1. Each cycle appears in exactly one orbit, which completely contains that cycle.

Proof. If an orbit ω partially contains a cycle , then ω is not invariant under automorphism π, as π will map some node in ω (and ) to another node outside ω (but still in ) according to corollary 1.1, contradicting our definition of orbits. Since two orbits are disjoint, must appear only in ω, and in none of the other orbits.

These statements are enough to be able to find all orbits of each graphette, as we now demonstrate.

Automatically enumerating all orbits of a graph

From the propositions in the previous section, an algorithm to enumerate the orbits can be constructed like this:

Generate all automorphisms of g.
Split each automorphism into its cycles.
Merge the cycles from different automorphisms to form orbits.

Generating all automorphisms of g.

Referring to Algorithm 1, the function generateAutomorphisms() applies every possible permutation of over Adj(g). Each permutation creates an isomorph of Adj(g). If Adj(g) is unchanged under some permutation π, then by definition, π is an automorphism of g. Hence it is saved into Aut(g).

Two optimization strategies are employed:

No node is mapped to another node with unequal degree.
An automorphism of graph g is also an automorphism of its complement graph g′.

In practice, this algorithm generates all automorphisms of all the canonical graphettes up to size 8 in a matter of seconds. Nevertheless, for additional speed up in higher sizes, modern sophisticated automorphism detection algorithms [23, 24] may be used.

Splitting automorphisms into cycles.

An automorphism π of g is basically a permutation of nodes of g. Hence, to split π into cycles, we can repeatedly apply π over every node u ∈ π and remember the nodes u transforms into. This forms the cycle with node u, i.e. , which is saved in . After first visit, each node is marked visited to prevent more visits.

Merging cycles to enumerate orbits.

Suppose is the set of all cycles resulting from all the automorphisms of g.

To enumerate orbits from it, first each node u is colored with a unique color ω(u) = u. Then ω(u) is continuously updated to reflect the current color of u, as the nodes belonging to same orbits are gradually colored by identical color.

For the nodes of each cycle , we save their minimum color in ω_min, and then color all of them with ω_min. After coloring all the cycles in this way, nodes belonging to same orbits get the same color, and hence, get enumerated.

Algorithm 1 Automatically enumerating automorphism orbits of a graph

function generateAutomorphisms (Graph g)

Aut(g) = {} // Find the automorphisms of g

for each permutation π of do

apply π over Adj(g)

if Adj(g) == π(Adj(g)) then put π in Aut(g)

end if

end for

end function

function generateCycles (automorphism π)

for node u in π do

if u is not visited then

mark u visited

new cycle

node v = π(u)

while v != u do

put v in

mark v visited

v = π(v)

end while

put in

end if

end for

end function

function enumerateOrbits ()

for each node do ω(u) = u

end for

for cycle do

let ω_min = ∞

for node u ∈ c do ω_min = min(ω_min, ω(u))

end for

for node u ∈ c do ω(u) = ω_min

end for

end function

Proof of correctness of Algorithm 1

Here we prove that Algorithm 1 determines every orbit of g.

Suppose a set ω is among the final sets generated by Algorithm 1. We shall prove ω is an orbit of g by showing that it follows the two properties of orbits:

Let a node u ∈ ω form the cycle under automorphism π. The generateCycles function will apply π repeatedly until it finds a λ so that π^λ(u) = u and will therefore determine . Since the enumerateOrbits function assigned u to ω, it had also assigned all nodes in to ω. Hence u ∈ ω ⟺ π(u) ∈ ω.
Suppose nodes u, v ∈ ω. Then, either they belonged to a cycle from which they were assigned to a mutual set ω in enumerateOrbits function, or there is a third node w so that w shares separate cycles with u and v under different automorphisms π₁ and π₂. In the first case, u and v already belong to a common cycle. In the second case, assume and . Consider the permutation . Since composition of two automorphisms is an automorphism [26], ϕ is also an automorphism. And notice that implying u and v belong to a common cycle under ϕ.

Therefore, ω is indeed an orbit of g. Since each node was given a unique orbit color in the beginning of enumerateOrbits, every orbit of g will be eventually found by Algorithm 1.

Results and discussion

Using the algorithms described herein, we have enumerated all possible graphlets, including the generalization of disconnected counterparts called graphettes, up to size k = 8. The code and data can be found in http://github.com/Neehan/Faye. (Note that the github code uses the upper triangle matrix, though we intend to convert it to use the lower tringle as that representation has already been established [16].) We have also enumerated all orbits up to size k = 8. More importantly to the statistical sampling technique described in the Introduction, we have used a bit-vector representation of all possible adjacency matrices of all possible sets of up to k = 8 nodes and created a lookup table from the 2^{k(k − 1)/2} k-sets to their canonical graphette representatives. This allows us to determine, in constant time, the graphette represented by these k nodes, as well as the automorphism orbits of each nodes. This allows efficient estimation of both the global distribution of graphlets and orbits, as well as an estimation of the graphlet (or orbit) degree vector for each node in a large graph G.

Although the lookup tables for k > 8 are at present too big to compute or store, we could also use NAUTY or SAUCY to enumerate all the canonical graphettes up to size k = 12, and use our orbit generation code Algorithm 1 to determine all the orbits in all graphettes up to size k = 12. We have verified that previous results are consistent with ours in terms of the number of distinct graphettes [22] and orbits [27] determined, as displayed in Table 1.

In future work we will study which statistical sampling techniques most efficiently produce a good estimate of the complete graphlet and local (per-node) degree vectors. We also intend to study how this method may aid in cataloging of graphlets for database network queries, or in non-alignment network comparison [10]. Finally, there may be ways to combine our method with those of orbit counting equations [15, 16] to more efficiently produce samples of orbit counts.

Acknowledgments

We thank Sridevi Maharaj, Dillon Kanne, and the anonymous referees for several helpful suggestions on presentation.

References

1. Cook SA. The Complexity of Theorem-proving Procedures. In: Proceedings of the Third Annual ACM Symposium on Theory of Computing. STOC’71. New York, NY, USA: ACM; 1971. p. 151–158. Available from: http://doi.acm.org/10.1145/800157.805047.
2. Newman M. Networks: an introduction. 2010. United Slates: Oxford University Press Inc, New York. 2010; p. 1–2.
3. Emmert-Streib F, Dehmer M, Shi Y. Fifty years of graph matching, network alignment and network comparison. Information Sciences. 2016;346:180–197.
- View Article
- Google Scholar
4. Wilson RC, Zhu P. A study of graph spectra for comparing graphs and trees. Pattern Recognition. 2008;41(9):2833–2841.
- View Article
- Google Scholar
5. Thorne T, Stumpf MP. Graph spectral analysis of protein interaction network evolution. Journal of The Royal Society Interface. 2012; p. rsif20120220.
- View Article
- Google Scholar
6. Dehmer M, Emmert-Streib F, Shi Y. Interrelations of graph distance measures based on topological indices. PloS one. 2014;9(4):e94985. pmid:24759679
- View Article
- PubMed/NCBI
- Google Scholar
7. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. Network motifs: simple building blocks of complex networks. Science. 2002;298(5594):824–827. pmid:12399590
- View Article
- PubMed/NCBI
- Google Scholar
8. Pržulj N, Corneil DG, Jurisica I. Modeling interactome: scale-free or geometric? Bioinformatics. 2004;20(18):3508–3515. pmid:15284103
- View Article
- PubMed/NCBI
- Google Scholar
9. Pržulj N. Biological network comparison using graphlet degree distribution. Bioinformatics. 2007;23(2):e177–e183. pmid:17237089
- View Article
- PubMed/NCBI
- Google Scholar
10. Yaveroğlu ÖN, Malod-Dognin N, Davis D, Levnajic Z, Janjic V, Karapandza R, et al. Revealing the hidden language of complex networks. Scientific reports. 2014;4:4547. pmid:24686408
- View Article
- PubMed/NCBI
- Google Scholar
11. Kuchaiev O, Milenković T, Memišević V, Hayes W, Pržulj N. Topological network alignment uncovers biological function and phylogeny. Journal of The Royal Society Interface. 2010;7(50):1341–1354.
- View Article
- Google Scholar
12. Malod-Dognin N, Pržulj N. L-GRAAL: Lagrangian Graphlet-based Network Aligner. Bioinformatics. 2015;
- View Article
- Google Scholar
13. Saraph V, Milenković T. MAGNA: maximizing accuracy in global network alignment. Bioinformatics. 2014;30(20):2931–2940. pmid:25015987
- View Article
- PubMed/NCBI
- Google Scholar
14. Mamano N, Hayes W. SANA: Simulated Annealing far outperforms many other search algorithms for biological network alignment. Bioinformatics. 2017;0(0):8.
- View Article
- Google Scholar
15. Hočevar T, Demšar J. A combinatorial approach to graphlet counting. Bioinformatics. 2014;30(4):559–565. pmid:24336411
- View Article
- PubMed/NCBI
- Google Scholar
16. Melckenbeeck I, Audenaert P, Michoel T, Colle D, Pickavet M. An Algorithm to Automatically Generate the Combinatorial Orbit Counting Equations. PLoS ONE. 2016;11(1). pmid:26797021
- View Article
- PubMed/NCBI
- Google Scholar
17. Chatr-aryamontri A, Breitkreutz BJ, Heinicke S, Boucher L, Winter A, Stark C, et al. The BioGRID interaction database: 2013 update. Nucleic Acids Research. 2013;41(D1):D816–D823. pmid:23203989
- View Article
- PubMed/NCBI
- Google Scholar
18. Pillich RT, Chen J, Rynkov V, Welker D, Pratt D. NDEx: A Community Resource for Sharing and Publishing of Biological Networks. Protein Bioinformatics: From Protein Modifications and Networks to Proteomics. 2017; p. 271–301.
- View Article
- Google Scholar
19. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos JS, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. pmid:20003500
- View Article
- PubMed/NCBI
- Google Scholar
20. Rahman M, Bhuiyan MA, Al Hasan M. Graft: An efficient graphlet counting method for large graph analysis. IEEE Transactions on Knowledge and Data Engineering. 2014;26(10):2466–2478.
- View Article
- Google Scholar
21. Pržulj N, Corneil DG, Jurisica I. Efficient estimation of graphlet frequency distributions in protein—protein interaction networks. Bioinformatics. 2006;22(8):974–980. pmid:16452112
- View Article
- PubMed/NCBI
- Google Scholar
22. Sloane N. Online Encyclopedia of Integer Sequences (OEIS);. Available from: http://oeis.org/A000088.
23. Mckay BD. Nauty; 2010. Available from: http://users.cecs.anu.edu.au/~bdm/nauty.
24. Codenotti P, Katebi H, Sakallah KA, Markov IL. Conflict Analysis and Branching Heuristics in the Search for Graph Automorphisms. In: Tools with Artificial Intelligence (ICTAI). IEEE; 2013.
25. Gross JL. Graph Theory—Lecture 2: Structure and Representation—Part A;. Available from: http://www.cs.columbia.edu/~cs4203/files/GT-Lec2.pdf.
26. Automorphism of a group;. Available from: https://groupprops.subwiki.org/wiki/Automorphism_of_a_group.
27. Sloane N. Online Encyclopedia of Integer Sequences (OEIS);. Available from: http://oeis.org/A000666.

[ref1] 1. Cook SA. The Complexity of Theorem-proving Procedures. In: Proceedings of the Third Annual ACM Symposium on Theory of Computing. STOC’71. New York, NY, USA: ACM; 1971. p. 151–158. Available from: http://doi.acm.org/10.1145/800157.805047.

[ref2] 2. Newman M. Networks: an introduction. 2010. United Slates: Oxford University Press Inc, New York. 2010; p. 1–2.

[ref3] 3. Emmert-Streib F, Dehmer M, Shi Y. Fifty years of graph matching, network alignment and network comparison. Information Sciences. 2016;346:180–197.
View Article
Google Scholar

[4] View Article

[5] Google Scholar

[ref4] 4. Wilson RC, Zhu P. A study of graph spectra for comparing graphs and trees. Pattern Recognition. 2008;41(9):2833–2841.
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref5] 5. Thorne T, Stumpf MP. Graph spectral analysis of protein interaction network evolution. Journal of The Royal Society Interface. 2012; p. rsif20120220.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref6] 6. Dehmer M, Emmert-Streib F, Shi Y. Interrelations of graph distance measures based on topological indices. PloS one. 2014;9(4):e94985. pmid:24759679
View Article
PubMed/NCBI
Google Scholar

[13] View Article

[14] PubMed/NCBI

[15] Google Scholar

[ref7] 7. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. Network motifs: simple building blocks of complex networks. Science. 2002;298(5594):824–827. pmid:12399590
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref8] 8. Pržulj N, Corneil DG, Jurisica I. Modeling interactome: scale-free or geometric? Bioinformatics. 2004;20(18):3508–3515. pmid:15284103
View Article
PubMed/NCBI
Google Scholar

[21] View Article

[22] PubMed/NCBI

[23] Google Scholar

[ref9] 9. Pržulj N. Biological network comparison using graphlet degree distribution. Bioinformatics. 2007;23(2):e177–e183. pmid:17237089
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref10] 10. Yaveroğlu ÖN, Malod-Dognin N, Davis D, Levnajic Z, Janjic V, Karapandza R, et al. Revealing the hidden language of complex networks. Scientific reports. 2014;4:4547. pmid:24686408
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref11] 11. Kuchaiev O, Milenković T, Memišević V, Hayes W, Pržulj N. Topological network alignment uncovers biological function and phylogeny. Journal of The Royal Society Interface. 2010;7(50):1341–1354.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref12] 12. Malod-Dognin N, Pržulj N. L-GRAAL: Lagrangian Graphlet-based Network Aligner. Bioinformatics. 2015;
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref13] 13. Saraph V, Milenković T. MAGNA: maximizing accuracy in global network alignment. Bioinformatics. 2014;30(20):2931–2940. pmid:25015987
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref14] 14. Mamano N, Hayes W. SANA: Simulated Annealing far outperforms many other search algorithms for biological network alignment. Bioinformatics. 2017;0(0):8.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref15] 15. Hočevar T, Demšar J. A combinatorial approach to graphlet counting. Bioinformatics. 2014;30(4):559–565. pmid:24336411
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref16] 16. Melckenbeeck I, Audenaert P, Michoel T, Colle D, Pickavet M. An Algorithm to Automatically Generate the Combinatorial Orbit Counting Equations. PLoS ONE. 2016;11(1). pmid:26797021
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref17] 17. Chatr-aryamontri A, Breitkreutz BJ, Heinicke S, Boucher L, Winter A, Stark C, et al. The BioGRID interaction database: 2013 update. Nucleic Acids Research. 2013;41(D1):D816–D823. pmid:23203989
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref18] 18. Pillich RT, Chen J, Rynkov V, Welker D, Pratt D. NDEx: A Community Resource for Sharing and Publishing of Biological Networks. Protein Bioinformatics: From Protein Modifications and Networks to Proteomics. 2017; p. 271–301.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref19] 19. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos JS, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. pmid:20003500
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref20] 20. Rahman M, Bhuiyan MA, Al Hasan M. Graft: An efficient graphlet counting method for large graph analysis. IEEE Transactions on Knowledge and Data Engineering. 2014;26(10):2466–2478.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref21] 21. Pržulj N, Corneil DG, Jurisica I. Efficient estimation of graphlet frequency distributions in protein—protein interaction networks. Bioinformatics. 2006;22(8):974–980. pmid:16452112
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref22] 22. Sloane N. Online Encyclopedia of Integer Sequences (OEIS);. Available from: http://oeis.org/A000088.

[ref23] 23. Mckay BD. Nauty; 2010. Available from: http://users.cecs.anu.edu.au/~bdm/nauty.

[ref24] 24. Codenotti P, Katebi H, Sakallah KA, Markov IL. Conflict Analysis and Branching Heuristics in the Search for Graph Automorphisms. In: Tools with Artificial Intelligence (ICTAI). IEEE; 2013.

[ref25] 25. Gross JL. Graph Theory—Lecture 2: Structure and Representation—Part A;. Available from: http://www.cs.columbia.edu/~cs4203/files/GT-Lec2.pdf.

[ref26] 26. Automorphism of a group;. Available from: https://groupprops.subwiki.org/wiki/Automorphism_of_a_group.

[ref27] 27. Sloane N. Online Encyclopedia of Integer Sequences (OEIS);. Available from: http://oeis.org/A000666.

Figures

Abstract

Introduction

Materials and methods

Definitions and notations

Canonization of graphettes

Generating the lookup table from non-canonical to canonical graphettes

Graph automorphism and orbits

Automatically enumerating all orbits of a graph

Generating all automorphisms of g.

Splitting automorphisms into cycles.

Merging cycles to enumerate orbits.

Proof of correctness of Algorithm 1

Results and discussion

Acknowledgments

References