Graphlet-orbit Transitions (GoT): A fingerprint for temporal network comparison

Given a set of temporal networks, from different domains and with different sizes, how can we compare them? Can we identify evolutionary patterns that are both (i) characteristic and (ii) meaningful? We address these challenges by introducing a novel temporal and topological network fingerprint named Graphlet-orbit Transitions (GoT). We demonstrate that GoT provides very rich and interpretable network characterizations. Our work puts forward an extension of graphlets and uses the notion of orbits to encapsulate the roles of nodes in each subgraph. We build a transition matrix that keeps track of the temporal trajectory of nodes in terms of their orbits, therefore describing their evolution. We also introduce a metric (OTA) to compare two networks when considering these matrices. Our experiments show that networks representing similar systems have characteristic orbit transitions. GoT correctly groups synthetic networks pertaining to well-known graph models more accurately than competing static and dynamic state-of-the-art approaches by over 30%. Furthermore, our tests on real-world networks show that GoT produces highly interpretable results, which we use to provide insight into characteristic orbit transitions.


Introduction
Networks are widely used to model real-world systems as a way to uncover their topological features [1]. Most of these systems are not static; they exhibit a dynamic nature that can only be captured and truly understood by taking into account the network's temporal evolution [2]. Consider for instance a co-authorship network, where nodes are authors and edges represent joint publications. By narrowing our focus to static network snapshots we cannot answer relevant questions such as: how stable are connections over time? How is collaboration emerging and dissolving? How did we get to the current state of the network? Can we predict how the network will look like in the future?
One very powerful technique to uncover the underlying structure of a network is to decompose it into smaller components, namely subgraphs. Local network metrics such as network motifs [3] and graphlets [4] incorporate subgraph information to create rich topological metrics that have been successfully applied in many domains. For instance, motif analysis has identified the feed-forward loop as a recurring and crucial functional subgraph pattern in many real biological networks, such as gene regulation and metabolic networks [5,6], and it was able to identify and separate different families of networks [7]. Another example is the graphlet-degree-agreement, which incorporates the notion of orbits (the position of nodes inside each subgraph) and has been used for both network comparison and model fitting, showing that protein-protein interaction networks are more akin to geometric graphs than to traditional scale-free models [4]. These subgraph-based metrics were initially proposed for static networks, thus disregarding temporal information. A temporal extension for graphlets was put forward by Faisal et al. [8]; however, their approach summarizes each temporal snapshot, without offering a real inter-snapshot relation. Another work by Hulovatyy et al. [9] provides a clear inter-snapshot evolution, but they only allow for a single event (i.e., temporal edge) at each snapshot (i.e. just one edge addition between two nodes), therefore limiting the scope of possible graphlet transitions. Our method differs from these because our transition matrix establishes direct relations between snapshots, and we allow for any number of edge additions or removals in each snapshot, aiming for a broader and fully general set of possible transitions between two consecutive snapshots.
In this work we propose graphlet-orbit transitions (GOT) as a framework for characterizing and comparing evolving networks. Our method incorporates the rich topological information provided by subgraphs and extends it to the temporal domain. Orbit-transition matrix encapsulate not only how subgraphs are changing but also how the roles of the nodes themselves are evolving, leading to a more detailed fingerprint of the network. We also introduce the orbittransition-agreement metric (OTA) as a suitable way of comparing transition matrices of heterogeneous networks.
Next we underline our main contributions: • Effectiveness: GOT achieves over 30% higher precision (AUPR) on a set of well-known network models than other subgraph-based methods. On real data it produces groupings that match pre-determined categories better than competing approaches.
• Interpretability: Results produced by GOT are very easy to visualize (i.e. analyze specific transition frequencies between orbits). Therefore, GOT can be used as an interpretable temporal network fingerprint.
• Generability: Our method is used to compare heterogeneous networks from different domains and of different sizes. Furthermore, GOT is general and easily extensible to directed and multilayered networks, but these extensions are demanding in terms of storage and execution.
The remainder of this paper is organized as follows. First, we present related work on network comparison with a focus on subgraph-based metrics, both for static and temporal networks. Next, we introduce necessary graph terminology. Our proposed methodology for temporal network comparison is then presented. Finally, various metrics are used to compare and group (i) a set of synthetic data generated using different random-graph models and (ii) a set of real-world data pertaining to different classes, respectively.

Related work
For a general overview of temporal networks we refer the reader to the survey by Holme and Saramäki [2], and for an overview of existing temporal network metrics we refer to the survey by Nicosia et al. [10].
In this work, our focus is on network comparison, which is a crucial and very useful task. For example, if the properties of a given network are well known, it allows for knowledge transfer to similar networks [11]. Global metrics such as the degree distribution, characteristic path length and clustering coefficient give an idea of the structure of the networks and can be used to compare them. For instance, social networks tend to have a higher clustering coefficient and a smaller characteristic path length than spatial networks. However, these simple metrics are often not expressive enough, and subgraph-based metrics offer a much richer topological characterization.
While our focus here is temporal network comparison using subgraph-based metrics, we should note that pattern discovery on temporal networks is a much broader field. Shah et al. [12] propose an algorithm that concisely summarizes temporal networks by their characteristic temporal subnetworks. Similarly to their work, we also aim for interpretability, but we do graph comparison instead of graph summarization and our method does not require a null model to assess how a certain interesting pattern deviates from randomness. Yu et al. [13] put forward a matrix factorization method that characterizes the correlations of network's edges as a function of time. Their representation builds a dynamic profile of the network that can be used to predict future states. Here we do not specifically target link prediction; our graphletorbit transitions could possibly be used for the task but that is out of the scope of this work. Another task related with both network comparison and network visualization is network condensation [14]; its aim is to reduce the size of the temporal network significantly without much loss of information. Here we aim for interpretability but we do not address the problem of network condensation directly.
On the remainder of this section we give an overview of subgraph-based network metrics, discuss their usefulness and drawbacks, and pinpoint the advantages of our proposed extension when compared to them.

Static subgraphs
Network motif fingerprints [3] and graphlet-based metrics [4] have been widely used for network comparison. Motifs are overrepresented subgraphs that appear in larger numbers than expected, while graphlet degree distributions can be regarded as an extension of the node degree concept. Both approaches need to compute subgraph frequencies, which is a computationally very expensive task. Even just knowing if one subgraph appears or not on another network is already an NP-Complete problem [15]. Because of this, typically one uses only small subgraphs, but their frequencies already provide very rich characterizations. For instance, Milo et al. [7] compared network motifs with three and four nodes of four superfamilies: sensory networks, hyperlink networks, social networks and linguistic networks. By comparing motif significances they were able to correctly cluster all four superfamilies. Similar studies have been carried out to classify metabolic networks [6], co-authorship networks [16] or articles [17]. Another possibility is to, instead of directly comparing two real networks, compare a network with graph models. Przulj [4] showed that protein-protein interaction networks were more accurately described as random geometric graphs rather than as purely random or scalefree networks. Therefore, motifs and graphlets have been successfully used to compare static networks. However, metrics such as these disregard temporal information which can be crucial for a better understanding of network topology and function.

Temporal subgraphs
There are several approaches that incorporate the temporal evolution of subgraphs to study and characterize networks. Given the computationally demanding nature of the involved computations, very small or very specific classes of subgraphs are typically used. One example of this are triangles (cliques with three nodes, that is, fully connected sets of three nodes), which are meaningful for many applications since they are the simplest communities. Buriol et al. [18] and Pavan et al. [19] put forward a method to extract approximate and exact counts of all triangles in graph streaming environments. Finocchi et al. [20] proposed an algorithm to count cliques for sizes slightly larger than 3. Instead of triangles, Aliakbarpour et al. [21] focused on star shaped graphs. Our approach differs from these because we support any type of isomorphic subgraphs.
Kovanen et al. [22] presented an extension of network motifs for temporal networks and studied them on a phone call network. Their proposed temporal motifs have at most three events and a varying number of nodes. A similar approach for graphlets was put forward by Hulovatyy et al. [9], without a set limit on either the size of the graphlets nor the total number of events, but which only allows for one event at a time (i.e. their graphlets do not capture instances where two edges, or more, are added in the same snapshot). As a consequence, their method does not directly capture situations where several events occur at the same time, like when a loosely connected subgraph immediately becomes a clique or near-clique. By contrast, our approach supports any number of edge removals or additions, allowing for the analysis of networks that intrinsically have multiple events occurring at the same time.
Martin et al. [23] proposed a metric to evaluate network similarity based on how their triplets are evolving over time. Their metric is based on the loss or gain of edges from one state to the next. Our method differs from theirs since they do not differentiate by pair-wise graphlet transitions but only by increase or decrease of total edges between states (i.e. different pairwise transitions are not differentiated as long as they affect the same number of edges). The approach by Doroud et al. [24] is more similar to our own since they enumerate all transitions between 3-node directed subgraphs in network snapshots. That information is used in order to estimate the probability of a given transition in a social network and predict network changes. Kim et al. [25] also count all 3-node directed subgraphs to assess which motifs are present in different states of developing gene networks in different regions. These approaches are however limited to 3-node subgraphs and do not consider the roles of the individual nodes, that is, the orbits.
Faisal and Milenkovic [8] integrate graphlet frequency distributions on the analysis of temporal biological networks, but they only look at the global distribution in each snapshot, without offering the possibility to observe how each individual connected set of nodes is evolving. By contrast, we provide a direct transition matrix.
Another approach is followed by Jin et al. [26]; they introduce the notion of trend motifs geared towards weighted networks, trying to capture increasing or decreasing trends in the weights of specific sets of nodes inducing certain subgraphs. Therefore, their approach is only applicable to weighted networks, not identifying edge appearance and disappearance as in our case.

Graph terminology
A graph (or network) G is comprised of a set V(G) of vertices (or nodes) and a set E(G) of edges. A k-graph is a graph with k nodes. Nodes represent entities and edges correspond to relationships between them. Edges are represented as pairs of vertices of the form (u, v), where u, v 2 V(N). In directed graphs, edges (u, v) are ordered pairs (translated to "u goes to v") whereas in undirected graphs there is no order since the nodes are always reciprocally connected.
. Two subgraph occurrences are said to be isomorphic if it is possible to obtain one from the other just by changing the node labels without affecting their topology. The general task of evaluating if two graphs are isomorphic is called the graph isomorphism problem [27]. If two subgraphs occurrences S k and S 0 k are isomorphic, they are occurrences of the same subgraph G k (i.e., canonical class [28]). The frequency Fr(G k , G) is the number of occurrences of G k in G.

Subgraph census
Graphlets and network motifs have at their core the task of computing subgraph frequencies. This is known as the general subgraph census problem [29]: Definition 1 (Subgraph census) Given a set G k of non-isomorphic k-subgraphs and a graph G, determine the frequency of all induced occurrences of the subgraphs G k 2 G k in G. Two occurrences are considered different if they have at least one node or edge that they do not share. Other nodes and edges can overlap.

Static network motifs
Network motif analysis has two steps: first, subgraph census is performed on graph G, and second, motif significance is assessed on a null model [3]. Numerous null models can be used, such as the one proposed in [3] which generates a set RðGÞ of randomized networks that keep G's degree sequence. A subgraph census is then performed on each R 2 RðGÞ. The average frequency of a subgraph G k on the randomized networks is represented by < FrðG k ; RðGÞÞ >, and G k is considered a network motif if it appears with a significantly higher frequency in G than in RðGÞ. Motif scores, represented by δ k,G , are computed for each subgraph G k (Eq 1). As was proposed in [7], motif scores are normalized, represented by Δ k,G (Eq 2).
The motif-fingerprint of G is a vector containing all Δ k,G .

Static graphlets
Graphlets [4] are small induced non-isomorphic subgraphs that differentiate nodes according to their subgraph positions-or their orbits. In a graphlet G, the set of isomorphisms of G into itself comprises its set of automorphisms. Two vertices u and v are said to be equivalent (meaning "in the same orbit") when there exists some automorphism that maps u into v. Fig 1 presents all undirected graphlet-orbits with 4 nodes. For instance, the single node at the center of a star is topologically different from a leaf-node, whereas leaf-nodes are structurally equivalent. Therefore, a 4-star has only two orbits: a center-orbit O 1 which a single node inhabits, and a leaf-orbit O 2 where the remaining 3 nodes are at. Graphlets can be either undirected [4] or directed subgraphs [30]. Notation uG k is adopted for the set of all undirected graphlets with k nodes, and dG k for directed ones. The set of all orbits of uG k is expressed as uO k , and dO k is used for directed graphlets. Prefixes d and u are suppressed whenever concepts are applicable to both directed and undirected graphlets.
The graphlet-degree distribution is an extension of the node degree-distribution, and both can be used for network comparison. The degree distribution of a given graph G is obtained by counting 8u 2 V(G) how many direct connections u has. This task produces a vector of size n = |V(G)| containing the degrees of each u 2 V(G) which is transformed into a node-degree-distribution vector (or NDD G , for short) of size m, where m is the maximum degree, and NDD G,p is the number of nodes that have degree p. Notice from Fig 2 that the node-degree is essentially equivalent to orbit a (a two node subgraph): the node-degrees of each node n correspond to the first column of Fr G and that NDD G is simply the first line of GDD G .
Graphlet-degree-distributions (GDD G ) generalize the concept of NDD G for subgraphs bigger than the degree (i.e., subgraphs with more than two nodes). To compute the graphlet degree distribution it is necessary to count 8u 2 V(G) how many times u appears in some orbit j 2 O and repeat this process for the total jOj orbits, resulting in a graphlet degree vector GDV(u). A matrix F r (G) of n × m positions is obtained by joining the GDVs of all n nodes where each row of Fr(G) is GDV(u), u 2 V(G) and each position fr u, j is the number of times that node u appears in orbit j. Matrix GDD G is obtained directly from Fr G , where GDD j;p G is the number of nodes that appear p times in orbit j. This task is more formally defined in Definition 2 and an example of this process is given in Fig 2. For instance, node v has degree 2 (orbit a), appears once in a triangle (orbit b), and appears in 2 chains always in its periphery (orbit d).
Definition 2 (Graphlet-orbit frequency computation) Given a set G s of non-isomorphic subgraphs of size s and a graph G, determine the number of times fr i,j that each node i 2 V(G) appears in all the orbits j 2 O s . All occurrences are induced. Two occurrences are considered different if they have at least one node or edge that they do not share. Other nodes and edges can overlap.
As suggested by Pržulj [4], a GDD matrix is normalized with respect to its total area (i.e., the sum of all GDD j;p G ), before being used for comparison. The normalized values are represented below as n j;p G . Two networks G and H are compared by computing the differences between their respective normalized GDD matrices. One possibility to compare the two  matrices is to use the arithmetic mean GDD-agreement (GDA) introduced in [4]: m6) the GDA is obtained individually for each orbit j (Eq 3) and then the mean GDA is computed (Eq 4), m7) ranging from 0 to 1. High GDA(G, H) means that G and H are topologically similar. Note in Eq 3 that, in practice, p is never infinite because, in real graphs, the number of nodes that appear in a given orbit is always finite.
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi X þ1 GDAðG; HÞ ¼ 1 m Parameters ρ and s depend on the network; for instance, in scientific co-authorship networks one or two years are the more suitable value for ρ, while in conference interaction networks ρ is a few hours or a couple of days. The number of snapshots jS N j depends on the amount of available data. Networks can gain (or lose) new edges (or new nodes) from S i to S i+1 .

Computing temporal network similarity using Graphlet-orbit Transitions
In this section we describe our method and specify how it is used to measure temporal network similarity. Temporal network similarity assumes that there is a set of temporal patterns used as features; in our case those features are graphlet-orbit transitions (GOT). A metric is also necessary to compare the networks' feature space; for this purpose we developed orbit-transition agreement (OTA).

Graphlet-orbit Transitions (GOT)
In essence, our method performs graphlet-orbit frequency computation (as stated in Definition 2) for each snapshot of a given temporal network. Our aim is to analyze how the roles of nodes are evolving between snapshots. Only connected graphlets are taken into account because our focus is to study how groups evolve and, when a group becomes disconnected, that set of nodes is no longer a group. We should point out that disconnected graphlets would be very useful to analyze group formation, but computing their frequencies would require considering all possible n k À � subsets of nodes, effectively making it only feasible for small networks and very small k-graphlets.
Possible algorithms for graphlet-orbit frequency enumeration include [31,32]; here we use g-tries due to their general applicability and efficiency [33]. G-Tries can be used to store subgraphs of any given size, as long as they fit into main memory, with directed or undirected edges. Their efficiency comes from compressing the search space by taking advantage of common subtopolgies in the input subgraphs. For more details on how g-tries are created and how they are used for graphlet-orbit frequency enumeration we refer the reader to [30,33].
G-Tries allow the user to customize which subgraphs are enumerated (i.e. only stars, only cliques, etc.). To showcase the general scope of our method we search for all subgraphs of size k. Consider the two possible 3-node undirected graphlets, uG 3 , and their respective orbits from Fig 3. The chain-graph has two possible positions-the node can be either at its center or in one of its leaves-while all nodes in a triangle-graph are topologically equivalent. Given those three possible orbits, GoT counts how many times a node x changes from one to the other. There are 3 × 3 = 9 possible orbit transitions for uO 3 . A node can remain in its previous orbit, be it a (a) chain-center, (d) chain-periphery or (i) triangle-node; it can go from the chaincenter to the chain-periphery (b) or to a triangle-node (c), etc. All possibilities are shown in Algorithm 1 gives an overview of the enumeration process that builds the transition matrices. In order to obtain the frequencies of these transitions, for each snapshot t our method  enumerates all k-node occurrences {Node 1 , Node 2 , . . ., Node k } (lines 3-5) as well as the orbits of each node on that subgraph {Orbit 1 , Orbit 2 , . . ., Orbit k } (line 6). The occurrences discovered are pushed into a vector of the form {Node 1 , Node 2 , . . ., Node k , t, Orbit 1 , Orbit 2 , . . ., Orbit k } (line 7). When all occurrences have been enumerated one can simply sort the vector (line 9) and check if two consecutive vector positions contain the same subgraph (but were found in consecutive snapshots) (lines 10-11). As an example, occurrences {5, 8, 10, 12, t = 1, x, x, y, y} and {5, 8, 10, 12, t = 2, x, y, x, y} increment graphlet-orbit transitions T x;x ðNÞ, T x;y ðNÞ, T y;x ðNÞ and T y;y ðNÞ all by 1 (regardless of what orbits x and y represent). These transitions are used to build the network's transition matrix (lines [12][13][14]; they offer rich topological information that can be used for network summarization, Data Mining (e.g. they can be used as features for prediction tasks), network comparison and model fitting. In this work we compare different networks according to their transition matrices. Next we describe our metric for network comparison based on orbit-transition matrices.
Algorithm 1 Enumerate graphlet-orbit transitions of orbits O k on temporal network N.

Orbit temporal agreement (OTA)
After enumerating all graphlet-orbit transitions, and having constructed T x;y ðNÞ matrices for each network N of set N , our method computes their topological similarity. For experimental purposes, all orbits of size k are enumerated for each network, therefore each T x;y ðNÞ matrix consist of jOj � jOj transitions. Our approach is based on the arithmetic mean of orbit-transition differences. Matrices T x;y ðNÞ are normalized before computing orbit-transitions differences in order to reduce bias induced by different network sizes. Normalization is performed by row, as shown in Eq 5. This choice gives the same importance to common and rare orbits. Instead, one could normalize the matrix both by row and column if the scale of the original values is important. We feel that choosing the latter option would disregard differences in rare orbits, which arguably can differentiate networks better than common ones.
The similarity of two networks N 1 and N 2 is given by the average similarity of their graphlet-transition frequency for each graphlet-transition ntr i,j . Eq 6 presents this metric, which we name orbit-transition agreement (OTA).
Eq 6 produces an absolute value of agreement, i.e., OTA (N 1 , N 2 ) is always the same regardless of N . However, for our purposes a relative value of agreement is more suitable since we want to compare networks within a specific set. Consider maxðOTA N Þ and minðOTA N Þ as the highest and lowest OTA between two networks in set N , we normalize the OTA matrix to values between 0 and 1 (Eq 7). Using the normalized nOTA, the two most similar networks on the set N have nOTA = 1, and the two most different have nOTA = 0, while the other pairs have a normalized nOTA between 0 and 1.

Classifying synthetic data
We assess our method's grouping efficiency on a set of well-known graph models, and compare it against other subgraph-based methods. Our assumption is that an efficient method should report networks from the same model as more topologically alike than networks from different models due to their inherent structure (a similar approach was followed in [9]). All of the following experiments were conducted on an Intel i7-6700 CPU with 4 cores at 3.40 GHz; nevertheless, all programs were executed using a single-thread. Our code was written in C++11 and compiled with gcc 6.3.1 with O3 optimizations, while dynamic graphlets [9] were computed using the executable available at http://www3.nd.edu/~cone/DG/. Network motifs, graphlets and static-temporal graphlet vectors were obtained using our own code, available at http://www.dcc.fc.up.pt/~daparicio/software. The source code for graphlet-orbit transitions computation, as well as the data used for experimental purposes, can be found at http://www. dcc.fc.up.pt/got-wave/ and dx.doi.org/10.17504/protocols.io.tcqeivw.

Synthetic networks
In order to assess our method's clustering capabilities we tested it on dynamic versions created by us of three of the most well-known and studied random-graph models: Erdos-Rényi [34], Baràbasi-Albert [35] and Watts-Strogratz [36]. All synthetic networks have 5 snapshots and start with 250 nodes; these values were chosen in order to obtain results from every method in a reasonable time. New nodes and edges arrive in the networks [37], while the network's density remains stable throughout all snapshots (this behavior was observed in online social networks [38], for instance). New edges are created according to the model's criteria: either randomly [34], by preferential attachment [35] or through rewiring of past edges [36]. Noise is also injected in some of the networks by having edges randomly deleted: if P(e − ) = 0.5, half of the edges from S N,i are removed in S N,i+1 , whereas if P(e − ) = 0, all edges are permanent. Strogratz models control how much rewiring is performed; we use either no rewiring (β = 0) to build regular ring-networks or some rewiring (β = 0.2) to create small-world networks. From these variables we obtain a total of 6 different graph models (see Table 1), and build 25 instances/networks of each.

Methods
In our experiments we assess GOT's grouping capabilities and compare it to other subgraphbased methods. All methods rely on subgraph census, as defined in Definition 1, therefore an appropriate set of k-subgraphs needs to be chosen. Hulovatyy et al. [9] reported that dynamic graphlets with 4-nodes and 6-events achieved the best results for node classification tasks, and that increasing their size did not significantly improve results and greatly increased computational time. Therefore, 4-node and 6-event dynamic graphlets are enumerated and, for results to be directly comparable, 4-node subgraphs are enumerated for every other method. For generability, all possible 4-node subgraphs are enumerated instead of a specific set.
• Static motifs (SM) -4-node subgraphs (network motifs) are enumerated on a single aggregate network, and their motif score is evaluated on a set of 100 similar randomized networks (see [7]). Output: a vector of motif scores for each network.
• Graphlet-orbit transitions (GoT) -4-node graphlet-orbit transitions are enumerated. However, for the methods to be more easily comparable, the OTA is not computed. Output: a matrix of graphlet-orbit transitions.

Accuracy and speed comparison
For each node, we compute its SM, SG, STG, DG and GoT vectors and use them as the node's features. For instance, when considering GoT, each node is represented by its graphlet-orbit  transitions. The feature vectors over all nodes in a network form a #Nodes × #Features matrix. For two networks being compared, this results in two corresponding matrices with the same number of columns, whose rows are then joined together. Due to high dimensionality and sparsity of the joined matrix, we perform dimensionality reduction on the matrix using principal component analysis, keeping 99% of its variance. Then, we compute the topological similarity between every two nodes from different networks as the Euclidean similarity between the nodes' PCA-reduced feature vectors. Precision-recall curves (PRCs) were calculated and are presented in Fig 5. In order to compute the PRCs, � is initially set as 0 (meaning that the networks are exactly the same according to the metric) and is incremented by s = 0.001 at each step until s = 1 (the networks are totally distinct). Precision is the fraction of correctly grouped pairs while recall is the fraction of the correctly grouped pairs over all correct ones. The Area Under the Precision-Recall curve (AUPR) evaluates how well the metric groups the networks, and its value is approximated as shown in Eq 8. Pr(k) is the precision at step k, ΔRec(k) is the change in recall from steps k − 1 to k and N is the number of steps.
Our method (GOT) achieves the highest AUPR and has a gain of � 30% when compared to the second best (DG). STG obtained a higher AUPR than SG, but only by a small fraction, while DG performed significantly better than both, corroborating the results from [9]. Table 2 compares the execution times of the two approaches that achieve highest AUPR: GOT and DG

Fig 5. Obtained precision-recall curves on synthetic data (AUPR inside parentheses).
https://doi.org/10.1371/journal.pone.0205497.g005 Table 2. Time comparison of our method (GOT) and dynamic graphlets (DG). We show the speed gain of GoT over DG inside parentheses (e.g., 2x means 2 times faster).   [9]. For Baràbasi networks times are comparable; this is due to the high density of Baràbasi networks that induce a larger number of GOT transitions. In our experiments it is clear that our method is both faster and more accurate than competing approaches on a set of simple and well-known evolving random graph models. For the Strogratz model with no rewiring (P(β) = 0.0), in particular, our method is over 400 times faster than DG [9]. This high efficiency comes from the data-structure and algorithm that we use, based on g-tries [33].

Discussion on storage limitations and execution times
Typically subgraph census is only feasible for relatively small networks and small subgraphs. This is due to the number of occurrences, and therefore the resulting execution time, growing exponentially (i) on larger (or denser) networks and (ii) with the size of the subgraphs [33]. Consider Table 3, the number of possible orbits grows exponentially with the number of nodes; this effect is even more pronounced in directed subgraphs. GOT stores all possible graphlet-orbit transitions (or #Orbits 2 ), further increasing the strain on storage space. Assuming that the frequency of each transition is stored in a 4-byte integer, computing GOT requires 4 × #Orbits 2 bytes of memory. Therefore, � 2GB of RAM are needed when enumerating all 5-node directed GOT, which is feasible in modern PCs. However, enumerating all 6-node directed GOT is impractical. A possible solution to reduce the memory footprint is to remove orbit redundancies [4]. Another option is to avoid generating all possible orbits before enumeration and instead only build their representation during the enumeration phase as they occur in the network [39] since it is reasonable to expect that only a fraction of all possible orbits actually appear in a given network.
Another problem comes from the exponential increase in the number of occurrences as k grows. Table 4 shows the average number of occurrences for all 25 networks of three models from Table 1: Erdos and Barabasi with P(E − ) = 0 and Strogratz with P(β) = 0.2. It can also be noted that, despite having the same (a) number of nodes, (b) number of edges and (c) density, the number of occurrences varies greatly. Barabasi networks induce many more subgraph occurrences since they have a much higher cluestering coefficient than both Erdos and Strogratz networks. This quick growth in the number of occurrences makes subgraph census generally only feasible for small networks and small subgraphs. Previous work has extended g- tries to employ both sampling [40] and parallelism [41] to greatly reduce enumeration time, making larger subgraph sizes attainable (k> 6). We should note that our method requires a subgraph enumeration algorithm, since it needs not only the subgraph frequencies but also their occurrences, thus very efficient subgraph counting methods such as [32,42] can not be used. While subgraph enumeration is a very challenging problem, the field is very active and currently it is possible to scale to networks with millions of edges and subgraphs with more than 6 nodes by combining efficient algorithms, parallelism and sampling.

Grouping and analyzing real data
In this section we show the effectiveness of our proposed method in (a) grouping a set of realworld temporal networks by predetermined categories and (b) visualizing their characteristics. Therefore, our goals are to assess grouping capabilities but also interpretability. The set of realworld networks N comprises (i) co-authorship, (ii) crime, (iii) e-mail communication, (iv) physical interaction, (v) bipartite, (vi) soccer transfers and (vii) social media friendship networks, as shown on Table 5. Our hypothesis is that networks of the same category have similar topological structure [7,9], and this is verified by our method. We start by analyzing how networks are evolving over time (growing vs.shrinking, becoming more-connected vs.less-connected) as well as some of their global metrics, namely the averagedegree, the clustering-coefficient and the characteristic path-length. These metrics are easy to analyze visually and give some temporal information about the networks, but they are not posts on the other's wall 877k 3 months 16 [53] successful when grouping the networks due to their limitations. Static network motif (SM) and graphlet (SG) analyses are also conducted since they capture richer topological information than aforementioned global metrics. We compare the networks'motif-fingerprints and graphlet-degree distributions for 4-node subgraphs and assess how well the networks are being grouped using these metrics. We assess the clustering capabilities of static graphlet-orbits by computing the graphlet-degree-agreement (GDA) for each pair of networks and clustering set N accordingly: networks with high agreement are grouped together. We proceed in a similar fashion for our own graphlet-orbit transitons by computing the orbit-transition-agreement (OTA) for each pair of networks. Finally, we show that graphlet-orbit transition matrices offer highly interpretable information which displays both (a) clear differences between networks of different categories and (b) characteristic transitions in networks of the same category.
Here we do not show results for static temporal graphlets (STG) [8] because they did not show significant improvement in our synthetic data (Table 1) and they are harder to visualize than static graphlets. Dynamic graphlets (DG) [9] with 4 nodes and 5 or 6 events were computed in our set of networks N but, for some networks, the method did not output graphlet counts in a manageable time, making it impossible to compare with our method. Table 6 shows a comparison of the execution times between our method (GoT) and dynamic graphlets (DG). All possible 4-node graphlets were enumerated by both methods. Dynamic graphlets have the number of events as an additional parameter; thus, dynamic graphlets with 5 events (DG-5) or 6 events (DG-6) were separately computed. For some of the largest networks from Table 5 neither DG-5 nor DG-6 produced an output in a reasonable time (we allowed it to run for over a week). For the networks that both GoT and DG finished their computation it is clear that DG is much more computationally heavy. Furthermore, growing the number of events from 5 to 6 greatly increased computational time. For these reasons, dynamic graphlets were not included in our discussion of real-world networks.

Network overview
A set of 14 temporal networks N was collected from various sources in order to evaluate our method's efficiency (Table 5). N is comprised of active-edge networks, meaning that edges are only present in the snapshot S N,i in which they appear at and need to be re-activated in subsequent snapshots. The number of snapshots jS N j depends on the amount of available data of N. Long-term networks, such as co-authorship networks, have a bigger time-interval ρ when compared with short-term networks, such as physical interactions in social events. Fig 6 shows how the networks are evolving size-wise. Most of them are growing as time goes by. The fastest growing networks are arXiv hep-ph, Twitter, Facebook and Enron, which start at only �10% of their largest state, but Enron begins shrinking at t = 11 Table 6. Execution times of GOT with 4-nodes and DG with 4-nodes and 5 (DG-5) or 6 (DG-6) events. An asterisk ( � ) means that the method did not finish in the maximum running time of 1 week. and almost disappears by t = 16. Authenticus, Escorts and Transfers are also growing networks but they grow at a slower rate and become almost stagnant at the end, where they might have reached their full potential in terms of growth. Crime, physical interaction networks and Emails stay relatively stable in size. category with characteristic-path-length evolution, growth or average degree is observed from Figs 6, 7 and 8, respectively. Clustering coefficients were also computed for each network snapshot and it was found that they do not change with t. Co-authorship networks have the highest clustering coefficient at 0.5 while crime, bipartite, Facebook and Tranfers networks have near-zero clustering coefficient. The clustering coefficient is capable of grouping co-authorship networks together despite only considering 3-node subgraphs (triangles and 3-node chains). However, it does not distinguish between crime and bipartite networks, for instance. In these cases, one option to differentiate between networks with similar 3-node subgraphs is to analyze their 4-node network motifs and graphlets.

Network motifs
In our experiments we performed subgraph census with k = 4 and k = 5. Results are presented only for the smaller subgraphs since no significant differences were observed. Subgraph enumeration and necessary motif statistical significance tests were performed using GT-Scanner by [30], available at http://www.dcc.fc.up.pt/~daparicio/software. Network motifs were enumerated in the final aggregate state of each network from Table 5 and motif scores Δ s,N were computed for each subgraph G s (Eq 1) and normalized (Eq 2). Motif fingerprints between two networks are compared by computing their Euclidean distance. Fig 9 shows the obtained motif-fingerprints for all 4-node undirected subgraphs (uG 4 ), evaluated against 100 randomized networks. Co-authorship networks have a similar motif-profile where cliques and near-cliques are the most unexpectedly prevalent groups. This comes from the fact that scientific collaboration communities tend to be tightly connected [16]. The two crime networks have a similar network profile, with cliques and near-cliques being underrepresented while squares (G 3 ) are very overrepresented. This result was expected since our crime networks are geographical graphs with near-zero clustering coefficient and cities have a gridlike structure. Motif-profiles of the email networks are also relatively alike. Similar to coauthorship networks, cliques and near-cliques are the most overrepresented subgraphs. However, that is much more obvious in Enron than in Emails. This is probably because Emails is too small for the over-representation to become obvious since the small random networks are also capable of generating cliques and near-cliques. Physical interaction networks have a similar motif-fingerprint but it seems indistinguishable from co-authorship networks. Both types of networks have cliques and near-cliques as the most overrepresented subgraphs but those groups have different meanings. In co-authorship networks they might indicate communities but in the short-term networks they seem to simply indicate that everyone communicates with everyone by the end of the time-frame. Analyzing just the final aggregate network ignores relevant information, it is often more insightful to study how networks evolve. Bipartite networks have similar motif-fingerprints but they are also identical to those of crime networks. It should be pointed out that these networks are not pure bipartite networks but only nearly bipartite, otherwise subgraphs with cycles would never occur (G 3 , G 4 , G 5 and G 6 ). The Transfer network's motif fingerprint is also similar to the ones of crime and bipartite networks. Finally, Facebook's motif-profile is alike co-authorship network except G 3 is also overrepresented. Since Facebook's density is so low ( N E 2 � 183000 64000 2 � 0:004%) randomized networks have almost exclusively stars (G 1 ) and chains (G 2 ). Finally, by observing Fig 10(a) it is clear that motifs can only separate the networks into two big groups.

Static graphlets
Graphlets are subgraphs that take into account the position that nodes occupy in them. Fig 1  shows set uO 4 , representing all orbits of uG 4 . As stated in Problem 2, graphlet-agreement computation requires graphlet-orbits to be counted for all nodes in N. After obtaining GDD matrices for all N 2 N we compute the GDA for all network pairs. This results in a GDA i,j matrix where GDA(N i , N j ) � 0 means that networks N i and N j are completely different and GDA(N i , N j ) � 1 translates to N i and N j being very similar. Fig 10(b) shows the obtained GDA i,j matrix where each cell is colored according to the GDA value and similar networks have a darker cell. Graphlets group bipartite networks and most of the physical interactions networks correctly. By comparison, motif-fingerprints were only capable of finding two large groups, as discussed in the previous section. Neither motifs nor graphlets were able to cluster the set of networks correctly, which might indicate that temporal information is relevant to understand these networks.

Graphlet-orbit transitions
All possible transitions between graphlet-orbits from the set uO 4 (Fig 1) were considered in our experiments. Enumerating larger subgraphs was unnecessary since our method already achieves an adequate grouping for k = 4. Furthermore, larger subgraphs would be harder to visualize in paper format. Previous studies analyzed graphlet transitions [24,25], but graphletorbit transitions give more information since they account for changes of position in the same graphlet, for instance. A full orbit enumeration was performed for each snapshot S i in order to build graphlet-orbit transitions matrices uT 4 for each network from Table 5. Fig 11 shows the transition matrices of Authenticus, a collaboration network, and Conference, a physical interaction network. To simplify visualization, OTA values were discretized into three intervals, indicating rare ( 0; 1 3 � � ), common (� 1 3 ; 2 3 �) and frequent transitions (� 2 3 ; 1�). The main diagonal of the matrix suggests that all orbits are relatively stable in Authenticus except for the square-orbit O 5 . This is expected from collaboration networks since groups forming a square-graph are only loosely connected, therefore these groups tend to either become tighter (transition from O 5 to orbits 6-11) or nearly break apart (transition from O 5 to orbits 1-4). On the other hand, orbits in Conference are very unstable, i.e. they almost always change to another orbit. This is explained by the fact that, in short-term physical interaction groups, connections are mostly temporary and not a strong indicator of community. In this example, people meet in a conference and they might meet people that their "group" already met, but they are mostly interested in meeting more people than establishing strong groups. As another example, O 1 shows the effect of hubs in collaboration networks: it is more likely that a hub-like group will gain a new edge between previously unconnected authors (transition from O 1 to O 6 ) than for to remain unconnected. It is also common that not only one but two new edges appear (transition from O 1 to O 9 ). However, stars (O 1 /O 2 ) becoming cliques (O 11 ) is rare in Authenticus. Interestingly, Fig 12 shows that star-to-clique transitions are common in the other collaboration network, arXiv hep-ph. This might come from the fact that, while Authenticus data covers multiple areas, arXiv hep-ph only has publications pertaining to physicists; therefore, the observed differences may hint that physicists form tighter connections sooner than the average. It also seems that transitions are relatively slow in collaboration networks since it is rare for a loosely connected subgraph to become a densely connected subgraph in just a single jump. The same cannot be said about Conference, where behavior is almost chaotic. These are only some of the possible observations about transition matrices that highlight their interpretive power. Fig 10(c) clearly shows that graphlet-orbit transitions are able to correctly group our set of temporal networks while motifs and static graphlet-orbits could not (Fig 10(a) and 10(b)).
For completeness, Fig 12 presents orbit transitions for collaboration, physical interaction, crime and bipartite network. Matrices are discriminated by starting orbit (each matrix) and by network (each matrix-row) for an easier comparison. For instance, the first matrix from Fig 12 shows, for each network, the transitions of O 1 to all O k 2 uO 4 , the second one of O 2 to all O k 2 uO 4 , and so forth. To help visualization we inserted red lines that separate networks of different categories. It is clear that, while networks of the same category have some differences in their orbit-transition profile, they are more alike than networks from different categories. As an example: the transitions of O 1 clearly distinguish co-authorship from physical interaction networks, and also co-authorship from crime and bipartite networks. However, O 1 transitions are very similar for crime and bipartite networks. Distinguishing these two types of networks can be achieved by instead looking at O 5 , for instance. Orbit-transition fingerprints are a visual way of interpreting how a network evolves and present very detailed topological and temporal information.

Conclusion
In this paper we put forward a new extension of graphlets for temporal networks (GOT), as well as a novel metric (OTA) to compare them. The effectiveness of our proposed method was assessed on (a) synthetic networks pertaining to well-studied graph models and (b) a set of temporal networks with predetermined categories. Our method was shown to be more accurate than competing approaches on synthetic data. For real networks, we began by analyzing how global metrics evolved over time, namely the average-degree, clustering-coefficient and the characteristic path-length. While these metrics give insight into the topological structure of the networks, we could not visualize that networks of different categories are distinguishable using them. Static network motif and graphlet analyses were also conducted since they capture richer topological information than aforementioned global metrics. However, since they do not take temporal information into account, they are not adequate for temporal network comparison. Our method correctly clustered the set of networks by category, showcasing both the importance of temporal information in these networks and our method's clustering capabilities. Furthermore, our method produces highly interpretable results, leading to a better understanding of network evolution and differences between transitions of distinct networks.