Automatic Network Fingerprinting through Single-Node Motifs

Complex networks have been characterised by their specific connectivity patterns (network motifs), but their building blocks can also be identified and described by node-motifs—a combination of local network features. One technique to identify single node-motifs has been presented by Costa et al. (L. D. F. Costa, F. A. Rodrigues, C. C. Hilgetag, and M. Kaiser, Europhys. Lett., 87, 1, 2009). Here, we first suggest improvements to the method including how its parameters can be determined automatically. Such automatic routines make high-throughput studies of many networks feasible. Second, the new routines are validated in different network-series. Third, we provide an example of how the method can be used to analyse network time-series. In conclusion, we provide a robust method for systematically discovering and classifying characteristic nodes of a network. In contrast to classical motif analysis, our approach can identify individual components (here: nodes) that are specific to a network. Such special nodes, as hubs before, might be found to play critical roles in real-world networks.

: Adjacency matrix and belonging network during rewiring process as described by Watts and Strogatz [14] (number of steps in upper right corner; inset: network representation with nodes arranged on a circle). Beginning with a perfectly regular ring lattice (200 nodes) where each node is linked to its 6 closest neighbours (upper left), nodes are visited successively (one per step) and connections are randomly rewired with a probability of 40%. On the k th visit to a node, it is the link to the k th neighbour on the right, which is potentially rewired. After 600 steps (lower right) every node has been visited three times and on average 40% of all links have changed. Our implementation of the workflow including the automatic parameter determination is publicly available (http://www.biological-networks.org/). Below we briefly mention the workflowalternatives of the software.

Kernel-Bandwidth
By default, the kernel-bandwidth is scaled according to the standard deviation along each principal component (PC) axis. Variability-based re-shaping of the kernel function improves the overall fit of the PDF to the points (Fig. S4). The kernel can also be made symmetric by deactivating the tick-box below the panel shown in Fig. S3a.

Number of Singular Nodes w
By default, our implementation of the workflow chooses the number of singular nodes w according to equation (1). This setting can be overwritten by the user, who is provided with a plot of all nodes' probabilities together with their relative differences (Fig. S3b). Manually chosen values for w can thereby be easily related to the default setting.

Number of Motif Groups k
Our implementation of the workflow provides 3 alternatives to determine k: By default motifgroups are determined deterministically through cliques of overlapping ellipses, as illustrated in Fig. 4. The user can also choose to determine the number of clusters using the ellipses, but perform clustering with k-means++. As the last alternative, k-means++ can be applied with a customised number of motif-groups.

Run-Time Complexity
The bulk of the runtime of the BtA-workflow is spent on step 1 where all selected local network measures are computed. Run-time complexity here depends on the measures that are chosen to characterise each node. We estimated how computational costs scale for six common local measures [4]. Like Costa et al. [5], we selected the normalised average degree r, the coefficient of variation of the degrees of the immediate neighbours of a node cv, the clustering coefficient cc [7,14], the locality index loc, the hierarchical clustering coefficient of level two cc 2 [3], and the normalised node degree K. These measures have been applied to random networks, which have been generated according to the Erdős-Rényi (ER) [6], Watts-Strogatz (WS) [14], and Barabási and Albert (BA) [1] model. A polynomial function was fitted (root mean square error) to the average run-times to determine their dependence on network size. Additional to the size of the network, its edge density might also affect run-time, which is why we repeated the process while varying sparseness 1 . The results show relatively stable growth rates, irrespective of network model or connection density: Our naïve implementations of the six measures show run-time complexities ranging from linear to less than cubic (Fig. S1). Costs are thus comparatively cheap considering methods that identify specific connectivity patterns by counting occurrences of particular subgraphs (e.g. [2,[8][9][10][11]13]); such motif-counts also scale at least linearly in network size, but they show exponentially growing costs as the size of the motif-pattern increases [8]. In practice this often means that counts can not be determined for patterns involving 10 nodes or more [12], which renders some domains computationally intractable for this approach. In these cases the BtA-methodology might still be applicable: Local networks measures that only scale polynomially are comparatively fast to calculate and exceptional network characteristics can therefore even be identified in very large networks.
In total we identified 5 singular node motifs, which differ in frequency and time of emergence (Fig. 3c): Motifs 2, 3 and 5 appear right from the beginning of the rewiring process; motifs 2 and 5 gradually become more common over time, whereas 3 levels out after a transient peak. The remaining motifs 1 and especially 4 only become apparent at later stages towards which both become more frequent. The motifs' temporally dependent expression levels can be understood by looking at their individual characteristics (Fig. 3d): 1. A node according to motif 1 has relatively few connections in contrast to its well connected neighbourhood. Other nodes that were initially linked to it have rewired themselves and because connections only change in 40% of the cases, motif 1 is rarely observed in early stages.
2. This contrasts the early appearance of motif 2 for which corresponding nodes are signified by many connections to a rather sparsely connected neighbourhood. From the starting point of a ring lattice such configuration occurs, as re-linking one of the initial regular connections destroys the local neighbourhood structure; if multiple nodes re-wire to the same target its degree grows, which makes the node a candidate for motif 2.
3. Motif 3-nodes have relatively few connections and nodes in their neighbourhood are similar in number of links and corresponding targets. This characterisation fits nodes linked to others that have been disconnected from the direct neighbours only. Such is likely to be observed during the first 200 steps of the rewiring process, where links to the closest neighbour are replaced, which is in agreement with motif 3's early peak in frequency. Later, when connections to further away neighbours are lost, the locality index decreases and fewer nodes fulfil the profile of motif 3.
4. The 4 th motif mostly starts to appear when nodes are visited for the third time and some of the longest initial connections are replaced. At these late stages the ring lattice has undergone substantial perturbation, such that nodes differ widely in their degree and interconnectivity. Motif 4 describes rarely connected nodes whose neighbours have a diverse number of connections; but instead of being linked between each other, neighbours share other common targets.
5. The final motif 5 can be best characterised by its relation to the rest of the network, which shows a higher degree of connectivity than any node involved in the motif. Neighbours of the motif-node further vary in their number of connections and do not link to each other. This motif emerges early on, but its frequency rises more quickly during the last re-wiring-pass. During that time the last initial links are broken up and motif 5 emerges, as more parts of the network finally become sparse enough.