^{1}

^{2}

^{3}

^{1}

^{4}

^{5}

^{*}

Conceived and designed the experiments: CE LdFC FAR MK. Performed the experiments: CE. Analyzed the data: CE. Wrote the paper: CE MK.

The authors have declared that no competing interests exist.

Complex networks have been characterised by their specific connectivity patterns (network motifs), but their building blocks can also be identified and described by node-motifs—a combination of local network features. One technique to identify single node-motifs has been presented by Costa et al. (L. D. F. Costa, F. A. Rodrigues, C. C. Hilgetag, and M. Kaiser, Europhys. Lett.,

Networks appear in a variety of real-world systems ranging from biology to engineering

Mapping complex systems to networks revealed that some nodes are remarkably different from other nodes of the same network. For instance, hubs, characterized by a high number of connections (a high node degree), often play a fundamental role in protein-protein interaction networks and their removal can be lethal for an organism

Networks can describe complex systems whose interactivity between dynamical components changes over time. Altered connections between the elements (represented by nodes) may in turn feed back on the dynamics, such that the dynamical process and the network topology evolve in an adaptive fashion

Here, we describe a novel workflow for detecting characteristic single-node motifs and for using fingerprints for network comparison. Improvements compared to the previous approach include (a) automatic parameter determination, which facilitates high throughput analysis without user interaction, and (b) replacing the k-means clustering algorithm with a deterministic method to simplify the workflow and to improve robustness of results. In addition, we provide (c) a validation of our method and (d) an application to networks where the topology changes over time (addition or deletion of nodes or edges).

The application of single measures to complex networks has revealed important insights in many cases. However, as Newman and Leicht recognised

To solve this problem, two complementary approaches have been suggested. The first approach by Newman and Leicht groups nodes based on their connectivity without any further prior information

Analyses with focus on only one particular aspect of a network at a time might fail to detect irregularities or similarities in structure. The second approach is to avoid single measures and to use a combination of multiple ones

Each of these two approaches to identify patterns in complex networks has its drawbacks and advantages. The Newman and Leicht algorithm (NLA) does not depend on one or few network measures, but it works on network links directly. Networks are not restricted to undirected ones, but directed links and even weighted ones can be considered. The NLA requires the number of node-groups to be specified; this is also true for the approach by Costa et al. [Beyond the Average (BtA)], where the number of motif regions needs to be chosen

In the next section we suggest several improvements to the BtA-workflow (

Step 1: Choose set of local measures to characterise network nodes

In this paper we propose how to choose all relevant parameters of the BtA-workflow automatically (see

The first validation is on a network that is small enough to confirm BtA-results by eye: We use a family-tree from

With these reassuring results from a single network, we proceed by testing BtA on whole series: We generate structures with both regular components and exceptional ones, which BtA has to identify. In our first series we compose networks of two components: a regular ring lattice and a smaller Erdös-Rényi (ER)

Finally, we reverse the nature of the networks: The major component is set to a random network [ER, Barabási and Albert (BA)

In conclusion, the automatic parameter determination gives very satisfying results, which yield confidence in BtA's ability to identify outliers in complex networks autonomously.

Large complex networks are challenging to analyse; time-series of such are even more so. We attempt to approach this challenge by first condensing networks to a compact representation—mapping a series of changing structures to a uniform representation benefits the identification of trends and changes of such. Therefore, all networks have to be characterised, which we do using single node-motifs. These are identified with BtA using six common local measures: (1) the normalised average degree

Similar to random graphs, small-world networks have a small characteristic path length, but at the same time they exhibit a high degree of clustering, as regular ring lattices, for example. It has been discovered early that the combination of short paths plus grouping is inherent to social networks; a phenomenon that became known as six degrees of separation

In total we identified 5 single node-motifs, which differ in characteristics, frequency, and time of emergence (

Vertical axes in subfigures a–c correspond to number of outlier nodes

Overall, results are very satisfying and we are confident that BtA could be successfully applied to real networks using the automatic parameter determination.

In this paper we presented a method to detect single node-motifs automatically. The main parameters of the previous routine

Despite our improvements to BtA certain issues and room for further advancements remain. For example, reducing feature vectors in dimension inevitably leads to a loss of information, but which has to be kept withing reasonable bounds. In other words, although 6-dimensional feature vectors were suitably represented in the 2-dimensional plane so far

In cases where feature vectors can not be suitably represented in 2 dimensions, their display becomes more complicated and verifying a good fit of the estimated probability density function (PDF) is challenging. However, a good PDF estimate is needed in the BtA workflow to determine outlier nodes. Problems that might arise in these situations could possibly be circumvented by a major change to the workflow: The use of PCA to compact information offers the possibility to replace both the PDF estimation and the subsequent outlier selection with a more direct and non-parametric standard technique, which is

Considering the BtA workflow as presented in this paper, the technique can be easily adapted by including different local network measures in the analysis. Measures that take spacial aspects of the network into account, for instance, or those including link-weights can increase quality of the analysis. Finally, interest might not only lie on motifs formed by outlier nodes, but on all single node-motifs occurring in the network. In this case regular and singular nodes are not distinguished, but all of them have to be included in the network fingerprint.

BtA-fingerprinting of many networks has so far been prevented by the need to choose parameters during the analysis manually. With the improvements presented in this paper, however, it is now possible to process large numbers of networks fully unsupervised. Identified outliers are characteristic nodes that can provide a fingerprint of a network; fingerprinting networks from numerous domains allows easy characterisation and comparisons. As already demonstrated

To encourage the use of the BtA methodology by other researchers, we provide our implementation of the workflow including the automatic parameter determination for download (

In conclusion, we provide a robust method for systematically discovering and classifying characteristic nodes of a network. The distribution of node-classes results in a fingerprint, which in turn can give a classification of whole networks, as for network motifs of multiple nodes

Network nodes were characterised with six common local measures whose definitions are given in the following. Therefore, let

In the following sections we describe how appropriate settings for the parameters of the BtA-workflow can be found automatically. Kernel-bandwidth, the number of singular nodes

In step 3 of the workflow (

After assigning probabilities to all nodes (Step 3), nodes with an exceptionally low probability come into focus: These outliers correspond to points in the PCA-plane that are spatially separated from larger clusters; and this separation corresponds to abnormalities of measured features. Due to their uncommon characteristics, these nodes are considered singular. For humans it is usually straightforward to identify these non-regular nodes, if interactive visual aids are provided; we therefore implemented a graphical user interface for the whole workflow (Fig. S3). In the following, however, we discuss how the number of singular nodes

To determine singular nodes, automated methods can query the PDF that has been estimated earlier (Step 3). For example, for a fixed number

A necessary condition for a node being considered singular is a sufficiently low probability compared to other nodes. Additionally, it is desirable that singular nodes appear somewhat separated from the regular ones, which renders their classification non-arbitrary. We therefore suggest to set the borderline between regular and singular nodes where the steepest increase in probability among the low probability nodes appears. Nodes with a probability below mean

Once nodes are classified as either regular or singular (Step 4), clusters of singular nodes (

Optimal groupings of singular nodes consider well separated nodes to be in different clusters, whereas relatively close ones are grouped together. The standard deviations along each PC-axis can serve as a threshold for

Ellipses are centred on each point with dimensions corresponding to standard deviations

1. Similar to an adjacency matrix, create a binary

where

2. Determine a corresponding

3. Colour all cliques differently, which finally yields the motif-groups.

Note that this procedure has no parameter controlling the number of motif-groups, but these are identified automatically. Instead of using this method to actually group nodes it might also serve as a pre-processing step in order to determine the number of clusters

The prevalence of small-world networks has risen questions about their generating mechanisms and different explanatory models have been proposed

Supplementary figures, notes on software implementation, notes on run-time complexity, and detailed discussion of the small-world network results.

(PDF)