SPectral graph theory And Random walK (SPARK) toolbox for static and dynamic characterization of (di)graphs: A tutorial

Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi

doi:10.1371/journal.pone.0319031

Abstract

Spectral graph theory and its applications constitute an important forward step in modern network theory. Its increasing consensus over the last decades fostered the development of innovative tools, allowing network theory to model a variety of different scenarios while answering questions of increasing complexity. Nevertheless, a comprehensive understanding of spectral graph theory’s principles requires a solid technical background which, in many cases, prevents its diffusion through the scientific community. To overcome such an issue, we developed and released an open-source MATLAB toolbox - SPectral graph theory And Random walK (SPARK) toolbox - that combines spectral graph theory and random walk concepts to provide a both static and dynamic characterization of digraphs. Once described the theoretical principles grounding the toolbox, we presented SPARK structure and the list of available indices and measures. SPARK was then tested in a variety of scenarios including: two-toy examples on synthetic networks, an example using public datasets in which SPARK was used as an unsupervised binary classifier and a real data scenario relying on functional brain networks extracted from the EEG data recorded from two stroke patients in resting state condition. Results from both synthetic and real data showed that indices extracted using SPARK toolbox allow to correctly characterize the topology of a bi-compartmental network. Furthermore, they could also be used to find the “optimal” vertex set partition (i.e., the one that minimizes the number of between-cluster links) for the underlying network and compare it to a given a priori partition. Finally, the application to real EEG-based networks provides a practical case study where the SPARK toolbox was used to describe networks’ alterations in stroke patients and put them in relation to their motor impairment.

Citation: Ranieri A, Pichiorri F, Colamarino E, Cincotti F, Mattia D, Toppi J (2025) SPectral graph theory And Random walK (SPARK) toolbox for static and dynamic characterization of (di)graphs: A tutorial. PLoS One 20(6): e0319031. https://doi.org/10.1371/journal.pone.0319031

Editor: Longxiu Huang, Michigan State University, UNITED STATES OF AMERICA

Received: October 11, 2024; Accepted: January 25, 2025; Published: June 5, 2025

Copyright: © 2025 Ranieri et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data and scripts used in the paper are available from github: https://github.com/AndreaRani/SPARK.

Funding: This project is partially funded by the Italian National Ministry of Health (grants # RF-2018-12365210, RF-2019-12369396, GR2019-12369207) and by Sapienza University of Rome (LEAF, RM123188F229EC72). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Network theory and its applications are currently being studied across many scientific fields, from gene and protein networks to the World Wide Web [1,2]. The ubiquity of complex networks in science and technology boosted the development of new powerful tools and applications that brought graph theory up to another level. Network theory and its applications thus gained increasing consensus over the years, leading a growing request in tools for the analysis of complex systems. Especially in the last decades, an increasing number of open-source frameworks have been developed as to make those tools accessible to a larger cohort of interested scientists. As a noteworthy example with respect to the framework of modern neuroscience, the Brain Connectivity Toolbox [3] represents a groundbreaking work in this sense. The toolbox is completely open-source and offers a comprehensive list of topological measures, as well as generative network models and visualization functions for brain networks. Apart from neuroscience, a plethora of different scientific fields have been contaminated by network theory, thus contributing to its increasing popularity. In the emerging field of graph signal processing, for example, the Graph Signal Processing Toolbox [4] is an open-source MATLAB toolbox that can be used to tackle graph-related problems with a signal processing approach. Similarly to the previous one, Gasper [5] provides a suitable framework for graph signal processing and graph visualization in R. In such a stimulating context, network science benefited from a huge methodological contribution from various disciplines, such as physics and theoretical computer science. This allowed modern scientists to investigate questions of increasing complexity concerning, for example, the dynamical behaviour of the underlying system or its propensity to organize into interacting communities. In this scenario, spectral graph theory stems from the application of the spectral theorem to network problems. While classic approaches rely on single node features or global descriptors, spectral graph theory provides insights into the cluster-to-cluster communication within the network, enabling to study phenomena like community detection, diffusion processes and synchronization. This shift in perspective facilitates a deeper understanding of complex systems by linking algebraic properties to networks dynamic features. Furthermore, its intimate relationship with random walk processes makes spectral graph theory a powerful tool for network analysis both at topological and dynamic level [6].

Nowadays, spectral graph theory and its applications are widely used in different scientific fields, from resource allocation strategies and research operation problems [7,8] to geometric deep learning [9–11] and modern biomedicine. The field of neuroengineering, for example, largely benefits from network theory to characterize both physiological and pathological brain networks [12]. In this framework, given the propensity of the human brain to naturally organize into interacting communities [13], spectral graph theory could be employed to analyze the structural properties of brain networks through their eigenvalues and eigenvectors. In addition, a random walk perspective would complement the analysis by modelling the information flow across the network, providing a mathematical framework for understanding how quickly a system converges to a steady state, how efficiently information spreads between clusters and how dynamic properties vary according to the topology of the network. Modern studies on both synthetic and real networks agreed to characterize brain injuries as network diseases, as the effects of a lateralized traumatic event have been shown to spread throughout the network [12,14]. As a noteworthy example, stroke embodies one of the most representative scenarios where the effects of a lateralized traumatic event spread all over the network [15–17]. Rather than focusing on single node features or global descriptors, a cluster-level characterization of the underlying network is thus desirable. However, to the best of our knowledge only two groups pioneeringly applied spectral graph theory to characterize the “disconnection syndrome” characterizing the Alzheimer disease. Specifically, spectral indices allowed to point out changes in connectedness characterizing MEG-derived resting-state functional networks in Alzheimer patients, combined with a less-efficient network configuration characterizing dynamic processes [18]. Furthermore, brain tractography connectivity networks exhibit a higher number of disconnected components and lower spectral energy in Alzheimer patients when compared to healthy controls [19]. However, the sophisticated mathematical background combined with the lack of user-friendly toolboxes discouraged the diffusion of spectral graph theory in clinical neuroscience. To the best of our knowledge, spectral graph theory has never been applied to describe functional alteration in brain pathologies apart from Alzheimer’s disease. To fill such gap, this work introduces the Spectral And Random walK (SPARK) toolbox for (di)graphs, a new open-source MATLAB toolbox for the analysis of graphs. The toolbox is written using MATLAB 2023b and the experiments were conducted on a computer machine running Windows11, equipped with an Intel Core i7 processor at 2.8 GHz and a 16 GB RAM. Compatibility for Windows operating system requires Windows 10 (version 21H2 or higher), Windows 11, Windows Server 2019 or Windows Server 2022. For a computer equipped with Windows, MATLAB 2023b requires any Intel or AMD x86-64 processor with two or more cores with 8 GB minimum. Compatibility with other operating systems can be checked directly on the MathWorks website. The leading idea behind SPARK is to combine spectral graph theory and random walk to characterize digraphs from both a static and dynamic perspective. As to do so, an introduction to spectral graph theory and random walk fundamentals is first provided to introduce the reader with key concepts and notions used in this paper. Then, in the second part of the paper, SPARK was tested on both surrogate and real data across different application fields as to assess its versatility and adaptability to different scenarios.

2. Materials and methods

2.1. Spectral graph theory: background and basic facts

2.1.1. Laplacian matrix and its properties.

Let be an undirected graph with vertex set and edge set : if , can be efficiently described by its adjacency matrix . The binary adjacency matrix of a graph is a square matrix with elements:

(1)

Given a generic node , the total number of its neighbors can be obtained by summing the direct edges that links to any other node in the graph: this number represents the degree of a node

(2)

The degrees of all the nodes in can be collected in a diagonal matrix usually called degree matrix.

(3)

The core of spectral graph theory relies on the properties of the spectrum of the Laplacian matrix associated with , which is defined as

(4)

or, elementwise

(5)

Since is supposed to be undirected, its adjacency matrix is symmetric (i.e., ) and is also a symmetric matrix. Furthermore, has real eigenvalues in the range , where is the maximum degree of the nodes in the graph, and the corresponding eigenvectors are real and form an orthogonal basis for the range of [20]. By construction, since ’s rows sum to zero it holds that

(6)

meaning that 0 is always an eigenvalue for and the corresponding eigenvector is the vector of all ones . is also a positive-semidefinite matrix [21] and thus is the smallest eigenvalue of . Since, ’s eigenvalues are real they can be ordered in nondecreasing order: . The study of ’s spectrum led to important considerations on the second smallest eigenvalue and its corresponding eigenvector (respectively known as the “algebraic connectivity” of and the “Fiedler vector” [22]). Specifically, the geometric multiplicity of the eigenvalue (i.e., the number of linearly independent eigenvectors associated to ) represents the number of connected components of , which are groups of tightly connected nodes.

Claim. Let be the eigenvalues of . Then is connected if and only if and the multiplicity of the zero eigenvalue is equal to the number of connected components of .

As to deal with normalized quantities, it is useful to define the normalized version of the Laplacian matrix

(7)

with entries

(8)

Since is symmetric, its normalized version is still a symmetric matrix with real eigenvalues and its eigenvectors form an orthogonal basis for . As for the unnormalized Laplacian matrix, is also a positive semidefinite matrix, and its eigenvalues lies in the interval. Furthermore, ’s rows sum to zero meaning that 0 is still an eigenvalue and the corresponding eigenvector is .

2.1.2. Minimum cut partition and algebraic connectivity of a graph.

The spectrum of the Laplacian matrix associated with has a crucial role in the minimum-cut cluster problem [23,24]. Specifically, given a vertex set partition such that and , the normalized cut [24] induced by the partition is given by

(9)

where represents the total number of crossing edges between the subsets of nodes and and is the total number of connection between the nodes in and the whole vertex set (analogously for ). As to point out the role of the Laplacian spectrum, consider an affiliation vector for the partition with entries

(10)

It is possible to write the normalized cut as follows

(11)

where is the Rayleigh quotient for and . Minimizing the cost function in Eq. 11 leads to a partition of the vertex set such that the between-cluster connections are the minimum possible. However, the problem in Eq. 11 is known to be NP-hard since it minimizes the cost function over the set of every possible cut in [24]. As to deal with NP-hardness, we drop the restriction for to be in the form specified in Eq. 10. The Rayleigh quotient of a symmetric matrix has the nice property to be bounded by (lower bound) and (upper bound): being , this writes

(12)

The affiliation vector such that is . By a formal point of view, it minimizes identifying a single cluster made by the whole network itself: the normalized cut is null in that sense, but it is useless in a practical way. The original problem can thus be slightly modified to neglect the solution . Specifically, minimizing the Rayleigh quotient over the set of orthogonal to [25] leads to

(13)

The solution to the minimization problem in Eq. 13 is given by the eigenvector associated to the second smallest eigenvalue of , being the eigenvectors of orthogonal each other. Specifically, being the eigenvector associated to the second smallest eigenvalue of , Eq. 13 leads to

(14)

where (i.e., the second smallest eigenvalue of ) is the so-called “algebraic connectivity” of and its corresponding eigenvector (known as the “Fiedler vector”) identifies the partition of the vertex set that minimizes the cost function in Eq. 13. The way the algebraic connectivity affects the topology of a given network is represented Fig 1, where the topological representation of three different networks (together with their corresponding adjacency matrices) is shown to vary according to different values of .

Download:

Fig 1. Examples of undirected networks with different algebraic connectivity:

a) , b) and c) . Each network is represented through its binary adjacency matrix and the corresponding graph form. The adjacency matrix (lower part of each panel) is represented as an grid of pixels, where is the number of nodes and each link connects a node (row index) to another one (column index) in the underlying network. Pixels are colored in yellow if the connection exists and in blue otherwise. Graph representation in the upper part of each panel codes instead nodes as blue solid dots and the existing connections as solid lines linking two different nodes.

https://doi.org/10.1371/journal.pone.0319031.g001

2.2. Markov chain and random walk: Background and basic facts

2.2.1. Markov chain and random walk on graphs.

Markov processes are an elementary family of stochastic models describing the temporal evolution of an infinite sequence of random variables on a certain state space , where is a time set [26]. Markovian processes are governed by the so-called Markov property according to which the value of the random variable at time only depends on its value at time .

(15)

When is a discrete set, the sequence is usually referred as a Markov chain and the matrix collects the probabilities to move from the state to . According to this view, given a graph one can think of the vertex set as a state space having different states while specifies how different pair of states relate to each other. This new perspective shifts the emphasis toward a dynamic framework according to which describes the topology of a Markov Chain with as a state-space. Interestingly, the topological structure of has been proved to influence the evolution of dynamic phenomena running on the graph itself (e.g., diffusion, synchronization, consensus and so on) [6]. In this framework, random walk processes are of particular interest due to their intimate relationship with spectral graph theory.

As the name itself suggests, a random walk on describes the imaginary walk of an agent over the vertex set . Specifically, a random walk on is fully described by a transition probability matrix governing the behavior of the walker on each node of the network [27]. The transition probability matrix provides a probabilistic characterization of which is fundamental in the description of Markovian processes. For a graph with nodes, is a stochastic matrix with entries describing the probability of a random walker to jump from a node to another one in the underlying graph. is intimately related to the topology of since

(16)

or elementwise

(17)

being the diagonal matrix collecting the degree of each node. The knowledge of allows to track the evolution of the chain over the time since the Markov property guarantees that the configuration of the chain at only depends on its configuration at time . Specifically, let contain the configuration of the chain (i.e., the probability for a walker to be in each state ) at . The configuration at the next step (i.e., ) depends on the probability of being in each state at the current time (encoded in ) and the probability to make a transition from to , (i.e., encoded in the element of ). Therefore, can be derived from as

(18)

For a generic timestep Eq. 18 becomes

(19)

where represent the configuration of the chain at time and respectively and is the transition probability matrix governing the random walk on . As to know the configuration of the chain after steps from a generic time , it is sufficient to iteratively apply Eq. 19 times.

(20)

2.2.2 . Spectral graph theory meets random walk.

The intimate relationship between random walk and spectral graph theory relies on the fact that is strongly influenced by the structural properties of . Specifically, pre- and post-multiplying each side of Eq. 4 by leads to

(21)

Recalling that is a diagonal matrix (and so is ), combining Eq. 21 with Eqs. 7 and 16 leads to

(22)

In a dynamic perspective, the configuration update for the random walk process (Eq. 19) can be expressed as:

(23)

The topology of the network thus strongly influences the nature of the dynamic phenomena running on the underlying graph [6]. Specifically, since the eigenvalues of can be easily derived from those of (and vice versa) it can be appreciated that a random walk process converges faster in strongly organized networks (further details about the eigen-decomposition of can be found in the Supporting Information). For an ergodic chain (i.e., a chain that is guaranteed to converge to a unique stationary distribution) the speed of convergence to the stationary distribution can be estimated by means of the relaxation time [27], which is defined as

(24)

where the spectral gap associated to a Markov Chain relates to the second largest eigenvalue of

(25)

2.3. Spectral graph theory for digraphs

2.3.1. The PageRank random walk.

A directed graph (digraph) is a graph in which each edge has a specific direction (i.e., the edge spreads from node and sinks into , while for sink and source are switched). The directionality of each edge allows to distinguish between inward and outward connections for each node. For a generic node , its in-degree represents the total number of incoming edges which can be found by summing over the -th row of the adjacency matrix .

(26)

Similarly, the total number of outward links from can be found summing over the -th column of the adjacency matrix and is known as the out-degree of .

(27)

The total degree of a given node in a digraph is simply the sum of its in- and out-degree. Starting from Eq. 16, the transition matrix describing a random walk on a digraph simply becomes

(28)

or elementwise

(29)

where is a diagonal matrix containing the out-degree of each node on its main diagonal.

(30)

However, in the classic random walk not uniqueness nor convergence of the process are guaranteed. As to deal with an ergodic chain, a common issue is to refer to a modified version of the classic problem known as “PageRank random walk” (also known as the random surfer model) [28]. The transition matrix governing the behavior of a random surfer is given by

(31)

Where denotes the Moore-Penrose pseudoinverse of , is a vector with all entries equal to zero but when and is a parameter that manages the escape probability from absorbing states. Matrix defined in Eq. 31 is stochastic and describes an ergodic chain [29], thus its long-term behavior is guaranteed to converge to a unique stationary distribution. The values in assign a transition probability of to those nodes having , while a probability of is uniformlyassigned to nodes without outward links. The higher the value of , the more accurately the topology of the original chain will be preserved.

2.3.2. Symmetrized Laplacian matrix for digraphs.

The information about the direction of each edge reflects in a non-symmetric adjacency matrix and hence in a non-symmetric Laplacian matrix. Thus, the eigenvalues of are not guaranteed to be real and the results obtained for the undirected case cannot be directly applied to digraphs. As to overcome such a limitation, Chung proposed a symmetrized version of the combinatorial Laplacian [30]

(32)

Together with its normalized version

(33)

where is the probability transition matrix of the Markov Chain governing the random walk on , denotes its conjugate transpose and is a diagonal matrix with entries equal to the stationary distribution of the chain.

(34)

Clearly (and its normalized version ) is a symmetric matrix, thus its eigenvectors form an orthogonal basis for the range of (). Furthermore, 0 is always an eigenvalue and the corresponding eigenvector is the vector of all ones (for , should be substituted with its scaled version ). Although Chung’s symmetrization allows to extend the above considerations to digraphs, it should be pointed out that only provides a partial description of the original digraph since different digraphs can have the same . To overcome such a limitation, Li and Zhang defined the normalized Laplacian matrix for digraphs (i.e., Diplacian) [31] as

(35)

Or elementwise

(36)

The Diplacian matrix can be decomposed as a sum of a symmetric and skew-symmetric part, respectively indicated as and :

(37)

where is the symmetrized Laplacian for directed graphs defined by Chung [30] (already introduced in Eq. 33) and captures the differences between and its transpose. Clearly, when is symmetric , hence and thus .

2.4. The SPARK toolbox

SPARK is a general-purpose framework that can fit a wide range of scenarios. The toolbox is open source and can be easily downloaded from the following link: https://github.com/AndreaRani/SPARK. While existing toolboxes provide a low-level characterization of the underlying system, SPARK toolbox is able to answer questions of increased complexity (concerning, for example, the dynamical behavior of the underlying network or its propensity to organize into interacting communities). More in detail, SPARK combines spectral graph theory and random walk concepts to provide a both static and dynamic characterization of digraphs. As illustrated in the following paragraph, SPARK can fit different scenarios to answer a variety of questions. For example, let be a network topologically described by a binary adjacency matrix . When an a priori vertex set partition is given, SPARK can be used to characterize the partition itself relying on the measures in Table 1 as well as to investigate to which extent the given partition overlaps the minimum cut one. On the other hand, when no partition is given SPARK can be used to characterize the propensity of a given network to organize into interacting communities. If the underlying graph is described by a weighted adjacency matrix , the expressions in Table 1 can be easily turned into their weighted version replacing with .

Download:

Table 1. Full list of indices available on SPARK. Each index can be extracted from the weighted directed, weighted undirected, unweighted directed and unweighted undirected version of the underlying network depending on the scenario to deal with.

https://doi.org/10.1371/journal.pone.0319031.t001

From a practical point of view, the structure of Table 1 naturally reflects into a bipartite organization according to which the main folder SPARKtoolbox is split into two subfolders as shown in Fig 2: SPARKtoolbox/SpectralGT and SPARKtoolbox/RandomWalk. Both SPARKtoolbox/SpectralGT and SPARKtoolbox/RandomWalk are further organized into four subfolders containing the MATLAB scripts for weighted directed (/weighted_directed), weighted undirected (/weighted_undirected), unweighted directed (/unweighted_directed) and unweighted undirected (/unweighted_undirected) graphs.

Download:

Fig 2. SPARK folder structure.

The root folder SPARKtoolbox contains five main subfolders: SPARKtoolbox/RandomWalk, SPARKtoolbox/SpectralGT, SPARKtoolbox/Results, SPARKtoolbox/Examples and SPARKtoolbox/Dependencies. The SPARKtoolbox/RandomWalk and the SPARKtoolbox/SpectralGT subfolders are further subdivided into four leaf-folders with the MATLAB functions for weighted directed, weighted undirected, unweighted directed and unweighted undirected graphs. The SPARKtoolbox/Examples and SPARKtoolbox/Results subfolders contain the MATLAB code and the results for the examples on synthetic data. The MATLAB scripts to import and analyze real data from the UCI repository have also been provided in SPARKtoolbox/Examples. Finally, SPARKtoolbox/Dependencies contains a subset of auxiliary functions used for the analysis of synthetic data.

https://doi.org/10.1371/journal.pone.0319031.g002

SPARK functions for the computation of spectral graph theory indices simply require as input the adjacency matrix of the underlying graph and the affiliation vectors for clusters and . On the other hand, functions computing random walk indices also require a value for (Eq. 31), necessary for the PageRank random walk. As already introduced in Section 2.3.1, the parameter manages the teleporting probability of the walker by uniformly assigning an escape probability of to absorbing states. Since modifies the topology of the underlying network it should be tuned carefully, preferring small values in order to preserve the topology of the underlying network as much as possible.

2.5. Testing SPARK in different scenarios

The second part of this paper illustrates a possible set of applications for the SPARK toolbox through two toy examples on synthetic data and two applications to real data. More in detail, in the first toy example SPARK is used to compare the features of a set of random networks against a population of networks with a clear clustered topology. The focus of this first example is to use SPARK to extract some descriptive features, both at static and dynamic level, relying on the minimum-cut partition of a given network. Differently from the previous one, the second toy example investigates the effects that different partitions produce on the same network. Specifically, given a synthetic network with a predefined set of topological features, SPARK is used to investigate how the choice of the vertex set partition affects the cluster-to-cluster characterization of the underlying network. In the last two examples SPARK was tested on real data scenarios, respectively dealing with a binary classification task on two public datasets and the analysis of functional brain networks extracted from the EEG signals of two post-stroke patients during eyes-open resting state condition. More in detail, in the former example SPARK was tested on two public available datasets as an unsupervised binary classifier exploiting the properties of the Fiedler vector. The obtained results included both classification performances and computational times, which were then used to assess SPARK’s performance on medium-large datasets with different number of instances. In the last example SPARK was finally tested on functional brain networks extracted from the EEG signals of two post-stroke patients during eyes-open resting state condition. The toolbox was here used to extract graph spectral indices from real data while assessing whether they could help in the analysis of stroke-induced functional alterations and their link with the residual motor ability of the subject.

2.5.1. Surrogate ground-truth generation.

SPARK toolbox was tested on different scenarios simulating the interaction between two clusters within the same network. Synthetic data have been generated considering that, given a suitable permutation matrix , the adjacency matrix of a graph with interacting communities has a typical block structure. Specifically, for the permuted adjacency matrix has the structure depicted in Fig 3, where the blocks on the main diagonal relate to within-cluster connections, while the off-diagonal blocks refer to between-cluster connections.

Download:

Fig 3. Example of block adjacency matrix for a network describing the behavior of two interacting communities.

Any adjacency matrix can be represented as a block matrix with a vertex permutation that groups nodes depending on the cluster to which they belong (i.e., first those belonging to cluster 1 - - and then those related to cluster 2 - - or vice versa). Specifically, blocks on the main diagonal refer to within-cluster connections while off-diagonal blocks contain between cluster connections. In particular, the block refers to between-cluster connections from cluster to cluster . The legend of colors links the number of nonzero elements (i.e., the number of exiting links) in each block to the corresponding expression according to Eqs. 38–44.

https://doi.org/10.1371/journal.pone.0319031.g003

Data generation is demanded to the script Generate_simdata.m. Each surrogate network has been modelled as a bi-clustered system where the behavior of the two communities, namely and , is governed by the following set of equations:

(38)

(39)

(40)

(41)

(42)

(43)

(44)

The set of Eqs. 38–44 makes the underlying structure of a given network dependent on the set of generating parameters . Specifically, Eq. 38 simply equals the number of existing connections in each network to the sum of within- and between-cluster connections (respectively given by and ). The parameter represents network’s density and modulates the topology of the network regardless of its modular structure. Equations 39 and 40 describe, respectively, how the parameter manages the proportion of within- and between-cluster edges for a given network. Specifically, as increases the modular structure of the network becomes more pronounced, with a few edges connecting two sets of densely connected nodes. The distribution of within- and between-cluster connections all over the network is tuned by and as described in Eqs. 41–42 and Eqs. 43–44. The term-by-term summation of Eqs. 41 and 42 (respectively, of Eqs. 43 and 44) gives the total number of between-cluster (respectively, within-cluster) links. According to Eqs. 41–42, manages the flow imbalance in between-cluster connections. More in detail, correspond to a balanced scenario, where the number of existing links from to equals the number of connections from to . Similarly, tunes the imbalance in within-cluster connections according to Eqs. 43 and 44, being related to a balanced scenario in which the two clusters are equally densely populated. Any variation from (respectively, ) reflects into an imbalance in within-cluster (respectively, between-cluster) connections.

2.5.2 . Constraints on generating parameters.

The set of Eqs. 38–44 points out that, due to the intimate relationship among and , parameters and are not free to vary on . It is thus necessary to put some constraints on the generating parameters as to make the surrogate networks compatible with real world scenarios.

Since many biological networks are known to be sparse [32,33], should vary between and with those values corresponding to an empty and a half-full network respectively. As to simulate networks with different sparsity level, a set of suitable choices may be given by . Further issues concern the definition for a cluster of nodes. Classic approaches define a cluster as a set of tightly connected nodes, with a few connections existing between nodes of different clusters. According to this view, should reasonably vary between and , being associated to poorly pronounced clusters (i.e., the number of between-cluster connections equals the number of within ones) while represents a bipartite network. According to Eqs. 43 and 44 within-cluster connections can be written as convex combinations of and with respect to . This means that for any the corresponding is automatically assigned, thus naturally limiting within . Similarly, according to Eqs. 41 and 42 should be constrained within the same range of . Further topological constraints come from practical considerations related to network’s topology. For equally sized clusters without self-loops, the maximum number allowed for within-cluster connections is

(45)

being the number of nodes in the underlying network. Given that is the number of existing connections in a -density network, for the most populated cluster the following should hold:

(46)

thus leading to

(47)

For real-world networks it is not hard to meet : the constraint in Eq. 47 thus becomes

(48)

Setting the analysis of the worst-case scenario (i.e., ) leads to

(49)

Similarly, for between-cluster connections the following should hold:

(50)

which for restricts to

(51)

Conditions in Eqs. 49 and 51 should be simultaneously verified, thus constituting a system of two inequalities with three unknowns which is known to have a parametric solution with respect to one among or . As to deal with this issue, we empirically derived an upper boundary for and then set and accordingly. More in detail, given an a priori partition that splits the vertex set into equally sized clusters, we fixed and run a simulation in which synthetic networks have been generated for each value of between and with a fixed step of . For each network, was compared with the partition provided by the Fiedler vector by means of cosine similarity measure.

As appreciable from Fig 4, when the two partitions are almost identical for each , thus suggesting that represents a suitable candidate as an upper boundary for . On the other hand, identifies the opposite scenario in which the a priori partition and the minimum-cut one are (almost) orthogonal. An intermediate situation can be found for , where the cosine similarity is approximately and the two partitions partially overlap. While the lower bound imposes no further constraints on and , plugging into Eq. 49 leads to an upper boundary condition for .

Download:

Fig 4. Cosine similarity between the a priori and the minimum cut affiliation vector.

The plot presents the values for the cosine similarity (mean standard deviation) obtained comparing the minimum-cut and an a priori vertex set partition for synthetic networks. Surrogate networks were generated for each value of between and 1 with a fixed step of . Different colors correspond to different values of as indicated in the legend.

https://doi.org/10.1371/journal.pone.0319031.g004

(52)

About the between-cluster links, plugging into Eq. 51 imposes no limits on . On the other hand, substituting the critical value into Eq. 51 leads to the same upper boundary already obtained for in Eq. 52.

A suitable choice for both and should allow to simulate both a balanced and an unbalanced scenario. As discussed in Section e.1, reflects a balanced scenario in which the two clusters are equally densely populated, while corresponds to a balance in between-cluster connection. As for the imbalance in within-cluster links, a reasonable choice is since it meets the need to mimic an imbalance in within-cluster connections while respecting the constraint in Eq. 52. The same holds for which allows to simulate a strong imbalance in between-cluster flow. A suitable choice for the tuning parameters can thus be and . Generating parameters together with their definition, range and values used in the following examples are summarized in Table 2.

Download:

Table 2. Synthetic data generating parameters. The table summarizes the generating parameters for synthetic networks showing the corresponding symbol, name and range after the application of the constraints in Section e.2.

https://doi.org/10.1371/journal.pone.0319031.t002

2.5.3. Toy example #1.

In the first toy example (implemented in the SPARK_ex1.m script in the main folder) the SPARK toolbox is used to compare different kind of networks. Specifically, a set of random networks was put in comparison with three different populations, each one characterized by a different value of and no imbalance in within- nor between-cluster connections. More in detail, clustered populations are made of binary matrices with nodes generated using the Generate_simdata.m script introduced in Section 2.5.1. Each clustered population comprises networks with fixed density , having while varies in , allowing to have a different value of characterizing each population. In this way the topology of the underlying network exhibits a modular structure that depends solely on the value of . Random networks, on the other hand, have been generated maintaining the same density of the clustered networks, but without any superimposed modular structure. For clustered networks, as increases the a priori vertex set partition is expected to overlap the minimum-cut one, thus properly describing the behavior of the underlying group of networks. In contrast, the same partition is not expected to properly fit the random population since those kinds of networks are not guaranteed to have a modular structure. As to compare the topological properties of the two populations, SPARK toolbox was used to extract a subset of relevant indices (from those in Table 1) from each network (regardless of its nature) and for each density level . Specifically, both global (algebraic connectivity and relaxation time) and partition-dependent indices (normalized cut, normalized association and edge measure) have been calculated as to compare the two populations from different perspectives.

2.5.4. Toy example #2.

In the second example (implemented in the SPARK_ex2.m script in the main folder) an a priori vertex set partition was put in comparison with the minimum-cut one for different network configurations. As reported in Section 2.5.1, the combination of and determines the topological structure of the underlying network. More in detail, modulates the emergence of the two clusters, while and regulate the imbalance in between- and within-cluster connections respectively. For the whole set of possible combinations of and the minimum-cut partition is put in comparison with an a priori partition describing the structure of the underlying network (representing the ground-truth). As already described in Section 2.5.1, when the topology of the network only depends on : the greater is, the more pronounced the presence of the clusters will be and the Fiedler vector is more likely to overlap the a priori partition. When the underlying network does not have a pronounced modular structure (i.e., for low values of ), any shift in and/or is expected to disrupt the equilibrium in between- and/or within-cluster connections and the Fiedler vector is not guaranteed to correctly identify the two clusters. This happens because the a priori partition does not consider the imbalance caused by and/or which, instead, influences the minimum cut partition. As to compare the effects induced by different partitions on the same index distribution, let denote the generic index dependent on the underlying partition. The ratio

(53)

represents a suitable way to emphasize the effects of different partitions on the same graph. When (i.e., ), the a priori vertex set partition and the minimum-cut one coincide. On the other hand, when the two partitions behave differently. More in detail, if the current index assumes larger values for the ground truth than for the minimum-cut partition, the reversal being true for . For each combination of and , synthetic networks have been generated using the Generate_simdata.m script. A set of partition dependent indices were then calculated on each network using both the ground truth and the minimum cut partition. For each density value , a one-way ANOVA was used to assess the differences in distributions introduced by a modulation of the generating parameters and .

2.5.5. Real data scenario: public available datasets.

In the third example SPARK was tested on two public datasets from the UCI machine learning repository (https://archive.ics.uci.edu/) respectively implemented in the SPARK_ex3a.m and SPARK_ex3b.m scripts in the main folder. More in detail, the toolbox was tested on two popular datasets: the Breast Cancer Wisconsin (https://archive.ics.uci.edu/dataset/15/breast+cancer+wisconsin+original) and the rice dataset (https://archive.ics.uci.edu/dataset/545/rice±cammeo±and±osmancik). The Breast Cancer Wisconsin dataset is a popular dataset extensively used in machine learning and biomedical research. It consists of instances of breast cancer samples described by 9 features extracted from images of fine needle aspirate breast mass biopsies. Each instance is labelled as either malignant or benign, making it a suitable resource for developing classification algorithms. On the other hand, the rice dataset consists of two rice varieties commonly used in agricultural and genetic research [34]. In this dataset instances of Cammeo and Osmancik rice varieties are described through 7 different features including various phenotypic traits, yield-related measurements and morphological characteristics that helps to distinguish the two rice varieties. These datasets were chosen to test SPARK’s adaptability to different scenarios, given that the number of instances varies significantly between the two datasets.

As to deal with normalized quantities, data were first z-scored as usual in machine learning preprocessing. Each dataset was then embedded into a network-wise representation using a diffusion kernel with the inverse of the Euclidean distance between each couple of points at the exponent as described in Fig 5. The thresholded version of the weighted adjacency matrix was obtained by setting to zero those edges with weights smaller than the median of the strengths' distribution in the full version of the network. Finally, the minimum-cut partition was extracted from the topology of the unweighted adjacency matrix using SPARK. Labels extracted by the Fiedler vector were then compared with the ground truth of each dataset and classification performances were computed using the corresponding confusion matrices.

Download:

Fig 5. Graphical pipeline describing the analysis of data extracted from public available datasets.

Steps are sequential from left to right: data representation in the feature space, network embedding through a diffusion kernel based on the Euclidean distance, extraction of the Fiedler vector with the corresponding minimum-cut partition and classification performances comparing the ground truth with the label of the Fiedler vector.

https://doi.org/10.1371/journal.pone.0319031.g005

Since the datasets are usually non-balanced, the largest class in each dataset was split into equally sized parts, each of which was tested against the smaller class running a -fold validation. Classification performances and required computational times were then extracted by averaging the performances on 100 iterations, each of which comprises a -fold validation on the whole dataset.

2.5.6. Real data scenario: a case study.

In the last example SPARK was tested on real data to compare functional connectivity matrices (FCMs) extracted from the EEG signals of two post-stroke patients. The two patients belong to a population of post-stroke subjects enrolled in a longitudinal study within the inpatient service of Fondazione Santa Lucia IRCCS in Rome for purposes other than those of this work. The study was approved by the local ethics board at Fondazione Santa Lucia IRCCS (CE PROG.752/2019) and the participants signed an informed consent. The two patients were chosen to be matched in aetiology (both experienced a haemorrhagic stroke), while differ in their residual motor ability as assessed by the Upper Extremity Fugl-Mayer Assessment (UEFMA) score [35]. Ranging from 0 to 66 points, the UEFMA clinical scale can be used to assess different level of motor impairment for the upper limb (0–22 severe motor impairment, 23–44 moderate motor impairment, 45–66 mild motor impairment) [36]. According to this view, subject suffers for a severe upper limb impairment being , while has a moderate impairment since . According to numerous reports in the literature about differences in FCMs related to upper limb motor impairment [37,38], we expect that the difference in the residual motor ability of the two patients would be reflected in a different topological organization of the corresponding FCMs [15,17,39,40]. Since spectral graph theory and random walk proved to be useful tools for the analysis of topological and dynamic properties of FCMs [41], in this hands-on example SPARK was used to investigate the topological alterations characterizing FCMs in patients with different levels of motor impairment. The EEG signals were recorded for 2 minutes using a 64-electrodes cap (reference on digitally linked earlobes, ground on left mastoid) with a sampling frequency of 256 Hz during resting state condition with eyes opened (OE) using a commercial EEG system (g.HIAMP; g.tec medical engineering GmbH, Austria). Raw signals were band-pass filtered in [1,45] Hz and ocular artifacts were removed by means of Independent Component Analysis (ICA) (Vision Analyzer 1.05 software, Brain Products GmbH, Germany). Power-line interference was removed using a 50 Hz notch filter and the EEG time series were then chunked in 1 s lasting epochs. A semiautomatic procedure was then applied to reject those trials exceeding a voltage threshold of ±100 μV. As to reduce crosstalk phenomena between adjacent electrodes and avoid the identification of spurious connectivity flow, brain connectivity was extracted from a subset of 24 electrodes equally distributed over the scalp (AF7, AF8, F5, F1, F2, F6, FT7, FC3, FC4, FT8, C5, C1, C2, C6, TP7, CP3, CP4, TP8, P5, P1, P2, P6, PO7, PO8). Functional connectivity was then estimated using Partial Directed Coherence (PDC), a spectral estimator derived from the multivariate autoregressive (MVAR) model of the EEG time series [42]. PDC values were then averaged within four frequency bands (theta , alpha , beta and gamma ). As to discard spurious connections, PDC values were statistically assessed against chance level by applying the asymptotic method in [43]. By an operative point of view, for each connectivity matrix random networks were generated to test FCMs against their corresponding null case scenario. Specifically, random networks were generated with the only constraint to have the same density of their real counterpart, without any superimposed topological structure. An a priori vertex set partition was then superimposed to each network, dividing the whole vertex set into affected and unaffected hemisphere according to the stroke side. Finally, for each network both global and partition-dependent indices were calculated using SPARK toolbox.

3. Results

3.1. Toy example #1

This section shows the results of the toy example described in Section 2.5.3. More in detail, both global (algebraic connectivity and relaxation time) and partition-dependent indices (normalized cut, normalized association and edge measure) have been calculated to compare clustered and random populations of networks. As appreciable from Fig 6 higher values for both the algebraic connectivity and the relaxation time characterize random networks, reflecting a less organized structure when compared to their clustered counterparts. Partition-dependent indices confirm this characterization since both the normalized cut and the edge measure are larger in random than in clustered networks, while the normalized association follows the opposite trend. From Fig 6 it is also possible to appreciate that as increases, partition-dependent indices efficiently describe the topology of clustered networks. Although results in Fig 6 refers to the scenario, consistent results were obtained for each value of (see the Supporting Information).

Download:

Fig 6. Radar chart summarizing SPARK test for the toy example #1 with

. Radar chart representing the within-population average value of five different parameters (normalized association, edge measure, relaxation time, normalized cut and algebraic connectivity) extracted using SPARK toolbox on clustered networks with density () and their random counterparts. The red line identifies the random population, while remaining lines refer to (orange), (cyan) and (blue) scenarios.

https://doi.org/10.1371/journal.pone.0319031.g006

3.2. Toy example #2

This paragraph shows the results of the toy example described in Section 2.5.4: results in Fig 7 refers to the scenario, but similar results for the and scenarios can be found in the Supporting Information.

Download:

Fig 7. Boxplot representation for the

index distribution when: a) , b) , c) and d) for . Each panel shows the boxplots describing the distribution as a function of the within/between cluster ratio . The symbol * indicates a statistically significant result (i.e., ) for the post hoc Tuckey HSD test.

https://doi.org/10.1371/journal.pone.0319031.g007

The boxplot representation in Fig 7 allows to appreciate that distributions for poorly clustered networks (i.e., when ) clearly differ from their counterparts calculated on networks with an underlying modular structure (i.e., and scenarios) for all the combinations of and . Comparing the boxplots in Fig 7 with the ANOVA results in Table 3 it can also be appreciated that different values of reflect into different distributions for each combination of and .

Download:

Table 3. ANOVA results for the

distributions in Fig 7. Four different one-way ANOVA were run for each combination of

and

in the toy example #2. The corresponding p and F values are shown in this table.

https://doi.org/10.1371/journal.pone.0319031.t003

Fig 8 shows the distribution of for each combination of and . An opposite trend characterizes when compared to the distributions, since approaches zero from below. As expected, the normalized cut is smaller for the Fiedler partition (i.e., has negative values), since the Fiedler partition is the one that minimizes the cut between the two clusters.

Download:

Fig 8. Boxplot representation for the

index distribution when: a) , b) , c) and d) for . Each panel shows the boxplots describing the distribution as a function of the within/between cluster ratio . The symbol * indicates a statistically significant result (i.e., ) for the post hoc Tuckey HSD test.

https://doi.org/10.1371/journal.pone.0319031.g008

Comparing the ANOVA results in Table 4 with the boxplot representation in Fig 8, it can also be appreciated how different values of reflect into different distributions for each combination of and . As for , also distributions for poorly clustered networks are significantly different from their counterparts computed on networks with a clear modular structure.

Download:

Table 4. ANOVA results for the

distributions in Fig 8. Four different one-way ANOVA for each combination of

and

in the toy example #2. The corresponding p and F values are shown in this table.

https://doi.org/10.1371/journal.pone.0319031.t004

3.3. Real data scenario: Public available datasets

This paragraph shows the results from Section 2.5.5, where SPARK was tested on two public available datasets as an unsupervised binary classifier exploiting the properties of the Fiedler vector. Classification performances for the Wisconsin Breast Cancer dataset are reported in Table 5.

Download:

Table 5. Classification performances for the Breast Cancer Wisconsin dataset. Performance indices are extracted by the confusion matrix obtained comparing the original ground truth and the labels of the Fiedler vector.

https://doi.org/10.1371/journal.pone.0319031.t005

Once achieved a graph representation for the data in the feature space, the computational time required to extract the Laplacian matrix and calculate the Fiedler vector for the underlying graph is . Despite the simplicity of the classification criterion, performance in Table 5 reports accuracy, precision and F1 score above the together with a recall of . The same indices were then used to evaluate the performances on the medium-large graph extracted from the rice dataset, as reported in Table 6.

Download:

Table 6. Classification performances for the rice dataset. Performance indices are extracted by the confusion matrix obtained comparing the original ground truth and the labels of the Fiedler vector.

https://doi.org/10.1371/journal.pone.0319031.t006

The computational time required for the extraction of the Laplacian matrix and the corresponding Fiedler vector is which is, as expected, larger than the previous one given the presence of more instances and, thus, of a larger matrix to solve for the eigendecomposition. However, classification performances still show an accuracy, precision and F1 score above the together with a recall of .

3.4. Real data scenario: a case study

This paragraph shows the results from Section 2.5.6, providing a graphical representation of the comparison between the FCMs of and and their random counterparts. Measures in Fig 9 relates to alpha band, but similar results have been found for the remaining frequency bands: interested readers will find them in the Supporting Information.

Download:

Fig 9. Radar chart summarizing SPARK test on the FCM comparison introduced in Section e.5.

Radar chart represents five features (normalized association, edge measure, normalized cut, directed cut from AH to UH and directed cut from UH to AH) extracted using SPARK toolbox. The orange line refers to the FCM extracted from in alpha band while the red one refers to its random counterpart. Similarly, the cyan line refers to the FCM extracted from in alpha band and the blue one to its random counterpart.

https://doi.org/10.1371/journal.pone.0319031.g009

As evident from Fig 9, the FCMs of and have different topological features. Firstly, it should be noted that in both and the normalized association is higher when compared to their random counterparts while an opposite trend exists for the remaining indices. When comparing the features of with those of , it can be noted that the normalized association is higher for subject than for . On the other hand, is characterized by lower values for those measures relying on between-cluster connections, both at static and dynamic level. Taken together, these two facts can be summarized as a tendency for FCMs to organize into a more organized structure in the patient that preserved more the residual motor ability (being and ).

4. Discussion

In this work we presented SPARK, an open-source MATLAB toolbox for the analysis of digraphs that combines spectral graph theory and random walk. With respect to other existing MATLAB frameworks for network analysis SPARK deliberately focuses on spectral graph theory and random walk concepts, thus finding its own identity in the background of toolboxes for network analysis. In this context, the Brain Connectivity Toolbox [3] was pioneering in making the basics of graph theory accessible to a large audience, especially in the neuroscientific field. Its ease of use and simplicity made it extensively used, contributing to its large diffusion in modern neuroscience. Similarly, the Graph Signal Processing Toolbox [4] was tailored for researchers working on graph signal processing, providing various tools for implementing graph signal processing techniques on undirected graphs. In this perspective, SPARK finds its own identity focusing on spectral graph theory and random walk analysis for both directed and undirected networks. More specifically, SPARK provides the MATLAB code that implements the indices in Table 1 characterizing them to the case of directed, undirected, binary and weighted networks. In Section 2.5.1 we also proposed a practical way to model the behaviour of a network made by two interacting communities. The MATLAB script Generate_simdata.m implements the set of Eqs. 38–44 for the case of two equally sized clusters, but it can be easily generalized to communities of different size. The “mid-level” characterization approach proposed by SPARK was then tested on two toy-examples using synthetic data and two hands-on scenarios with real data.

Results in Section 3.1 (referred to the first toy example) show that, as increases, synthetic networks are characterized by a pronounced normalized association and a low normalized cut. Consistently with the hypotheses in Section 2.5.3, those two features reflect the presence of more within- than between-cluster links that properly describe the topology of the underlying network. On the other hand, when fitting the a priori partition to a random network the majority of links fall in between-cluster communication, thus increasing the normalized cut while leading to low values for the normalized association. The obtained findings confirmed the hypotheses in Section 2.5.3, thus encouraging the use of spectral graph theory and random walk tools for the analysis of cluster-to-cluster interactions in networks. Apart from clustering [6,24], dimensionality reduction [44] and data representation problems [45] spectral graph theory together with random walk analysis proved to be a reliable tool for the cluster-level characterization of complex networks.

Concerning the second toy example, results in Section 3.2 show that a pronounced clustered topology for the underlying network causes the distributions to approach zero, regardless for the within- and/or between-cluster imbalance. This can be justified combining Eq. 39 with the definition of association, leading to:

(54)

The equality in Eq. (54) says that has a direct effect on the term once fixed and . In fact, as increases the number of within-cluster connections increase too regardless on and , thus justifying the results in the first part of Section 3.2. As confirmed by the result of the post-hoc test in Fig 7, when the distribution clearly differs from the and scenarios. Specifically, in the first case the two clusters are not so pronounced, thus leading to a different choice of vertices characterizing the a priori partition and the minimum-cut one. On the other hand, when and the clusters are easier to detect: the two partitions thus tend to overlap and the two distributions get closer each other. A similar reasoning applies for the distributions. Being the directly related to (see Eq. 40), an increase in accomplishes a decrease in , thus justifying the trend in Fig 8. Also in this case, when the underlying network does not show a pronounced modular structure (i.e., ), the a priori vertex set partition and the minimum-cut one identify different subset of vertices. On the other hand, the two partitions tend to overlap when and and the two distributions get closer each other. This also allows to appreciate that, when the network exhibits an organized topology (i.e., ) SPARK correctly identifies the two interacting clusters regardless for an imbalance in within- and/or between-cluster links.

Results in Section 3.3 refers to the application of SPARK on real public available dataset. As reported in Tables 5 and 6, despite the simplicity of the model classification performances are encouraging given that, for both datasets, the spectral clustering achieves accuracies, F1 score and precision above 90%. As expected, computational times increase as the dataset becomes larger, since an increased number of instances leads to a larger network and, thus, to a more demanding computational cost for eigenvectors extraction. This may represent a potential bottleneck affecting not only SPARK performances but due, in general, to the demanding computational costs for eigenvectors estimation of large matrices. Further studies may investigate the performances of different techniques for the eigendecomposition of large matrices. However, despite the achieved performances, it should also be noted that spectral clustering may not be suitable to all types of datasets, as it assumes that the two classes in the original dataset share only a few edges (while most links lie in within-cluster communications). This is due to the fact that the minimum-cut partition implicitly assumes that two classes can be efficiently identified through the projection of the data along the direction of the Fiedler vector: this condition is not always met, especially when the number of between-cluster links is high as shown in the toy examples on synthetic data. Different kind of embeddings can be explored to extract a network representation from a set of data points (for example using a KNN algorithm to retain the nearest neighbors of each node) before spectral clustering, but this is out of the scope of this paper and can be further investigated in future works.

Results in Section 3.4 are in line with scientific literature confirming that different impairment conditions reflect in a different topological organization of FCMs [16,17,39,40]. A first result from the plot in Fig 9 is that real networks differ from their random counterpart: in other words, this indicates that functional brain networks are more prone to organize into communities. Specifically, the a priori vertex set partition (which divides the whole vertex set into affected and unaffected hemisphere) reflects the presence of more within- than between-cluster links, thus indicating the presence of a superimposed topological organization characterizing brain networks. For this kind of organization is more evident than since the topological features of make it closer to a random network when compared to . These results are in line with previous studies, which observed that functional changes in post-stroke networks are characterized by an increase in integration and a loss in segregation properties that push the underlying network away from an optimal small-world configuration [12,16]. The experimental findings in Section 3.4 thus support the hypothesis that a difference in the residual motor ability of the two subjects reflects into a different organization of their FCMs [41].

Results on both synthetic and real data encourage the use of SPARK to characterize a given network in terms of its underlying communities, as well as to measure its propensity to organize into interacting clusters. Although the examples presented in the paper focus on network characterization, SPARK can also be used to approach dynamic phenomena that can be modelled as a random walk. A classic example is the gambler’s ruin problem [27], where a random walk model is used to predict the probability of a gambler to be either richer or broke at the end of a gambling session. Other applicative scenarios may include the prediction of financial movements and the representation of fluid particles in turbulent flows [46].

In conclusion, there are some weak points that should be mentioned. Even though different definitions exist for the Laplacian matrix of directed graphs, SPARK deliberating focuses on the symmetrized version proposed by Chung [30] neglecting the others. Although Chung’s definition represents one of the most largely adopted, other definitions should be considered as to investigate to which extent the choice of a different Laplacian matrix influences the community detection and the related indices. Similarly, the random walk analysis implements the PageRank model [28] thus focusing on a diffusive process responding to a precise set of equations. Different kind of dynamical phenomena (such as synchronization) should be considered as to model the dynamics that better fits the underlying network. However, even considering that the random surfer model in Eq. 31 guarantees the ergodicity of the underlying chain, particular attention should be paid on the choice of the parameter for the PageRank algorithm. As pointed out in Section 2.3.1 in fact, the higher the value of , the more accurately the topology of the network will be preserved. In this spirit, through the whole paper and the examples, a value of was used to ensure ergodicity while guaranteeing adherence to the original topology of the underlying network.

Although we mainly focused on biomedical applications, SPARK is a general-purpose framework, developed to provide a useful instrument for researchers interested in network science, with a particular attention to spectral graph theory and random walk applications. Its applications are not only circumscribed to the biomedical field since SPARK could fit a variety of different scenarios ranging from sensor networks to social media networks, protein interaction networks and so on. Its general-purpose nature is a distinguishing feature that makes SPARK flexible enough to be improved according to user feedback and suggestions, providing a user-friendly toolbox for a wide range of applications.

Supporting information

S1 Text. Existence and uniqueness of the stationary distribution for the probability transition matrix P.

https://doi.org/10.1371/journal.pone.0319031.s001

(DOCX)

S1 Fig. Radar chart summarizing SPARK test on the toy example #1 with .

Radar chart representing the within-population average value of five different parameters (normalized association, edge measure, relaxation time, normalized cut and algebraic connectivity) extracted using SPARK toolbox on clustered networks with density () and their random counterparts. The red line identifies the random population, while remaining lines refer to (orange), (cyan) and (blue) scenarios.

https://doi.org/10.1371/journal.pone.0319031.s002

(DOCX)

S2 Fig. Radar chart summarizing SPARK test on the toy example #1 with .

Radar chart representing the within-population average value of five different parameters (normalized association, edge measure, relaxation time, normalized cut and algebraic connectivity) extracted using SPARK toolbox on clustered networks with density () and their random counterparts. The red line identifies the random population, while remaining lines refer to (orange), (cyan) and (blue) scenarios.

https://doi.org/10.1371/journal.pone.0319031.s003

(DOCX)

S3 Fig. Boxplot representation for the index distribution when: a) , b) , c) and d) for .

Each panel shows the boxplots describing the distribution as a function of the within/between cluster ratio . The symbol * indicates a statistically significant result (i.e., ) for the post hoc Tuckey HSD test.

https://doi.org/10.1371/journal.pone.0319031.s004

(DOCX)

S4 Fig. Boxplot representation for the index distribution when: a) , b) , c) and d) for .

Each panel shows the boxplots describing the distribution as a function of the within/between cluster ratio . The symbol * indicates a statistically significant result (i.e., ) for the post hoc Tuckey HSD test.

https://doi.org/10.1371/journal.pone.0319031.s005

(DOCX)

S5 Fig. Boxplot representation for the index distribution when: a) , b) , c) and d) for .

Each panel shows the boxplots describing the distribution as a function of the within/between cluster ratio . The symbol * indicates a statistically significant result (i.e., ) for the post hoc Tuckey HSD test.

https://doi.org/10.1371/journal.pone.0319031.s006

(DOCX)

S6 Fig. Boxplot representation for the index distribution when: a) , b) , c) and d) for .

Each panel shows the boxplots describing the distribution as a function of the within/between cluster ratio . The symbol * indicates a statistically significant result (i.e., ) for the post hoc Tuckey HSD test.

https://doi.org/10.1371/journal.pone.0319031.s007

(DOCX)

S7 Fig. Radar chart summarizing SPARK test on the FCM comparison introduced in Section e.5.

Radar chart represents five features (normalized association, edge measure, normalized cut, directed cut from AH to UH and directed cut from UH to AH) extracted using SPARK toolbox. The orange line refers to the FCM extracted from in beta band while the red one refers to its random counterpart. Similarly, the cyan line refers to the FCM extracted from in beta band and the blue one to its random counterpart.

https://doi.org/10.1371/journal.pone.0319031.s008

(DOCX)

S8 Fig. Radar chart summarizing SPARK test on the FCM comparison introduced in Section e.5.

Radar chart represents five features (normalized association, edge measure, normalized cut, directed cut from AH to UH and directed cut from UH to AH) extracted using SPARK toolbox. The orange line refers to the FCM extracted from in gamma band while the red one refers to its random counterpart. Similarly, the cyan line refers to the FCM extracted from in gamma band and the blue one to its random counterpart.

https://doi.org/10.1371/journal.pone.0319031.s009

(DOCX)

S9 Fig. Radar chart summarizing SPARK test on the FCM comparison introduced in Section e.5.

Radar chart represents five features (normalized association, edge measure, normalized cut, directed cut from AH to UH and directed cut from UH to AH) extracted using SPARK toolbox. The orange line refers to the FCM extracted from in theta band while the red one refers to its random counterpart. Similarly, the cyan line refers to the FCM extracted from in theta band and the blue one to its random counterpart.

https://doi.org/10.1371/journal.pone.0319031.s010

(DOCX)

S1 Table. ANOVA results for the distributions in S3 Fig.Four different one-way ANOVA were run for each combination of and in the toy example #2.

The corresponding p and F values are shown in this table.

https://doi.org/10.1371/journal.pone.0319031.s011

(DOCX)

S2 Table. ANOVA results for the distributions in S4 Fig. Four different one-way ANOVA were run for each combination of and in the toy example #2.

The corresponding p and F values are shown in this table.

https://doi.org/10.1371/journal.pone.0319031.s012

(DOCX)

S3 Table. ANOVA results for the distributions in S5 Fig. Four different one-way ANOVA were run for each combination of and in the toy example #2.

The corresponding p and F values are shown in this table.

https://doi.org/10.1371/journal.pone.0319031.s013

(DOCX)

S4 Table. ANOVA results for the distributions in S6 Fig. Four different one-way ANOVA were run for each combination of and in the toy example #2.

The corresponding p and F values are shown in this table.

https://doi.org/10.1371/journal.pone.0319031.s014

(DOCX)

References

1. Xiao Fan W, Guanrong C. Complex networks: small-world, scale-free and beyond. IEEE Circuits Syst Mag. 2003;3(1):6–20.
- View Article
- Google Scholar
2. Motter AE, Matías MA, Kurths J, Ott E. Dynamics on complex networks and applications. Physica D: Nonlinear Phenomena. 2006;224(1–2):vii–viii.
- View Article
- Google Scholar
3. Rubinov M, Sporns O. Complex network measures of brain connectivity: uses and interpretations. Neuroimage. 2010;52(3):1059–69. pmid:19819337
- View Article
- PubMed/NCBI
- Google Scholar
4. Perraudin N, Paratte J, Shuman D, Martin L, Kalofolias V, Vandergheynst P. GSPBOX: a toolbox for signal processing on graphs. arXiv. 2014.
- View Article
- Google Scholar
5. de Loynes B, Navarro F, Olivier B. Gasper: GrAph Signal ProcEssing in R. 2020 [cited 5 Apr 2024].
6. Lambiotte R, Delvenne J-C, Barahona M. Random walks, markov processes and the multiscale modular organization of complex networks. IEEE Trans Netw Sci Eng. 2014;1(2):76–90.
- View Article
- Google Scholar
7. Doostmohammadian M, Gabidullina ZR, Rabiee HR. Nonlinear perturbation-based non-convex optimization over time-varying networks. IEEE Trans Netw Sci Eng. 2024;11(6):6461–9.
- View Article
- Google Scholar
8. Doostmohammadian M, Aghasi A, Rikos AI, Grammenos A, Kalyvianaki E, Hadjicostis CN, et al. Distributed anytime-feasible resource allocation subject to heterogeneous time-varying delays. IEEE Open J Control Syst. 2022;1:255–67.
- View Article
- Google Scholar
9. Li M, Micheli A, Wang YG, Pan S, Lió P, Gnecco GS, et al. Guest editorial: deep neural networks for graphs: theory, models, algorithms, and applications. IEEE Trans Neural Netw Learning Syst. 2024;35(4):4367–72.
- View Article
- Google Scholar
10. Li M, Ma Z, Wang YG, Zhuang X. Fast haar transforms for graph neural networks. Neural Netw. 2020;128:188–98. pmid:32447263
- View Article
- PubMed/NCBI
- Google Scholar
11. Li J, Zheng R, Feng H, Li M, Zhuang X. Permutation equivariant graph framelets for heterophilous graph learning. IEEE Trans Neural Netw Learn Syst. 2024;35(9):11634–48. pmid:38466605
- View Article
- PubMed/NCBI
- Google Scholar
12. Aerts H, Fias W, Caeyenberghs K, Marinazzo D. Brain networks under attack: robustness properties and the impact of lesions. Brain. 2016;139(Pt 12):3063–83. pmid:27497487
- View Article
- PubMed/NCBI
- Google Scholar
13. Watts DJ, Strogatz SH. Collective dynamics of “small-world” networks. Nature. 1998;393(6684):440–2. pmid:9623998
- View Article
- PubMed/NCBI
- Google Scholar
14. Gratton C, Nomura EM, Pérez F, D’Esposito M. Focal brain lesions to critical locations cause widespread disruption of the modular organization of the brain. J Cogn Neurosci. 2012;24(6):1275–85. pmid:22401285
- View Article
- PubMed/NCBI
- Google Scholar
15. Pichiorri F, Morone G, Petti M, Toppi J, Pisotta I, Molinari M, et al. Brain-computer interface boosts motor imagery practice during stroke recovery. Ann Neurol. 2015;77(5):851–65. pmid:25712802
- View Article
- PubMed/NCBI
- Google Scholar
16. Siegel JS, Seitzman BA, Ramsey LE, Ortega M, Gordon EM, Dosenbach NUF, et al. Re-emergence of modular brain networks in stroke recovery. Cortex. 2018;101:44–59. pmid:29414460
- View Article
- PubMed/NCBI
- Google Scholar
17. Pirovano I, Mastropietro A, Antonacci Y, Barà C, Guanziroli E, Molteni F, et al. Resting state EEG directed functional connectivity unveils changes in motor network organization in subacute stroke patients after rehabilitation. Front Physiol. 2022;13:862207. pmid:35450158
- View Article
- PubMed/NCBI
- Google Scholar
18. de Haan W, van der Flier WM, Wang H, Van Mieghem PFA, Scheltens P, Stam CJ. Disruption of functional brain networks in Alzheimer’s disease: what can we learn from graph spectral analysis of resting-state magnetoencephalography?. Brain Connect. 2012;2(2):45–55. pmid:22480296
- View Article
- PubMed/NCBI
- Google Scholar
19. Daianu M, Mezher A, Jahanshad N, Hibar DP, Nir TM, Jack CR Jr, et al. Spectral graph theory and graph energy metrics show evidence for the alzheimer’s disease disconnection syndrome in APOE-4 risk gene carriers. Proc IEEE Int Symp Biomed Imaging. 2015;2015:458–61. pmid:26413205
- View Article
- PubMed/NCBI
- Google Scholar
20. Malliaros FD, Vazirgiannis M. Clustering and community detection in directed networks: a survey. Physics Reports. 2013;533(4):95–142.
- View Article
- Google Scholar
21. Spielman DA. Algorithms, graph theory, and linear equations in laplacian matrices. proceedings of the international congress of mathematicians 2010 (ICM 2010). Hyderabad, India: Published by Hindustan Book Agency (HBA), India. WSPC Distribute for All Markets Except in India; 2011. p. 2698–722.
22. Fiedler M. Laplacian of graphs and algebraic connectivity. Banach Center Publ. 1989;25(1):57–70.
- View Article
- Google Scholar
23. Gleich D. Hierarchical Directed Spectral Graph Partitioning. Stanford University; 2006.
24. Jianbo Shi, Malik J. Normalized cuts and image segmentation. IEEE Trans Pattern Anal Machine Intell. 2000;22(8):888–905.
- View Article
- Google Scholar
25. Hagen L, Kahng AB. New spectral methods for ratio cut partitioning and clustering. IEEE Trans Comput-Aided Des Integr Circuits Syst. 1992;11(9):1074–85.
- View Article
- Google Scholar
26. Seabrook E, Wiskott L. A tutorial on the spectral theory of markov chains. Neural Comput. 2023;35(11):1713–96. pmid:37725706
- View Article
- PubMed/NCBI
- Google Scholar
27. Levin DA, Peres Y. Markov chains and mixing times. 2nd ed. Providence, Rhode Island: American Mathematical Society; 2017.
28. Lai D, Lu H, Nardini C. Finding communities in directed networks by PageRank random walk induced network embedding. Phys A Stat Mechanics Appl. 2010;389(12):2443–54.
- View Article
- Google Scholar
29. Langville A, Meyer C. Deeper inside pagerank. Internet Math. 2004;1(3):335–80.
- View Article
- Google Scholar
30. Chung F. Laplacians and the cheeger inequality for directed graphs. Ann Comb. 2005;9(1):1–19.
- View Article
- Google Scholar
31. Li Y, Zhang Z-L. Digraph laplacian and the degree of asymmetry. Internet Mathematics. 2012;8(4):381–401.
- View Article
- Google Scholar
32. Leclerc RD. Survival of the sparsest: robust gene networks are parsimonious. Mol Syst Biol. 2008;4:213. pmid:18682703
- View Article
- PubMed/NCBI
- Google Scholar
33. Pavlopoulos GA, Secrier M, Moschopoulos CN, Soldatos TG, Kossida S, Aerts J, et al. Using graph theory to analyze biological networks. BioData Min. 2011;4:10. pmid:21527005
- View Article
- PubMed/NCBI
- Google Scholar
34. Cinar I, Koklu M. Classification of rice varieties using artificial intelligence methods. ijisae. 2019;7(3):188–94.
- View Article
- Google Scholar
35. Fugl-Meyer AR, Jääskö L, Leyman I, Olsson S, Steglind S. The post-stroke hemiplegic patient. 1. A method for evaluation of physical performance. Scand J Rehabil Med. 1975;7(1):13–31. pmid:1135616
- View Article
- PubMed/NCBI
- Google Scholar
36. Hernández ED, Galeano CP, Barbosa NE, Forero SM, Nordin Å, Sunnerhagen KS, et al. Intra- and inter-rater reliability of Fugl-Meyer Assessment of Upper Extremity in stroke. J Rehabil Med. 2019;51(9):652–9. pmid:31448807
- View Article
- PubMed/NCBI
- Google Scholar
37. Milani G, Antonioni A, Baroni A, Malerba P, Straudi S. Relation between EEG measures and upper limb motor recovery in stroke patients: a scoping review. Brain Topogr. 2022;35(5–6):651–66. pmid:36136166
- View Article
- PubMed/NCBI
- Google Scholar
38. Westlake KP, Nagarajan SS. Functional connectivity in relation to motor performance and recovery after stroke. Front Syst Neurosci. 2011;5:8. pmid:21441991
- View Article
- PubMed/NCBI
- Google Scholar
39. Grefkes C, Fink GR. Connectivity-based approaches in stroke and recovery of function. Lancet Neurol. 2014;13(2):206–16. pmid:24457190
- View Article
- PubMed/NCBI
- Google Scholar
40. Silasi G, Murphy TH. Stroke and the connectome: how connectivity guides therapeutic intervention. Neuron. 2014;83(6):1354–68. pmid:25233317
- View Article
- PubMed/NCBI
- Google Scholar
41. Ranieri A, Pichiorri F, Mongiardini E, Colamarino E, Cincotti F, Mattia D, et al. Spectral graph theory to investigate topological and dynamic properties of EEG-based brain networks: an application to post-stroke patients. 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Orlando, FL, USA: IEEE; 2024. pp. 1–4.
- View Article
- Google Scholar
42. Baccalá LA, Sameshima K. Partial directed coherence: a new concept in neural structure determination. Biol Cybern. 2001;84(6):463–74. pmid:11417058
- View Article
- PubMed/NCBI
- Google Scholar
43. Toppi J, Mattia D, Risetti M, Formisano R, Babiloni F, Astolfi L. Testing the significance of connectivity networks: comparison of different assessing procedures. IEEE Trans Biomed Eng. 2016;63(12):2461–73. pmid:27810793
- View Article
- PubMed/NCBI
- Google Scholar
44. Belkin M, Niyogi P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation. 2003;15(6):1373–96.
- View Article
- Google Scholar
45. Dhillon IS, Guan Y, Kulis B. Kernel k-means: spectral clustering and normalized cuts. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. Seattle WA USA: ACM; 2004. p. 551–6.
46. Chanson H. Turbulent dispersion and mixing: 1. Vertical and transverse mixing. In: Environmental Hydraulics of Open Channel Flows. Elsevier; 2004. p. 81–98.

[ref1] 1. Xiao Fan W, Guanrong C. Complex networks: small-world, scale-free and beyond. IEEE Circuits Syst Mag. 2003;3(1):6–20.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Motter AE, Matías MA, Kurths J, Ott E. Dynamics on complex networks and applications. Physica D: Nonlinear Phenomena. 2006;224(1–2):vii–viii.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Rubinov M, Sporns O. Complex network measures of brain connectivity: uses and interpretations. Neuroimage. 2010;52(3):1059–69. pmid:19819337
View Article
PubMed/NCBI
Google Scholar

[8] View Article

[9] PubMed/NCBI

[10] Google Scholar

[ref4] 4. Perraudin N, Paratte J, Shuman D, Martin L, Kalofolias V, Vandergheynst P. GSPBOX: a toolbox for signal processing on graphs. arXiv. 2014.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref5] 5. de Loynes B, Navarro F, Olivier B. Gasper: GrAph Signal ProcEssing in R. 2020 [cited 5 Apr 2024].

[ref6] 6. Lambiotte R, Delvenne J-C, Barahona M. Random walks, markov processes and the multiscale modular organization of complex networks. IEEE Trans Netw Sci Eng. 2014;1(2):76–90.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref7] 7. Doostmohammadian M, Gabidullina ZR, Rabiee HR. Nonlinear perturbation-based non-convex optimization over time-varying networks. IEEE Trans Netw Sci Eng. 2024;11(6):6461–9.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref8] 8. Doostmohammadian M, Aghasi A, Rikos AI, Grammenos A, Kalyvianaki E, Hadjicostis CN, et al. Distributed anytime-feasible resource allocation subject to heterogeneous time-varying delays. IEEE Open J Control Syst. 2022;1:255–67.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref9] 9. Li M, Micheli A, Wang YG, Pan S, Lió P, Gnecco GS, et al. Guest editorial: deep neural networks for graphs: theory, models, algorithms, and applications. IEEE Trans Neural Netw Learning Syst. 2024;35(4):4367–72.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref10] 10. Li M, Ma Z, Wang YG, Zhuang X. Fast haar transforms for graph neural networks. Neural Netw. 2020;128:188–98. pmid:32447263
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref11] 11. Li J, Zheng R, Feng H, Li M, Zhuang X. Permutation equivariant graph framelets for heterophilous graph learning. IEEE Trans Neural Netw Learn Syst. 2024;35(9):11634–48. pmid:38466605
View Article
PubMed/NCBI
Google Scholar

[32] View Article

[33] PubMed/NCBI

[34] Google Scholar

[ref12] 12. Aerts H, Fias W, Caeyenberghs K, Marinazzo D. Brain networks under attack: robustness properties and the impact of lesions. Brain. 2016;139(Pt 12):3063–83. pmid:27497487
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref13] 13. Watts DJ, Strogatz SH. Collective dynamics of “small-world” networks. Nature. 1998;393(6684):440–2. pmid:9623998
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref14] 14. Gratton C, Nomura EM, Pérez F, D’Esposito M. Focal brain lesions to critical locations cause widespread disruption of the modular organization of the brain. J Cogn Neurosci. 2012;24(6):1275–85. pmid:22401285
View Article
PubMed/NCBI
Google Scholar

[44] View Article

[45] PubMed/NCBI

[46] Google Scholar

[ref15] 15. Pichiorri F, Morone G, Petti M, Toppi J, Pisotta I, Molinari M, et al. Brain-computer interface boosts motor imagery practice during stroke recovery. Ann Neurol. 2015;77(5):851–65. pmid:25712802
View Article
PubMed/NCBI
Google Scholar

[48] View Article

[49] PubMed/NCBI

[50] Google Scholar

[ref16] 16. Siegel JS, Seitzman BA, Ramsey LE, Ortega M, Gordon EM, Dosenbach NUF, et al. Re-emergence of modular brain networks in stroke recovery. Cortex. 2018;101:44–59. pmid:29414460
View Article
PubMed/NCBI
Google Scholar

[52] View Article

[53] PubMed/NCBI

[54] Google Scholar

[ref17] 17. Pirovano I, Mastropietro A, Antonacci Y, Barà C, Guanziroli E, Molteni F, et al. Resting state EEG directed functional connectivity unveils changes in motor network organization in subacute stroke patients after rehabilitation. Front Physiol. 2022;13:862207. pmid:35450158
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref18] 18. de Haan W, van der Flier WM, Wang H, Van Mieghem PFA, Scheltens P, Stam CJ. Disruption of functional brain networks in Alzheimer’s disease: what can we learn from graph spectral analysis of resting-state magnetoencephalography?. Brain Connect. 2012;2(2):45–55. pmid:22480296
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref19] 19. Daianu M, Mezher A, Jahanshad N, Hibar DP, Nir TM, Jack CR Jr, et al. Spectral graph theory and graph energy metrics show evidence for the alzheimer’s disease disconnection syndrome in APOE-4 risk gene carriers. Proc IEEE Int Symp Biomed Imaging. 2015;2015:458–61. pmid:26413205
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref20] 20. Malliaros FD, Vazirgiannis M. Clustering and community detection in directed networks: a survey. Physics Reports. 2013;533(4):95–142.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref21] 21. Spielman DA. Algorithms, graph theory, and linear equations in laplacian matrices. proceedings of the international congress of mathematicians 2010 (ICM 2010). Hyderabad, India: Published by Hindustan Book Agency (HBA), India. WSPC Distribute for All Markets Except in India; 2011. p. 2698–722.

[ref22] 22. Fiedler M. Laplacian of graphs and algebraic connectivity. Banach Center Publ. 1989;25(1):57–70.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref23] 23. Gleich D. Hierarchical Directed Spectral Graph Partitioning. Stanford University; 2006.

[ref24] 24. Jianbo Shi, Malik J. Normalized cuts and image segmentation. IEEE Trans Pattern Anal Machine Intell. 2000;22(8):888–905.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref25] 25. Hagen L, Kahng AB. New spectral methods for ratio cut partitioning and clustering. IEEE Trans Comput-Aided Des Integr Circuits Syst. 1992;11(9):1074–85.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref26] 26. Seabrook E, Wiskott L. A tutorial on the spectral theory of markov chains. Neural Comput. 2023;35(11):1713–96. pmid:37725706
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref27] 27. Levin DA, Peres Y. Markov chains and mixing times. 2nd ed. Providence, Rhode Island: American Mathematical Society; 2017.

[ref28] 28. Lai D, Lu H, Nardini C. Finding communities in directed networks by PageRank random walk induced network embedding. Phys A Stat Mechanics Appl. 2010;389(12):2443–54.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref29] 29. Langville A, Meyer C. Deeper inside pagerank. Internet Math. 2004;1(3):335–80.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref30] 30. Chung F. Laplacians and the cheeger inequality for directed graphs. Ann Comb. 2005;9(1):1–19.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref31] 31. Li Y, Zhang Z-L. Digraph laplacian and the degree of asymmetry. Internet Mathematics. 2012;8(4):381–401.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref32] 32. Leclerc RD. Survival of the sparsest: robust gene networks are parsimonious. Mol Syst Biol. 2008;4:213. pmid:18682703
View Article
PubMed/NCBI
Google Scholar

[99] View Article

[100] PubMed/NCBI

[101] Google Scholar

[ref33] 33. Pavlopoulos GA, Secrier M, Moschopoulos CN, Soldatos TG, Kossida S, Aerts J, et al. Using graph theory to analyze biological networks. BioData Min. 2011;4:10. pmid:21527005
View Article
PubMed/NCBI
Google Scholar

[103] View Article

[104] PubMed/NCBI

[105] Google Scholar

[ref34] 34. Cinar I, Koklu M. Classification of rice varieties using artificial intelligence methods. ijisae. 2019;7(3):188–94.
View Article
Google Scholar

[107] View Article

[108] Google Scholar

[ref35] 35. Fugl-Meyer AR, Jääskö L, Leyman I, Olsson S, Steglind S. The post-stroke hemiplegic patient. 1. A method for evaluation of physical performance. Scand J Rehabil Med. 1975;7(1):13–31. pmid:1135616
View Article
PubMed/NCBI
Google Scholar

[110] View Article

[111] PubMed/NCBI

[112] Google Scholar

[ref36] 36. Hernández ED, Galeano CP, Barbosa NE, Forero SM, Nordin Å, Sunnerhagen KS, et al. Intra- and inter-rater reliability of Fugl-Meyer Assessment of Upper Extremity in stroke. J Rehabil Med. 2019;51(9):652–9. pmid:31448807
View Article
PubMed/NCBI
Google Scholar

[114] View Article

[115] PubMed/NCBI

[116] Google Scholar

[ref37] 37. Milani G, Antonioni A, Baroni A, Malerba P, Straudi S. Relation between EEG measures and upper limb motor recovery in stroke patients: a scoping review. Brain Topogr. 2022;35(5–6):651–66. pmid:36136166
View Article
PubMed/NCBI
Google Scholar

[118] View Article

[119] PubMed/NCBI

[120] Google Scholar

[ref38] 38. Westlake KP, Nagarajan SS. Functional connectivity in relation to motor performance and recovery after stroke. Front Syst Neurosci. 2011;5:8. pmid:21441991
View Article
PubMed/NCBI
Google Scholar

[122] View Article

[123] PubMed/NCBI

[124] Google Scholar

[ref39] 39. Grefkes C, Fink GR. Connectivity-based approaches in stroke and recovery of function. Lancet Neurol. 2014;13(2):206–16. pmid:24457190
View Article
PubMed/NCBI
Google Scholar

[126] View Article

[127] PubMed/NCBI

[128] Google Scholar

[ref40] 40. Silasi G, Murphy TH. Stroke and the connectome: how connectivity guides therapeutic intervention. Neuron. 2014;83(6):1354–68. pmid:25233317
View Article
PubMed/NCBI
Google Scholar

[130] View Article

[131] PubMed/NCBI

[132] Google Scholar

[ref41] 41. Ranieri A, Pichiorri F, Mongiardini E, Colamarino E, Cincotti F, Mattia D, et al. Spectral graph theory to investigate topological and dynamic properties of EEG-based brain networks: an application to post-stroke patients. 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Orlando, FL, USA: IEEE; 2024. pp. 1–4.
View Article
Google Scholar

[134] View Article

[135] Google Scholar

[ref42] 42. Baccalá LA, Sameshima K. Partial directed coherence: a new concept in neural structure determination. Biol Cybern. 2001;84(6):463–74. pmid:11417058
View Article
PubMed/NCBI
Google Scholar

[137] View Article

[138] PubMed/NCBI

[139] Google Scholar

[ref43] 43. Toppi J, Mattia D, Risetti M, Formisano R, Babiloni F, Astolfi L. Testing the significance of connectivity networks: comparison of different assessing procedures. IEEE Trans Biomed Eng. 2016;63(12):2461–73. pmid:27810793
View Article
PubMed/NCBI
Google Scholar

[141] View Article

[142] PubMed/NCBI

[143] Google Scholar

[ref44] 44. Belkin M, Niyogi P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation. 2003;15(6):1373–96.
View Article
Google Scholar

[145] View Article

[146] Google Scholar

[ref45] 45. Dhillon IS, Guan Y, Kulis B. Kernel k-means: spectral clustering and normalized cuts. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. Seattle WA USA: ACM; 2004. p. 551–6.

[ref46] 46. Chanson H. Turbulent dispersion and mixing: 1. Vertical and transverse mixing. In: Environmental Hydraulics of Open Channel Flows. Elsevier; 2004. p. 81–98.

Figures

Abstract

1. Introduction

2. Materials and methods

2.1. Spectral graph theory: background and basic facts

2.1.1. Laplacian matrix and its properties.

2.1.2. Minimum cut partition and algebraic connectivity of a graph.

2.2. Markov chain and random walk: Background and basic facts

2.2.1. Markov chain and random walk on graphs.

2.2.2 . Spectral graph theory meets random walk.

2.3. Spectral graph theory for digraphs

2.3.1. The PageRank random walk.

2.3.2. Symmetrized Laplacian matrix for digraphs.

2.4. The SPARK toolbox

2.5. Testing SPARK in different scenarios

2.5.1. Surrogate ground-truth generation.

2.5.2 . Constraints on generating parameters.

2.5.3. Toy example #1.

2.5.4. Toy example #2.

2.5.5. Real data scenario: public available datasets.

2.5.6. Real data scenario: a case study.

3. Results

3.1. Toy example #1

3.2. Toy example #2

3.3. Real data scenario: Public available datasets

3.4. Real data scenario: a case study

4. Discussion

Supporting information

S1 Text. Existence and uniqueness of the stationary distribution for the probability transition matrix P.

S1 Fig. Radar chart summarizing SPARK test on the toy example #1 with .

S2 Fig. Radar chart summarizing SPARK test on the toy example #1 with .

S3 Fig. Boxplot representation for the index distribution when: a) , b) , c) and d) for .

S4 Fig. Boxplot representation for the index distribution when: a) , b) , c) and d) for .

S5 Fig. Boxplot representation for the index distribution when: a) , b) , c) and d) for .

S6 Fig. Boxplot representation for the index distribution when: a) , b) , c) and d) for .

S7 Fig. Radar chart summarizing SPARK test on the FCM comparison introduced in Section e.5.

S8 Fig. Radar chart summarizing SPARK test on the FCM comparison introduced in Section e.5.

S9 Fig. Radar chart summarizing SPARK test on the FCM comparison introduced in Section e.5.

S1 Table. ANOVA results for the distributions in S3 Fig.Four different one-way ANOVA were run for each combination of and in the toy example #2.

S2 Table. ANOVA results for the distributions in S4 Fig. Four different one-way ANOVA were run for each combination of and in the toy example #2.

S3 Table. ANOVA results for the distributions in S5 Fig. Four different one-way ANOVA were run for each combination of and in the toy example #2.

S4 Table. ANOVA results for the distributions in S6 Fig. Four different one-way ANOVA were run for each combination of and in the toy example #2.

References