Figures
Abstract
Spectral graph theory and its applications constitute an important forward step in modern network theory. Its increasing consensus over the last decades fostered the development of innovative tools, allowing network theory to model a variety of different scenarios while answering questions of increasing complexity. Nevertheless, a comprehensive understanding of spectral graph theory’s principles requires a solid technical background which, in many cases, prevents its diffusion through the scientific community. To overcome such an issue, we developed and released an open-source MATLAB toolbox - SPectral graph theory And Random walK (SPARK) toolbox - that combines spectral graph theory and random walk concepts to provide a both static and dynamic characterization of digraphs. Once described the theoretical principles grounding the toolbox, we presented SPARK structure and the list of available indices and measures. SPARK was then tested in a variety of scenarios including: two-toy examples on synthetic networks, an example using public datasets in which SPARK was used as an unsupervised binary classifier and a real data scenario relying on functional brain networks extracted from the EEG data recorded from two stroke patients in resting state condition. Results from both synthetic and real data showed that indices extracted using SPARK toolbox allow to correctly characterize the topology of a bi-compartmental network. Furthermore, they could also be used to find the “optimal” vertex set partition (i.e., the one that minimizes the number of between-cluster links) for the underlying network and compare it to a given a priori partition. Finally, the application to real EEG-based networks provides a practical case study where the SPARK toolbox was used to describe networks’ alterations in stroke patients and put them in relation to their motor impairment.
Citation: Ranieri A, Pichiorri F, Colamarino E, Cincotti F, Mattia D, Toppi J (2025) SPectral graph theory And Random walK (SPARK) toolbox for static and dynamic characterization of (di)graphs: A tutorial. PLoS One 20(6): e0319031. https://doi.org/10.1371/journal.pone.0319031
Editor: Longxiu Huang, Michigan State University, UNITED STATES OF AMERICA
Received: October 11, 2024; Accepted: January 25, 2025; Published: June 5, 2025
Copyright: © 2025 Ranieri et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data and scripts used in the paper are available from github: https://github.com/AndreaRani/SPARK.
Funding: This project is partially funded by the Italian National Ministry of Health (grants # RF-2018-12365210, RF-2019-12369396, GR2019-12369207) and by Sapienza University of Rome (LEAF, RM123188F229EC72). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
Network theory and its applications are currently being studied across many scientific fields, from gene and protein networks to the World Wide Web [1,2]. The ubiquity of complex networks in science and technology boosted the development of new powerful tools and applications that brought graph theory up to another level. Network theory and its applications thus gained increasing consensus over the years, leading a growing request in tools for the analysis of complex systems. Especially in the last decades, an increasing number of open-source frameworks have been developed as to make those tools accessible to a larger cohort of interested scientists. As a noteworthy example with respect to the framework of modern neuroscience, the Brain Connectivity Toolbox [3] represents a groundbreaking work in this sense. The toolbox is completely open-source and offers a comprehensive list of topological measures, as well as generative network models and visualization functions for brain networks. Apart from neuroscience, a plethora of different scientific fields have been contaminated by network theory, thus contributing to its increasing popularity. In the emerging field of graph signal processing, for example, the Graph Signal Processing Toolbox [4] is an open-source MATLAB toolbox that can be used to tackle graph-related problems with a signal processing approach. Similarly to the previous one, Gasper [5] provides a suitable framework for graph signal processing and graph visualization in R. In such a stimulating context, network science benefited from a huge methodological contribution from various disciplines, such as physics and theoretical computer science. This allowed modern scientists to investigate questions of increasing complexity concerning, for example, the dynamical behaviour of the underlying system or its propensity to organize into interacting communities. In this scenario, spectral graph theory stems from the application of the spectral theorem to network problems. While classic approaches rely on single node features or global descriptors, spectral graph theory provides insights into the cluster-to-cluster communication within the network, enabling to study phenomena like community detection, diffusion processes and synchronization. This shift in perspective facilitates a deeper understanding of complex systems by linking algebraic properties to networks dynamic features. Furthermore, its intimate relationship with random walk processes makes spectral graph theory a powerful tool for network analysis both at topological and dynamic level [6].
Nowadays, spectral graph theory and its applications are widely used in different scientific fields, from resource allocation strategies and research operation problems [7,8] to geometric deep learning [9–11] and modern biomedicine. The field of neuroengineering, for example, largely benefits from network theory to characterize both physiological and pathological brain networks [12]. In this framework, given the propensity of the human brain to naturally organize into interacting communities [13], spectral graph theory could be employed to analyze the structural properties of brain networks through their eigenvalues and eigenvectors. In addition, a random walk perspective would complement the analysis by modelling the information flow across the network, providing a mathematical framework for understanding how quickly a system converges to a steady state, how efficiently information spreads between clusters and how dynamic properties vary according to the topology of the network. Modern studies on both synthetic and real networks agreed to characterize brain injuries as network diseases, as the effects of a lateralized traumatic event have been shown to spread throughout the network [12,14]. As a noteworthy example, stroke embodies one of the most representative scenarios where the effects of a lateralized traumatic event spread all over the network [15–17]. Rather than focusing on single node features or global descriptors, a cluster-level characterization of the underlying network is thus desirable. However, to the best of our knowledge only two groups pioneeringly applied spectral graph theory to characterize the “disconnection syndrome” characterizing the Alzheimer disease. Specifically, spectral indices allowed to point out changes in connectedness characterizing MEG-derived resting-state functional networks in Alzheimer patients, combined with a less-efficient network configuration characterizing dynamic processes [18]. Furthermore, brain tractography connectivity networks exhibit a higher number of disconnected components and lower spectral energy in Alzheimer patients when compared to healthy controls [19]. However, the sophisticated mathematical background combined with the lack of user-friendly toolboxes discouraged the diffusion of spectral graph theory in clinical neuroscience. To the best of our knowledge, spectral graph theory has never been applied to describe functional alteration in brain pathologies apart from Alzheimer’s disease. To fill such gap, this work introduces the Spectral And Random walK (SPARK) toolbox for (di)graphs, a new open-source MATLAB toolbox for the analysis of graphs. The toolbox is written using MATLAB 2023b and the experiments were conducted on a computer machine running Windows11, equipped with an Intel Core i7 processor at 2.8 GHz and a 16 GB RAM. Compatibility for Windows operating system requires Windows 10 (version 21H2 or higher), Windows 11, Windows Server 2019 or Windows Server 2022. For a computer equipped with Windows, MATLAB 2023b requires any Intel or AMD x86-64 processor with two or more cores with 8 GB minimum. Compatibility with other operating systems can be checked directly on the MathWorks website. The leading idea behind SPARK is to combine spectral graph theory and random walk to characterize digraphs from both a static and dynamic perspective. As to do so, an introduction to spectral graph theory and random walk fundamentals is first provided to introduce the reader with key concepts and notions used in this paper. Then, in the second part of the paper, SPARK was tested on both surrogate and real data across different application fields as to assess its versatility and adaptability to different scenarios.
2. Materials and methods
2.1. Spectral graph theory: background and basic facts
2.1.1. Laplacian matrix and its properties.
Let be an undirected graph with vertex set
and edge set
: if
,
can be efficiently described by its adjacency matrix
. The binary adjacency matrix of a graph is a square matrix with elements:
Given a generic node , the total number of its neighbors can be obtained by summing the direct edges that links
to any other node in the graph: this number represents the degree of a node
The degrees of all the nodes in can be collected in a diagonal matrix
usually called degree matrix.
The core of spectral graph theory relies on the properties of the spectrum of the Laplacian matrix associated with
, which is defined as
or, elementwise
Since is supposed to be undirected, its adjacency matrix is symmetric (i.e.,
) and
is also a symmetric matrix. Furthermore,
has real eigenvalues in the range
, where
is the maximum degree of the nodes in the graph, and the corresponding eigenvectors are real and form an orthogonal basis for the range of
[20]. By construction, since
’s rows sum to zero it holds that
meaning that 0 is always an eigenvalue for and the corresponding eigenvector is the vector of all ones
.
is also a positive-semidefinite matrix [21] and thus
is the smallest eigenvalue of
. Since,
’s eigenvalues are real they can be ordered in nondecreasing order:
. The study of
’s spectrum led to important considerations on the second smallest eigenvalue and its corresponding eigenvector (respectively known as the “algebraic connectivity” of
and the “Fiedler vector” [22]). Specifically, the geometric multiplicity of the
eigenvalue (i.e., the number of linearly independent eigenvectors associated to
) represents the number of connected components of
, which are groups of tightly connected nodes.
Claim. Let be the eigenvalues of
. Then
is connected if and only if
and the multiplicity of the zero eigenvalue is equal to the number of connected components of
.
As to deal with normalized quantities, it is useful to define the normalized version of the Laplacian matrix
with entries
Since is symmetric, its normalized version
is still a symmetric matrix with real eigenvalues and its eigenvectors form an orthogonal basis for
. As for the unnormalized Laplacian matrix,
is also a positive semidefinite matrix, and its eigenvalues lies in the
interval. Furthermore,
’s rows sum to zero meaning that 0 is still an eigenvalue and the corresponding eigenvector is
.
2.1.2. Minimum cut partition and algebraic connectivity of a graph.
The spectrum of the Laplacian matrix associated with has a crucial role in the minimum-cut cluster problem [23,24]. Specifically, given a vertex set partition
such that
and
, the normalized cut [24] induced by the partition is given by
where represents the total number of crossing edges between the subsets of nodes
and
and
is the total number of connection between the nodes in
and the whole vertex set
(analogously for
). As to point out the role of the Laplacian spectrum, consider an affiliation vector
for the partition
with entries
It is possible to write the normalized cut as follows
where is the Rayleigh quotient for
and . Minimizing the cost function in Eq. 11 leads to a partition of the vertex set such that the between-cluster connections are the minimum possible. However, the problem in Eq. 11 is known to be NP-hard since it minimizes the
cost function over the set of every possible cut in
[24]. As to deal with NP-hardness, we drop the restriction for
to be in the form specified in Eq. 10. The Rayleigh quotient of a symmetric matrix has the nice property to be bounded by
(lower bound) and
(upper bound): being
, this writes
The affiliation vector such that
is
. By a formal point of view, it minimizes
identifying a single cluster made by the whole network itself: the normalized cut is null in that sense, but it is useless in a practical way. The original problem can thus be slightly modified to neglect the solution
. Specifically, minimizing the Rayleigh quotient over the set of
orthogonal to
[25] leads to
The solution to the minimization problem in Eq. 13 is given by the eigenvector associated to the second smallest eigenvalue of , being the eigenvectors of
orthogonal each other. Specifically, being
the eigenvector associated to the second smallest eigenvalue of
, Eq. 13 leads to
where (i.e., the second smallest eigenvalue of
) is the so-called “algebraic connectivity” of
and its corresponding eigenvector (known as the “Fiedler vector”) identifies the partition of the vertex set
that minimizes the cost function in Eq. 13. The way the algebraic connectivity affects the topology of a given network is represented Fig 1, where the topological representation of three different networks (together with their corresponding adjacency matrices) is shown to vary according to different values of
.
a) , b)
and c)
. Each network is represented through its binary adjacency matrix and the corresponding graph form. The adjacency matrix (lower part of each panel) is represented as an
grid of pixels, where
is the number of nodes and each link connects a node (row index) to another one (column index) in the underlying network. Pixels are colored in yellow if the connection exists and in blue otherwise. Graph representation in the upper part of each panel codes instead nodes as blue solid dots and the existing connections as solid lines linking two different nodes.
2.2. Markov chain and random walk: Background and basic facts
2.2.1. Markov chain and random walk on graphs.
Markov processes are an elementary family of stochastic models describing the temporal evolution of an infinite sequence of random variables on a certain state space
, where
is a time set [26]. Markovian processes are governed by the so-called Markov property according to which the value of the random variable
at time
only depends on its value at time
.
When is a discrete set, the sequence
is usually referred as a Markov chain and the matrix
collects the probabilities to move from the state
to
. According to this view, given a graph
one can think of the vertex set
as a state space having
different states while
specifies how different pair of states relate to each other. This new perspective shifts the emphasis toward a dynamic framework according to which
describes the topology of a Markov Chain with
as a state-space. Interestingly, the topological structure of
has been proved to influence the evolution of dynamic phenomena running on the graph itself (e.g., diffusion, synchronization, consensus and so on) [6]. In this framework, random walk processes are of particular interest due to their intimate relationship with spectral graph theory.
As the name itself suggests, a random walk on describes the imaginary walk of an agent over the vertex set
. Specifically, a random walk on
is fully described by a transition probability matrix
governing the behavior of the walker on each node of the network [27]. The transition probability matrix
provides a probabilistic characterization of
which is fundamental in the description of Markovian processes. For a graph with
nodes,
is a stochastic matrix with entries
describing the probability of a random walker to jump from a node to another one in the underlying graph.
is intimately related to the topology of
since
or elementwise
being the diagonal matrix collecting the degree of each node. The knowledge of
allows to track the evolution of the chain over the time since the Markov property guarantees that the configuration of the chain at
only depends on its configuration at time
. Specifically, let
contain the configuration of the chain (i.e., the probability for a walker to be in each state
) at
. The configuration at the next step (i.e.,
) depends on the probability of being in each state
at the current time (encoded in
) and the probability to make a transition from
to
, (i.e., encoded in the
element of
). Therefore,
can be derived from
as
For a generic timestep Eq. 18 becomes
where represent the configuration of the chain at time
and
respectively and
is the transition probability matrix governing the random walk on
. As to know the configuration of the chain after
steps from a generic time
, it is sufficient to iteratively apply Eq. 19
times.
2.2.2 . Spectral graph theory meets random walk.
The intimate relationship between random walk and spectral graph theory relies on the fact that is strongly influenced by the structural properties of
. Specifically, pre- and post-multiplying each side of Eq. 4 by
leads to
Recalling that is a diagonal matrix (and so is
), combining Eq. 21 with Eqs. 7 and 16 leads to
In a dynamic perspective, the configuration update for the random walk process (Eq. 19) can be expressed as:
The topology of the network thus strongly influences the nature of the dynamic phenomena running on the underlying graph [6]. Specifically, since the eigenvalues of can be easily derived from those of
(and vice versa) it can be appreciated that a random walk process converges faster in strongly organized networks (further details about the eigen-decomposition of
can be found in the Supporting Information). For an ergodic chain (i.e., a chain that is guaranteed to converge to a unique stationary distribution) the speed of convergence to the stationary distribution can be estimated by means of the relaxation time [27], which is defined as
where the spectral gap associated to a Markov Chain relates to the second largest eigenvalue of
2.3. Spectral graph theory for digraphs
2.3.1. The PageRank random walk.
A directed graph (digraph) is a graph in which each edge has a specific direction (i.e., the edge
spreads from node
and sinks into
, while for
sink and source are switched). The directionality of each edge allows to distinguish between inward and outward connections for each node. For a generic node
, its in-degree represents the total number of incoming edges which can be found by summing over the
-th row of the adjacency matrix
.
Similarly, the total number of outward links from can be found summing over the
-th column of the adjacency matrix and is known as the out-degree of
.
The total degree of a given node in a digraph is simply the sum of its in- and out-degree. Starting from Eq. 16, the transition matrix describing a random walk on a digraph simply becomes
or elementwise
where is a diagonal matrix containing the out-degree of each node on its main diagonal.
However, in the classic random walk not uniqueness nor convergence of the process are guaranteed. As to deal with an ergodic chain, a common issue is to refer to a modified version of the classic problem known as “PageRank random walk” (also known as the random surfer model) [28]. The transition matrix governing the behavior of a random surfer is given by
Where denotes the Moore-Penrose pseudoinverse of
,
is a vector with all entries equal to zero but
when
and
is a parameter that manages the escape probability from absorbing states. Matrix
defined in Eq. 31 is stochastic and describes an ergodic chain [29], thus its long-term behavior is guaranteed to converge to a unique stationary distribution. The values in
assign a transition probability of
to those nodes having
, while a probability of
is uniformlyassigned to nodes without outward links. The higher the value of
, the more accurately the topology of the original chain will be preserved.
2.3.2. Symmetrized Laplacian matrix for digraphs.
The information about the direction of each edge reflects in a non-symmetric adjacency matrix and hence in a non-symmetric Laplacian matrix. Thus, the eigenvalues of are not guaranteed to be real and the results obtained for the undirected case cannot be directly applied to digraphs. As to overcome such a limitation, Chung proposed a symmetrized version of the combinatorial Laplacian
[30]
Together with its normalized version
where is the probability transition matrix of the Markov Chain governing the random walk on
,
denotes its conjugate transpose and
is a diagonal matrix with entries equal to the stationary distribution of the chain.
Clearly (and its normalized version
) is a symmetric matrix, thus its eigenvectors form an orthogonal basis for the range of
(
). Furthermore, 0 is always an eigenvalue and the corresponding eigenvector is the vector of all ones
(for
,
should be substituted with its scaled version
). Although Chung’s symmetrization allows to extend the above considerations to digraphs, it should be pointed out that
only provides a partial description of the original digraph since different digraphs can have the same
. To overcome such a limitation, Li and Zhang defined the normalized Laplacian matrix for digraphs (i.e., Diplacian)
[31] as
Or elementwise
The Diplacian matrix can be decomposed as a sum of a symmetric and skew-symmetric part, respectively indicated as and
:
where is the symmetrized Laplacian for directed graphs defined by Chung [30] (already introduced in Eq. 33) and
captures the differences between
and its transpose. Clearly, when
is symmetric
, hence
and thus
.
2.4. The SPARK toolbox
SPARK is a general-purpose framework that can fit a wide range of scenarios. The toolbox is open source and can be easily downloaded from the following link: https://github.com/AndreaRani/SPARK. While existing toolboxes provide a low-level characterization of the underlying system, SPARK toolbox is able to answer questions of increased complexity (concerning, for example, the dynamical behavior of the underlying network or its propensity to organize into interacting communities). More in detail, SPARK combines spectral graph theory and random walk concepts to provide a both static and dynamic characterization of digraphs. As illustrated in the following paragraph, SPARK can fit different scenarios to answer a variety of questions. For example, let be a network topologically described by a binary adjacency matrix
. When an a priori vertex set partition
is given, SPARK can be used to characterize the partition itself relying on the measures in Table 1 as well as to investigate to which extent the given partition overlaps the minimum cut one. On the other hand, when no partition is given SPARK can be used to characterize the propensity of a given network to organize into interacting communities. If the underlying graph is described by a weighted adjacency matrix
, the expressions in Table 1 can be easily turned into their weighted version replacing
with
.
From a practical point of view, the structure of Table 1 naturally reflects into a bipartite organization according to which the main folder SPARKtoolbox is split into two subfolders as shown in Fig 2: SPARKtoolbox/SpectralGT and SPARKtoolbox/RandomWalk. Both SPARKtoolbox/SpectralGT and SPARKtoolbox/RandomWalk are further organized into four subfolders containing the MATLAB scripts for weighted directed (/weighted_directed), weighted undirected (/weighted_undirected), unweighted directed (/unweighted_directed) and unweighted undirected (/unweighted_undirected) graphs.
The root folder SPARKtoolbox contains five main subfolders: SPARKtoolbox/RandomWalk, SPARKtoolbox/SpectralGT, SPARKtoolbox/Results, SPARKtoolbox/Examples and SPARKtoolbox/Dependencies. The SPARKtoolbox/RandomWalk and the SPARKtoolbox/SpectralGT subfolders are further subdivided into four leaf-folders with the MATLAB functions for weighted directed, weighted undirected, unweighted directed and unweighted undirected graphs. The SPARKtoolbox/Examples and SPARKtoolbox/Results subfolders contain the MATLAB code and the results for the examples on synthetic data. The MATLAB scripts to import and analyze real data from the UCI repository have also been provided in SPARKtoolbox/Examples. Finally, SPARKtoolbox/Dependencies contains a subset of auxiliary functions used for the analysis of synthetic data.
SPARK functions for the computation of spectral graph theory indices simply require as input the adjacency matrix of the underlying graph and the affiliation vectors for clusters and
. On the other hand, functions computing random walk indices also require a value for
(Eq. 31), necessary for the PageRank random walk. As already introduced in Section 2.3.1, the parameter
manages the teleporting probability of the walker by uniformly assigning an escape probability of
to absorbing states. Since
modifies the topology of the underlying network it should be tuned carefully, preferring small values in order to preserve the topology of the underlying network as much as possible.
2.5. Testing SPARK in different scenarios
The second part of this paper illustrates a possible set of applications for the SPARK toolbox through two toy examples on synthetic data and two applications to real data. More in detail, in the first toy example SPARK is used to compare the features of a set of random networks against a population of networks with a clear clustered topology. The focus of this first example is to use SPARK to extract some descriptive features, both at static and dynamic level, relying on the minimum-cut partition of a given network. Differently from the previous one, the second toy example investigates the effects that different partitions produce on the same network. Specifically, given a synthetic network with a predefined set of topological features, SPARK is used to investigate how the choice of the vertex set partition affects the cluster-to-cluster characterization of the underlying network. In the last two examples SPARK was tested on real data scenarios, respectively dealing with a binary classification task on two public datasets and the analysis of functional brain networks extracted from the EEG signals of two post-stroke patients during eyes-open resting state condition. More in detail, in the former example SPARK was tested on two public available datasets as an unsupervised binary classifier exploiting the properties of the Fiedler vector. The obtained results included both classification performances and computational times, which were then used to assess SPARK’s performance on medium-large datasets with different number of instances. In the last example SPARK was finally tested on functional brain networks extracted from the EEG signals of two post-stroke patients during eyes-open resting state condition. The toolbox was here used to extract graph spectral indices from real data while assessing whether they could help in the analysis of stroke-induced functional alterations and their link with the residual motor ability of the subject.
2.5.1. Surrogate ground-truth generation.
SPARK toolbox was tested on different scenarios simulating the interaction between two clusters within the same network. Synthetic data have been generated considering that, given a suitable permutation matrix , the adjacency matrix of a graph with
interacting communities has a typical block structure. Specifically, for
the permuted adjacency matrix
has the structure depicted in Fig 3, where the blocks on the main diagonal relate to within-cluster connections, while the off-diagonal blocks refer to between-cluster connections.
Any adjacency matrix can be represented as a block matrix with a vertex permutation that groups nodes depending on the cluster to which they belong (i.e., first those belonging to cluster 1 - - and then those related to cluster 2 -
- or vice versa). Specifically, blocks on the main diagonal refer to within-cluster connections while off-diagonal blocks contain between cluster connections. In particular, the
block refers to between-cluster connections from cluster
to cluster
. The legend of colors links the number of nonzero elements (i.e., the number of exiting links) in each block to the corresponding expression according to Eqs. 38–44.
Data generation is demanded to the script Generate_simdata.m. Each surrogate network has been modelled as a bi-clustered system where the behavior of the two communities, namely and
, is governed by the following set of equations:
The set of Eqs. 38–44 makes the underlying structure of a given network dependent on the set of generating parameters . Specifically, Eq. 38 simply equals the number of existing connections in each network to the sum of within- and between-cluster connections (respectively given by
and
). The parameter
represents network’s density and modulates the topology of the network regardless of its modular structure. Equations 39 and 40 describe, respectively, how the parameter
manages the proportion of within- and between-cluster edges for a given network. Specifically, as
increases the modular structure of the network becomes more pronounced, with a few edges connecting two sets of densely connected nodes. The distribution of within- and between-cluster connections all over the network is tuned by
and
as described in Eqs. 41–42 and Eqs. 43–44. The term-by-term summation of Eqs. 41 and 42 (respectively, of Eqs. 43 and 44) gives the total number of between-cluster (respectively, within-cluster) links. According to Eqs. 41–42,
manages the flow imbalance in between-cluster connections. More in detail,
correspond to a balanced scenario, where the number of existing links from
to
equals the number of connections from
to
. Similarly,
tunes the imbalance in within-cluster connections according to Eqs. 43 and 44, being
related to a balanced scenario in which the two clusters are equally densely populated. Any variation from
(respectively,
) reflects into an imbalance in within-cluster (respectively, between-cluster) connections.
2.5.2 . Constraints on generating parameters.
The set of Eqs. 38–44 points out that, due to the intimate relationship among and
, parameters
and
are not free to vary on
. It is thus necessary to put some constraints on the generating parameters as to make the surrogate networks compatible with real world scenarios.
Since many biological networks are known to be sparse [32,33], should vary between
and
with those values corresponding to an empty and a half-full network respectively. As to simulate networks with different sparsity level, a set of suitable choices may be given by
. Further issues concern the definition for a cluster of nodes. Classic approaches define a cluster as a set of tightly connected nodes, with a few connections existing between nodes of different clusters. According to this view,
should reasonably vary between
and
, being
associated to poorly pronounced clusters (i.e., the number of between-cluster connections equals the number of within ones) while
represents a bipartite network. According to Eqs. 43 and 44 within-cluster connections can be written as convex combinations of
and
with respect to
. This means that for any
the corresponding
is automatically assigned, thus naturally limiting
within
. Similarly, according to Eqs. 41 and 42
should be constrained within the same range of
. Further topological constraints come from practical considerations related to network’s topology. For equally sized clusters without self-loops, the maximum number allowed for within-cluster connections is
being the number of nodes in the underlying network. Given that
is the number of existing connections in a
-density network, for the most populated cluster the following should hold:
thus leading to
For real-world networks it is not hard to meet : the constraint in Eq. 47 thus becomes
Setting the analysis of the worst-case scenario (i.e.,
) leads to
Similarly, for between-cluster connections the following should hold:
which for restricts to
Conditions in Eqs. 49 and 51 should be simultaneously verified, thus constituting a system of two inequalities with three unknowns which is known to have a parametric solution with respect to one among or
. As to deal with this issue, we empirically derived an upper boundary for
and then set
and
accordingly. More in detail, given an a priori partition
that splits the vertex set into equally sized clusters, we fixed
and run a simulation in which
synthetic networks have been generated for each value of
between
and
with a fixed step of
. For each network,
was compared with the partition provided by the Fiedler vector by means of cosine similarity measure.
As appreciable from Fig 4, when the two partitions are almost identical for each
, thus suggesting that
represents a suitable candidate as an upper boundary for
. On the other hand,
identifies the opposite scenario in which the a priori partition and the minimum-cut one are (almost) orthogonal. An intermediate situation can be found for
, where the cosine similarity is approximately
and the two partitions partially overlap. While the lower bound
imposes no further constraints on
and
, plugging
into Eq. 49 leads to an upper boundary condition for
.
The plot presents the values for the cosine similarity (mean standard deviation) obtained comparing the minimum-cut and an a priori vertex set partition for
synthetic networks. Surrogate networks were generated for each value of
between
and 1 with a fixed step of
. Different colors correspond to different values of
as indicated in the legend.
About the between-cluster links, plugging into Eq. 51 imposes no limits on
. On the other hand, substituting the critical value
into Eq. 51 leads to the same upper boundary already obtained for
in Eq. 52.
A suitable choice for both and
should allow to simulate both a balanced and an unbalanced scenario. As discussed in Section e.1,
reflects a balanced scenario in which the two clusters are equally densely populated, while
corresponds to a balance in between-cluster connection. As for the imbalance in within-cluster links, a reasonable choice is
since it meets the need to mimic an imbalance in within-cluster connections while respecting the constraint in Eq. 52. The same holds for
which allows to simulate a strong imbalance in between-cluster flow. A suitable choice for the tuning parameters can thus be
and
. Generating parameters together with their definition, range and values used in the following examples are summarized in Table 2.
2.5.3. Toy example #1.
In the first toy example (implemented in the SPARK_ex1.m script in the main folder) the SPARK toolbox is used to compare different kind of networks. Specifically, a set of random networks was put in comparison with three different populations, each one characterized by a different value of and no imbalance in within- nor between-cluster connections. More in detail, clustered populations are made of binary matrices with
nodes generated using the Generate_simdata.m script introduced in Section 2.5.1. Each clustered population comprises
networks with fixed density
, having
while
varies in
, allowing to have a different value of
characterizing each population. In this way the topology of the underlying network exhibits a modular structure that depends solely on the value of
. Random networks, on the other hand, have been generated maintaining the same density of the clustered networks, but without any superimposed modular structure. For clustered networks, as
increases the a priori vertex set partition is expected to overlap the minimum-cut one, thus properly describing the behavior of the underlying group of networks. In contrast, the same partition is not expected to properly fit the random population since those kinds of networks are not guaranteed to have a modular structure. As to compare the topological properties of the two populations, SPARK toolbox was used to extract a subset of relevant indices (from those in Table 1) from each network (regardless of its nature) and for each density level
. Specifically, both global (algebraic connectivity and relaxation time) and partition-dependent indices (normalized cut, normalized association and edge measure) have been calculated as to compare the two populations from different perspectives.
2.5.4. Toy example #2.
In the second example (implemented in the SPARK_ex2.m script in the main folder) an a priori vertex set partition was put in comparison with the minimum-cut one for different network configurations. As reported in Section 2.5.1, the combination of and
determines the topological structure of the underlying network. More in detail,
modulates the emergence of the two clusters, while
and
regulate the imbalance in between- and within-cluster connections respectively. For the whole set of possible combinations of
and
the minimum-cut partition is put in comparison with an a priori partition describing the structure of the underlying network (representing the ground-truth). As already described in Section 2.5.1, when
the topology of the network only depends on
: the greater
is, the more pronounced the presence of the clusters will be and the Fiedler vector is more likely to overlap the a priori partition. When the underlying network does not have a pronounced modular structure (i.e., for low values of
), any shift in
and/or
is expected to disrupt the equilibrium in between- and/or within-cluster connections and the Fiedler vector is not guaranteed to correctly identify the two clusters. This happens because the a priori partition does not consider the imbalance caused by
and/or
which, instead, influences the minimum cut partition. As to compare the effects induced by different partitions on the same index distribution, let
denote the generic index dependent on the underlying partition. The ratio
represents a suitable way to emphasize the effects of different partitions on the same graph. When (i.e.,
), the a priori vertex set partition and the minimum-cut one coincide. On the other hand, when
the two partitions behave differently. More in detail, if
the current index assumes larger values for the ground truth than for the minimum-cut partition, the reversal being true for
. For each combination of
and
,
synthetic networks have been generated using the Generate_simdata.m script. A set of partition dependent indices were then calculated on each network using both the ground truth and the minimum cut partition. For each density value
, a one-way ANOVA was used to assess the differences in
distributions introduced by a modulation of the generating parameters
and
.
2.5.5. Real data scenario: public available datasets.
In the third example SPARK was tested on two public datasets from the UCI machine learning repository (https://archive.ics.uci.edu/) respectively implemented in the SPARK_ex3a.m and SPARK_ex3b.m scripts in the main folder. More in detail, the toolbox was tested on two popular datasets: the Breast Cancer Wisconsin (https://archive.ics.uci.edu/dataset/15/breast+cancer+wisconsin+original) and the rice dataset (https://archive.ics.uci.edu/dataset/545/rice±cammeo±and±osmancik). The Breast Cancer Wisconsin dataset is a popular dataset extensively used in machine learning and biomedical research. It consists of instances of breast cancer samples described by 9 features extracted from images of fine needle aspirate breast mass biopsies. Each instance is labelled as either malignant or benign, making it a suitable resource for developing classification algorithms. On the other hand, the rice dataset consists of two rice varieties commonly used in agricultural and genetic research [34]. In this dataset
instances of Cammeo and Osmancik rice varieties are described through 7 different features including various phenotypic traits, yield-related measurements and morphological characteristics that helps to distinguish the two rice varieties. These datasets were chosen to test SPARK’s adaptability to different scenarios, given that the number of instances varies significantly between the two datasets.
As to deal with normalized quantities, data were first z-scored as usual in machine learning preprocessing. Each dataset was then embedded into a network-wise representation using a diffusion kernel with the inverse of the Euclidean distance between each couple of points at the exponent as described in Fig 5. The thresholded version of the weighted adjacency matrix was obtained by setting to zero those edges with weights smaller than the median of the strengths' distribution in the full version of the network. Finally, the minimum-cut partition was extracted from the topology of the unweighted adjacency matrix using SPARK. Labels extracted by the Fiedler vector were then compared with the ground truth of each dataset and classification performances were computed using the corresponding confusion matrices.
Steps are sequential from left to right: data representation in the feature space, network embedding through a diffusion kernel based on the Euclidean distance, extraction of the Fiedler vector with the corresponding minimum-cut partition and classification performances comparing the ground truth with the label of the Fiedler vector.
Since the datasets are usually non-balanced, the largest class in each dataset was split into equally sized parts, each of which was tested against the smaller class running a
-fold validation. Classification performances and required computational times were then extracted by averaging the performances on 100 iterations, each of which comprises a
-fold validation on the whole dataset.
2.5.6. Real data scenario: a case study.
In the last example SPARK was tested on real data to compare functional connectivity matrices (FCMs) extracted from the EEG signals of two post-stroke patients. The two patients belong to a population of post-stroke subjects enrolled in a longitudinal study within the inpatient service of Fondazione Santa Lucia IRCCS in Rome for purposes other than those of this work. The study was approved by the local ethics board at Fondazione Santa Lucia IRCCS (CE PROG.752/2019) and the participants signed an informed consent. The two patients were chosen to be matched in aetiology (both experienced a haemorrhagic stroke), while differ in their residual motor ability as assessed by the Upper Extremity Fugl-Mayer Assessment (UEFMA) score [35]. Ranging from 0 to 66 points, the UEFMA clinical scale can be used to assess different level of motor impairment for the upper limb (0–22 severe motor impairment, 23–44 moderate motor impairment, 45–66 mild motor impairment) [36]. According to this view, subject suffers for a severe upper limb impairment being
, while
has a moderate impairment since
. According to numerous reports in the literature about differences in FCMs related to upper limb motor impairment [37,38], we expect that the difference in the residual motor ability of the two patients would be reflected in a different topological organization of the corresponding FCMs [15,17,39,40]. Since spectral graph theory and random walk proved to be useful tools for the analysis of topological and dynamic properties of FCMs [41], in this hands-on example SPARK was used to investigate the topological alterations characterizing FCMs in patients with different levels of motor impairment. The EEG signals were recorded for 2 minutes using a 64-electrodes cap (reference on digitally linked earlobes, ground on left mastoid) with a sampling frequency of 256 Hz during resting state condition with eyes opened (OE) using a commercial EEG system (g.HIAMP; g.tec medical engineering GmbH, Austria). Raw signals were band-pass filtered in [1,45] Hz and ocular artifacts were removed by means of Independent Component Analysis (ICA) (Vision Analyzer 1.05 software, Brain Products GmbH, Germany). Power-line interference was removed using a 50 Hz notch filter and the EEG time series were then chunked in 1 s lasting epochs. A semiautomatic procedure was then applied to reject those trials exceeding a voltage threshold of ±100 μV. As to reduce crosstalk phenomena between adjacent electrodes and avoid the identification of spurious connectivity flow, brain connectivity was extracted from a subset of 24 electrodes equally distributed over the scalp (AF7, AF8, F5, F1, F2, F6, FT7, FC3, FC4, FT8, C5, C1, C2, C6, TP7, CP3, CP4, TP8, P5, P1, P2, P6, PO7, PO8). Functional connectivity was then estimated using Partial Directed Coherence (PDC), a spectral estimator derived from the multivariate autoregressive (MVAR) model of the EEG time series [42]. PDC values were then averaged within four frequency bands (theta
, alpha
, beta
and gamma
). As to discard spurious connections, PDC values were statistically assessed against chance level by applying the asymptotic method in [43]. By an operative point of view, for each connectivity matrix
random networks were generated to test FCMs against their corresponding null case scenario. Specifically, random networks were generated with the only constraint to have the same density of their real counterpart, without any superimposed topological structure. An a priori vertex set partition was then superimposed to each network, dividing the whole vertex set into affected and unaffected hemisphere according to the stroke side. Finally, for each network both global and partition-dependent indices were calculated using SPARK toolbox.
3. Results
3.1. Toy example #1
This section shows the results of the toy example described in Section 2.5.3. More in detail, both global (algebraic connectivity and relaxation time) and partition-dependent indices (normalized cut, normalized association and edge measure) have been calculated to compare clustered and random populations of networks. As appreciable from Fig 6 higher values for both the algebraic connectivity and the relaxation time characterize random networks, reflecting a less organized structure when compared to their clustered counterparts. Partition-dependent indices confirm this characterization since both the normalized cut and the edge measure are larger in random than in clustered networks, while the normalized association follows the opposite trend. From Fig 6 it is also possible to appreciate that as increases, partition-dependent indices efficiently describe the topology of clustered networks. Although results in Fig 6 refers to the
scenario, consistent results were obtained for each value of
(see the Supporting Information).
. Radar chart representing the within-population average value of five different parameters (normalized association, edge measure, relaxation time, normalized cut and algebraic connectivity) extracted using SPARK toolbox on clustered networks with
density (
) and their random counterparts. The red line identifies the random population, while remaining lines refer to
(orange),
(cyan) and
(blue) scenarios.
3.2. Toy example #2
This paragraph shows the results of the toy example described in Section 2.5.4: results in Fig 7 refers to the scenario, but similar results for the
and
scenarios can be found in the Supporting Information.
index distribution when: a)
, b)
, c)
and d)
for
. Each panel shows the boxplots describing the
distribution as a function of the within/between cluster ratio
. The symbol * indicates a statistically significant result (i.e.,
) for the post hoc Tuckey HSD test.
The boxplot representation in Fig 7 allows to appreciate that distributions for poorly clustered networks (i.e., when
) clearly differ from their counterparts calculated on networks with an underlying modular structure (i.e.,
and
scenarios) for all the combinations of
and
. Comparing the boxplots in Fig 7 with the ANOVA results in Table 3 it can also be appreciated that different values of
reflect into different
distributions for each combination of
and
.
Fig 8 shows the distribution of for each combination of
and
. An opposite trend characterizes
when compared to the
distributions, since
approaches zero from below. As expected, the normalized cut is smaller for the Fiedler partition (i.e.,
has negative values), since the Fiedler partition is the one that minimizes the cut between the two clusters.
index distribution when: a)
, b)
, c)
and d)
for
. Each panel shows the boxplots describing the
distribution as a function of the within/between cluster ratio
. The symbol * indicates a statistically significant result (i.e.,
) for the post hoc Tuckey HSD test.
Comparing the ANOVA results in Table 4 with the boxplot representation in Fig 8, it can also be appreciated how different values of reflect into different
distributions for each combination of
and
. As for
, also
distributions for poorly clustered networks are significantly different from their counterparts computed on networks with a clear modular structure.
3.3. Real data scenario: Public available datasets
This paragraph shows the results from Section 2.5.5, where SPARK was tested on two public available datasets as an unsupervised binary classifier exploiting the properties of the Fiedler vector. Classification performances for the Wisconsin Breast Cancer dataset are reported in Table 5.
Once achieved a graph representation for the data in the feature space, the computational time required to extract the Laplacian matrix and calculate the Fiedler vector for the underlying graph is . Despite the simplicity of the classification criterion, performance in Table 5 reports accuracy, precision and F1 score above the
together with a recall of
. The same indices were then used to evaluate the performances on the medium-large graph extracted from the rice dataset, as reported in Table 6.
The computational time required for the extraction of the Laplacian matrix and the corresponding Fiedler vector is which is, as expected, larger than the previous one given the presence of more instances and, thus, of a larger matrix to solve for the eigendecomposition. However, classification performances still show an accuracy, precision and F1 score above the
together with a recall of
.
3.4. Real data scenario: a case study
This paragraph shows the results from Section 2.5.6, providing a graphical representation of the comparison between the FCMs of and
and their random counterparts. Measures in Fig 9 relates to alpha band, but similar results have been found for the remaining frequency bands: interested readers will find them in the Supporting Information.
Radar chart represents five features (normalized association, edge measure, normalized cut, directed cut from AH to UH and directed cut from UH to AH) extracted using SPARK toolbox. The orange line refers to the FCM extracted from in alpha band while the red one refers to its random counterpart. Similarly, the cyan line refers to the FCM extracted from
in alpha band and the blue one to its random counterpart.
As evident from Fig 9, the FCMs of and
have different topological features. Firstly, it should be noted that in both
and
the normalized association is higher when compared to their random counterparts while an opposite trend exists for the remaining indices. When comparing the features of
with those of
, it can be noted that the normalized association is higher for subject
than for
. On the other hand,
is characterized by lower values for those measures relying on between-cluster connections, both at static and dynamic level. Taken together, these two facts can be summarized as a tendency for FCMs to organize into a more organized structure in the patient that preserved more the residual motor ability (being
and
).
4. Discussion
In this work we presented SPARK, an open-source MATLAB toolbox for the analysis of digraphs that combines spectral graph theory and random walk. With respect to other existing MATLAB frameworks for network analysis SPARK deliberately focuses on spectral graph theory and random walk concepts, thus finding its own identity in the background of toolboxes for network analysis. In this context, the Brain Connectivity Toolbox [3] was pioneering in making the basics of graph theory accessible to a large audience, especially in the neuroscientific field. Its ease of use and simplicity made it extensively used, contributing to its large diffusion in modern neuroscience. Similarly, the Graph Signal Processing Toolbox [4] was tailored for researchers working on graph signal processing, providing various tools for implementing graph signal processing techniques on undirected graphs. In this perspective, SPARK finds its own identity focusing on spectral graph theory and random walk analysis for both directed and undirected networks. More specifically, SPARK provides the MATLAB code that implements the indices in Table 1 characterizing them to the case of directed, undirected, binary and weighted networks. In Section 2.5.1 we also proposed a practical way to model the behaviour of a network made by two interacting communities. The MATLAB script Generate_simdata.m implements the set of Eqs. 38–44 for the case of two equally sized clusters, but it can be easily generalized to communities of different size. The “mid-level” characterization approach proposed by SPARK was then tested on two toy-examples using synthetic data and two hands-on scenarios with real data.
Results in Section 3.1 (referred to the first toy example) show that, as increases, synthetic networks are characterized by a pronounced normalized association and a low normalized cut. Consistently with the hypotheses in Section 2.5.3, those two features reflect the presence of more within- than between-cluster links that properly describe the topology of the underlying network. On the other hand, when fitting the a priori partition to a random network the majority of links fall in between-cluster communication, thus increasing the normalized cut while leading to low values for the normalized association. The obtained findings confirmed the hypotheses in Section 2.5.3, thus encouraging the use of spectral graph theory and random walk tools for the analysis of cluster-to-cluster interactions in networks. Apart from clustering [6,24], dimensionality reduction [44] and data representation problems [45] spectral graph theory together with random walk analysis proved to be a reliable tool for the cluster-level characterization of complex networks.
Concerning the second toy example, results in Section 3.2 show that a pronounced clustered topology for the underlying network causes the distributions to approach zero, regardless for the within- and/or between-cluster imbalance. This can be justified combining Eq. 39 with the definition of association, leading to:
The equality in Eq. (54) says that has a direct effect on the term
once fixed
and
. In fact, as
increases the number of within-cluster connections increase too regardless on
and
, thus justifying the results in the first part of Section 3.2. As confirmed by the result of the post-hoc test in Fig 7, when
the
distribution clearly differs from the
and
scenarios. Specifically, in the first case the two clusters are not so pronounced, thus leading to a different choice of vertices characterizing the a priori partition and the minimum-cut one. On the other hand, when
and
the clusters are easier to detect: the two partitions thus tend to overlap and the two
distributions get closer each other. A similar reasoning applies for the
distributions. Being the
directly related to
(see Eq. 40), an increase in
accomplishes a decrease in
, thus justifying the trend in Fig 8. Also in this case, when the underlying network does not show a pronounced modular structure (i.e.,
), the a priori vertex set partition and the minimum-cut one identify different subset of vertices. On the other hand, the two partitions tend to overlap when
and
and the two
distributions get closer each other. This also allows to appreciate that, when the network exhibits an organized topology (i.e.,
) SPARK correctly identifies the two interacting clusters regardless for an imbalance in within- and/or between-cluster links.
Results in Section 3.3 refers to the application of SPARK on real public available dataset. As reported in Tables 5 and 6, despite the simplicity of the model classification performances are encouraging given that, for both datasets, the spectral clustering achieves accuracies, F1 score and precision above 90%. As expected, computational times increase as the dataset becomes larger, since an increased number of instances leads to a larger network and, thus, to a more demanding computational cost for eigenvectors extraction. This may represent a potential bottleneck affecting not only SPARK performances but due, in general, to the demanding computational costs for eigenvectors estimation of large matrices. Further studies may investigate the performances of different techniques for the eigendecomposition of large matrices. However, despite the achieved performances, it should also be noted that spectral clustering may not be suitable to all types of datasets, as it assumes that the two classes in the original dataset share only a few edges (while most links lie in within-cluster communications). This is due to the fact that the minimum-cut partition implicitly assumes that two classes can be efficiently identified through the projection of the data along the direction of the Fiedler vector: this condition is not always met, especially when the number of between-cluster links is high as shown in the toy examples on synthetic data. Different kind of embeddings can be explored to extract a network representation from a set of data points (for example using a KNN algorithm to retain the nearest neighbors of each node) before spectral clustering, but this is out of the scope of this paper and can be further investigated in future works.
Results in Section 3.4 are in line with scientific literature confirming that different impairment conditions reflect in a different topological organization of FCMs [16,17,39,40]. A first result from the plot in Fig 9 is that real networks differ from their random counterpart: in other words, this indicates that functional brain networks are more prone to organize into communities. Specifically, the a priori vertex set partition (which divides the whole vertex set into affected and unaffected hemisphere) reflects the presence of more within- than between-cluster links, thus indicating the presence of a superimposed topological organization characterizing brain networks. For this kind of organization is more evident than since the topological features of make it closer to a random network when compared to . These results are in line with previous studies, which observed that functional changes in post-stroke networks are characterized by an increase in integration and a loss in segregation properties that push the underlying network away from an optimal small-world configuration [12,16]. The experimental findings in Section 3.4 thus support the hypothesis that a difference in the residual motor ability of the two subjects reflects into a different organization of their FCMs [41].
Results on both synthetic and real data encourage the use of SPARK to characterize a given network in terms of its underlying communities, as well as to measure its propensity to organize into interacting clusters. Although the examples presented in the paper focus on network characterization, SPARK can also be used to approach dynamic phenomena that can be modelled as a random walk. A classic example is the gambler’s ruin problem [27], where a random walk model is used to predict the probability of a gambler to be either richer or broke at the end of a gambling session. Other applicative scenarios may include the prediction of financial movements and the representation of fluid particles in turbulent flows [46].
In conclusion, there are some weak points that should be mentioned. Even though different definitions exist for the Laplacian matrix of directed graphs, SPARK deliberating focuses on the symmetrized version proposed by Chung [30] neglecting the others. Although Chung’s definition represents one of the most largely adopted, other definitions should be considered as to investigate to which extent the choice of a different Laplacian matrix influences the community detection and the related indices. Similarly, the random walk analysis implements the PageRank model [28] thus focusing on a diffusive process responding to a precise set of equations. Different kind of dynamical phenomena (such as synchronization) should be considered as to model the dynamics that better fits the underlying network. However, even considering that the random surfer model in Eq. 31 guarantees the ergodicity of the underlying chain, particular attention should be paid on the choice of the parameter for the PageRank algorithm. As pointed out in Section 2.3.1 in fact, the higher the value of
, the more accurately the topology of the network will be preserved. In this spirit, through the whole paper and the examples, a value of
was used to ensure ergodicity while guaranteeing adherence to the original topology of the underlying network.
Although we mainly focused on biomedical applications, SPARK is a general-purpose framework, developed to provide a useful instrument for researchers interested in network science, with a particular attention to spectral graph theory and random walk applications. Its applications are not only circumscribed to the biomedical field since SPARK could fit a variety of different scenarios ranging from sensor networks to social media networks, protein interaction networks and so on. Its general-purpose nature is a distinguishing feature that makes SPARK flexible enough to be improved according to user feedback and suggestions, providing a user-friendly toolbox for a wide range of applications.
Supporting information
S1 Text. Existence and uniqueness of the stationary distribution for the probability transition matrix P.
https://doi.org/10.1371/journal.pone.0319031.s001
(DOCX)
S1 Fig. Radar chart summarizing SPARK test on the toy example #1 with
.
Radar chart representing the within-population average value of five different parameters (normalized association, edge measure, relaxation time, normalized cut and algebraic connectivity) extracted using SPARK toolbox on clustered networks with density (
) and their random counterparts. The red line identifies the random population, while remaining lines refer to
(orange),
(cyan) and
(blue) scenarios.
https://doi.org/10.1371/journal.pone.0319031.s002
(DOCX)
S2 Fig. Radar chart summarizing SPARK test on the toy example #1 with
.
Radar chart representing the within-population average value of five different parameters (normalized association, edge measure, relaxation time, normalized cut and algebraic connectivity) extracted using SPARK toolbox on clustered networks with density (
) and their random counterparts. The red line identifies the random population, while remaining lines refer to
(orange),
(cyan) and
(blue) scenarios.
https://doi.org/10.1371/journal.pone.0319031.s003
(DOCX)
S3 Fig. Boxplot representation for the
index distribution when: a)
, b)
, c)
and d)
for
.
Each panel shows the boxplots describing the distribution as a function of the within/between cluster ratio
. The symbol * indicates a statistically significant result (i.e.,
) for the post hoc Tuckey HSD test.
https://doi.org/10.1371/journal.pone.0319031.s004
(DOCX)
S4 Fig. Boxplot representation for the
index distribution when: a)
, b)
, c)
and d)
for
.
Each panel shows the boxplots describing the distribution as a function of the within/between cluster ratio
. The symbol * indicates a statistically significant result (i.e.,
) for the post hoc Tuckey HSD test.
https://doi.org/10.1371/journal.pone.0319031.s005
(DOCX)
S5 Fig. Boxplot representation for the
index distribution when: a)
, b)
, c)
and d)
for
.
Each panel shows the boxplots describing the distribution as a function of the within/between cluster ratio
. The symbol * indicates a statistically significant result (i.e.,
) for the post hoc Tuckey HSD test.
https://doi.org/10.1371/journal.pone.0319031.s006
(DOCX)
S6 Fig. Boxplot representation for the
index distribution when: a)
, b)
, c)
and d)
for
.
Each panel shows the boxplots describing the distribution as a function of the within/between cluster ratio
. The symbol * indicates a statistically significant result (i.e.,
) for the post hoc Tuckey HSD test.
https://doi.org/10.1371/journal.pone.0319031.s007
(DOCX)
S7 Fig. Radar chart summarizing SPARK test on the FCM comparison introduced in Section e.5.
Radar chart represents five features (normalized association, edge measure, normalized cut, directed cut from AH to UH and directed cut from UH to AH) extracted using SPARK toolbox. The orange line refers to the FCM extracted from in beta band while the red one refers to its random counterpart. Similarly, the cyan line refers to the FCM extracted from
in beta band and the blue one to its random counterpart.
https://doi.org/10.1371/journal.pone.0319031.s008
(DOCX)
S8 Fig. Radar chart summarizing SPARK test on the FCM comparison introduced in Section e.5.
Radar chart represents five features (normalized association, edge measure, normalized cut, directed cut from AH to UH and directed cut from UH to AH) extracted using SPARK toolbox. The orange line refers to the FCM extracted from in gamma band while the red one refers to its random counterpart. Similarly, the cyan line refers to the FCM extracted from
in gamma band and the blue one to its random counterpart.
https://doi.org/10.1371/journal.pone.0319031.s009
(DOCX)
S9 Fig. Radar chart summarizing SPARK test on the FCM comparison introduced in Section e.5.
Radar chart represents five features (normalized association, edge measure, normalized cut, directed cut from AH to UH and directed cut from UH to AH) extracted using SPARK toolbox. The orange line refers to the FCM extracted from in theta band while the red one refers to its random counterpart. Similarly, the cyan line refers to the FCM extracted from
in theta band and the blue one to its random counterpart.
https://doi.org/10.1371/journal.pone.0319031.s010
(DOCX)
S1 Table. ANOVA results for the
distributions in S3 Fig.Four different one-way ANOVA were run for each combination of
and
in the toy example #2.
The corresponding p and F values are shown in this table.
https://doi.org/10.1371/journal.pone.0319031.s011
(DOCX)
S2 Table. ANOVA results for the
distributions in S4 Fig. Four different one-way ANOVA were run for each combination of
and
in the toy example #2.
The corresponding p and F values are shown in this table.
https://doi.org/10.1371/journal.pone.0319031.s012
(DOCX)
S3 Table. ANOVA results for the
distributions in S5 Fig. Four different one-way ANOVA were run for each combination of
and
in the toy example #2.
The corresponding p and F values are shown in this table.
https://doi.org/10.1371/journal.pone.0319031.s013
(DOCX)
S4 Table. ANOVA results for the
distributions in S6 Fig. Four different one-way ANOVA were run for each combination of
and
in the toy example #2.
The corresponding p and F values are shown in this table.
https://doi.org/10.1371/journal.pone.0319031.s014
(DOCX)
References
- 1. Xiao Fan W, Guanrong C. Complex networks: small-world, scale-free and beyond. IEEE Circuits Syst Mag. 2003;3(1):6–20.
- 2. Motter AE, Matías MA, Kurths J, Ott E. Dynamics on complex networks and applications. Physica D: Nonlinear Phenomena. 2006;224(1–2):vii–viii.
- 3. Rubinov M, Sporns O. Complex network measures of brain connectivity: uses and interpretations. Neuroimage. 2010;52(3):1059–69. pmid:19819337
- 4. Perraudin N, Paratte J, Shuman D, Martin L, Kalofolias V, Vandergheynst P. GSPBOX: a toolbox for signal processing on graphs. arXiv. 2014.
- 5.
de Loynes B, Navarro F, Olivier B. Gasper: GrAph Signal ProcEssing in R. 2020 [cited 5 Apr 2024].
- 6. Lambiotte R, Delvenne J-C, Barahona M. Random walks, markov processes and the multiscale modular organization of complex networks. IEEE Trans Netw Sci Eng. 2014;1(2):76–90.
- 7. Doostmohammadian M, Gabidullina ZR, Rabiee HR. Nonlinear perturbation-based non-convex optimization over time-varying networks. IEEE Trans Netw Sci Eng. 2024;11(6):6461–9.
- 8. Doostmohammadian M, Aghasi A, Rikos AI, Grammenos A, Kalyvianaki E, Hadjicostis CN, et al. Distributed anytime-feasible resource allocation subject to heterogeneous time-varying delays. IEEE Open J Control Syst. 2022;1:255–67.
- 9. Li M, Micheli A, Wang YG, Pan S, Lió P, Gnecco GS, et al. Guest editorial: deep neural networks for graphs: theory, models, algorithms, and applications. IEEE Trans Neural Netw Learning Syst. 2024;35(4):4367–72.
- 10. Li M, Ma Z, Wang YG, Zhuang X. Fast haar transforms for graph neural networks. Neural Netw. 2020;128:188–98. pmid:32447263
- 11. Li J, Zheng R, Feng H, Li M, Zhuang X. Permutation equivariant graph framelets for heterophilous graph learning. IEEE Trans Neural Netw Learn Syst. 2024;35(9):11634–48. pmid:38466605
- 12. Aerts H, Fias W, Caeyenberghs K, Marinazzo D. Brain networks under attack: robustness properties and the impact of lesions. Brain. 2016;139(Pt 12):3063–83. pmid:27497487
- 13. Watts DJ, Strogatz SH. Collective dynamics of “small-world” networks. Nature. 1998;393(6684):440–2. pmid:9623998
- 14. Gratton C, Nomura EM, Pérez F, D’Esposito M. Focal brain lesions to critical locations cause widespread disruption of the modular organization of the brain. J Cogn Neurosci. 2012;24(6):1275–85. pmid:22401285
- 15. Pichiorri F, Morone G, Petti M, Toppi J, Pisotta I, Molinari M, et al. Brain-computer interface boosts motor imagery practice during stroke recovery. Ann Neurol. 2015;77(5):851–65. pmid:25712802
- 16. Siegel JS, Seitzman BA, Ramsey LE, Ortega M, Gordon EM, Dosenbach NUF, et al. Re-emergence of modular brain networks in stroke recovery. Cortex. 2018;101:44–59. pmid:29414460
- 17. Pirovano I, Mastropietro A, Antonacci Y, Barà C, Guanziroli E, Molteni F, et al. Resting state EEG directed functional connectivity unveils changes in motor network organization in subacute stroke patients after rehabilitation. Front Physiol. 2022;13:862207. pmid:35450158
- 18. de Haan W, van der Flier WM, Wang H, Van Mieghem PFA, Scheltens P, Stam CJ. Disruption of functional brain networks in Alzheimer’s disease: what can we learn from graph spectral analysis of resting-state magnetoencephalography?. Brain Connect. 2012;2(2):45–55. pmid:22480296
- 19. Daianu M, Mezher A, Jahanshad N, Hibar DP, Nir TM, Jack CR Jr, et al. Spectral graph theory and graph energy metrics show evidence for the alzheimer’s disease disconnection syndrome in APOE-4 risk gene carriers. Proc IEEE Int Symp Biomed Imaging. 2015;2015:458–61. pmid:26413205
- 20. Malliaros FD, Vazirgiannis M. Clustering and community detection in directed networks: a survey. Physics Reports. 2013;533(4):95–142.
- 21.
Spielman DA. Algorithms, graph theory, and linear equations in laplacian matrices. proceedings of the international congress of mathematicians 2010 (ICM 2010). Hyderabad, India: Published by Hindustan Book Agency (HBA), India. WSPC Distribute for All Markets Except in India; 2011. p. 2698–722.
- 22. Fiedler M. Laplacian of graphs and algebraic connectivity. Banach Center Publ. 1989;25(1):57–70.
- 23.
Gleich D. Hierarchical Directed Spectral Graph Partitioning. Stanford University; 2006.
- 24. Jianbo Shi, Malik J. Normalized cuts and image segmentation. IEEE Trans Pattern Anal Machine Intell. 2000;22(8):888–905.
- 25. Hagen L, Kahng AB. New spectral methods for ratio cut partitioning and clustering. IEEE Trans Comput-Aided Des Integr Circuits Syst. 1992;11(9):1074–85.
- 26. Seabrook E, Wiskott L. A tutorial on the spectral theory of markov chains. Neural Comput. 2023;35(11):1713–96. pmid:37725706
- 27.
Levin DA, Peres Y. Markov chains and mixing times. 2nd ed. Providence, Rhode Island: American Mathematical Society; 2017.
- 28. Lai D, Lu H, Nardini C. Finding communities in directed networks by PageRank random walk induced network embedding. Phys A Stat Mechanics Appl. 2010;389(12):2443–54.
- 29. Langville A, Meyer C. Deeper inside pagerank. Internet Math. 2004;1(3):335–80.
- 30. Chung F. Laplacians and the cheeger inequality for directed graphs. Ann Comb. 2005;9(1):1–19.
- 31. Li Y, Zhang Z-L. Digraph laplacian and the degree of asymmetry. Internet Mathematics. 2012;8(4):381–401.
- 32. Leclerc RD. Survival of the sparsest: robust gene networks are parsimonious. Mol Syst Biol. 2008;4:213. pmid:18682703
- 33. Pavlopoulos GA, Secrier M, Moschopoulos CN, Soldatos TG, Kossida S, Aerts J, et al. Using graph theory to analyze biological networks. BioData Min. 2011;4:10. pmid:21527005
- 34. Cinar I, Koklu M. Classification of rice varieties using artificial intelligence methods. ijisae. 2019;7(3):188–94.
- 35. Fugl-Meyer AR, Jääskö L, Leyman I, Olsson S, Steglind S. The post-stroke hemiplegic patient. 1. A method for evaluation of physical performance. Scand J Rehabil Med. 1975;7(1):13–31. pmid:1135616
- 36. Hernández ED, Galeano CP, Barbosa NE, Forero SM, Nordin Å, Sunnerhagen KS, et al. Intra- and inter-rater reliability of Fugl-Meyer Assessment of Upper Extremity in stroke. J Rehabil Med. 2019;51(9):652–9. pmid:31448807
- 37. Milani G, Antonioni A, Baroni A, Malerba P, Straudi S. Relation between EEG measures and upper limb motor recovery in stroke patients: a scoping review. Brain Topogr. 2022;35(5–6):651–66. pmid:36136166
- 38. Westlake KP, Nagarajan SS. Functional connectivity in relation to motor performance and recovery after stroke. Front Syst Neurosci. 2011;5:8. pmid:21441991
- 39. Grefkes C, Fink GR. Connectivity-based approaches in stroke and recovery of function. Lancet Neurol. 2014;13(2):206–16. pmid:24457190
- 40. Silasi G, Murphy TH. Stroke and the connectome: how connectivity guides therapeutic intervention. Neuron. 2014;83(6):1354–68. pmid:25233317
- 41. Ranieri A, Pichiorri F, Mongiardini E, Colamarino E, Cincotti F, Mattia D, et al. Spectral graph theory to investigate topological and dynamic properties of EEG-based brain networks: an application to post-stroke patients. 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Orlando, FL, USA: IEEE; 2024. pp. 1–4.
- 42. Baccalá LA, Sameshima K. Partial directed coherence: a new concept in neural structure determination. Biol Cybern. 2001;84(6):463–74. pmid:11417058
- 43. Toppi J, Mattia D, Risetti M, Formisano R, Babiloni F, Astolfi L. Testing the significance of connectivity networks: comparison of different assessing procedures. IEEE Trans Biomed Eng. 2016;63(12):2461–73. pmid:27810793
- 44. Belkin M, Niyogi P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation. 2003;15(6):1373–96.
- 45.
Dhillon IS, Guan Y, Kulis B. Kernel k-means: spectral clustering and normalized cuts. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. Seattle WA USA: ACM; 2004. p. 551–6.
- 46.
Chanson H. Turbulent dispersion and mixing: 1. Vertical and transverse mixing. In: Environmental Hydraulics of Open Channel Flows. Elsevier; 2004. p. 81–98.