Topological state-space estimation of functional human brain networks

We introduce an innovative, data-driven topological data analysis (TDA) technique for estimating the state spaces of dynamically changing functional human brain networks at rest. Our method utilizes the Wasserstein distance to measure topological differences, enabling the clustering of brain networks into distinct topological states. This technique outperforms the commonly used k-means clustering in identifying brain network state spaces by effectively incorporating the temporal dynamics of the data without the need for explicit model specification. We further investigate the genetic underpinnings of these topological features using a twin study design, examining the heritability of such state changes. Our findings suggest that the topology of brain networks, particularly in their dynamic state changes, may hold significant hidden genetic information.


Author Summary
The paper introduces a new data-driven topological data analysis (TDA) method for studying dynamically changing human functional brain networks obtained from the resting-state functional magnetic resonance imaging (rs-fMRI).Leveraging persistent homology, a multiscale topological approach, we present a framework that incorporates the temporal dimension of brain network data.This allows for a more robust estimation of the topological features of dynamic brain networks.
The method employs the Wasserstein distance to measure the topological differences between networks and demonstrates greater efficiency and performance than the commonly used -means clustering in defining the state spaces of dynamic brain networks.Our method maintains robust performance across different scales and is especially suited for dynamic brain networks.
In addition to the methodological advancement, the paper applies the proposed technique to analyze the heritability of overall brain network topology using a twin study design.The study investigates whether the dynamic pattern of brain networks is a genetically influenced trait, an area previously underexplored.By examining the state change patterns in twin brain networks, we make significant strides in understanding the genetic factors underlying dynamic brain network features.Furthermore, the paper makes its method accessible by providing MATLAB codes, contributing to reproducibility and broader application.

Introduction
In standard graph theory-based network analysis, network features such as node degrees and clustering coefficients are obtained from adjacency matrices after thresholding weighted edges [5,93,104,23].The final statistical analysis results can vary depending on the choice of threshold or parameter [22,62].This variability underscores the need for a multiscale network analysis framework that provides consistent results and interpretation, regardless of the choice of parameter.Persistent homology, a branch of algebraic topology, presents a novel solution to this challenge of multiscale analysis [36].Unlike traditional graph theory approaches that analyze networks at a single fixed scale, persistent homology examines networks across multiple scales.It identifies topological features that remain persistent and are robust against different scales and noise perturbations [73,86,87,98].
Recent studies have illustrated the versatility of persistent homology in analyzing complex networks, including brain networks.Sizemore et al. [87], Xing et al. [105] highlighted the application of persistent homology in evaluating temporal changes in topological network features.Aktas et al. [2] used persistent homology to detect and track the evolution of networks' clique.Billings et al. [7] discussed the use of simplicial complexes encoded by persistent homology for brain networks.Sizemore et al. [86] applied persistent homology to investigate the spatial distributions of cliques and cycles in brain networks.Khalid et al. [56], Caputi et al. [15] showed how persistent homology could be used in the analysis of functional brain connectivity using EEG.Chung et al. [30] utilized persistent homology to analyze brain networks for studying abnormal white matter in maltreated children.These studies collectively emphasize the potential of persistent homology in providing a robust framework for multiscale network analysis.This approach's ability to capture topological features across different scales and under varying conditions makes it particularly suitable for studying the complex brain networks.
Persistent homological network approaches have shown to be more robust and outperform many existing graph theory measures and methods.In Lee et al. [61,62], persistent homology was shown to outperform eight existing graph theory features, such as clustering coefficient, small-worldness, and modularity.Kuang et al. [58] showed persistent homology-based measures can provide more significant group difference and better classification performance compared to standard graph-based measures that characterize small-world organization and modular structure.In Chung et al. [24,25], persistent homology was shown to outperform various matrix normbased network distances.In Wang et al. [103], persistent homology was shown to outperform the power spectral density and local variance methods.In Wang et al. [102], persistent homology was shown to outperform topographic power maps.In [110], center persistency was shown to outperform the network-based statistic and element-wise multiple corrections.In Chung et al. [29], persistent homology based clustering is shown to outperform -means clustering and hierarchical clustering.However, the method has been mainly used on static networks or a static summary of time-varying networks.The dynamic pattern of persistent homology for time-varying brain networks was rarely investigated, with a few exceptions [109,82,90,41,86,29].
While Euclidean loss remains the dominant cost function in deep learning, topological losses based on persistent homology are emerging as superior in tasks requiring topological understanding [16,49,43,64].These topological losses incorporate penalties based on the topological features of the data, distinguishing them from the Euclidean loss, which primarily focuses on differences at the node or edge level.By encoding the intrinsic topological structure of the network, topological losses facilitate the creation of more informative feature maps, potentially enhancing overall model performance [48].Gupta et al. [43] demonstrated that image segmentation based on topological loss outperforms other deep learning architectures for similar tasks.Lin et al. [64] introduced a new architecture that excels in segmenting curvilinear structures by learning topological similarities over existing methods.
In this paper, we propose to develop a novel dynamic persistent homology framework for time varying network data.Our coherent scalable framework for the computation is based on the Wasserstein distance between persistent diagrams, which provides the topological profile of data into 2D scatter plots.We directly establish the relationship between the Wasserstein distance and edge weights in networks making the method far more accessible and adaptable.We achieve O ( log ) run time in most graph manipulation tasks such as matching and averaging.Such scalable computation enables us to perform a computationally demanding task such as topological clustering with ease.The method is applied in the determination of the state space of dynamically changing functional brain networks obtained from the resting-state functional magnetic resonance imaging (rs-fMRI).We will show that the proposed method based on the Wasserstein distance can capture the topological patterns that are consistently observed across different time points.
The Wasserstein distance or Kantorovich-Rubinstein metric, as originally defined between probability distributions, can be used to measure topological differences [99,14,6].Due to the connection to the optimal mass transport, which enjoys various optimal properties, the Wasserstein distance has been applied to various imaging applications.Nonetheless, its application in network data analysis remains relatively limited [67,30].Mi et al. [69] used the Wasserstein distance in resampling brain surface meshes.Shi et al. [85], Su et al. [94] used the Wasserstein distance in classifying brain cortical surface shapes.Hartmann et al. [46] used the Wasserstein distance in building generative adversarial networks.Sabbagh et al. [80] used the Wasserstein distance for manifold regression problems in the space of positive definite matrices for the source localization problem in EEG.Xu et al. [107] used the Wasserstein distance in predicting Alzheimer's disease progression in magnetoencephalography (MEG) brain networks.Fu et al. [39] enhanced images by regularizing with the Wasserstein distance.However, the Wasserstein distance in these applications is all geometric in nature.
We applied the method to dynamically changing twin brain networks obtained from the resting-state functional magnetic resonance imaging (rs-fMRI).We investigated if the state change pattern in time varying brain networks is genetically heritable for the first time.This is not yet reported in existing literature.Monozygotic (MZ) twins share 100% of genes while dizygotic (DZ) twins share 50% of genes [38].MZ-twins are more similar or concordant than DZ-twins for cognitive aging and dysfunction [78].The difference between MZ-and DZ-twins directly quantifies the extent to which imaging phenotypes, behaviors and cognitions are influenced by genetic factors [112].If MZ-twins show more similarity on a given trait compared to DZ-twins, this provides a piece of evidence that genes significantly influence that trait.Even twin studies on normal subjects are useful for understanding the extent to which psychological and medical disorders, as well as behaviors and traits, are influenced by genetic factors.This information can be used to develop better ways to prevent and treat disorders and maladaptive behaviors.Some of the most effective treatments for medical disorders have been identified as a result of twin studies [81].
Even though there are numerous twin imaging studies, almost all previous studies used static univariate imaging phenotypes such as cortical thickness [68], fractional anisotropy [18], functional activation [9,42,88] in determining heritability in brain networks.There have been a limited number of studies investigating the heritability of the dynamics of brain networks [9,100].It is not even clear the dynamic pattern itself is a heritable trait.We propose to tackle this challenge.Measures of network dynamics are worth investigating as potential phenotypes that indicate the genetic risk for neuropsychiatric disorders [10].Determining the extent of heritability of dynamic pattern is the first necessary prerequisite for identifying dynamic network phenotypes.
One of the earliest papers on functional brain activation in twins is based on the resting-state EEG [66], where they observed high twin correlation in MZ-twins on Fig. 1 Proposed topological clustering pipeline used in estimating the state space.Given two weighted graphs  1 ,  2 , we first perform the birth-death decomposition and partition the edges into sorted birth and death sets (section 3.3).The 0D topological distance  0 between birth values quantifies discrepancies in connected components (section 3.4).The 1D topological distance  1 between death values quantifies discrepancies in cycles (section 3.4).The combined distance D =  2 0 +  2 1 is used in computing the within-cluster distance   between graphs.Topological clustering is performed by minimizing   over all possible cluster labels  1 , • • • ,   (section 3.6).
EEG spectra.Glahn et al. [42] reported a heritability of 0.42 for default-mode network (DMN) in an extended pedigree study without twins.Xu et al. [106] reported a heritability of 0.54 for DMN on using 24 pairs of MZ and 22 pairs of DZ.Korgaonkar et al. [57] studied 79 MZ twins and 46 DZ twin pairs, reporting heritability in only one specific connection: They found statistically significant heritability of 0.41 for the connection between the precuneus and the right inferior parietal/temporal cortex, using a structural equation model.We report far stronger results with much higher heritability in a larger twin study.

Graphs as simplicial complices
The proposed method for estimating topological state space is based on the topological clustering on a collection of graphs (Figure 1).The initial step involves a birthdeath decomposition of a weighted graph, leading to the generation of sorted birth and death sets (section 3.3).The second step entails calculating the topological distance: between birth sets to obtain the 0D topological distance  0 , and between death sets to obtain the 1D topological distance  1 (section 3.4).The third step involves computing the within-cluster distance   among the collection of graphs (section 3.6).Subse-quently, we demonstrate the equivalence of topological clustering with -means clustering in a high-dimensional convex set, employing -means clustering routines for optimization.To increase the reproducibility, MATLAB codes for performing the methods are provided in https://github.com/laplcebeltrami/PH-STAT.
A high dimensional object such as a brain network can be modeled as weighted graph X = (, ) consisting of node set  indexed as  = {1, 2, • • • , } and edge weights  = (   ) between nodes  and .If we order the edge weights in the increasing order, we have the sorted edge weights: where  ≤ (  2 − )/2.The subscript ( ) denotes the order statistic.In terms of sorted edge weight set  = { (1) , • • • ,  () }, we may also write the graph as X = (, ).
If we connect nodes following some criterion on the edge weights, they will form a simplicial complex which will follow the topological structure of the underlying weighted graph [36,114].Note that the -simplex is the convex hull of  + 1 points in .A simplicial complex is a finite collection of simplices such as points (0-simplices), lines (1-simplices), triangles (2-simplices) and higher dimensional counter parts.The Rips complex X  is a simplicial complex, whose -simplices are formed by ( + 1) nodes which are pairwise within distance  [40].While a graph has at most 1-simplices, the Rips complex has at most (  − 1)-simplices.The Rips complex induces a hierarchical nesting structure called the Rips filtration , where the sequence of -values are called the filtration values.The filtration is quantified through a topological basis called -cycles.0cycles are the connected components, 1-cycles are 1D closed paths or loops while 2cycles are 3-simplices (tetrahedron) without interior.Any -cycle can be represented as a linear combination of basis -cycles.The Betti number   counts the number of independent -cycles.During the Rips filtration, the -th -cycle is born at filtration value   and dies at   .The collection of all the paired filtration values displayed as 1D intervals is called the barcode and displayed as a 2D scatter plot is called the persistent diagram.Since   <   , the scatter plot in the persistent diagram are displayed above the line  =  line by taking births in the -axis and deaths in the -axis.
For a dynamically changing brain network X() = (, ()), we assume the node set is fixed while edge weights are changing over time .If we build persistent homology at each fixed time, the resulting barcode is also time dependent:

Graph filtrations
As the number of nodes  increases, the resulting Rips complex becomes increasingly dense.Additionally, as the filtration values rise, the number of edges connecting each pair of nodes also increases, leading to a more interconnected structure.At higher filtration values, Rips filtration becomes an ineffective representation of networks.To remedy this problem, graph filtration was introduced [61,62].Given weighted graph X = (, ) with edge weight  = (   ), the binary network X  = (,   ) is a graph consisting of the node set  and the binary edge weights   = (  ,  ) given by Note   is the adjacency matrix of X  , which is a simplicial complex consisting of 0-simplices (nodes) and 1-simplices (edges) [40].While the binary network X  has at most 1-simplices, the Rips complex can have at most (  − 1)-simplices.By choosing threshold values at sorted edge weights , we obtain the sequence of nested graphs [22]: The sequence of such a nested multiscale graph is called the graph filtration [61,62].Note that X  (1) −  is the complete weighted graph for any  > 0. On the other hand, X  () is the node set .By increasing the threshold value, we are thresholding at higher connectivity; thus more edges are removed.
For dynamically changing brain networks (Figure 2), we can similarly build time varying graph filtrations at each time point {X  () :  ∈ R + }.

Birth-death decomposition
Unlike the Rips complex, there are no higher dimensional topological features beyond the 0D and 1D topology in graph filtration.The 0D and 1D persistent diagrams (  ,   ) tabulate the life-time of 0-cycles (connected components) and 1cycles (loops) that are born at the filtration value   and die at value   , respectively.The 0th Betti number  0 ( () ) counts the number of 0-cycles at filtration value  () and can be shown to be non-decreasing over filtration (Figure 3) [25]: On the other hand the 1st Betti number  1 ( () ) counts the number of independent loops and can be shown to be non-increasing over filtration [25]: where birth set In a complete graph with  nodes, there are  = (  − 1)/2 unique edge weights.There are  0 =  − 1 number of edges that produce 0-cycles.This is equivalent to the number of edges in the maximum spanning tree (MST) of the graph.Thus, Fig. 3 The birth-death decomposition partitions the edge set into the birth and death edge sets.The birth set forms the maximum spanning tree (MST) and contains edges that create connected components (0D topology).The death set contains edges that do not belong to the maximum spanning tree (MST) and destroys loops (1D topology).number of edges destroy loops.The 0D persistent diagram is given by { We can show that the birth set is the MST (Figure 3) [91].
The identification of   is based on the modification to Kruskal's or Prim's algorithm that identifies the MST [62,91].Then   is identified as  \   =  ∩    .Figure 4 displays how the birth and death sets change over time for a single subject used in the study.Given edge weight matrix  as input, the Matlab function WS decompose.moutputs the birth set   and the death set   .

Topological distances
Like the majority of clustering methods such as -means and hierarchical clustering that use geometric distances [55,45,62], we propose to develop a topological clustering method using topological distances (Figure 5).For this purpose we use the Wasserstein distance.
Given two probability distributions  ∼  1 and  ∼  2 , the -Wasserstein distance   , the probabilistic version of the optimal transport, is defined as where the infimum is taken over every possible joint distribution of  and  .The Wasserstein distance is the optimal expected cost of transporting points generated from  1 to those generated from  2 [14].There are numerous distances and similarity measures defined between probability distributions such as the Kullback-Leibler (KL) divergence and the mutual information [59].While the Wasserstein distance is a metric satisfying positive definiteness, symmetry, and triangle inequality, the KL-divergence and the mutual information are not metrics.Although they are easy to compute, the biggest limitation of the KL-divergence and the mutual information is that the two probability distributions must be defined on the same sample space.If the two distributions do not have the same support, it may be difficult to even define the distance between them.If  1 is discrete while  2 is continuous, it is difficult to define them.On the other hand, the Wasserstein distance can be computed for any arbitrary distributions that may not have the common sample space, making it extremely versatile.Consider persistent diagrams  1 and  2 given by Their empirical distributions are given in terms of Dirac-Delta functions Then we can show that the -Wasserstein distance on persistent diagrams is given by over every possible bijection , which is a permutation, between  1 and  2 [99,14,6].Optimization (5) is the standard assignment problem, which is usually solved by the Hungarian algorithm in O ( 3 ) [37].However, for graph filtration, the distance can be computed exactly in O ( log ) by simply matching the order statistics on the birth or death values [76,91,92]: The r-Wasserstein distance between the 0D persistent diagrams for graph filtration is given by where   () is the -th smallest birth values in persistent diagram   .The -Wasserstein distance between the 1D persistent diagrams for graph filtration is given by where The proof is provided in [28].We can show that the 2-Wasserstein distance is equivalent to the Euclidean distance within a certain convex set.Let b  = ( (1) ,   (2) • • • ,   ( 0 ) ) ⊤ be the vector of sorted birth values of persistent diagram   .Then b  is a point in the ( 0 − 1)-simplex T 0 given by where  1 and   0 are bounded below and above respectively.If brith and death values are from correlation matrices, −1 ≤  1 and   0 ≤ 1.Hence, the 0D Wasserstein distance is equivalent to Euclidean distance in the  0 -dimensional convex set T 0 .Similarly, the vector of sorted death values where  1 and   1 are bounded below and above respectively.Hence, the 1D Wasserstein distance is equivalent to Euclidean distance in the  1 -dimensional convex set T 1 .

Topological mean and variance
Given a collection of graphs X 1 = (,  1 ), • • • , X  = (,   ) with edge weights   = (    ), the usual approach for obtaining the average network X is simply averaging the edge weight matrices in an element-wise fashion However, such average is the average of the connectivity strength.Such an approach is usually sensitive to topological outliers [25].We address the problem through the Wasserstein distance.A similar concept was proposed in the persistent homology literature through the Wasserstein barycenter [1,33], which is motivated by the Fréchet mean [60,96,111,35].However, the method has not seen many applications in modeling graphs and networks.
To account for both 0D and 1D topological differences in networks, we use the sum of 0D and 1D Wasserstein distances between networks X 1 and X 2 as the topological distance The equal weights of the form (9) were used for the following reasons.Through the birth-death decomposition, a weighted graph can be topologically characterized by 0D and 1D features, with no higher-dimensional features present.However, it is unclear which feature contributes the most.Equal weighting of 0D and 1D features ensures a balanced representation without bias towards either type of feature.
Let ∅ denote a graph with zero edge weights.Then, due to the birth-death decomposition, we have where the squared sums of all the edge weights make the interpretation straightforward.If unequal weighting is used, these relationships do not hold.Further, the Wasserstein distances  0 and  1 are equivalent to the Euclidean distances in a convex set.Therefore, squared distance is a more natural choice that satisfies the triangle inequality thus qualifying as a metric.The sum (9) does not uniquely define networks.Like the toy example in Figure 5, we can have many topologically equivalent brain networks that give the identical distance.Thus, the average of two graphs is also not uniquely defined.The situation is analogous to Fréchet mean, which frequently does not result in a unique mean [60,96,111,35].We introduce the concept of the topological mean for networks, defined as the minimizer according to the Wasserstein distance, mirroring how the sample mean minimizes the Euclidean distance.The squared Wasserstein distance is translation invariant such that If we scale connectivity matrices by , we have

Definition 1
The topological mean EX of networks X 1 , • • • , X  is the graph given by Unlike the sample mean, we can have many different networks with identical topology that give the minimum.Similarly, we can define the topological variance VX as follows.

Definition 2
The topological variance VX of networks X 1 , • • • , X  is the graph given by The topological variance can be interpreted as the variability of graphs from the topological mean EX.To compute the topological mean and variance, we only need to identify a network with identical topology as the topological mean or the topological variance.

Topological clustering
There are few studies that used the Wasserstein distance for clustering [69,108].
The existing methods are mainly applied to geometric data without topological consideration or persistence.It is not obvious how to apply such geometric methods to cluster graph or network data.We propose to use the Wasserstein distance to cluster collection of graphs Let  = ( 1 , • • • ,   ) be the collection of clusters.Let   be the topological cluster mean within   given by The cluster mean is computed through Theorem 3. Just like Fréchet mean, the cluster mean is not unique in a geometric sense but only unique in a topological sense [96,60,111,35].Let  = ( 1 , • • • ,   ) be the cluster mean vector.The within-cluster distance from the cluster centers is given by If we let |  | to be the number of networks within cluster   , (13) can be written as with topological cluster variance within cluster   .The optimal cluster is found by minimizing within-cluster distance   (; ) in ( 13) over every possible partition of .
If  is given and fixed, the identification of clusters  can be done easily by assigning each network to the closest mean.Thus the topological clustering algorithm can be written as the two-step optimization similar to the expectation maximization (EM) algorithm often used in variational inferences and likelihood methods [8].The first step computes the cluster mean.The second step minimizes the within-cluster distance.Just like -means clustering, the two-step optimization is then iterated until convergence.Such process converges locally.

Theorem 4 The topological clustering converges locally.
The direct algebraic proof is fairly involving and given in Chung et al. [30].Here we provide a more intuitive explanation.Note  0 and  1 are Euclidean distances in convex set T 0 ⊗ T 1 .Subsequently, is the Euclidean distance in the Cartesian product T 0 ⊗ T 1 .Thus, our topological clustering is equivalent to -means clustering restricted to the convex set T 0 ⊗ T 1 .
The convergence of topological clustering is then the direct consequence of the convergence of -means clustering, which always converges in such a convex space.Numerically we minimize (13) by replacing the Wasserstein distance with the 2-norm between sorted vectors of birth and death values in -means clustering.Like -means clustering algorithm that only converges to local minimum, there is no guarantee the topological clustering converges to the global minimum [52].This is remedied by repeating the algorithm multiple times with different random seeds and taking the smallest possible minimum.The method is implemented as the Matlab function WS cluster.m which inputs the collection of networks and outputs the cluster labels and clustering accuracy.Different choice of initial cluster centers may lead to different results.Thus, the algorithm may become stuck in a local minimum and may not converge to the global minimum.Thus, in actual numerical implementation, we used different initializations of centers.Then, we picked the best clustering result with the smallest within cluster distance   .

Validation
We validated the topological clustering in a simulation with the ground truth against -means and hierarchical clustering [61].We generated 4 circular patterns of identical topology (Figure 6-top) and different topology (Figure 6-bottom).Along the circles, we uniformly sampled 60 nodes and added Gaussian noise  (0, 0.3 2 ) on the coordinates.We generated 5 random networks per group.The Euclidean distance ( 2 -norm) between randomly generated points are used to build connectivity ma-trices for -means and hierarchical clustering.Figures 6 shows the superposition of nodes from 20 networks.For -means and Wasserstein graph clustering, the average result of 100 random seeds is reported.

Testing for false positives
In the experiment depicted in Figure 6, we evaluated the occurrence of false positives in scenarios devoid of topological differences.All groups, derived from Group 1 through rotations, are topologically identical.Hence, any detected differences are false positives.While -means clustering exhibited an accuracy of 0.90 ± 0.15, and hierarchical clustering achieved perfect accuracy (1.00), these methods reported significant false positives, erroneously categorizing the groups as distinct clusters.The absence of inherent topological differences between the groups implies that higher clustering accuracy is indicative of false positive results.Conversely, topological clustering, with a lower accuracy of 0.53 ± 0.08, demonstrated a reduced tendency for reporting false positives in the absence of topological differences.

Testing for false negatives
Figure 6 presents our test for false negatives, featuring groups with varying numbers of cycles and distinct topologies.In this scenario, topological differences should be detectable.Here, -means clustering recorded an accuracy of 0.83 ± 0.16, and hierarchical clustering again reported perfect accuracy.Notably, topological clustering attained a high accuracy of 0.98 ± 0.09.Separating topological from geometric signals is challenging; the presence of topological differences often coincides with geometric variations, which can influence the performance of all tested methods.
In summary, while traditional clustering methods based on geometric distances are prone to a significant number of false positives, making them less suitable for topological learning tasks, the proposed Wasserstein distance-based approach demonstrates superior performance.This method excels in minimizing both false positives and false negatives, as evidenced by our tests.Its effectiveness is particularly noteworthy in topological learning tasks, where discerning topological rather than geometric distinctions is crucial.

Weighted Fourier series representation
The predominant method for computing time-varying correlation in time series data, particularly in neuroimaging studies, involves Sliding Windows (SW).This technique entails computing correlations between brain regions across various time windows [3,53,84,70,52].However, the use of discrete windows in SW can lead to artificially high-frequency fluctuations in dynamic correlations [72].While tapering methods can occasionally mitigate these effects [3], the correlation computations within these windows remain susceptible to the influence of outliers [34].
To circumvent these limitations, we employed the Weighted Fourier Series (WFS) representation [20,21].This approach extends the traditional cosine Fourier transform by incorporating an additional exponential weight.This modification effectively smooths out high-frequency noise and diminishes the Gibbs phenomenon [20,50].Crucially, WFS eliminates the need for sliding windows (SW) when computing timecorrelated data.Given the necessity for robust signal denoising methods to ensure the efficacy of the persistent homology method across various subjects and time points, such an approach is needed.Consider an arbitrary noise signal  (),  ∈ [0, 1], which will undergo denoising through the diffusion process.
The algebraic derivation is given in [20].Note the cosine basis is orthonormal where   is Kroneker-detal taking value 1 if  =  and 0 otherwise.We can rewrite (16) as a more convenient convolution form where heat kernel   (,  ′ ) is given by The diffusion time  is usually referred to as the kernel bandwidth and controls the amount of smoothing.Heat kernel satisfies ∫ 1 0   (,  ′ )  = 1 for any  ′ and .To reduce unwanted boundary effects near the data boundary  = 0 and  = 1 [50,52], we project the data onto the circle C with circumference 2 by the mirror reflection:
The cosine series coefficients    are estimated using the least squares method by setting up a matrix equation [20].We set the expansion degree to equate the number of time points, which is 295.The window size of 20 TRs was used in most sliding window methods [3,65,52].We matched the full width at half maximum (FWHM) of heat kernel to the window size numerically.We used the fact that diffusion time  in heat kernel approximately matches to the kernel bandwidth of Gaussian kernel  − 2 /2 2 as  =  2 /2 (page 144 in [19]).20 TRs is approximately equivalent to heat kernel bandwidth of about 4.144 • 10 −4 in terms of FWHM. Figure 7 displays the WFS representation of rsfMRI with different kernel bandwidths.

Dynamic correlation on weighted Fourier series
The weighted Fourier series representation provides a way to compute correlations dynamically without using sliding windows.Consider time series () and () with heat kernel   (,  ′ ).The mean and variance of signals with respect to the heat kernel are given by Subsequently, the correlation () of () and () is given by When the kernel is shaped as a sliding window, the correlation () exactly matches the correlation computed over the sliding window.The kernelized correlation generalizes the concept of integral correlations with the additional weighting term [51].
As  → ∞, () converges to the Pearson correlation computed over the whole time points.Thus, the kernel bandwidth behaves like the length of sliding window.

Theorem 7
The correlation () of time series () and () with respect to heat kernel   (,  ′ ) is given by with are the cosine series coefficients.Similarly we expand ()(),  2 () and  2 () using the cosine basis and obtain coefficients    ,    and   .
The derivation follows by simply replacing all the terms with the WFS representation.Correlation (20) is the formula we used to compute the dynamic correlation in this study.Figure 7 displays the WFS-based dynamic correlation for different bandwidths.
A similar weighted correlation was proposed in Pozzi et al. [75], where the time varying exponential weights proportional to  / with exponential decay factor  were used.However, our exponential weight term is related to the spectral decomposition

Data
The proposed method is applied in the accurate estimation of state spaces in dynamically changing functional brain networks.The 479 subjects resting-state functional magnetic resonance images (rs-fMRI) used in this paper were collected on a 3T MRI scanner (Discovery MR750, General Electric Medical Systems, Milwaukee, WI, USA) with a 32-channel RF head coil array.The 479 healthy subjects consist of 231 males and 248 females ranging in age from 13 to 25 years.The sample contains 132 monozygotic (MZ) twin pairs and 93 same-sex dizygotic (DZ) twin pairs.The image preprocessing includes motion corrections and image alignment to the MNI template and follows [11,54].The resulting rs-fMRI consist of 91 × 109 × 91 isotropic voxels at 295 time points.We further parcellated the brain volume into 116 non-overlapping brain regions using the Automated Anatomical Labelling (AAL) atlas [97].The fMRI data were averaged across voxels within each brain region, resulting in 116 average fMRI signals with 295 time points for each subject.The rs-fMRI signals were then scaled to fit to unit interval [0, 1] and treated as functional data in [0, 1].

Topological state space estimation
For  brain regions, we estimated  ×  dynamically changing correlation matrices   () for the -th subject using WFS.Let C   denote the vectorization of the upper triangle of  ×  matrix   (  ) at time point   into  2 × 1 vector.For each fixed , the collection of C   over  = 295 time points is then feed into topological clustering in identifying the recurring brain connectivity at the subject level.We are clustering individual brain networks without putting any constraint on group or twin.We compared the proposed Wasserstein clustering against the -means clustering, which has been often used as the baseline method in the state space modeling [3,50,52].After clustering, each correlation matrix   (  ) is assigned integers between 1 and .These discrete states serve as the basis for investigating the dynamic pattern of brain connectivity [95].For the convergence of both topological clustering and -means clustering, the clusterings were repeated 10 times with different initial centroids and the best result (smallest within-cluster distance) is reported.Figure 8-left displays the result of the topological clustering against the -means for three subjects.295 time points are rescaled to fit into unit interval [0, 1].
The optimal number of cluster  was determined by the elbow method [3,77,95,52].For each value of , we computed the ratio of the within-cluster to betweencluster distances.The ratio shows the goodness-of-fit of the cluster model.The elbow method gives the largest slope change in the ratio when  = 3 in the both methods (Figure 8-right).At  = 3, the ratio is 0.034 ± 0.012 for 479 subjects for Wasserstein while it is 0.202 ± 0.047 for the -means.The six times smaller ratio for the topological clustering demonstrates the superior model fit over -means.Figure 9 shows the results of clustering for both methods.The -means clustering result is based on averaging correlations of every time point and subject within each state.The resulting states in the -means clustering are somewhat random without any biologically interpretable pattern.The topological clustering computes the topological mean of every time point and subject within each state.

Twin correlations over transpositions
Using additional twin information in the data, we further investigated if the state change pattern itself is genetically heritable.As far as we are aware, there is no study on the heritability of the state change pattern itself.This requires computing twin correlations.We assume there are  MZ-and  DZ-twins.For some feature, let   = ( 1 ,  2 ) ⊤ be the -th twin pair in MZ-twin and   = ( 1 ,  2 ) ⊤ be the -th twin pair in DZ-twin.They are represented as .
However, there is no preference for the order of twins within a pair, and we can transpose the -th twin pair in MZ-twin such that and obtain another twin correlation    (  (x 1 ),   (x 2 )) [17,27].Ignoring symmetry, there are 2  possible combinations in ordering the twins, which form a permutation group.The size of the permutation group grows exponentially large as the sample size increases.Computing correlations over all permutations is not even computationally feasible for large  beyond 100.Figure 10 illustrates many possible transpositions within twins.Thus, we propose a new fast online computational strategy for computing twin correlations.Over transposition   , the correlation changes incrementally.We will determine the exact increment over the transposition.The sample correlation between x  and x  involves the following functions.
The functions  and  are updated over transposition   as Then the MZ-twin correlation after transposition is updated as The time complexity for correlation computation is 33 operations per transposition, which is substantially lower than the computational complexity of directly computing correlations per permutation.In the numerical implementation, we sequentially apply random transpositions   1 ,   2 , • • • ,    .This results in  different twin correlations, which are averaged.Let The average correlation     of all  transpositions is given by In each sequential update, the average correlation can be updated iteratively as If we use enough transpositions, the average correlation     converges to the true underlying twin correlation    for sufficiently large .DZ-twin correlation   is estimated similarly.
In the widely used ACE genetic model, the heritability index (HI) ℎ, which determines the amount of variation due to genetic difference in a population, is estimated using Falconer's formula [38,26,4].MZ-twins share 100% of genes while same-sex DZ-twins share 50% of genes on average.Thus, the additive genetic factor  and the common environmental factor  are related as HI ℎ, which measures the contribution of , is given by ℎ(x, y) = 2(   −   ).
In numerical implementation, 100 million transpositions can be easily done in 100 seconds in a desktop.Similarly, we update the DZ-correlation over the transposition.

Heritability of the state space
The heritability estimation of state space is not a trivial task since the estimated state does not synchronize across twins making the task fairly difficult.Figure 11 displays the state visits in randomly selected 3 MZ-and 3 DZ-twins.However, the time series of state changes do not synchronize within twins.This is likely a reason for the lack of reported heritability of the state space in the literature.For each subject, we computed the average correlation of each state, where the average is taken over all time points within each state.The correlation matrices are then used as the input to the transposition based twin correlations [26].Subsequently, we computed the MZ-and DZ-twin correlations within each state (Figure 12).The MZ-twin correlations (Figure 12-top) are densely observed in many connections while there is no DZ-twin correlations (Figure 12-middle) observed above 0.3.We then computed the heritability index (HI) of each state (Figure 12-bottom).The heritability of the first state is characterized by strong lateralization of the hemisphere connections.The heritability of the second state is characterized by front and back connections.We believe the topological approach provides far more accurate and stable heritability index maps for dynamically changing state, which are biologically interpretable.
We reported 10 connections that give the highest HI values in all three states in Tables 1, 2 and 3.Although numerous studies report high heritability for anatomical features such as gray matter density, there are few rs-fMRI studies reporting heritability of rs-fMRI [42,57].Most of these studies report low HI compared to our high HI.[42] reported HI of 0.104 in the left cerebellum, 0.304 in the right cerebellum, 0.331 in the left temporal parietal region, 0.420 in the right temporal parietal region.[57] reported HI of 0.41 in the connection between the posterior cingulate cortex and right inferior parietal cortex in the default mode network involving 79 MZ-and 46 same-sex DZ-twins.Other connections are all reporting very low HI below 0.24.We believe our topological method is clustering topologically similar functional network patterns and significantly boost genetic signals.

Null test on twin study design
Because we are reporting significantly higher diffused heritability compared to existing literature [42,106,57], we performed the null test to check the validity of our analysis pipeline further.We generated the null MZ-twin data by randomly pairing each MZ individual with another, excluding their own twin.Such a permutation is   generated by derangement, which is a permutation of the elements of a set, such that no element appears in its original position [47].In other words, if we have a set of distinct items and you rearrange them, a derangement means none of the items are in the spot they started in.The null DZ-twin data is constructed similarly.Such null data should not show any genetic relations beyond random chances.On the null data, we recomputed the twin correlations and the heritability index, following the same pipeline as before.Figure 13 shows an example of one possible derangement out of exponentially many such permutations.For  MZ-twin pairs, there are !  ∑︁ =0 (−1)  ! number of derangements.For the null test, we generated 1000 derangements then followed the proposed pipeline in computing average MZ-and DZ-correlations in each state.We used the Wasserstein distance in measuring the topological discrepancy.Figure 14 displays the normalized histogram of the Wasserstein distance between average MZ-and DZ-twin correlations within each state over 1000 derangements.Because the generated null data has no genetic signal, we are basically computing the Wasserstein distance between two random connectivity matrices.In comparison, the observed Wasserstein distance (red line) between average MZ-and DZ-twin correlation shows huge topological differences.None of the derangements show the large wide spread signals as our observation.We conclude that what we observe is genetic signal and cannot possibly be produced by random chance.

Discussion
In this study, we proposed the topological clustering method for the estimation and quantification of dynamic state changes in time-varying brain networks.A coherent statistical theory, grounded in persistent homology, was developed, and we demonstrated the application of this method to resting-state fMRI data.Restingstate brain networks tend to persist in a single state for extended periods before transitioning to another state [3,84,13,79].The average brain network in each state appears to diverge from the patterns reported in previous studies (Figure 10) [12,44,52].Further research is required for independent validation.
In contrast to previous studies that reported relatively low heritability in functional brain networks [42,106,57,101], our findings indicate significant higher heritability across various regions of the brain network.This discovery not only challenges the prevailing understanding but also opens new avenues for exploring genetic influences on brain network dynamics.Our observations align with the early findings by Lykken et al. [66], which documented higher heritability in EEG spectra.Heritability in brain networks may be more nuanced than previously understood.In our framework, rather than directly using the connectivity strength, we decomposed networks into discrete topological states and computed heritability for each state.This granular analysis provides a more accurate estimation of heritability across different functional states of the brain.The resting state measures employed in studies such as those by Glahn et al. [42], Xu et al. [106], Korgaonkar et al. [57] directly rely on static connectivity matrices.These matrices, while informative, often do not capture the dynamic and configural nature of brain networks.Such methods may overlook hidden configural patterns that hold significant heritable information.Our topological method represents a significant advancement in this regard.By focusing on the topological aspects of dynamic brain networks, our method is adept at identifying and extracting hidden patterns of high heritability that might be missed by traditional approaches.This capability could be crucial for understanding the genetic basis of various neuropsychiatric and neurodevelopmental disorders, where altered brain network configurations play a critical role.
Intraclass correlation (ICC) has long been recognized as a vital reliability and reproducibility metric, especially for gauging similarity in paired data when the order of pairing is not preserved [17,83,89].In brain imaging, it serves as a popular baseline for test-retest (TRT) reliability assessments, often in conjunction Fig. 14 The normalized histogram of the Wasserstein distance between average MZ-and DZ-twin correlations within each state over 1000 derangements.Since the generated null data has no genetic signal, we are basically computing the Wasserstein distance between two connectivity matrices with random noises.In comparison, the observed Wasserstein distance (dotted line) between average MZ-and DZ-twin correlation shows huge topological differences.
with the Dice coefficient [63,31,32,74,113].The widespread use of ICC in these contexts underscores its perceived utility in evaluating consistency across imaging sessions or different imaging modalities.The conventional computation of ICC is typically through an ANOVA statistical model, which can be fairly limited and inflexible.Recent years have seen a shift towards mixed-effects models, which offer greater flexibility and accuracy in estimating ICC, especially in datasets with nested or hierarchical structures [17,71].In light of these advancements, our proposed transposition-based approach for computing correlation over paired data presents a novel approach to computing ICC, potentially offering a faster and more efficient alternative.The full potential and utility of the transposition-based method for ICC computation, however, remain to be explored in future research.

Fig. 2
Fig. 2 Dynamically changing correlation matrices computed from rs-fMRI using the sliding window of size 60 for a subject.The constructed correlation matrices are superimposed on top of the white matter fibers in the MNI space and color coded based on correlation values.

𝛽 1 (
() ) ≥  1 ( (+1) ).During the graph filtration, when new components is born, they never die.Thus, 0D persistent diagrams are completely characterized by birth values   only.Loops are viewed as already born at −∞.Thus, 1D persistent diagrams are completely characterized by death values   only.We can show that the edge weight set  can be partitioned into 0D birth values and 1D death values [92]: Theorem 1 (Birth-death decomposition) The edge weight set  = { (1) , • • • ,  () } has the unique decomposition

Fig. 4
Fig.4 The corresponding birth and death sets of dynamically changing correlation matrix shown in Figure2.The horizontal axis is the time point.Columns are the sorted birth and death edge values at the time point.

Fig. 5
Fig. 5 Comparison between geometric distance   and topological distance    .We used the shortest Euclidean distance between objects as the geometric distance.The left (two circles) and middle (circle and arc) objects are topologically different while the left and right (square and circle) objects are topologically equivalent.The geometric distance cannot discriminate topologically different objects (left and middle) and produces false negatives.The geometric distance incorrectly discriminates topologically equivalent objects (left and right) and produces false positive.

Fig. 6
Fig. 6 Top: simulation study on topological equivalence.The correct clustering method should not be able to cluster them because they are all topologically equivalent.The pairwise Euclidean distance ( 2 -norm) is used in -means and hierarchical clustering.The Wasserstein distance is used in topological clustering.Bottom: simulation study on topological difference.The correct clustering method should be able to cluster them because they are all topologically different.

Fig. 7
Fig. 7 Left: The original and smoothed fMRI time series using WFS with degree  = 295 and different heat kernel bandwidth .The bandwidth 4.141 × 10 −4 used in this study approximately matches 20 TRs often used in the sliding window methods.Right: The dotted gray lines are correlations computed over sliding windows.The solid black lines are correlations computed using WFS.

Fig. 8
Fig. 8 Left: The time series of estimated state spaces using the topological clustering and -means clustering for 3 subjects.The time is normalized into unit interval [0, 1].Right: The ratio of within-cluster to between-cluster distances.Smaller the ratio, better the clustering fit is.

Fig. 9
Fig.9 The estimated state spaces of dynamically changing brain networks.The correlations are averaged over every time point and subject within each state for -means clustering (top) and Wasserstein distance based topological clustering (bottom).In -means clustering, the connectivity pattern of each state is somewhat random.In topological clustering, the connectivity pattern is highly symmetric even though we did not put any symmetry constraint in the clustering method.

Fig. 10
Fig. 10 The schematic of transpositions on 3 MZ-and 2 DZ-twins.a) One possible pairing.b) Transposition within a MZ-twin.c) Transposition within a DZ-twin.d) Transpositions in both MZ-and DZ-twins simultaneously.Any transposition will affect the heritability estimate so it is necessary to account for as many transpositions as possible.

Fig. 11
Fig. 11 State visits for 3 MZ-twins (left) and 3 DZ-twins (right) obtained from the baseline -means clustering.We are interested in determining the heritability of such state changes.Unfortunately, even within a twin, the time series of state change do not synchronize, making the task extremely challenging.

Fig. 12
Fig.12MZ-correlation (top) and DZ-correlation (middle) in each state obtained through topological clustering in Figure9.There is no DZ-correlation above 0.3 and not displayed.The heritability index (HI) is determined by the twice the difference in twin correlations.HI of each state shows the extensive genetic contribution of dynamically changing states.The first state is characterized by prominent bilateral connections between the left and right hemispheres, whereas the second state primarily features front-back connections.

Fig. 13
Fig. 13 MZ-correlation (top) and DZ-correlation (middle) in each state obtained through topological clustering in Figure 9.There is no MZ-correlation above 0.3 and not displayed.The heritability index (HI) is determined by the twice the difference in twin correlations.HI of each state shows extensive genetic contribution of dynamically changing states.

Table 1
10 connections with the highest heritability index for state 1.Connections are sorted with respect to HI values.

Table 2
10 connections with the highest heritability index for state 2. Connections are sorted with respect to HI values.

Table 3
10 connections with the highest heritability index for state 3. Connections are sorted with respect to HI values.