Population-wise labeling of sulcal graphs using multi-graph matching

Population-wise matching of the cortical folds is necessary to compute statistics, a required step for e.g. identifying biomarkers of neurological or psychiatric disorders. The difficulty arises from the massive inter-individual variations in the morphology and spatial organization of the folds. The task is challenging both methodologically and conceptually. In the widely used registration-based techniques, these variations are considered as noise and the matching of folds is only implicit. Alternative approaches are based on the extraction and explicit identification of the cortical folds. In particular, representing cortical folding patterns as graphs of sulcal basins—termed sulcal graphs—enables to formalize the task as a graph-matching problem. In this paper, we propose to address the problem of sulcal graph matching directly at the population level using multi-graph matching techniques. First, we motivate the relevance of the multi-graph matching framework in this context. We then present a procedure for generating populations of artificial sulcal graphs, which allows us to benchmark several state-of-the-art multi-graph matching methods. Our results on both artificial and real data demonstrate the effectiveness of multi-graph matching techniques in obtaining a population-wise consistent labeling of cortical folds at the sulcal basin level.


Quantitative comparison across brains is a crucial but open question
Comparing features extracted from brain MRI across individuals is necessary for estimating population statistics and ultimately discover markers of diseases.However, this task presents several challenges at both the methodological and conceptual levels.Indeed, the features extracted from two different individual brains are defined in two different mathematical spaces.Comparing such features thus requires to address the methodological problem of transferring them into a common space.The task of transferring information from one brain to another or to a common space consists in defining spatial correspondences i arXiv:2301.13532v1[stat.ML] 31 Jan 2023 across these objects by compensating for their variations in their respective geometry.The challenge lies in the massive inter-individual variations of the morphology of the brain and in particular the geometry of the cortical surface, which make the identification of such spatial correspondences an ill-posed problem.As a consequence, any solution to this problem inevitably requires to introduce additional constraints based on assumptions on the biological validity of the resulting spatial correspondences, which constitutes a challenge at the conceptual level.Indeed, the assumptions and constraints introduced in the definition of the spatial correspondences actually influence the derived statistics measured on the population of interest, and could thus be considered as a source of bias in the analysis Van Essen, Glasser, Dierker, Harwell, and Coalson (2012).
One widely used approach to tackle this problem -termed here as the registrationbased approach -consists in defining a mapping between each individual brain and an atlas serving as the common space by estimating a spatial transformation.As pointed above, the process of building the atlas and defining the associated projection operator which minimizes the error induced by the transformation remains an open research question.As a consequence, several registration techniques and atlases co-exist in the field, and tools to enable comparison across atlases are then required (Devlin & Poldrack, 2007;Van Essen & Dierker, 2007).The variety of atlases, projection mechanisms and descriptors illustrate the ongoing exploration of putative biologically relevant features used to define these correspondences across individuals.One of the most widely used registration-based approach (Fischl, Sereno, Tootell, & Dale, 1999) defines a mapping between cortical surfaces by imposing the alignment of a combination of curvature and convexity features estimated from a 2D mesh representing the geometry of the cortex.The cortical surface of a given subject is projected onto the atlas by matching its curvature and convexity, under the assumption that aligning these features induces biologically relevant anatomo-functional correspondences.In this process, as in any registration-based approach, variations across individuals are considered as noise or confounding perturbations to be minimized, including variations in the topology and number of folds (sulci).More generally, the registrationbased approach might be seen as an over simplification of the problem since potentially relevant geometrical information is not taken into account.
Alternative approaches consist in characterizing the geometry and organization of the cortical folds in each individual and then compare these features across the population.

Characterizing cortical folding patterns using graphs
Several approaches have been proposed to characterize cortical folding patterns, such as gyrification index, fractal dimension and curvature (Armstrong, Schleicher, Omran, Curtis, & Zilles, 1995;Cachia et al., 2008;Im et al., 2006).Although these measures capture relevant morphological features, they do not explicitly reflect the topology, i.e the spatial relationships between sulci.Mangin et al. (2004) introduced an analysis framework based on the automatic extraction and labeling of the sulci allowing to characterize their shape, size and pattern in terms of e.g.sulcus area, depth and length.This representation of the cortical geometry has been used for instance to characterize populations of healthy subjects (Duchesnay et al., 2007), to quantify potential deviations from normal populations in various conditions such as schizophrenia (Cachia et al., 2008) and autism spectrum disorder (Auzias et al., 2014), or to estimate the heritability of the folding patterns (Pizzagalli et al., 2020).Pursuing on this line of research, the sulcal pits were introduced as a concept allowing to decompose the sulci into smaller pieces and thus access finer scale geometrical information.As described in details in Auzias, Brun, Deruelle, and Coulon (2015); Im et al. (2010), each fold is divided into sulcal basins that are defined as concavities in the white matter surface bounded by convex ridges, and the deepest point in each basin defines the associated sulcal pit.More recently, Im et al. (2011); Takerkart, Auzias, Brun, and Coulon (2017) represented the geometrical relationships between sulcal basins as a sulcal graph.A sulcal graph is constructed by considering each sulcal basin (or associated pit) as a node, while the edges connect only adjacent basins and thus represent their spatial organization.
Various geometrical features of a sulcal basin can then be attributed to graph nodes (such as the depth of the pit, its 3d coordinates...), while the spatial organization of the basins is encoded in the topology of the graph.Figure 1 illustrates this decomposition of the cortical folds into sulcal basins allowing to represent this complex geometry as a sulcal graph.

Problem statement and contributions
In the present work, we focus on the task of matching together a set of sulcal graphs in order to define biologically relevant correspondences across a population of subjects, under the specific constraint of explicitly taking into account the variations in folding patterns.Before moving to the formalization, we more precisely situate this problem with respect to the conceptual question of defining correspondences across individuals, and with respect to the methodological problem of graph matching.

Unsupervised comparison and matching of sulcal graphs
Comparing brains using sulcal graphs is highly relevant because all the geometrical information about the macroscopic cortical folding can be encoded into such graphs.However, several challenges need to be addressed in this context: 1) the large inter-individual variations in brain anatomy induce complex variations across sulcal graphs, including in their topology; 2) sulcal graphs can be contaminated by noise resulting from the imperfect segmentation of the individual cortical surface and corresponding sulcal basins; 3) there is no consensus on a nomenclature or atlas at the scale of sulcal basins covering the whole brain, that is a prerequisite to tackle the matching problem as a supervised learning task.
Indeed, few studies investigated the matching of cortical folds across individuals as a supervised task (Behnke et al., 2003;Borne, Rivière, Mancip, & Mangin, 2020;Rivière et al., 2002).All these works focused at the scale of sulci, i.e. considering large folds consisting of several of our sulcal basins.To our knowledge, only Lyu et al. (2021) attempted to tackle this problem at finer scale, probably because of the massive amount of efforts needed to gather sufficient amount of manually labeled data Voorhies, Miller, Yao, Bunge, and Weiner (2021).Indeed, ambiguities due to variations across individuals in the folding patterns become overwhelming at finer scale than sulci.This is illustrated by the tedious works advancing the definition of a fined-grained nomenclature of folds Sprung-Much and Petrides (2020) and their relationship with underlying function Willbrand et al. (2022).
The lack of widely accepted fined-grained nomenclature is also blatant in the related field of brain parcellation: more than 20 different fine-grained atlases co-exist (Eickhoff, Yeo, & Genon, 2018), and even the most advanced multi-modal atlas (Glasser et al., 2016) was validated only on a small portion of the cortex.
Matching sulcal graphs across individuals is thus a very challenging problem.Instead of relying on the few existing labeled data-sets that clearly deserve further validation, we decided to approach this question as an unsupervised learning task.
We now describe the few studies that have attempted to tackle the question of unsupervised labeling of sulcal graphs.The first approach was proposed by Im et al. (2010) and consisted in computing a map of the spatial density of sulcal pits across a population of subjects.This density map was computed by accumulating the pits from the different individuals in each vertex of an average surface after aligning the folds using a registration technique.A watershed algorithm was then applied to this density map in order to separate the main clusters of sulcal pits, empirically defined as the regions of high density.
An arbitrary label was then associated to each cluster, hereby defining an ad-hoc labeling of the pits across individuals, depending on the cluster to which they contributed in the density map.This procedure implicitly defines a matching of sulcal pits and corresponding basins across individuals.Exemplar applications of this method can be found in e.g.Auzias et al. (2015); Im et al. (2010);Le Guen et al. (2017), with illustrations of density maps and induced labeling for various populations.We refer in the following to this category of methods as Auzias et al. since we used the open source implementation from this paper.
The main limitation of this approach is that the labeling is driven only by the coordinates of the sulcal pit.Kaltenmark et al. (2020) introduced an alternative procedure for labeling the sulcal basins, hereby considering the geometry of the basin surrounding each sulcal pit in addition to its spatial location.We refer to this method as Kaltenmark et al. in the following.
The authors of (Kaltenmark et al., 2020) also raised the question of the consistency of the labeling, a notion that we will develop further below.In this method, an explicit constraint is imposed to restrict the labeling to only one node per subject for each label.In addition, the nodes for which the labeling is ambiguous -i.e. for which several labels are equally plausible -remain unlabelled, which is often denoted as partial matching in the literature on graph processing.Importantly, the spatial relationships between adjacent sulcal basins and pits are never taken into account in any of these methods, since the different pits/basins from each subject are considered independently.In contrast, in the present work our aim is to exploit the spatial organization of the adjacent basins stored in the sulcal graph representation.
Few publications investigated the potential of graph matching in the context of sulcal graphs.In Im et al. (2011), the spectral graph matching technique (Leordeanu & Hebert, 2005) was applied to a set of 48 monozygotic twins, comparing a pair at a time.This study showed that the similarity of the sulcal graphs across pairs of twins are higher than for unrelated pairs, demonstrating the genetic influence on sulcal patterns, and the relevance of graph matching techniques in this context.This approach was used in follow-up papers from the same group, e.g for comparing brain lobes in Morton et al. (2019) or for matching individuals onto an atlas in Im et al. (2017).
In the work by Meng et al. (2018), a population of 677 neonates was analyzed based on a sulcal graph comparison method similar to the one of Im et al. (2011).The authors proposed to use different features of the sulcal pits such as the pit position, the pit depth, the basin area, the basin boundary and the pit local connectivity to construct different similarity matrices, one per feature, and merge them into a single one using a matrix fusion technique (B.Wang et al., 2014).A clustering algorithm was then applied to the fused similarity matrix to identify sub-populations of sulcal graphs, associated to specific folding patterns in the central, cingulate and superior temporal regions.Critically, all these previous studies relied only on pairwise graph matching techniques.
Comparing pairs of graphs independently, in the presence of noise and large inter-individual variations, is clearly sub-optimal.

Multi-graph matching: a relevant framework for population studies
Given the large variations across subjects and imperfect sulcal basins extraction, examining jointly a group of sulcal graphs is key to reveal meaningful information not accessible by v considering only pairs of subjects.This is the translation to sulcal graphs of the basic idea behind general population studies, that allowed researchers to uncover some of the mechanisms underlying the anatomo-functional organization of the brain.We follow this principle by investigating for the first time the potential of multi-graph matching techniques in the context of sulcal graphs.By considering several brains together, the geometrical information that is shared by the majority of individuals should help to regularize the matching problem and allow to identify putative noisy graph nodes in a more robust way than with pairwise matching.The multi-graph matching framework has the potential to uncover population-wise invariant patterns in sulcal graphs without imposing a priori, potentially biasing, assumptions.

Contributions
In our previous work (Buskulic, Dupé, Takerkart, & Auzias, 2021), we introduced a framework to generate a set of synthetic sulcal graphs representative of a population, and used it to benchmark state of the art pairwise matching techniques in the context of sulcal graphs.
In Yadav, Dupé, Takerkart, and Auzias (2022), we provided a proof of concept of the relevance of multi-graph matching techniques in this context.In the present study, we extend these preliminary studies in several directions.
First, we introduce an improved simulation framework to generate populations of artificial sulcal graphs and demonstrate their biological plausibility through a quantitative comparison with real data.Secondly, we benchmark a selection of recently published multi-graph matching techniques against the best pairwise technique for this task (identified in from Buskulic et al. (2021)), and report variations in performances that would clearly impact potential real-world applications, e.g in a clinical context.Finally, we compare qualitatively and quantitatively the different graph matching techniques, as well as the previously published approaches Auzias et al. and Kaltenmark et al, on a real dataset of 137 subjects.In addition, our experiments demonstrate the feasibility of comparing a large population of sulcal graphs based on multi-graph matching techniques, in fully acceptable computing times.All the source code and data will be shared openly upon publication at https://www.github.com/gauzias/sulcalgraphs matching.

Formal problem and state of the art
In this section, we define formally the problem of matching sulcal graphs, as well as the multi-graph framework.We then give an overview of the different methods proposed in the literature and provide a more detailed description of the multi-graph matching methods included in our experiments.

Undirected attributed Sulcal graphs
We consider a population of N sulcal graphs, noted G 1 . . .G N , representing the cortical folding pattern of an hemisphere from N different individuals.The sulcal graph from a vi given subject q is an undirected attributed graph formally defined as a quadruplet G q = (V q , E q , A V q , A E q ), where V q = {v 1 , v 2 , . . ., v nq } are the nodes in the graph and |G q | = n q is the number of nodes.E q ⊆ V q × V q defines the set of e q edges.A V q = {a V v 1 , a V v 2 , . . ., a V vn q } is the set of attributes associated to each node in V q , and A E q = {a E e 1 , a E e 2 , . . ., a E ee q } is the set of attributes associated with each edge in E q .Note that the number of nodes n q and edges e q and corresponding attributes varies across graphs.As illustrated on Fig. 2, the sulcal graph from each subject is then mapped onto the same common spherical domain using the surface inflation and registration tools from freesurfer v.5.1.0(https://surfer.nmr.mgh.harvard.edu/,see Fischl et al. (1999) for details).The matching is computed in this common spherical domain.In this work, we consider as attributes of the nodes the 3D coordinates of the sulcal pits on the sphere.Regarding the attributes of the edges, we compute the length of the edge on the sphere as an approximation of the geodesic distance between neighboring pits.
Figure 2: The sulcal graph from each subject is transferred onto a common sphere using the inflation and spherical registration tools from freesurfer.The sulcal graphs from every subjects can then be mapped onto either the common sphere or onto an average surface for visualization.Note that the spatial dispersion of the nodes of the graphs on the common spaces is heterogeneous, with dense clusters in cortical regions where the variations across individuals are lower.

Generalities and overview of pairwise graph matching methods
Pairwise graph matching refers to the problem of finding correspondences between the nodes of two graphs G 1 and G 2 .This problem is usually divided into two categories: exact and partial matching.Exact matching methods consider graph matching to be a special case of the graph isomorphism problem.It aims at finding the bijection between two graphs, which implies that both the nodes and edges of the different graphs are strictly matched.This requirement is too strict for most real-world tasks and in particular in our context where the number of nodes and edges varies across graphs.Therefore, we focus on the partial matching problem.This problem can be formulated as a Quadratic Assignment Problem (QAP) (Loiola, Silva, & Galati, 2007).Although different forms of QAP exist, the vast majority of the literature has focused on Lawler's QAP (Lawler, 1963).Given two graphs G 1 and G 2 with number of nodes |G 1 | = n 1 and |G 2 | = n 2 respectively, the Lawler's QAP consists in searching for the assignment matrix X 12 ∈ {0, 1} n 1 ×n 2 such that where vec(X 12 ) denotes the column wise vectorization of X 12 ; 1 n 1 and 1 n 2 denote the column vectors of all ones of size n 1 and n 2 ; and Φ 12 ∈ [0, 1] n 1 n 2 ×n 1 n 2 is the affinity matrix that is given as an input.The diagonal entries of Φ 12 encode the similarity across nodes whereas non-diagonal entries encode the similarity across edges between the two graphs.
The computation of the affinity matrix is context-dependent, and we detail the approach used in the present work in section 4.1.
The computation and storage in memory of the very large matrix Φ 12 impedes the scalability of the matching problem based on this formulation.A solution to tackle this limitation is to reformulate the matching as a Koopmans-Beckmann's problem (F.Zhou & De la Torre, 2015) that is a special case of Lawler's QAP: where Ψ 12 ∈ [0, 1] n 1 ×n 2 denotes the affinity matrix across nodes, and A 1 ∈ R n 1 ×n 1 and A 2 ∈ R n 2 ×n 2 are the weighted adjacency matrices of two graphs respectively such that exists with weight w ij and A[i, j] = 0 otherwise.Koopmans-Beckmann's formulation is a special case of Lawler's where the edges can only be weighted by a scalar value (i.e.cannot support a vector of attributes on edges).Under this constraint, we can decompose the large matrix Φ 12 into three smaller matrices Ψ 12 , A 1 and A 2 , which provides better scalability than Lawler's QAP.
These two formulations are combinatorial QAPs and are known to be NP-hard problems.Most methods therefore relax the hard constraints given in Eq.( 1) and ( 2) and provide approximate solutions.Various approaches have been proposed to relax these problems, leading to a variety of graph matching methods.Discussing these methods is beyond the scope of this work but we refer interested readers to the review Yan, Yin, et al. (2016).
Going back to our specific context, we reported in Buskulic et al. (2021) a benchmark of the pairwise methods SMAC (Spectral Matching with Affine Constraints) (Cour, Srinivasan, & Shi, 2007), IPFP (Integer Projected Fixed Point algorithm) (Leordeanu, Hebert, & Sukthankar, 2009), RRWM (Reweighted Random Walks for graph Matching) (Hutchison et al., 2010), and KerGM (Kernelized Graph Matching) (Zhang, Xiang, Wu, Xue, & Nehorai, 2019).We observed that KerGM clearly outperforms the others in our context.Conceptually, KerGM well suits sulcal graphs as it relies on Frank-Wolfe optimization that allows to follow an optimisation path that respects the constraint on each step.This induces a robustness to the presence of noise in graphs that is crucial in our context.In the present work, KerGM is included in our benchmark as a representative of pairwise approaches.It viii is also used to define the initialization of all the multi-graph methods that are introduced in next section.

The multi-graph matching problem
We now focus on the problem of jointly matching a population of N graphs {G 1 , . . ., G N }, starting from pairwise assignment matrices X ij between graphs G i and G j (computed with KerGM in this work).The key concept behind multi-graph matching is the cycle consistency.This concept states that a matching between two graphs G i and G j should be the same if we go through an intermediate graph G k to create a new mapping.Formally, a perfectly consistent, bijective mapping (every node is matched to one and only one other node) would satisfy : for any i, j and k with i = j = k.A common way to estimate consistency at the population level is to compute the full bulk assignment matrix X ∈ {0, 1} m×m with m = N q=1 |G q |, that is obtained by assembling all individual pairwise matrices: Intuitively, enforcing the consistency constraint will induce a reduction of the rank of this bulk matrix.Multi-graph matching techniques can be divided into three categories as follows.
The second category of techniques does not explicitly minimize the rank of the bulk matrix but rely on other types of formalization aiming at increasing the consistency across all graphs (Yan, Cho, Zha, Yang, & Chu, 2016;Yan, Cho, et al., 2016;Yan et al., 2014Yan et al., , 2013;;Yan, Wang, Zha, Yang, & Chu, 2015).

Selection of the methods included in our benchmark
We used the following criteria to select the methods included in our benchmark: (i) Availability of code.We included only methods for which the authors have made their code openly available in order to avoid reimplementation issues and to ensure the full reproducibility of our results.(ii) Methods exploiting graph topology.We selected the methods that take into account the topology of the graph, which is crucial to exploit the spatial adjacency information encoded in the sulcal graphs.(iii) Scalability.Since we are interested in performing population studies over large sets of individuals, we excluded methods that do not provide acceptable scalability.(iv) Unsupervised methods.Finally, as motivated in the introduction, we focus on unsupervised methods in the present study.
The method that satisfy these selection criteria are mALS (X.Zhou et al., 2015), mSync (Pachauri et al., 2013) and CAO (Yan, Cho, et al., 2016).We provide a detailed description of each of these methods below.In our experiments, these multigraph graph-matching techniques will be compared with the pariwise approach KerGM, and with the two methods from the literature specifically designed for labeling sulcal graphs already described in Sec.1.3.1:Auzias et al. (Auzias et al., 2015) and Kaltenmark et al. (Kaltenmark et al., 2020).

Description of the selected multi-graph matching methods
As described in section 2.3, the general objective of multi-graph matching methods is to match the nodes across several graphs together by enforcing consistency.
The authors of CAO (Yan, Cho, et al., 2016) propose to maximize the affinity information and impose consistency at the same time instead of considering them separately.
They assume that enforcing consistency acts as a regularizer in the affinity objective function, particularly when the matching is ambiguous due to noise.The approach is based on the search of an intermediate graph G q that allows to optimize the affinity score while progressively inducing consistency.They introduce the unitary consistency across a set of N pairwise matching solutions X for a graph G q as: where .F is the Frobenius norm.The authors propose several approaches to balance between consistency and affinity, leading to different variants of CAO.In particular, their best algorithm is able to elicit outlier nodes during the optimization, which is highly relevant in our context.However, the use of affinity information along with consistency and outlier elicitation increase the computational complexity of the method to O(N 4 ).As a consequence, only the least resource-demanding algorithm CAO cst did scale with the x memory requirements imposed by the size of our graphs and number of subjects in our populations.We thus refer to that particular version in the rest of this article.This version of CAO enforces consistency through Eq.4, but ignores the affinity information.
The approach mSync (Pachauri et al., 2013) consists in estimating a mapping of each X ij to a common universe of assignment matrices, of size d: with Since solving eq.5 is intractable in most applications, the authors relax the problem into a generalized Rayleigh problem.They further propose to use a reference graph in order to estimate the mapping to the universe.In the implementation provided by the authors, the first graph in the collection G 1 is selected as the reference graph.
In mALS X. Zhou et al. (2015), the authors formalize the multi-graph matching as the following low rank matrix recovery problem: where, ., . is the inner product, α controls the weight on sparsity, and is the set of affinity matrices given as input.The cycle consistency is induced by the nuclear norm X * that controls for the rank of X while 1, X favors bijective matchings across graphs.Importantly, X is treated as a real matrix such that X ∈ [0, 1] The matrix is binarized at the end of the optimization process using a threshold value t that is set by default as to t = 0.5.In, addition, the authors leverage the work by Hastie, Mazumder, Lee, and Zadeh (2015) and Cabral, De la Torre, Costeira, and Bernardino (2013) for decomposing X which allows to solve the problem in a lower dimension space using the ADMM method (Eckstein & Bertsekas, 1992).

Generation of a population of synthetic sulcal graphs
A primary objective of our work is to investigate and evaluate different multi-graph matching techniques in the context of sulcal graphs.However, as mentioned in the introduction, there is no ground truth matching available for such graphs.We tackle this problem by designing a procedure allowing to generate a population of artificial sulcal graphs with correspondences defined by construction.Such populations of artificial graphs will constitute a ground truth against which the different matching methods can then be benchmarked.Generating artificial sulcal graphs for the purpose of a benchmark study induces the two following constraints: 1) The artificial graphs should be biologically plausible, i.e. xi they should respect as much as possible the intrinsic properties of a population of real sulcal graphs.2) The generation of the artificial graphs should be as simple and straightforward as possible in order to facilitate the comparison of the performances obtained in the benchmark study and the interpretation of the differences, i.e. the generation procedure should rely only on a limited number of parameters, and potential biases should be avoided.As detailed below, these two contradictory constraints are balanced in the design of our generation procedure.
The procedure is summarized in Algo. 1 and consists in two main steps.First, we generate a set of points on the common spherical domain, that will serve as reference nodes.
Then, we impose several types of perturbations to this set of reference nodes in order to generate a corresponding population of artificial sulcal graphs, while preserving the correspondences across graphs, i.e. the ground-truth matching.Such procedure provides the ground truth matching across the population, while controlling for the nature and amount of variations across artificial sulcal graphs (corresponding to different subjects in real data).

Generation of a set of reference nodes
The first step consists in generating a set of reference nodes on the spherical domain while controlling for two specific distinct parameters : the number of nodes noted n ref , that is typically set to match the average number of nodes across a real population, and the minimum distance between the nodes.Indeed, the nodes of the real sulcal graphs cannot be closer to each other than a minimum distance since they correspond to depth maxima that are not located in the immediate proximity of the boundary of sulcal basins (see Auzias et al. (2015) for further description of the extraction of sulcal pits and basins).As a consequence the spatial distribution of the nodes on the sphere cannot be fully random.In order to generate this set of n ref points on a sphere with pseudo-random spatial distribution, we adopted a simple brute force approach: we sample a set of n ref points over the surface of the sphere 10000 times; and we select the set that has the largest minimum geodesic disxii tance between neighbouring points.As we will show in sec.4.3.1, 10000 times is sufficient to get a set of reference nodes with a minimum distance between points that is realistic.
Technically, the uniform sampling of points on the sphere is achieved by generating random rotations of the unit vector as described in Blaser, Fryzlewicz, Blaser, and Fryzlewicz (2016); Lefèvre et al. (2018).
At this stage, we have defined on purpose a set of reference nodes that matches a real population in terms of size and of minimal distance between nodes.The next step consists in perturbing the reference nodes in order to generate the population of synthetic sulcal graphs.

Generation of an individual sulcal graphs
We now add perturbations of different natures to this set of reference nodes in order to obtain a population of artificial sulcal graphs, that corresponds to different subjects.
These perturbations aim at mimicking the inter-individual variations that are observed in a healthy population, by affecting the features of the nodes and edges, but also the topology of the graphs.In order to generate a population of N artificial sulcal graphs, these operations are repeated N times independently.

Perturbation of the location of the reference nodes
The first step consists in adding random noise to the coordinates of the reference nodes on the sphere, in order to model the inter-individual variability that exists in the location of the sulcal pits.We used the von Mises-Fisher (vM F ) distribution that is an approximation of Gaussian distribution on a sphere (Von Mises, 1964).The two parameters of the vM F distribution µ and κ can be seen as the equivalent of the mean and of the inverse of the standard deviation (κ ∝ 1/σ) for a Gaussian distribution.Therefore, we iterate across the reference nodes, and for each reference node, we produce a noisy one by sampling from the distribution vM F (µ, κ), where µ is the coordinates of this reference node.We control for the amount of noise on the coordinates of the perturbed nodes through the value of the parameter κ, that is common to all nodes from the reference set.Smaller values for κ will induce larger variations across the artificial sulcal graphs within the population.
Importantly, note that since we perturb each node of the reference set independently, we keep the correspondence between each noisy node and its reference node, which will allow defining our ground truth matching at the population level.

Addition of outliers and suppression of nodes
Next, we simulate the inter-individual variations in the number of nodes across the sulcal graphs, which is of crucial importance for generating biologically plausible artificial populations.The aim is to model both false positive and false negative matchings, i.e. respectively nodes that are present in the reference set but not in a given graph, and nodes that are present in the graph but not in the reference set.This is achieved by randomly adding xiii a certain number n o of nodes on top of the perturbed nodes -hereafter called outlier nodes, and by deleting n s nodes amongst the perturbed nodes -hereafter called suppressed nodes.
In order to randomly draw n o and n s , we use the β-binomial distribution B(ν, α, β), which is a distribution of non-negative integers.The parameter ν denotes the size of the support of the distribution, i.e the maximal value that can be sampled.The parameters α and β can be set so that B(ν, α, β) approximates a Gaussian distribution.We describe the setting of these parameters and precise their link with µ and σ of a Gaussian in Appendix A. Since we want the average number of nodes across the population of perturbed graphs µ simu to match the number of nodes in the reference set n ref , we set µ o = µ s = µ pert and also σ o = σ s = σ pert .This formulation allows us to control the standard deviation of the number of nodes across the population of artificial graphs with the two parameters µ pert and σ pert .

Construction of the edges
The last step consists in constructing each artificial sulcal graph with the sets of perturbed nodes as follows.We first compute the three-dimensional convex hull of each set of perturbed nodes located on the sphere.This yields a triangulation where only neighboring nodes on the sphere are connected, which is a simple way to simulate the region adjacency graph that is constructed from the sulcal basins in the real data.However, the average node degree in such triangulations is higher than for real sulcal graphs.Therefore, we finally delete a small percentage p of the edges in these triangulations, in order to obtain artificial graphs which match the average degree of real sulcal graphs.
Note that since the construction of the edges occurs after the previous perturbation steps (perturbations of the location, addition of outlier nodes and suppression of nodes), the resulting artificial sulcal graphs can show variations in their topology across individuals of a population, as we observe in real data, making them biologically-plausible in that respect.

Computation of the affinity matrices
As described in Sec.2.2, we initialize all the multigraph matching methods using the pairwise results obtain from KerGM, which relies on the formalization of Eq. 2. We thus need to compute the affinity matrices Ψ ij , A i , A j that store the similarity between nodes and edges across every pairs of graphs in the population.
In the present work, we compute these affinity matrices using Gaussian kernels applied to the attributes.For two nodes v ∈ G and v ∈ G the affinity value is computed using the kernel defined as exp (−γ ) and for two edges e ∈ G and e ∈ G the kernel is defined as ).To estimate appropriate values for γ V and γ E we use a heuristic proposed in Takerkart et al. (2017) that consists in using a cross-validation xiv scheme to compute the inverse of the median of the distribution across all possible pairs of nodes/edges, independently for each attribute (3D coordinates on the sphere for the nodes and the geodesic distance for the edges).

Dummy nodes
Most graph matching methods assume a constant number of nodes across the graphs to be matched, which is not the case in our case (both synthetic and real graphs).We use the classical approach from the graph matching literature which consists in adding dummy nodes to smaller graphs so that all the graphs get the same number of nodes as the largest graph in the population.For each of these dummy nodes, we assign to 0 the corresponding values in the node and edge affinity matrices.This makes the optimization problem defined in Eq.2 independent from dummy nodes.

Description of synthetic data sets
We first tuned empirically the parameters to the values µ pert = 12, σ pert = 4 and p = 10% to obtain variations in our synthetic graph populations that are in line with what is observed in real data.The distribution for number of nodes in the real data population is 88.27±4.72 likewise in our simulated population for a randomly chosen κ value the distribution for number of nodes is 88.15 ± 4.45 for the selected value of µ pert and σ pert and is consistent across all κ values across all trials.We further provide in Appendix B additional materials showing the matching distributions between our simulated graphs and real data.
Furthermore, we varied the value of κ ∈ [100,200,400,1000], which controls the amount of variations across synthetic graphs within a population.Note that κ controls the spread of nodes coordinates around the reference nodes, which in turn induces variations in the topology and attributes of synthetic graphs.
For each value of κ, we generate 10 populations of N = 137 synthetic graphs (which corresponds to the number of subjects in our real population; see below) and report the average and standard deviation of the metrics described below.As illustrated on Fig. 3, our populations of synthetic graphs show variations that are qualitatively very close to those observed across real graphs.

Evaluation metrics for synthetic data sets
In order to evaluate the different matching methods on simulated graphs, we use the classical precision, recall and F 1 -score: xv Thus, Precision is a ratio between the True positives(number of correct matches predicted by the algorithms) and all the positives(number of matches by the algorithms).Whereas, Recall is a ratio between True positives and True positives along with False negatives(number of correct matches not predicted by the algorithms).Finally, the F 1 score provides a balance between Precision and Recall.A F 1 -score of 1 reflects the ability of the algorithm to obtain a perfect matching of inlier nodes and accurate identification of outlier nodes.These metrics are relevant in our context to detect matching with outliers alongside the incorrect matches.

Results on synthetic data sets
We report of Fig 4 the mean and standard deviation of Precision, Recall and F 1 -score, computed across the 10 synthetic populations for each value of κ.
First, we find that two multi-graph matching methods, mALS and mSync, vastly and consistently outperform KerGM, which has been identified as the best pairwise matching method for this task in Buskulic et al. (2021).This confirms our main hypothesis: considering the matching problem on the whole population using multi-graph matching allows an important gain in performance compared to only considering pairs of graphs.
Then, we observe a gradual decline in the performances of all methods as the noise increases (decrease of κ), as expected.The performances of the multigraph approaches mALS and mSync resist much more to this increase in variability than the pairwise approach.The performances of mSync are limited more specifically by the lower precision at any level of noise.This suggests that the difference in performances between the two methods are mainly due to the hard constraint on the consistency in mSync that seems too restrictive.On the other hand, the recall indicates that mSync is more robust to increasing noise than mALS, with very close value when κ = 100.However, mALS performs better for lower noise values.Overall, mALS shows the best F 1 -score for every κ values, thanks to a very high precision combined with very good recall.Indeed, the F 1 -score for mALS is above 0.7 even for κ = 200 which corresponds to a configuration where the noise is quite strong.
Finally, the performances of CAO are very low, even lower than the pairwise technique KerGM.Such poor performances are likely a consequence of the optimization that considers only the consistency but ignores the affinity of nodes.As already mentioned in Sec.2.5, the other versions of CAO proposed in Yan, Cho, et al. (2016) could show much higher performances but did not scale with the size of our data.[1000,400,200,100].For each method, we plot the average across the 10 simulated populations as a line and the standard deviation as the shaded region of the same color.

Preprocessing of real data
For the evaluation on real data, we use the sulcal graphs from 137 young healthy adults taken from the publicly available database OASIS (Marcus et al., 2007).The preprocessing xvii of these data (brain tissues segmentation, mesh extraction and sulcal graphs construction) has been detailed in Auzias et al. (2015); Takerkart et al. (2017).Across this population, the number of nodes is 88 ± 4, with a maximum size of 101 nodes/pits.Dummy nodes are thus added to all other graphs to get a constant size of 101, as explained above.

Evaluation metrics used with real data
In absence of ground truth matching for real data, we cannot compute the same scores as for the simulation experiments.We therefore combine a set of quantitative metrics with some qualitative assessments, which we describe below.

Consistency
According to Yan, Cho, et al. (2016), we compute the node consistency as follows: Given G k ∈ {G q } N q=1 and the bulk matrix X, for node where Note that it is different from Eq. 4 which estimates the consistency at the graph level.
This consistency measure is computed for each node of each graph, including dummy nodes.A value of 1 corresponds to the ideal case where each graph only contains nodes that have been matched in a consistent manner.This consistency measure cannot distinguish the matches of real nodes to dummy nodes from valid matches across real nodes.
For methods imposing an explicit constraint on the consistency, a value of 1 is expected (and not informative), but for the other methods this measure is relevant and allows to assess the spatial pattern of the consistency across clusters.

Qualitative and quantitative assessment of the labeling induced by the matching
In terms of potential applications of the graph matching to sulcal graphs, a major outcome is the labeling of graph nodes that is induced.As already mentioned in the introduction, the assessment of the quality of the labeling and thus of the biological relevance of the matching across individuals is an ill-posed problem.The first problem is to retrieve a labeling from the assignment matrix resulting from the matching.In the case of a perfectly consistent matching where each node of each graph would be matched to one and only one node from every other graph in the population, the labeling would be trivial and would consist in simply associating a label to each row or column of the assignment matrix.This situation is however impossible since the number of nodes varies across individuals within our population of interest.Therefore, in the present work we take the largest graph as a reference, and we associate an arbitrary label to each of its nodes and then propagate these labels to every other graphs based on the assignment matrix resulting from each method.
Once the labeling of the nodes is retrieved, the nodes that share the same label across subjects are grouped together into what we will designate as clusters, that are different depending on the matching method.We then compute the coordinates of the centroid of xviii each cluster, which enables to evaluate qualitatively the spatial distribution of the different clusters across the cortical surface.
This qualitative assessment is complemented with a quantitative measure of the compactness of the clusters.For this, we compute the silhouette coefficient of each node from each graph.As proposed in Rousseeuw (1987), the silhouette of a node corresponds to the ratio between the average Euclidean distance to the other nodes in the cluster and its distance to other nearby clusters.Since the distances are computed on the spherical domain, the use of Euclidean distance is sub-optimal but the errors induced are very low and independent from the matching method.The silhouette coefficient of a cluster is then obtained by averaging the silhouette values from corresponding nodes.

Results on real data
We first report in Table 1 the quantitative measures that allow us to compare the different techniques at the whole brain level: the number of clusters (thus of labels) obtained with each method, the silhouette measure averaged across all nodes and graphs, the percentage of nodes remaining unlabeled, the consistency measure averaged across all nodes and graphs, and the computing time.We then illustrate the matching across nodes from the different graphs (subjects), obtained for each method on  Together with Table 1, this figure illustrates the poor performances of pairwise matching xx approach with high spatial dispersion of nodes corresponding to each cluster for KerGM, associated to very low silhouette coefficients.The method mSync results in higher silhouette coefficients for some nodes, but lower value for others (nodes and centroids in blue on Fig. 6), indicating that the matching was enforced also for ambiguous nodes located in highly variable regions.This is a consequence of the hard consistency constraint in mSync imposing a matching that is consistent across all graphs by construction, even in highly variable regions.For Auzias et al., we observe that the clusters are organized around regions of high nodes density, but the nodes located relatively far from the centroids have a lower silhouette value (nodes in green on Fig. 6).These observations are consistent with the algorithm that is based on a watershed applied to the sulcal pits density map as described in Sec.1.we observe a cluster in the superior temporal sulcus that is more consistent than those located anteriorly or posteriorly, which is in line with previous studies describing variations and stabilities across individuals in this region Leroy et al. (2015).
xxi Figure 6: For each method, we show the silhouette coefficient of each node from every graphs, as well as corresponding centroids as larger circles.Each centroid (larger circles) is colored according to the average of the silhouette coefficient of corresponding nodes.
Figure 7: Node consistency computed for each node of each graph with respect to the remaining graphs, and then averaged across graphs.We adapted the colorbar to visualize the differences between the three methods, with the pariwise technique KerGM showing much lower values.

Discussion
In this work, we explored the potential of graph matching methods applied to a population of sulcal graphs to uncover correspondences across individuals driven by the local patterns of folds.Indeed, these graph matching methods take into account the characteristics of individual sulcal basins as well as their topological organization to construct the correspondences.Our results on both simulations and real data support the biological relevance of the correspondences across individual resulting from multi-graph matching techniques. xxii

Relevance of simulated graphs relative to real data for evaluating matching techniques
To overcome the lack of ground truth for real data, we proposed a procedure allowing to generate artificial graphs that approximate the features of real sulcal graphs while controlling the variations across graphs.This simulation procedure enabled to benchmark various pairwise and multi-graph matching techniques.The evaluation of the performances of the different methods and their robustness to controlled variations in the simulated graphs was informative for probing their effectiveness in this context.The performances of the pairwise approach KerGM were limited even when the level of perturbations was minimal.Note that we reported in Buskulic et al. (2021) that alternative pairwise techniques perform even worse on this task.Amongst the different multi-graph matching techniques that were tested, mALS showed better performances than the others in all conditions, and a good robustness to increasing noise levels.These observations were confirmed by our application on real data.Overall, our set of experiments confirmed the intuition that multigraph matching techniques are highly relevant in our context, while pairwise techniques show limited performances and might thus be restricted to initialization purpose.
Of note, our aim was not to push the biological plausibility of our simulated graphs.
Keeping the simulations simple enables straightforward interpretation of the variations in the performances across the different approaches.This trade-off is visible in the procedure in particular when we sample the reference nodes uniformly on the sphere.Indeed, our simulation procedure cannot produce realistic non-uniform spatial distribution of nodes across the population.While this could be achieved by adapting the sampling of the reference points, this would induce variations in the performances of the matching techniques depending on the location on the sphere, which in turn would make the comparison across methods much more difficult.
Beyond the present work, our procedure for simulating sulcal graphs could be instrumental to assess future improvements in graph matching techniques.

Considerations relative to deep-learning approaches and potential methodological improvements
As already mentioned in section 2.3, many other graph matching techniques can be found in the literature but were not included in the present work.More specifically, deep learning approaches outperform traditional approaches in supervised learning task (LeCun, Bengio, & Hinton, 2015).Recent works such as e.g.(Scarselli, Gori, Tsoi, Hagenbuchner, & Monfardini, 2009;Xu, Hu, Leskovec, & Jegelka, 2019) showed that the structural information can be learnt by a Graph Neural Network(GNN), providing that manually labelled ground-truth data is available.
In addition, the rise of semi-supervised learning approaches represents an opportunity in the context of graphs with partial matching ground-truth.Such approaches are worth considering in our context, since we observed marked variations across cortical rexxiii gions in the ambiguity of the matching.The work by Fey, Lenssen, Morris, Masci, and Kriege (2020) considers a semi-supervised framework for handling the matching problem where the ground-truth correspondence are only given for a small subset of nodes.In addition, their approach imposes an explicit inductive bias to find correspondences across graphs, based on neighbourhood consensus that does not allow adjacent nodes from being mapped to different regions in other graphs.This is appealing in the case of sulcal graph matching where we would like to enforce the matching of nodes located in some specific regions more than in others.Such a framework could benefit from the recent work Lyu et al. (2021) on context-aware data augmentation, which could be instrumental to overcome the bottleneck of the lack of ground-truth labeling data.
Another avenue for potential gains in performance consists in improving the definition and integration of the attributes on nodes and edges.Many other geometrical features could be considered to enrich the attributes on nodes, such as e.g.shape index and curvedness Awate, Yushkevich, Song, Licht, and Gee (2010), or the local gyrification index Rabiei, Richard, Coulon, and Lefèvre (2017).On the other hand, the literature on learning edge representations is very scarce Hsu, Shen, and Cremers (2022), and the attributes on edges are most often reduced to a scalar value (i.e. a simple weight).In particular, the methods included in the present work cannot handle vectors of attributes on edges.Some recent deep learning methods such as (R. Wang et al., 2021) can exploit such vectors of attributes, but their scalability is limited by the size of the affinity matrices.We proposed in Dupé, Yadav, Auzias, and Takerkart (2022) to overcome this limitation by leveraging the recent matrix factorization method from Zhang et al. (2019).We will further investigate the potential of these methods in our future studies.

Data-driven nomenclature of sulcal basins
The present work extends previously reported experimental results illustrating the major impact of the labeling strategy on the induced correspondences and data-driven nomenclature.In Kaltenmark et al. (2020), the number of clusters obtained at the group level varied from 90 to 114 for the right hemisphere using either the approach proposed in Kaltenmark et al. (2020) or Auzias et al. (2015) respectively, on the same population of subjects.We

Conclusion
In this study, we explored the potential of several graph matching methods chosen from the literature to define population-wise correspondences across individual cortical geometries.In the absence of a ground-truth labeling for real data, we first proposed a procedure to generate simulated sulcal graphs that follow the intrinsic structure and properties of real sulcal graph.We then compared the approaches on our simulated sulcal graphs with ground-truth correspondences defined by construction.
We also evaluated the methods on real data.We computed the silhouette value of each node of the graph that measures the degree of compactness of each cluster, giving

Figure 1 :
Figure 1: Example of sulcal graphs from three individual brains, superimposed with the underlying decomposition of the cortical surface in sulcal basins.Sulcal basins are shown in different colours, and their corresponding node in the graph are represented as spherical dots in the lower panel.The color of each node in the graph illustrates the value of a given attribute such as for instance the area or depth of corresponding sulcal basin.

Figure 3 :
Figure 3: a) Real sulcal graphs from three randomly chosen individuals, and projected on the average surface.b) Simulated graphs randomly chosen for κ = 1000, showing the ground-truth correspondence across graphs in color.Nodes in black represent the outlier nodes that have no correspondence.c) Illustration of the impact of κ on the spatial dispersion of nodes: the nodes of six simulated graphs are shown on the average surface for κ = 1000 (left) and , κ = 200 (right).The spread across the nodes for each cluster varies according to κ, while outlier nodes in black have random locations.

Figure 4 :
Figure 4: F 1 -score, Precision and recall for κ ∈[1000, 400, 200, 100].For each method, we plot the average across the 10 simulated populations as a line and the standard deviation as the shaded region of the same color.

Fig. 5 .
The number and location of the different centroids (larger circles) is informative of the spatial distribution of the clusters of nodes across the cortical surface, for each method.On the first column (mALS and Kaltenmark et al.) some nodes remain unlabelled and are represented in black.The clusters seem more compact than for the methods of the second column (Auzias et al. and mSync) that do not allow any node to remain unlabeled.On the third column (CAO and kerGM) the matching looks noisy, with clusters overlapping between eachother in almost every cortical location, which illustrates the poor anatomical relevance of the matching.

Figure 5 :
Figure 5: Labeling and corresponding cluster centroids (larger circles) for each method.Dots in black in the first column (for mALS and Kaltenmark et al.) correspond to unmatched nodes.See text for further description.
Across the different methods, we observe that the clusters showing a higher silhouette value relative to other clusters are located systematically in the same regions that are known to be less variable across individuals, such as the central sulcus, and the insula.For these clusters, the silhouette values are close across methods, confirming the lower ambiguity in the matching in these regions.In highly variable regions, the different methods produce different matchings.For instance in the occipital lobe, the clusters produced byKaltenmark et al.show lower silhouette values compared to mALS, but we observe the opposite effect in the anterior frontal lobe.On Fig.7, we show the consistency for every nodes and centroids, for the three methods that do not explicitly enforce a perfect consistency.Clearly, the pairwise technique KerGM results in inconsistent matching for every clusters, including the regions where the variations across individuals are known to be low (no centroid in green, even in the central sulcus and the insula).For mALS and Auzias et al., we can observe the spatial variations of the consistency across cortical regions.Again, higher consistency is obtained in less variable regions (central sulcus, insula) for both techniques, and relatively lower values are visible in the frontal and occipital regions.The consistency is higher for mALS than Auzias et al. for every clusters.Note that the spatial pattern of the consistency measure for mALS is anatomically relevant, with a consistent matching in the insula, the central and pre-central regions, and less consistent in the peri-sylvian regions.At a more local scale, provide a much more detailed comparison.We show on Fig.8the superimposition of the centroids from different methods on the same average surface.This visualization shows that the location of some of the centroids are very consistent across methods (indicated by arrows), corresponding to cortical regions where variations across individuals are known to be low, such as the central sulcus, the insula, the inferior precentral or superior temporal sulcus.Other clusters differ across methods.The clusters indicated by squares are those resulting from Auzias et al. and mSync (resp.) that do not match clusters from Kaltenmark et al. (see also Table 1).These are located either in highly variable regions such as the frontal lobe, or on the top of gyri such as the superior temporal gyrus and the inferior frontal gyrus.While mALS and Kaltenmark et al. result in centroids that are highly similar (crosses and rings are often superimposed on the panel on the left), the two xxiv methods do not result in the same matching in highly variable regions such as the inferior frontal and parietal regions (indicated by diamonds).In conjunction with our results on synthetic (Sec.4.3.3) and real data (Sec.4.4.3), these observations confirm that the conceptual differences between the approaches yield different matchings and thus different correspondences across individuals.Indeed, graph-matching techniques such as mALS are able to take into account the topological information encoded in the graphs, i.e the spatial organization of neighbouring folds, while Kaltenmark et al. relies only on the geometry of sulcal basins, considering different folds separately.

Figure 8 :
Figure 8: Superimposition of the centroids from mALS, Auzias et al., and mSync shown as crosses with those of Kaltenmark et al. shown as green rings.mSync is representative of the methods kerGM and CAO that also result in 101 clusters.The arrows point to centroids that are robust across methods.The squares indicate centroids corresponding to small clusters located on gyri.Diamonds indicate centroids that differ between mALS and Kaltenmark et al..

Figure B. 2 :
Figure B.2: Distribution for geodesic distance in the simulated and real population of 137 graphs.The shaded region corresponds to standard deviations across graphs in the population.

Figure B. 3 :
Figure B.3: Degree distribution in the simulated and real population of 137 graphs.The shaded region corresponds to standard deviations across graphs in the population.
Algorithm 1 Procedure to generate a population of artificial sulcal grahs Require: N, n ref , κ, µ pert , σ pert , p Step1: create reference nodes See Sec.3.1 for j = 1..10000 do Sample n ref points on the sphere Compute the minimum geodesic distance end for Choose the set of points with the largest min distance.

Table 1 :
Quantitative measures computed at the whole brain scale.KerGM enforce the matching of every nodes, and result in a number of clusters equal to the size of the largest graph in the set, i.e. 101.The method Auzias et al. results in more clusters than the size of the largest graph, suggesting that some clusters correspond to highly variable nodes that cannot be matched consistently across individuals.This is confirmed by the consistency measure which is lower than for mALS.The consistency of Auzias et al. is still much higher than the value of 0.30 obtained with the pairwise technique KerGM.Note that the methods mSync and CAO explicitly enforce a perfect consistency, but this is possible only when considering the dummy nodes as pointed in section 4.4.2.Also note that the method Kaltenmark et al. also gets a perfect consistency.This is a consequence of the explicit constraint imposed in this technique by allowing one and only one node per subject to be matched for any given cluster.xix The silhouette measures illustrate that a high consistency can be associated with a low compactness of the clusters as e.g. for CAO and mSync that get values close to the one of the pairwise technique KerGM.The methods Auzias et al. and Kaltenmark et al. get much higher silhouette values which is expected since these techniques enforce the matching of nodes based essentially on their spatial proximity on the surface.The silhouette value of mALS is higher than these two techniques.Overall, mALS results in high silhouette and consistency values, at the cost of a high number of unmatched nodes (28.4%) compared to Kaltenmark et al. and Auzias et al., indicating that this method was much more conservative in the matching, leaving more ambiguous nodes unmatched.
The number of clusters and percentage of unmatched nodes indicate that the two methods that allow partial matching mALS and Kaltenmark et al. result in a lower number of clusters, suggesting that the ambiguous nodes remain unlabeled instead of enforcing their matching into potentially unreliable clusters.The three methods mSync,CAO and 3.1.For both mSync and Auzias et al., we observe some clusters with low silhouette value located close to each other, suggesting that the number of clusters is too high.The techniques mALS and Kaltenmark et al. result in much higher silhouette values,which is expected since they do not force the matching of highly variable nodes that are left unlabeled.The unlabeled nodes have a very low silhouette value (in violet on Fig.6), but since they do not belong to any cluster, this does not reduce the silhouette values of clusters.Note that even for these methods, the clusters get closer with lower silhouette values in highly variable regions such as the anterior frontal and occipital lobes.