What lies underneath: Precise classification of brain states using time-dependent topological structure of dynamics

The self-organising global dynamics underlying brain states emerge from complex recursive nonlinear interactions between interconnected brain regions. Until now, most efforts of capturing the causal mechanistic generating principles have supposed underlying stationarity, being unable to describe the non-stationarity of brain dynamics, i.e. time-dependent changes. Here, we present a novel framework able to characterise brain states with high specificity, precisely by modelling the time-dependent dynamics. Through describing a topological structure associated to the brain state at each moment in time (its attractor or ‘information structure’), we are able to classify different brain states by using the statistics across time of these structures hitherto hidden in the neuroimaging dynamics. Proving the strong potential of this framework, we were able to classify resting-state BOLD fMRI signals from two classes of post-comatose patients (minimally conscious state and unresponsive wakefulness syndrome) compared with healthy controls with very high precision.


A.3 Resting-state fMRI preprocessing
Preprocessing of both resting-sate fMRI datasets was performed using MELODIC (Multivariate Exploratory Linear Optimized Decomposition into Independent Components) version 3.14 [3], which is part of FMRIB's Software Library (FSL, http://fsl.fmrib.ox.ac.uk/fsl). Preprocessing steps included discarding the first 5 volumes from each scan to allow for signal stabilization, motion correction using MCFLIRT [4], non-brain removal using brain extraction tool (BET) [5], spatial smoothing with 5 mm full width at half-maximum Gaussian kernel, rigid-body registration, high pass filter cutoff = 100.0s, and single-session ICA with automatic dimensionality estimation. Furthermore, we applied FIX (FMRIB's ICA-based X-noiseifier) to remove the noise components and the lesion-driven artifacts, independently, for each subject [6,7,8]. Specifically, we used FSLeyes in Melodic mode to manually classify the single-subject Independent Components (ICs) into "good" for signal, "bad" for lesion-driven artifacts, noise, motion, or other nuisance sources, and "unknown" for imprecise components. We classified each component by looking at the spatial map, the temporal power spectrum, and the time course. Finally, we applied FIX by using the default settings to remove the bad components and obtain a cleaned version of the fMRI data.
We used FSL tools to extract the subject-specific time series among seven resting-state networks described according to the well-known Yeo parcellation atlas: (1) Visual, (2) Somatomotor, (3) Dorsal attention, (4) Ventral attention, (5) Limbic, (6) Frontoparietal, and (7) Default [9]. Specifically, the cleaned functional data were co-registered to the T1-weighted structural image by using FLIRT [10], the T1-weighted image was co-registered to the standard MNI space by using FLIRT (12 DOF) and FNIRT [10,11]. The resulting transformations were concatenated and inversed, and applied to warp the Yeo atlas from MNI space to the cleaned fMRI data in native-space by using a nearest-neighbor interpolation algorithm. The averaged time series for each of the seven resting-state networks were extracted for each subject in their native-space using custom-made Matlab scripts by computing fslmaths and fslmeants. A second order Butterworth filter between 0.008 and 0.08 Hz was applied for all datasets and a piecewise-linear function was applied to the two Paris protocols to avoid systematic differences with the Liège dataset. Finally, to avoid singularities in the LVT, all values must be positive. So, the global minimum of the two datasets (all subjects, all regions) was subtracted to all data points and a small ϵ=0.01 was added.

A.4 Probabilistic Tractography analysis
The average Structural Connectivity (SC) matrix of all healthy controls in the Liège group is used for all subjects in the study, given the unavailability of the connectivity matrix of many of the DOC patients. Ideally, we would have used for each subject her connectivity matrix, but we only had them for some of the Liège participants. The choice to use a different matrix for each group (average of available SC matrices in the group) was not a good methodological decision because the IS produced were very different due to differences in the SC matrices and not in the brain activity of each subject. We therefore opted for the same matrix for all. The main limitation is that the real matrix of a patient with severe brain damage may be very different from the average matrix we used. But the LV equations multiply the connectivity parameter by the activity in each brain network, so if this is low or non-existent, the connectivity value becomes less relevant in determining the IS shape.
We generated the structural whole-brain connectivity matrix for each subject in their native diffusion space using the Yeo template as used in the resting-state fMRI data. The whole-brain matrices were computed by following the two-step procedure as specified in previous studies [12,13,14]. First, images in DICOM format were converted to the Neuroimaging Informatics Technology Initiative (NIfTI) format by applying dcm2nii (www.nitrc.org/projects/ dcm2nii). The b0 image was co-registered to the T1-weighted structural image by using FLIRT [10], and the T1-weighted structural image was co-register to the standard space by using FLIRT and FNIRT [10,11]. The transformations matrices from these steps were concatenated and inversed, and applied to warp the Yeo atlas from MNI space to the native MRI diffusion space by using a nearest-neighbor interpolation method. Second, analysis of diffusion images was performed in FSL using the processing pipeline of the FMRIB's Diffusion Toolbox (FDT) (www. fsl.fmrib.ox.ac.uk/fsl/fslwiki/FDT). In brief, non-brain tissues were removed by applying the Brain Extraction Tool (BET) [5], eddy current distortions and head motion were corrected by using eddy correct tool [15], and the gradient matrix was reoriented to correct for subject motion [16]. Crossing Fibres were modeled using the default BEDPOSTX parameters and the probability of multi-fibre orientations was calculated to improve the sensitivity of non-dominant fibre populations [17,18]. Finally, Probabilistic Tractography was performed for each subject in their native MRI diffusion space using the default settings of PROBTRACKX [17,18]. For each of the seven resting-state networks of the Yeo parcellation, the connectivity probability to each of the other six resting-state networks was estimated as the total proportion of sampled fibres in all voxels in the network n that reach any voxel in the network p. Given that DTI does not capture fiber directionality, the SC np matrix was then symmetrized by computing their transpose matrix SC pn and averaging both matrices.

B.1 Model Transform
Here a model is defined as a system of n first order ordinary differential equations with n parameters θ i : where f i : R n+1 → R, and x ∈ R. Given a model (1) and a differentiable functionû : let us define the auxiliary functionsf i : If (1) is a model of a dynamical system we use t ∈ R (instead of x ∈ R), and the parametersθ i (t) for any given time point t = t 0 are called instantaneous parameters in t = t 0 .
It is easy to see that given the model (1) and given a differentiable functionû : R → R n , if θ i (x) are the model transform ofû(x) for that model for i = 1, ..., n, thenû(x) is the solution of u ′ i (x) = f i (u 1 , . . . , u n ;θ i (x)), i = 1, ..., n for initial conditions u(x 0 ) =û(x 0 ) for any x 0 ∈ R.
The most trivial example of model transformation is the slope of a curve at a point. The model would be the differential equation u ′ (x) = m where m is the parameter of this model. It is obviously a "straight line model" that is trivially invertible and given a functionû : R → R, u(x), for x ∈ R the transform ism(x) =û ′ (x) which is the well-known slopem of the curveû(x) at each point x.
Another trivial example is instantaneous velocity. In this case the model is a uniform motion in a straight line, a trivial dynamic system, defined by the three differential equations Thus, in this case the instantaneous parameters are the components of instantaneous velocity.

B.2 Lotka-Volterra Transform
The Lotka-Volterra Transform (LVT) is a particular case of model transform (MT) when the model is given by:u where g is a global coupling strength parameter. From these equations we can obtain timedependent parameters α i (t) as: This equation defines the LVT. In practice, both u(t) andu(t) are empirical values from BOLD signals obtained using fMRI in the form of discrete time series, γ ij is the value in the SC matrix connecting regions i and j andu(t) is approximated as (u(t + 1) − u(t − 1))/2h, where h is the time parameter for the central difference approximation of the derivative. LVT transforms u(t) into α(t). A non-stationary attractor landscape emerges when parameters α i evolve over time. Changes in the α i produce changes in the AL and the NoEL also changes over time. Thus, in our discrete time series at each time step k the u i,k data define a α i,k column with n components (i = 1, ..., n), and we obtain a temporal series of T (k = 1, ..., T ) different attractor landscapes and the corresponding T different ISs.
The global coupling strength parameter g and the parameter h are fitted to balance the information that each network i receives from its proper activity,u i (t)/u i (t) + u i (t) (h affects the approximation ofu i (t)) and the information coming from all the other networks, g n j=1 γ ij u j (t) (multiplied by g). When balanced, the three terms of the LVT (4) are in the same order of magnitude. In addition, in order to get a finer fitting we optimise the differences between groups.

B.3 Global attractor and energy levels
In a n-dimensional phase space X, a dynamical system in a state u can be characterized by a family of operators indexed by time, S(t), defined as S(t)u(t 0 ) = u(t 0 + t) for u ∈ X. A given set A is invariant under S(t) if S(t)A = A for every t. The global attractor is a compact invariant subset of the phase space with the property that S(t)u is arbitrarily close to global attractor for any u under the condition that t is sufficiently large [19].
These concepts can be illustrated using the Lotka-Volterra (LV) system, an example of dynamical system on a network:u where α i are the growth rates and γ ij the network structural connectivity. It is called cooperative Lotka-Volterra system when γ ij ≥ 0 for i, j = 1, .., n. Each stationary solution of a LV system is a unique combination of null and non-null components which can be binarized and expressed as a combination of ones and zeros (Figure 1). The Information Structure (IS) of the global attractor is defined as a directed graph (see Figure 1) composed of nodes associated with the invariant subsets and links establishing their connections.
Energy levels: The global attractor is usually comprised of invariant subsets (for instance, stationary points or peridic orbits) and trajectories connecting them [19,20]. The highest energy level is defined as formed by invariant subsets that receive no solution, i.e. unstable or source sets. Then, each successive lower level is defined including those invariant subsets that receive solutions only from the previously defined higher levels [21]. . The GASS can be said to be the point the system is attracted to, since any initial point of R 7 + will converge to the GASS. With n = 7 the attractor landscape can include up to 2 7 = 128 stationary points.
Thus, the invariant subsets are partially ordered, i.e. they admit an order relationship in which not necessarily all the pairs of elements can be compared. The number of energy levels (NoEL) will be one of the measurement in our analysis of the attractor landscape (AL). Figure 1 displays four different examples of ISs with NoEL equal to 5 (top) and 6 (bottom). For LV systems each energy level is formed by stationary points with the same number of nonzero components and the only stable point is a single stationary point with only incoming solutions in the lowest level called globally asymptotically stable solution (GASS). The GASS can be said to be the point the system is attracted to, since any initial point of R n + will converge to the GASS [22].
The LV model is relatively simple but, for n = 7 it may include a complex IS with up to 2 7 = 128 stationary points.

B.4 Frondosity
Information structures with q + 1 energy levels (NoEL) can have up to 2 q nodes, but not all of them are always present. Frondosity is defined as the number of nodes in the IS divided by 2 q (Figure 1). This ratio is a measure of the level of integration in the system. This is because in those IS in which Frondosity is small, a stable local attractor (GASS) is reached in which ROIs are active although the nodes of the IS corresponding to many of the combinations of those ROIs do not appear in the IS. The explanation, therefore, is that the integrative interaction between different brain areas facilitates the existence of that stable attractor.
Thus, joining w and z there are, at most, only 7 non-zero components different that are also positive and will be called r i . Recall that each GASS of a LV system is a unique combination of null and non-null components [23]. At the points where the transition between GASS occurs there are more than 7 null components of r. Thus, one way to measure the proximity of the GASS transition is to calculate the minimum of the r i . Previously, and since both w and z, and therefore also r, depend linearly on alpha, the vector r is normalized to make it independent of the alpha module. The minimum of the resulting {r i } is a measure of criticality as it indicates the proximity of a phase transition.

B.6 Synchronicity
The GASS divides the nodes of the system into two subsets: those equal to 0 and the rest with a value grater than 0. But there are two special kinds of IS where all nodes are in the same group. These are the IS with extreme values of NoEL (1 or 8 in the IS for the 7 networks in Yeo parcellation). Synchronicity is equal to the ratio of IS with these extreme NoEL values: all nodes are equally active or inactive in the GASS. Lower values of synchronicity correspond to a higher differentiation in the behaviour of the nodes of the system.  Information Structures of cooperative Lotka-Volterra systems have some important structural properties. If two stationary points having respectively the sets A and B of nodes with a value greater than 0 belong to the same IS, then that IS contains also a stable point with nodes A ∪ B greater than 0. This is because of the cooperative nature of the systems. So, once a node n i appears in a certain energy level, it will also appear in all the following levels and also in the global stable point. But we can look at the first stable point(s) where n i appears. If n i appears in the first energy level (like node n 6 in the top-left IS of Figure 1, which has a value greater than 0 in the solution represented by 0000010), then it is because α i > 0. But when n i appears in a later energy level (like n 3 in the top-left example IS of Figure 1, which appears the first time in the solution represented by 0010010) it is because of an interaction between nodes (in the example, n 6 allows the apparition of n 3 ). It may happen that n i is enabled by different nodes (in the example, n 3 is also enabled by n 7 , see the node 0010001, the right-most point in the intermediate level of the IS).

B.7 Cooperation
We can focus on the points of an IS which contain some new apparition of a certain node, and call them cooperative points. Figure 2 contains the representation of the cooperative points of the four IS of Figure 1. Node 0000000 is added so that there is a common root in all cases. The top-left example corresponds to the top-left IS of Figure 1. Trivially, points in the first energy level are included because each of them contains the first occurrence of a certain node (0000100 contains the first occurrence of n 5 ). The only nodes in the second energy level are 0010010 and 0010001 which correspond to the apparitions of n 3 because of the cooperation of n 6 and n 7 , respectively. Observe that only a minority of the points in an IS are cooperative points.
Four measures are defined by looking at the cooperative nodes of an IS. The highest energy level in which cooperation appears is the highest cooperation level. In Figure 2 the IS with the highest cooperation level is the bottom left one, which has two points 0111010 and 0110110 of level 4. By comparing figures 1 and 2, we see that the lower the frondosity, the highest the cooperation measures. The Cooperation value A is defined as the sum, for all cooperative points, of its level minus 1. Figure 2 indicates the cooperation values of the four examples. Recall that 0000000 is not a cooperative point. The top-left example has three points on level 1 (which sums 0 each to the first cooperation value) and two points on level 2 (which sums 1 each), so the cooperation value A is 2. The Cooperation value B is the sum, for each node which appears in the IS thanks to cooperation, the lower level in which appears minus one. So a node appearing due to the cooperation of several different nodes counts only once. It happens with n 3 in the top left picture of Fig. 2. It appears twice, in 0010010 and 0010001, but it only sums 1 once. In the bottom right IS, n 3 appears thrice in the second level (adding only 1) and n 4 appears once (adding 1 too). So, while cooperation value A looks at the nodes of the IS, cooperation value B looks at the apparitions of the nodes n i of the system. Cooperation value C works similar to cooperation value B but adding 2 l−1 for each node appearing the first time in level l. In the bottom left example of Fig. 2 cooperation value C is equal to 8 (2 4−1 ) because n 6 appears in level 4.