SozRank: A new approach for localizing the epileptic seizure onset zone

Epilepsy is one of the most common neurological disorders affecting about 1% of the world population. For patients with focal seizures that cannot be treated with antiepileptic drugs, the common treatment is a surgical procedure for removal of the seizure onset zone (SOZ). In this work we introduce an algorithm for automatic localization of the seizure onset zone (SOZ) in epileptic patients based on electrocorticography (ECoG) recordings. The proposed algorithm builds upon the hypothesis that the abnormal excessive (or synchronous) neuronal activity in the brain leading to seizures starts in the SOZ and then spreads to other areas in the brain. Thus, when this abnormal activity starts, signals recorded at electrodes close to the SOZ should have a relatively large causal influence on the rest of the recorded signals. The SOZ localization is executed in two steps. First, the algorithm represents the set of electrodes using a directed graph in which nodes correspond to recording electrodes and the edges’ weights quantify the pair-wise causal influence between the recorded signals. Then, the algorithm infers the SOZ from the estimated graph using a variant of the PageRank algorithm followed by a novel post-processing phase. Inference results for 19 patients show a close match between the SOZ inferred by the proposed approach and the SOZ estimated by expert neurologists (success rate of 17 out of 19).

: A Block diagram of the procedure for estimating the causal influence graph G. V is a 10 · Fs × N matrix of the ECoG recordings; X is a matrix of the pre-processing output; G is the estimated causal-influence graph (of size N × N ).
Let X and Y be arbitrary discrete-time continuous-amplitude random processes, and let X N ∈ R N and Y N ∈ R N , be N -length sequences. Using the chain rule for MI [1,Ch. 2.5], the MI between X N and Y N can be written as: Differently from MI, the DI from X N to Y N is defined as [2, eq. (1)]: Thus, in contrast to MI which quantifies a measure of dependence between two sequences, DI aims at quantifying the causal influence of the sequence X N on the sequence Y N (note that in the first term in (4) is X i 1 , while in (3) it is X N 1 ). It can be observed that I(X N → Y N ) depends on the sequence length N , thus, it is more insightful to consider the DI rate between the processes X and Y, which is defined as [2, eq. (12)]: provided that this limit exists. We now make the following assumptions regarding the processes X and Y.

A1)
The random processes X and Y are assumed to be stationary, ergodic, and Markovian of order M in the observed sequences. The stationarity assumption implies that the statistics of the considered random processes are constant throughout the observed sequences. From a practical perspective, stationarity is required to ensure that the causal influence does not change over the observed sequences. Thus, the block length is chosen such that the analyzed signals are (approximately) stationary. Ergodicity is assumed to ensure that the observed sequences truly represent the underlying processes. Finally, the Markovity assumption is common in modeling real-life systems which have finite memory, in particular in neuroscience [3,4,5]. We formulate the assumption of Markovity of order m in the observed sequences via: Note that here we use the simplifying assumption that the dependence of y i on past samples of Y i−1 1 and X i 1 is of the same order m. Further note that in the above Markovity assumption, we implicitly assume that given (Y i−1 i−m , X i−1 i−m ), y i is independent of X i , which reflects a setting where X i and Y i are simultaneously measured, as in the case of ECoG recording.
A3) The following holds: In the context of the current work, Assumptions A2) and A3) are required to mathematically insure that the DI rate exists and is equal to a simple expression which depends on the finite memory length m. Moreover, Assumptions A2) and A3) prevent the degenerate case of deterministic Y 1 or deterministic relationship between Y m+1 and Y m 1 , X m 1 . It is reasonable to assume that these conditions hold for the ECoG recordings which always contain some level of randomness. Under Assumptions A1)-A3), [5, Lemmas 3.1 and 3.2] imply that I(X → Y) exists and is equal to: In view of (6), I(X → Y) may have the following interpretation: Given the past of the sequence Y , how much does the past of the sequence X helps in predicting the next sample of Y .
One can observe that this interpretation of the DI rate is identical to the interpretation of GC [6]. Yet, while GC is a parametric measure, DI does not assume any underlying statistical model. Next, we discuss the pre-processing of the ECoG data.

Pre-Processing
The pre-processing consists of 4 steps: 1. First, the common reference is removed from all the recorded signals [7,8].
2. Then, each column of the matrix V is filtered using a 60 Hz notch filter to remove the linenoise [8,9].
3. Next, each column of the matrix V is down-sampled to 100 Hz (recall that the sampling rate in the iEEG portal is between 500 Hz and 5 KHz). Since each block is 10 seconds long, this implies that the size of the matrix X in Figure 1 is 1000 × N .
4. Finally the mean of each column is removed and each column is normalized to have a unit variance.
To understand the reasoning behind down-sampling, we note that to accurately estimate the causal influence two contradicting constraints should be satisfied. On one hand, the sequence from which the causal influence is estimated should be approximately stationary (see Assumption A1)). According to [7], ECoG signals are approximately stationary only for a few seconds. Thus, the number of samples that can be used for estimation is limited. On the other hand, as stated in [10, Sec. V.E], the number of samples required for accurate estimation grows exponentially with the Markov order m (this follows as increasing m can be viewed as increasing the state space). Thus, if m is too large, one cannot hope to accurately estimate the causal influence. Now, the Markov order m should reflect the auto-time-dependence of the ECoG signals (auto-correlation in the case of Gaussian signals), namely, for a fixed auto-time-dependence τ (in milliseconds), m should satisfy τ = m Fs . Hence, for a given τ (which is a property of the considered continuous signal), increasing F s requires also increasing m. However, the value of m must be controlled due to the limited number of samples. A possible approach to control the value of m, such that it will capture auto-timedependence τ , is to reduce F s via down-sampling.
The work [5] roughly estimated the Markov order m to be at the order of tens of milliseconds. For such values, when F s = 500 Hz, one must use m > 10. At the same time, the number of ECoG samples one observes in a 10 seconds block (recall that the block cannot be too long due to the stationarity requirement), is far from sufficient for accurate estimation (when m > 10). To tackle this challenge we chose to down-sample the ECoG signals to 100 Hz, thus, the required m is significantly smaller, i.e., smaller than 10, and this can enable an accurate estimation. Note that the above down-sampling filters out all data in frequencies larger than 100 Hz. Filtering the high frequencies was also applied in [8]. To conclude this discussion we note that [5] proposed a different method to deal with the challenge of high values of m. Using the terminology in (6), [5] proposed the following approximation: where 1 ≤ ∆ ≤ m is an integer. We emphasize that this step was used only in estimating the causal influence. While this indeed reduces the required number of samples for estimation, it is not clear how ∆ should be chosen. Moreover, the impact of the samples omitted when approximating (7) by (8) is not clear. In view of the common approach to analyze frequencies lower than 100 Hz [8], we believe that down-sampling is more robust and leads to better results as it accounts for all the information in the low frequencies. On the other hand, the approach of (8) ignores the dependence on the samples X i−m+1 , X i−m+2 , . . . , X i−m+∆−1 , X i−m+∆+1 , . . . and

Estimating the Pair-Wise Causal Influences
A block diagram of the procedure for estimating the DI between the signal recorded in the i th electrode and the the signal recorded in the j th electrode is depicted in Figure 2. From (6) it can be observed that the DI, and therefore its estimation, are functions of the Markov order m. Clearly, this value is not known apriori. Hence, m is first estimated, and then, its estimation is used in estimating the DI functional. To estimate m we use a locally weighted prediction calculated from the 8-nearest-neighbors (NN) of each sample tuple, thus, extending the approach of [11]. We note here that in order to maintain a reasonable computational complexity we estimate an m value for each time-series, i.e., column of X. Then, when estimating the DI, we use the value max{m i , m j }, see Figure 6.
Given an estimation of m, the DI is estimated using the k-NN approach with k = 5. We use the estimator presented in [12] which extends the MI estimator derived in [13]. A detailed analysis of the procedure for estimating the pair-wise DI is out of the scope of this document. We refer the reader to [14,15,16] for a detailed description and discussion of the estimation method and its properties.
Recall that in addition to estimating the DI, the proposed algorithm also estimates the GC measure. This is done by using the Multi-Variate Granger Causality (MVGC) toolbox [17], as discussed in [9].

Generating the Causal Influence Graph
The graph G is constructed by defining each node to correspond to a recording electrode and each weight of an edge to quantify the respective causal influence. More precisely, let [X] [:,i] denote the i th column of the matrix X. Then, the weight of the edge between the i th node and the j th node in G is given by: whereÎ(X → Y) denotes the estimation of DI (or GC). Note that the diagonal elements of G are set to zero, namely, there is no causal influence from a signal to itself. We conclude this supplementary document with the observation that as [G] i,j is estimated from a finite number of samples, one may want to assess the statistical significance of this estimation. A lack of statistical significance may imply that there is no causal influence between the considered signals, and the estimated values are due to either noise or due to estimation error. In such a case one may choose to set [G] i,j = 0. While for estimating GC there are known methods for quantifying the statistical significance [17,Sec. 2.5], such a method is not known for the problem of estimating DI (when the time-series samples are taken from a discrete alphabet [18] derived an asymptotically optimal null-distribution of the DI). An alternative method for evaluating the statistical significance is via a non-parametric bootstrapping procedure in the spirit of [19]. The main drawback of such a procedure is the tremendous increase in computational complexity, since applying such a bootstrapping procedure amounts to multiplying the computational complexity by a factor of at least 20 (note that the estimation of the DI values constitute the main computational load of the algorithm). For this reason, the statistical significance of a given pair-wise estimation (for the case of DI) is not evaluated. Instead, the algorithm uses the post-processing phase, as discussed in the Methods section in the main manuscript.