Extracting Labeled Topological Patterns from Samples of Networks

An advanced graph theoretical approach is introduced that enables a higher level of functional interpretation of samples of directed networks with identical fixed pairwise different vertex labels that are drawn from a particular population. Compared to the analysis of single networks, their investigation promises to yield more detailed information about the represented system. Often patterns of directed edges in sample element networks are too intractable for a direct evaluation and interpretation. The new approach addresses the problem of simplifying topological information and characterizes such a sample of networks by finding its locatable characteristic topological patterns. These patterns, essentially sample-specific network motifs with vertex labeling, might represent the essence of the intricate topological information contained in all sample element networks and provides as well a means of differentiating network samples. Central to the accurateness of this approach is the null model and its properties, which is needed to assign significance to topological patterns. As a proof of principle the proposed approach has been applied to the analysis of networks that represent brain connectivity before and during painful stimulation in patients with major depression and in healthy subjects. The accomplished reduction of topological information enables a cautious functional interpretation of the altered neuronal processing of pain in both groups.

electrical stimulus on a scale ranging from 0 to 6 (0 --no sensation; 1 --just perceived, not painful; 2 -clearly perceived, but not painful; 3 --low pain; 4 --moderate pain; 5 --strong pain, but tolerable; 6 -unbearable pain) [1,2] . The pain threshold was defined as the intensity yielding a sensation described as a sharp painful pinprick, corresponding to a rating of "3". The EEG was recorded continuously during the electrical stimulation from 60 electrodes, referenced to Cz, using a standard EEG cap (Easy Cap, Falk Minow Services, Germany) based on an extended International 10--20 system. Finally, data from nine selected electrodes F3, Fz, F4, C3, Cz, C4, P3, Pz and P4 (re--referenced to a linked ears reference) were processed. These electrodes are situated above some of the important regions of pain processing, attention and depression (frontal, central, and parietal brain regions). Eye movement and mastication muscle activity artifact contaminated single trial somatosensory evoked potentials (SEPs) were excluded, which resulted in an exclusion of three data sets since there were not enough artifact--free trials left for a reliable connectivity analysis. In order to compare the pre--and post--stimulus condition, signal sections of 700 duration were extracted pre--stimulus onset (700 before onset to the onset of stimulus, i.e. −700 to 0 ) as well as post--stimulus onset (from stimulus onset to 700 after stimulus onset, i.e. 0 to 700 ). These signal sections provided the data basis for the connectivity analysis. To assess the effective connectivity between each ordered pair of nine electrodes, the generalized partial directed coherence (gPDC) [3] was applied. The frequency range of interest for the SEP analysis was determined to be in the delta--, theta--and the alpha--bands (1 to 13 ) since the signal power is mainly situated in this frequency range. For a consolidated analysis the gPDC values of the corresponding frequencies were pooled to one quantity by averaging with respect to the frequency range of interest. Thus, one gPDC value results for each of the 72 possible directed interactions. Finally, the effective connectivity that we are interested in is given by significantly increased gPDC values. The significance test was conducted by the Bootstrap procedure introduced in [4] at a significance level of 5%. In the present study, we generally used 1500 Bootstrap repetitions. A drawback of this approach might be that the gPDC distribution under the null hypothesis 'no interaction' cannot be guaranteed. In this case, the significance threshold cannot be determined. With respect to the entire sample, we detected this situation in 3.48% of all possible directed interactions. We registered and treated the connections as missing values in these cases. By treating significant gPDC values as existent interactions of the respective subject, we modeled each subject's EEG recorded neural activity that occurred over the period of the corresponding stimulus condition as an ECN.

Imputations of missing values
Connectivity analysis is complicated by the fact that the significance threshold cannot be determined in all cases. Even though this situation rarely appears, many networks are partially unobserved. In particular, only 12.5% of the networks are not affected by at least one missing value. As a consequence, traditional analysis (listwise deletion of networks with missing values) would discard a huge amount of useful information of the data. Besides this, excluding networks implies that the remaining networks are a random subsample of the original data set (i.e. the data are missing completely at random (MCAR)). This is clearly a very strong assumption and it is likely that it is not fulfilled for our data. Therefore, estimates and inferences derived from only our fully observed networks would be biased. In the literature, different imputation strategies exist to overcome the problem of missing values [5]. Since there is no superior approach concerning all possible missing data mechanisms, a reasonable imputation strategy according to the specific data set has to be chosen.
Because it seems implausible that observed variables contain useful information to predict missing values (i.e. data are missing at random (MAR)), we performed extreme case imputation: First, all missing directed edges are imputed as no interactions, while in the second data set they are treated as interactions. Both data sets are analyzed independently, and only subnetworks that are significant in both cases are considered as locatable characteristic topological patterns. This is a conservative approach to deal with the problem of missing values, and it keeps the effect of the imputations on topological pattern selection as small as possible.

Effective connectivity networks
Effective connectivity networks are directed graphs, which have identical pairwise different vertex labels. They consist of a non--empty finite set of vertices -the EEG channels-and a finite set of ordered pairs of distinct vertices called arcs or directed edges, which represent directed interactions between each electrode pair. Every vertex is associated with the position of its recording EEG--electrode, which becomes its label. The ordered pair ( ! , ! ) denotes a directed edge that leaves vertex ! and enters vertex ! . The vertex ! is called the tail and vertex ! the head of the edge. Each directed edge is given by a corresponding significant gPDC value. An ECN might be represented by its adjacency matrix . Since we computed effective connectivity between signals from nine selected electrodes, each ECN has a vertex set of cardinality = 9 and the adjacency matrices have size 9×9 where !" = 1 if and only if the ECN comprises the directed edge ( ! , ! ). Accordingly, a bidirectional edge running between two vertices ! and ! is indicated by two entries !" = 1 and !" = 1 in . Most ECNs have dense connection patterns. The mean number of directed edges in our samples of ECNs is 36.8 out of 72 potential edges. Since ECNs are simple directed graphs they do not contain loops (edges whose tail and head coincide) or multiple edges (multi--edges -edges that have the same tail and the same head). With two exceptions, all ECNs are connected networks.
In our study we examined eight samples of ECNs. These eight samples result from all combinations of the group assignment (MD -patients with major depression, HC -healthy control subjects), the time window with respect to the stimulus condition (pre -time window before stimulation, post -time window following stimulation) and the stimulated side (right - right hand, left - left hand). The size for the "MD--post--right" sample is fifteen; where for all other samples the size equals sixteen.

Estimation of an upper bound for the number of networks with prescribed degree sequence
We obtained a lower bound for the number of pairwise different networks with the prescribed degree sequence of the representative ECN by means of simulations. To put it into perspective we were also interested in estimating an upper bound. Since it has 37 directed edges, obviously, there cannot exist more than 2 !" square matrices of size nine. An only slightly better estimation for the number of such networks is given by !" !" . An appropriate decomposition of the adjacency matrix of the representative ECN yields 131,712,000 as a much better upper bound for the number of matrices with same row and column sums as the adjacency matrix of the representative ECN. The adjacency matrix is binary and its diagonal elements are not considered in the decomposition. Let ! be the sum of the i--th row and let ! be the sum of the i--th column. For the first row there are !