Learning about learning: Mining human brain sub-network biomarkers from fMRI data

Modeling the brain as a functional network can reveal the relationship between distributed neurophysiological processes and functional interactions between brain structures. Existing literature on functional brain networks focuses mainly on a battery of network properties in “resting state” employing, for example, modularity, clustering, or path length among regions. In contrast, we seek to uncover functionally connected subnetworks that predict or correlate with cohort differences and are conserved within the subjects within a cohort. We focus on differences in both the rate of learning as well as overall performance in a sensorimotor task across subjects and develop a principled approach for the discovery of discriminative subgraphs of functional connectivity based on imaging acquired during practice. We discover two statistically significant subgraph regions: one involving multiple regions in the visual cortex and another involving the parietal operculum and planum temporale. High functional coherence in the former characterizes sessions in which subjects take longer to perform the task, while high coherence in the latter is associated with high learning rate (performance improvement across trials). Our proposed methodology is general, in that it can be applied to other cognitive tasks, to study learning or to differentiate between healthy patients and patients with neurological disorders, by revealing the salient interactions among brain regions associated with the observed global state. The discovery of such significant discriminative subgraphs promises a better data-driven understanding of the dynamic brain processes associated with high-level cognitive functions.

The network state mining methodology we adopt works with categorical labels of the global states, i.e. it is formalized as an instance of classification as opposed to one of regression. While estimating continuous global scores (regression) might be of interest as well, here we focus on identifying biomarkers that differentiate between sessions in which subjects progressively increase the speed of completion of the visual cues (high learning rate sessions) from those in which no acceleration is observed (low learning rate sessions). We adopt the slope of the movement time over trials practiced as a measure of the learning rate. Movement time is the time between the first and last button press of the 12-note sequence. A larger slope indicates that the subject is learning well, while a smaller slope indicates that the subject is not learning as well [1].
The slopes for the 18 subjects are listed in Table 1. The coloring of scores in the table are based on a threshold of −0.06 (which we discuss in more detail in the next paragraph) used to convert the continuous slopes into the categorical low versus high learning rate states. Slopes smaller than the chosen threshold correspond to high learning rate (blue) and slopes larger than the threshold correspond to low learning rate (yellow). In the first session, 13 of the subjects are learning the motor task as their rates are below the threshold, while 5 of the subjects are not learning as fast. We focus on the fast early changes in the learning process as posited in the theoretical framework of Doyon et al. [2]. Thus the 13 subjects that were labeled as being in a high learning state in the first session were discarded from the second and third session data sets. In the third session, 1 subject transitions from a low to a high learning rate state while the remaining 4 low learning rate subjects remain in the low learning rate state. The confidence intervals of the slope included 0 for subject 16 in Session 2 and subject 1 in Session 3; hence, these sessions were excluded as no significant learning occurred. Note that our estimate of learning rate is independent of how fast the the subject performs at the beginning of the session, a feature which is often largely driven by biomechanics rather than the ability to learn. Instead, we focus on the improvement of individual performance over time, which is attributed to early-stage motor skill learning [2]. Table 1. Slopes of movement time versus trials practiced for each subject and each experimental session. High-learning rate sessions are colored blue and low learning rate sessions are colored yellow. All but five subjects exhibit a high learning rate in the first session. One of those five subjects transitions into a high learning rate in their third session. Slopes for two of the sessions (Subject 1 Session 3 and Subject 16 Session 2) were negative and thus indicated worsening of performance (colored gray). We selected a threshold of −0.06 on the learning rate to delineate high versus low learning rate states. To choose this threshold, we used a clustering approach with perturbations and we estimated the optimal threshold between states based on the  stability of multiple clustering solutions. To begin, we randomly perturbed the measured learning rates from all valid sessions by adding Gaussian noise with a magnitude of 3 standard deviations of the original learning rates to each parameter value. We created 100 perturbed instances of the measured learning rates and clustered them using k-means clustering (at k = 2). We compared the different clustering solutions using the Jaccard index as a measure of similarity. A clustering is considered "stable" if it is approximately maintained across multiple perturbations. We observed that the vast majority of clustering pairs had a Jaccard index of 1 (perfect similarity) and −0.06 is the middle point separating the points in the obtained clusterings. We show the distribution of the original learning rate values in Fig. 1(a) and the distribution of the middle point between two perturbed clusters in Fig. 1(b). Both figures single out −0.06 as an optimal threshold.
Using the selected threshold and discarding sessions preceded by high learning rate sessions of the same subject, we obtained 14 high learning rate session-specific networks and 13 low learning rate session-specific networks (color-coded with blue and yellow respectively in Tab. 1). We applied our discriminative biomarker approach considering all 27 networks simultaneously together with their global learning state labels.
Multiple thresholds, and thus, multiple classes corresponding to levels of learning rate, can be selected similarly using clustering. Such modeling of more than two classes is possible within our SNL learning framework as we discussed earlier. Note that the two meta-networks described earlier, will still be suitable in this case since their topology is determined by network samples having the same or different global state values and not on their specific class labels. The number of dimensions for the subspace in which SNL projects the original data will also have to be adjusted to be one less than the number of classes. Moreover, as there will be a natural order of the classes (e.g., gradients of learning rate), the penalty for rendering different-class instance pairs close should be scaled according to the distance between their classes in that natural order. The limited number of subjects and training sessions in our data, however, drove the decision to work with two classes for this specific evaluation task.

Discriminative subspace mining: parameters and comparative analysis
As a baseline, we compared the testing accuracy of SNL with that of an SVM classifier that works with the same instances, but which is oblivious to the network structure. All 6216 features were given to the SVM while performing cross validation. We also used the linear-SVM as the classifier for SNL but it performed on the subspace spanned by the subnetworks selected by the SNL. The original data set of 27 samples was randomly divided into 9 folds and in the cross validation (CV) model, we used each of the 8 folds as a training data set while the left-out fold was used as an independent test set. This 9-fold cross-validation was further performed 5 times with different random data splitting and we report the averaged prediction accuracy from 45 testing folds. While the conventional SVM technique achieved 76% testing accuracy using all features, SNL achieved 81% using only the top 50 selected features.

Conserved connected discriminative subgraphs
The result of SNL's discriminative subspace learning was a set of 45 edge subsets, one from each training version. Each of those selected edge subsets did not necessarily form a single connected component. There were a total of 456 connected components in the 45 selected edges sets. The maximum number of occurrences for any subgraph of the selected edges was 45 and hence when we mined the space of subgraphs for conserved ones, we needed to define a support threshold, i.e. a number of occurrences out of the 45 possible. In this analysis, we began with a very high threshold and lowered it to obtain a larger number of candidate biomarkers to be further tested for significance. Table 2 lists the number of frequent subgraphs in the set produced by SNL for decreasing frequency. For conservative (high) thresholds, we obtained a small number of candidates but importantly, we could easily miss candidates of lower frequency, whose accuracy may not have been significant. As we lowered the threshold, the number of candidates increased drastically, since the space of subgraphs increased exponentially. In this process of varying thresholds, we observed that the set of subgraphs with significant predictive accuracy (q-value ≤ 0.015) initially grew and for thresholds lower than 15 occurrences out of 45, we do not obtain new significant subgraphs. The reason for this phenomenon is that at lower thresholds we had many subgraphs of high overlap that satisfied the threshold, but whose individual accuracy was not significant.    instances using only the subset of features corresponding to included edges. We used an SVM classifier with polynomial kernel for this analysis. Our goal was to select only connected subgraphs, whose accuracy was significant. Hence, we needed a background model for the accuracy levels we could expect at random for connected subgraphs. Since a closed form solution to this problem is not available, we resorted to sampling-based estimation of the background model. Note that allowing more features naturally results in higher accuracy, hence we defined background models for subgraphs of specific sizes. We sampled 1000 subgraphs of a fixed size k using a random walk based sampling that ensured that every connected subgraph had an equal chance of being included in our sample by following the techniques in [3]. We then computed the accuracy for every subgraph in the sample by building SVM classifiers. We quantified the p-value of our conserved subgraph's accuracies (obtained by frequent subgraph mining) in the background models corresponding to their individual sizes. To correct for false discovery rate (FDR), we then computed q-values based on the obtained p-values, following the protocol from [4]. The final set of significant subgraphs with q-value ≤ 0.015 is described in the following section.

Correlation of biomarker edge coherence and learning rate
In this section we provide additional visualization of the correlation of biomarker edges strengths with learning rate Fig. 2. Thickness of individual edges is proportional to the absolute value of the correlation with learning rate, while colors denote the correlation sign: red for negative and blue for positive. The same information is also quantitatively visualized in Fig. 3.

Full set of discovered biomarkers
The full set of biomarkers falling in the two biomarker regions discussed in the experimental section are presented in Table 3. Their individual accuracy (ACC) varies between 74% and 85% and it is significant: q-value ≤ 0.015. Most biomarkers have no loops, only the last two feature loops involving the right and left occipital poles, and left intracalcarine and supercalcarine cortices. The sizes of the biomarkers in terms of number of edges |E| and number of nodes |V | vary between 1 and 5 and between 1 and 6 respectively.

List of brain region name abbreviations
The list of all brain regions and their abbreviations used in figures throughout the manuscript is presented in Table 4.