Extracting Multiscale Pattern Information of fMRI Based Functional Brain Connectivity with Application on Classification of Autism Spectrum Disorders

We employed a multi-scale clustering methodology known as “data cloud geometry” to extract functional connectivity patterns derived from functional magnetic resonance imaging (fMRI) protocol. The method was applied to correlation matrices of 106 regions of interest (ROIs) in 29 individuals with autism spectrum disorders (ASD), and 29 individuals with typical development (TD) while they completed a cognitive control task. Connectivity clustering geometry was examined at both “fine” and “coarse” scales. At the coarse scale, the connectivity clustering geometry produced 10 valid clusters with a coherent relationship to neural anatomy. A supervised learning algorithm employed fine scale information about clustering motif configurations and prevalence, and coarse scale information about intra- and inter-regional connectivity; the algorithm correctly classified ASD and TD participants with sensitivity of and specificity of . Most of the predictive power of the logistic regression model resided at the level of the fine-scale clustering geometry, suggesting that cellular versus systems level disturbances are more prominent in individuals with ASD. This article provides validation for this multi-scale geometric approach to extracting brain functional connectivity pattern information and for its use in classification of ASD.


Simulation Setting
In this section two sets of simulated data are offered on which we display the effectiveness of the proposed method. To mimic the brain fMRI data, two groups, autistic and control are simulated with 29 subjects in each of them. A 106 × 106 matrix is simulated to represent the brain activity correlation of each subject.
Anatomically, the 106 regions of interest (ROI's) are be classified into 11 clusters based on their physical locations in the brain (In the simulation study, ROIs of vermis are grouped together as a separate cluster). Compared to a pair of two ROI's in different clusters, a pair of ROI's within the same cluster is more highly correlated . Within each of the clusters, usually the pair of the ROI's with same functions (one in the left hemisphere and the other one in the right) is even more highly correlated. To accurately include this characteristic in the simulated data, a multi-scaled correlation matrix for each subject is needed. Another important property to be included in the simulation is the difference between ASD group and TD group. In each group, the correlation matrices are not exactly the same, but slightly different from subject to subject. However, the correlation differences between the two clinic groups are assumed to be more dominant.
Here we offer the approach by which we generate the first set of correlation matrices: Step -1, Generate two 106 × 106 distance matrices. One of them represents the distance measure on the ROI's from the control group, and the other one for the autistic group. To reflect the multi-scale characteristic, the distance between two ROI's in distinct anatomical regions is assumed to be from a normal distribution N (5, 1). In contrast, the distance between two ROI's within an anatomical region is to be from N (2, 0.5) which is expected to be smaller. At the finest scale, the distance between a left-right ROI pair is expected to be even shorter, from distribution N (1, 0.2).
To illustrate the difference between two clinical groups, we break the ROI connections at different probabilities. By breaking the connection between two ROI's which are in the same anatomical region, the distance of the connection is reset to be from N (5, 1). The breaking probabilities of the two groups are listed in Table 2.  All the L-R pairs are broken at probability 0.2. If broken the distance between the L-R pair is drawn from N (2, 0.5).
Step -2, Generate the correlation matrices They are generated by applying data cloud geometry. For each distance, 29 ensemble matrices are generated at temperatures which are uniformly chosen from 0.3 to 0.7. The elements in the ensemble matrices are between 0 and 1, representing the correlations. Different temperatures yield correlations at distinct scales which are observed from the real fMRI data.
The second set of data are generated in the same fashion except that the two distance matrices are closer, with more similar breaking probabilities between the controls and the autistics. The breaking probabilities are given in Table 3.  8 We use the ratio in item 6 and ratio of number of new motifs to two groups as the predictors in a leave-one-out cross-validation logistic regression.
9 Sensitivity and specificity are then calculated.

Classification
We extract the motifs as illustrated in the procedures above; Sensitivity and specificity are calculated based on leave-one-out cross validation logistic regression, and we obtain 100% sensitivity and specificity for all these simulated data sets, which reveals that multi-scale geometry works well in detecting the systematic difference between groups.

Clustering as a backup
We define the distance between two hierarchical clustering trees of two subjects A and B (see procedure above, item 3) as d AB = 1 − #{matched motifs in (A, B)} max{#motifs in A, #motifs in B} .
Then we obtain the distance matrix of 58 subjects in the simulated data. The hierarchical clustering tree is shown below: From the tree, we notice all the 29 subjects in the first group are grouped together and the left 29 subjects are grouped together, which is consistent with the classification results in which we have 100% sensitivity and specificity respectively.