The Frequent Complete Subgraphs in the Human Connectome

While it is still not possible to describe the neural-level connections of the human brain, we can map the human connectome with several hundred vertices, by the application of diffusion-MRI based techniques. In these graphs, the nodes correspond to anatomically identified gray matter areas of the brain, while the edges correspond to the axonal fibers, connecting these areas. In our previous contributions, we have described numerous graph-theoretical phenomena of the human connectomes. Here we map the frequent complete subgraphs of the human brain networks: in these subgraphs, every pair of vertices is connected by an edge. We also examine sex differences in the results. The mapping of the frequent subgraphs gives robust substructures in the graph: if a subgraph is present in the 80% of the graphs, then, most probably, it could not be an artifact of the measurement or the data processing workflow. We list here the frequent complete subgraphs of the human braingraphs of 414 subjects, each with 463 nodes, with a frequency threshold of 80%, and identify 812 complete subgraphs, which are more frequent in male and 224 complete subgraphs, which are more frequent in female connectomes.


Introduction
Diffusion MRI-based macroscopic mapping of the connections of the human brain is a technology that was developed in the last 15 years [1,2,3,4]. Applying the method, we are able to construct braingraphs, or connectomes, from the diffusion MRI images [1,5,6]: the vertices of the graph are anatomically labeled areas of the gray matter (called "Regions of Interests", ROIs), and two such ROIs are connected by an edge, if a complex workflow, involving either deterministic or probabilistic tractography, finds axonal fibers between the two ROIs. Therefore, one can construct graphs, with up to 1015 nodes and several thousand edges, from the MR image of each subject.
The analysis of these graphs is a fast-developing and an important area today: these connections form the "hardware" of all brain functions on a macroscopic level [2,3,4]. Naturally, it would be exciting to map the neuronal scale human connectome, too: here the nodes were the individual neurons, and two nodes (or neurons), say X and Y, would be connected by a directed edge, say (X, Y ), if the axon of X were connected to a dendrite of Y . Unfortunately, to date, the neural-level connectome of only one adult organism is described: that of the nematode Caenorhabditis elegans with 302 neurons, in the year 1986 [7]. In larval state, two more neuronal level connectomes are published: the larva of the fruitfly Drosophila melanogaster [8], and the tadpole larva of the Ciona intestinalis [9]. Despite of some exciting, very recent developments [10], the complete connectome of the adult Drosophila melanogaster with 100,000 neurons, is not determined yet. Humans have 80 billion neurons in their brains. Therefore, the mapping and the analysis of the neuronal scale human connectome is out of our reach today.
There are numerous results published for the analysis of the diffusion MRI-computed human connectomes, e.g., [1,2,3,4]. Our research group has also contributed some more graphtheoretically oriented analytical methods, like the comparison of the deep graph theoretical parameters of male and female connectomes [11,12,13], the parameterizable human consensus connectome [14,15], the description of the individual variability in the connections of the major lobes [16], the discovery of the Consensus Connectome Dynamics [17,18,19,20] the description of the frequent subgraphs of the human brain [21], and the Frequent Neighborhood Mapping of the human hippocampus [22].
The reasons of our graph-theoretical approach are listed as follows: (i) Graph theory has a long history of great successes, starting with the paper of Leonhard Euler in 1741 on Königsberg's bridges [23].
(ii) "Pure mathematical" graph theory and its applications in computer engineering reached an exceptionally high level of development in the late XX th and in the early XXI st century, just mentioning three famous examples: the Strong Perfect Graph Theorem [24], Szemerédi's Regularity Lemma [25], and the intricate parallel algorithms for multiprocessor routing in [26].
(iii) Graph theoretical definitions and notations are well-developed, clear, and usually catch the deep and most relevant properties of the networks examined.

Frequent edges and subgraphs: a robust analysis
The data acquisition and processing workflow, whose results are the braingraphs or structural connectomes, has numerous delicate steps. Naturally, errors may occur in MRI recordings and processing, as well as in segmentation, parcellation, tractography and graph computation steps [1,27,28]. When we have hundreds of high-quality MR images, we can analyze the frequently appearing graph edges or subgraphs, in order to derive robust, reproducible results, appearing in high fraction of the brains imaged. By analyzing only the frequently appearing structural elements, the great majority of data acquisition and processing errors will be filtered out.
Our first effort for describing frequent edges in human connectome was the construction of the Budapest Reference Connectome Server [14,15], in which the user can select the frequency threshold k% of the edges, and the resulting consensus connectome contains only those edges, which are present in at least k% of the subjects. The generated consensus connectome can be both visualized and downloaded at the site https://pitgroup.org/connectome/.
The frequent, connected subgraphs of at most 6 edges are mapped in the human connectome in [21]. The frequencies were compared between female and male connectomes, and strong sex differences were identified: there are connected subgraphs, which are significantly more frequent in males than in females, and there are a higher number of connected graphs that are more frequent in females than in males.
The direct connections of important brain areas are of special interest: correlations between the present or missing connections and psychological tests results or biological parameters may enlighten the fine structure-function relations of our brain. For error-correction reasons, the frequent neighbors of the relevant brain areas form the robust objects of study: small errors in the data processing workflow will most probably have no effects on the frequent connections. In our work [22], we have introduced the method of the Frequent Neighborhood Mapping, which describes the frequent neighbor sets of the given nodes of the braingraph. In [22], we have demonstrated the method by mapping the frequent neighborhoods of the human hippocampus: one of the most deeply studied part of the brain. We have mapped the frequent neighbor sets of the hippocampus, and we have found sex differences in the frequent neighbor sets: males have much more frequent neighbor sets of the hippocampus than the females; therefore, the neighborhoods of the men's hippocampi are more regular, with less variability than those of women. This observation is in line with the results of [11,12,13], where we have shown that the female connectomes are better expander graphs than the braingraphs of men.
In the present contribution, we are mapping the frequent complete graphs of the human connectome, based on the large dataset of the Human Connectome Project [29]. Our dataset contains the braingraphs of 414 subjects. A recently appeared work [30] deals with complete subgraphs in braingraphs of 8 subjects, each with 83 nodes. Our results are derived from 414 braingraphs, each of 463 nodes. Therefore, we are able to find frequent structures, i.e., frequent complete subgraphs in our dataset of 414 graphs (while it is not feasible to derive frequent structures from 8 graphs).

Cliques vs. complete subgraphs
Here we intend to clarify some graph theoretical terms. A complete graph on v vertices contains (undirected) edges, connecting all the v 2 = v(v−1)/2 vertex-pairs: that is, in a complete graph, each pair of vertices are connected by an edge.
If we have a graph G on n vertices, we can look for the complete subgraphs H of G: all the vertices and the edges of H need to be vertices and edges of G (i.e., H is a subgraph of G), and, moreover, H needs to be a complete graph.
The complete subgraph of the maximum vertex-number of G is called a clique. The clique number of graph G, ω(G), is the number of vertices in the largest complete subgraph of G. Computing the clique number ω(G) is a well-known hard problem: it is NP-hard [31], that is, it is not probable that one could find a fast (i.e., polynomial-time) algorithm for computing ω(G). Moreover, in general, not only the exact value of ω(G) is hard to compute, but it is also very difficult to approximate, even roughly [32]. In special cases, however, when the number of the vertices is only several hundred, and the graph is not too dense, that is, it has not too many edges, then all the frequently appearing complete subgraphs can be computed relatively quickly by the apriori algorithm [33,34]. The computational details are given in the Materials and Methods section.
Our goal in the present contribution is to map the frequently appearing complete subgraphs in human connectomes. We need to make clear that our analysis is done on 463-vertex braingraphs. Therefore, if a complete subgraph is found, it does not imply the neuronal level existence of complete subgraphs. It implies, however, that the macroscopic ROIs, corresponding to the vertices of the complete graphs discovered, are connected densely to each another, probably even on the neuronal level.
In the literature one may find numerous references to the "rich club property" of some networks, related to the braingraph [35,36]. Here we prefer using classical graph theoretical terms instead of this "rich club property", consequently, we intend to map those densely connected subgraphs of the human connectomes, which form complete graphs, and appear in at least the 80% of the all braingraphs considered.

Discussion and Results
First we review the frequent complete subgraphs of the human braingraph, next we analyze the significant differences in their frequencies in males and in females.

Frequent complete subgraphs of the human connectome
Supporting Table S1 contains the complete subgraphs of the human connectomes appearing in at least 80% of the graphs of the 414 subjects examined. In each row, the vertices of the complete subgraphs are listed, together with their frequencies of appearance. Note, that the vertices of a complete graph uniquely determine its edges. The list is redundant in the following sense: if a k-vertex complete graph has frequency at least 80%, then all of its complete subgraphs are also listed. We find that this redundancy helps in the analysis of the results, as it will be clear from what follows.
We would like to emphasize the following very simple, but powerful fact: If a given subgraph U has a frequency, say %, then all subgraphs of U has frequency at least %. This is the central point in the apriori algorithm [33,34], and it was noted and applied in [21,22].

Complete subgraphs appearing in each subject
Here we list the maximal complete subgraphs from supporting Table S1, which are present in all of the braingraphs, and contains at least three nodes: Note that L1 and L2 correspond to R1 and R2, and L3 almost corresponds to R3. Complete graph R4 has no correspondence in the left hemisphere (which are present in each subject), but in the left hemisphere, the superiorfrontal regions are also connected densely, as one can verify easily from Table S1.
We believe that the connections between the above-listed areas are very strong in each subject: so strong that they are not affected by measurement errors and individual variability.

The largest frequent complete subgraphs
The largest complete subgraphs, which are present in at least the 80% of the subjects, have seven vertices, and they are located in the left hemisphere. The first one connects the left putamen with six vertices in the left frontal lobe (B1), the second one connects the left caudate and the left putamen ROIs to five left frontal areas (B2): There are 48 different 6-vertex complete subgraphs, which are present in at least 80% of the connectomes. Only 6 of these are situated in the right hemisphere, the other 42 are in the left hemisphere. Hippocampus  187  247  Amygdala  66  99  Thalamus-Proper  265  175  Putamen  1041  673  Pallidum 149 123 Table 1: The number of appearances of the hippocampus, the amygdala, the thalamus-proper, the putamen and the pallidum in the frequent complete subgraphs, with a frequency threshold of 80%, in each hemisphere. The right hippocampus and the right amygdala are present in much more complete subgraphs than the left ones; the left thalamus-proper, the left putamen and the left pallidum are present in much more complete subgraphs than the right ones.

Complete subgraphs across the hemispheres
Since the neural fiber tracts, connecting the two hemispheres of the brain, are very dense in the corpus callosum, their tractography in the diffusion MR images is difficult since the fiber-crossings cannot always be tracked reliably [40,41].
We have found only relatively few frequent complete subgraphs of the human connectome, which have nodes from both hemispheres. Here we list those, which are present in more than 80% of the braingraphs studied; therefore, they are most probably not false positives. Again, we are listing only the maximal complete subgraphs for clarity. We note that most ROIs in the list are the parts of the striatum: each complete subgraph contains either a caudate nucleus or a nucleus accumbens of either the right-or the left hemisphere: Counts of the hippocampus, thalamus, putamen, pallidum and the amygdala in the frequent complete subgraphs In this section we count the appearances of certain ROIs in the frequent complete subgraphs, with a frequency threshold of 80%. Our results show that there are considerable differences between the hemispheres in these numbers: The right hippocampus and the right amygdala are present in much more complete subgraphs than the left ones; the left thalamus-proper, the left putamen and the left pallidum are present in much more complete subgraphs than the right ones (Table 1).

Sex differences
Mapping sex differences in the human connectome is a hot and fast-developing area of research. In our earlier works we have shown -first in the literature -that in numerous well-defined graph theoretical parameters, women have "better connected" braingraphs than men [11,12,13]. In the work [21] we have mapped the frequent subgraphs of the human brain of at most 6 vertices, and have found sex differences: there are numerous frequent connected subgraphs, which are more frequent in men than in women, and, similarly, which are more frequent in men than in women. In the study of [22], we have mapped the neighbor-sets of the human hippocampus and found also significant sex differences in these sets.
Here we compare the frequencies of the complete subgraphs of the connectomes of men and women. We have found significant differences in the frequencies of some complete subgraphs, with the advantage at men and women, too.
We have found much more complete subgraphs with significantly higher frequency in men than in women. More exactly, Supporting Table S2 lists 224 complete subgraphs, with significantly higher frequency in females than in males, while Table S3 lists 812 complete subgraphs, where their frequencies in males were higher than in females (with p=0.01, and the inclusion threshold was a minimum 80% for the larger frequency).
This observation, in a sense, shows that men's connectomes show less inter-personal variability in complete subgraphs than those of women. This observation is in contrast with our findings in [21], where we have shown that women have much more 6-vertex frequent subgraphs than men: but in [21] we required connectedness, and not completeness.  Table S1 contains all the complete subgraphs with frequency of at least 80%, Table S2 contains the complete subgraphs, where the frequency of their appearance in females is significantly higher (p=0.01) than in males; Table S3 contains those, where the frequency is significantly higher in males than in females. The supporting tables are available at http://uratim.com/cliques/tables.zip.

The Data Source and the Graph Computation
The data source of the present study is the website of the Human Connectome Project at the address http://www.humanconnectome.org/documentation/S500 [29]. The dataset contains the HARDI MRI data of healthy human young adults between the ages of 22 and 35 years.
Further particularities of the graph processing workflow are described in [6], where the http:// braingraph.org repository is also given. The braingraphs, analyzed here, can be accessed at the https: //braingraph.org/cms/download-pit-group-connectomes/ site, choosing the "Full set, 413 brains, 1 million streamlines" option.

The Algorithm
In general, in a graph G, finding the size of the largest complete subgraph, called the clique-number, and denoted by ω(G), is a hard problem: it is NP-hard [31]. Naturally, finding the largest complete subgraph itself cannot be easier than finding its size ω(G); therefore, it is also hard.
Finding the largest complete subgraphs in sparse graphs (i.e., graphs with relatively few edges, compared to the number of its vertices) is usually not a very difficult task, since in these graphs, regularly, there are not too many large complete subgraphs. Finding only the frequently appearing complete subgraphs further simplifies the computational tasks, and we can apply an algorithm, which resembles in many points to the apriori algorithm for finding frequent item sets [34,33], and this algorithm is very fast in the practice. Now we describe the algorithm: A frequent complete subgraph is characterized by the list of its vertices and the set of its edges. At the beginning, for an (undirected) edge (vi, vj), let these two lists be given as ([vi, vj] In general, the vertices of the complete subgraphs are listed in the increasing order of their indices, and the vertices of each edge are listed also in the increasing order of its indices; otherwise, the particular order of the edges is indifferent, since they are elements of an unordered set, and they are stored also as a set. Now we describe the generating, "apriori" step. Let be two frequent complete subgraphs of size k. If the first k − 1 vertices of L1 and L2 are the same, and the last ones differ, we will consider generating a new, k + 1-vertex complete graph, as follows: if v1 = u1, v2 = u2, ...v k−1 = u k−1 and v k = u k , then, by the notation v k+1 = u k , we verify the suitable frequency of the complete graph L = ([v1, v2, . It is easy to see that in the edge list only the last one, (v k , v k+1 ) is new, all the others are already the edges of the frequent subgraphs L1 or L2.
In generating L one needs to make sure that the vertices in the vertex-list are ordered by their indices, and that the frequency of L is above the inclusion threshold.

Statistical Notes
The frequent complete subgraphs were chosen in the following way: First, the subjects were partitioned into two disjoint sets, by the parity of their ID number's second digit from the right. Next, in both sets, the complete graphs with the minimum frequency of 80% were identified, as it was described in the previous section. Only those complete subgraphs were retained, which have had a minimum frequency of 80% in both sets under consideration. Then the frequency of these subgraphs were re-calculated for the whole dataset: these frequencies are given in the supplementary tables.
In the computation of sex differences, we have applied χ 2 tests to identify significant differences in the frequencies of the complete subgraphs. Our null hypothesis was that the frequencies are the same in males and females, and we refute this hypothesis with p=0.01. The secondary statistical errors were handled by Holm-Bonferroni corrections [43]. The un-corrected and the corrected p values are listed in supplementary Tables S2 and S3.

Conclusions
By an apriori-like algorithm, we have mapped the frequent (>80%) complete subgraphs of 414 subjects, each with 463 vertices. The largest frequent complete subgraph has 7 vertices. Most of the largest frequent subgraphs are located in the left hemisphere. We have also identified the frequent complete subgraphs, containing vertices from both hemispheres, and identified complete subgraphs with significant frequency-differences between the sexes. We have found that men have much more frequent complete subgraphs than women: this result contrasts our earlier finding [11], where we have shown that women have much better connectivity-related parameters in their connectomes than men in a following sense: while women have better connected braingraphs than men (as it is very precisely described in [11]), the dense subgraphs of men show less inter-individual variability than in women.
The braingraphs, computed by us, can be accessed at the https://braingraph.org/cms/ download-pit-group-connectomes/ site, by choosing the "Full set, 413 brains, 1 million streamlines" option. Here we have used exclusively the 463-node graphs.
The Supplementary Tables are available on-line at the address http://uratim.com/cliques/tables. zip. Table S1 contains the list of all the complete subgraphs of 414 human connectomes with a minimum frequency of 80%. Table S2 contains the complete subgraphs, where the frequency of their appearance in females is significantly higher (p=0.01) than in males; Table S3 contains those, where the frequency is significantly higher in males than in females. In both Tables S2 and S3 a frequency cut-off 80% is applied to the larger frequency of the appearance in the sexes: only those significant differences are listed, where the larger of the frequencies of males and females are at least 80%.