Figures
Abstract
This study investigates the potential of graph neural networks (GNNs) for estimating system-level integrated information and major complex in integrated information theory (IIT) 3.0. Owing to the hierarchical complexity of IIT 3.0, calculating the integrated information and identifying the major complex are computationally prohibitive for large systems. To overcome this difficulty, we propose a GNN model with transformer convolutions characterized by multi-head attention mechanisms for estimating the major complex and its integrated information. For evaluation, we begin by obtaining exact solutions for integrated information and major complexes in systems with 5, 6, and 7 nodes, and conduct two experiments: (1) a non-extrapolative setting in which the model is trained and tested on a mixture of systems with 5, 6, and 7 nodes, and (2) an extrapolative setting in which systems with 5 and 6 nodes are used for training and systems with 7 nodes are used for testing. We then examine the scaling behavior for tree-like, fully connected, and loop-containing graph topologies in larger systems. Although accurate estimation is difficult, our approximate estimates for larger systems generally preserve the qualitative patterns of integrated information and major complex size that are observed in small systems. Finally, based on this observation, we qualitatively analyze a split-brain–like system of 100 nodes. The system consists of two weakly coupled subsystems of 50 nodes each, representing a structurally meaningful, brain-inspired configuration. When the connectivity between the subsystems is low, “local integration” emerges, and a single subsystem forms a major complex. As the connectivity increases, local integration rapidly disappears, and the integrated information gradually rises toward “global integration,” in which a large portion of the entire system forms a major complex. Our analysis suggests that the proposed GNN-based framework provides a practical approach to qualitative analysis of integrated information and major complexes in large systems.
Citation: Hosaka T (2025) Graph neural networks for integrated information and major complex estimation. PLoS One 20(11): e0335966. https://doi.org/10.1371/journal.pone.0335966
Editor: Ping Xiang, Central South University, CHINA
Received: January 13, 2025; Accepted: October 17, 2025; Published: November 7, 2025
Copyright: © 2025 Tadaaki Hosaka. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All code files are available from GitHub at https://github.com/hosaka-t/Graph-neural-network-for-IIT, and the dataset used in this study can be accessed from Zenodo at https://zenodo.org/records/14551717.
Funding: TH was supported by JSPS (Japan society for the promotion of science) KAKENHI Grant Number 23K11790. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. There was no additional external funding received for this study.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Integrated information theory (IIT), proposed by Giulio Tononi, has witnessed significant development over the last two decades, evolving through four major versions [1–7] and becoming a central framework for understanding consciousness. IIT asserts that consciousness emerging in a system is equivalent to the integrated information quantified by Φ, which measures the information in the whole exceeding the sum of the information in its parts. This core principle is a fundamental concept shared across all versions of IIT.
IIT 1.0 [1] introduced the foundational notion that consciousness arises from the integrated information within a system, emphasizing the difference between the whole and its parts. In IIT 2.0 [2–4], this idea was rigorously formalized through the introduction of the “minimum information partition.” It is defined as the division that yields the smallest decrease in information integration relative to the non-partitioned system. However, IIT 2.0 treated the entire system as a single entity for calculating the integrated information without considering that consciousness might be composed of integrated subsystems or exploring any hierarchical structures within the system. By contrast, IIT 3.0 [5,6] introduced a hierarchical approach that analyzes all possible subsystems to elaborate the nature of consciousness more precisely (S1 Text). For every group of nodes within each subsystem, the mechanism-level integrated information, denoted as , is measured according to the principle of the minimum information partition. From the collection of these values of
and their associated probability distributions, the system-level integrated information for each subsystem is then computed as Φ. Finally, the subsystem with the highest Φ across the entire system is identified as the “major complex,” which represents the location of consciousness within the system. The latest version, IIT 4.0 [7], has advanced the theory by introducing the concept of “relations” between node groups to refine how the value of Φ is derived from the collection of
.
This study is based on IIT 3.0. For a system characterized by nodes, edges, node states, and state transition probabilities, the primary objective of IIT 3.0 is to identify the “major complex,” defined as the subset of nodes with the highest integrated information Φ. The most notable advancement from IIT 2.0 to IIT 3.0 is the introduction of the aforementioned hierarchical structure to compute the system-level integrated information, which significantly increases the computational complexity owing to more intricate hierarchical procedures. Deriving integrated information Φ and major complex pose substantial computational challenges, not only because finding the minimum information partition requires evaluating all possible partitions of a target system but also because IIT 3.0 involves multiple nested combinatorial optimization problems, as described in S1 Text. Consequently, the exponential growth in computational complexity with the number of nodes restricts rigorous IIT 3.0 calculations to extremely small systems, typically those comprising fewer than 10 nodes, thereby limiting the applicability of IIT 3.0 to large-scale systems.
Several theoretical and numerical studies have explored the specific characteristics and limitations of IIT 3.0 [8–11], but practical approximation methods have not been proposed thus far. Although efficient approximation methods for identifying optimal partitioning patterns have been proposed in the context of IIT 2.0 [12,13], they cannot be applied to IIT 3.0 due to its complex hierarchical structure. An attempt within the IIT 3.0 framework is the “cut-one” approximation [14], which restricts the search space for the system-level partition to those that isolate a single node. However, this method simplifies only one of the multiple combinatorial optimizations required by IIT 3.0, and the overall computational complexity remains exponential with respect to the number of nodes. To date, no practical approximation method capable of fully bypassing the computational difficulties of IIT 3.0 has been proposed.
To extend the applicability of IIT to larger systems, it is crucial to address its computational challenges. Graph neural networks (GNNs) offer a promising solution to this problem. GNNs are a type of deep learning model designed specifically for graph-structured data, where nodes represent entities and edges represent their relationships. GNNs have been effectively employed in various biological applications, including drug discovery [15], predicting protein interfaces [16,17], modeling neural connectivity patterns in the brain [18], identifying relationships between diseases and genes [19], and classifying lung cancer subtypes from whole-slide images [20]. They have also been applied in other fields such as predicting structural dynamic responses in civil engineering [21], state estimation in power systems [22], weather forecasting [23], recommending personalized items based on user behaviors [24], and analyzing train-bridge coupled systems [25,26]. Although GNNs face challenges in terms of interpretability, they often outperform traditional machine learning methods by accurately extracting complex patterns from data. Unlike traditional feature-based machine learning models such as support vector machines or multilayer perceptrons, GNNs can directly leverage the topological structure of the input graph. This capability is essential for tasks that depend on the graph topology and inter-node relationships. Such a property could make GNNs particularly suitable for estimating integrated information and major complex in IIT 3.0, without explicitly modeling the intricate computational processes.
Isomorphism invariance means that the output does not depend on how node labels are assigned but only on the underlying graph structure. This property is essential for GNNs to estimate integrated information Φ and the major complex, since both are determined solely by the structural and probabilistic properties of the system. Most existing GNNs operate as message-passing algorithms [27], where each node iteratively aggregates information from its neighbors to update its feature representation. This process is often referred to as graph convolution, and its expressive power is theoretically equivalent to that of the first-order Weisfeiler–Lehman (1-WL) graph isomorphism test [28,29]. Consequently, graph convolutional networks may fail to distinguish between non-isomorphic graphs.
To overcome the expressiveness limitations of 1-WL-based models, higher-order extensions such as k-WL tests and their corresponding k-GNN architectures have been proposed [30]. These models consider combinations of k nodes rather than individual nodes. With a sufficiently large value of k and appropriate parameters, they can theoretically approximate any continuous graph function [31]. However, their computational and memory demands grow rapidly with the number of nodes, making them impractical for large-scale systems. As a more practical alternative, efforts have been made to enhance the performance of 1-WL-equivalent models. For example, the Graph Isomorphism Network [29] improves expressiveness through careful aggregation design, while transformer-based GNNs [32] incorporate attention mechanisms inspired by advances in natural language processing [33]. These approaches aim to maximize the capability of 1-WL models while maintaining computational feasibility, aligning well with the goals of our study. In particular, we adopt transformer convolution in our framework to leverage its expressive power and scalability.
Our GNN-based framework does not attempt to model explicitly the complex nested optimization procedures of IIT 3.0. Instead, it adopts a data-driven approach that aims to learn the mapping from system structure to the values of Φ and the major complex. This means that the model is designed to imitate only the input–output relationship derived from IIT calculations. We consider this approach a meaningful and practical strategy for extending IIT to large systems where exact computation is infeasible.
This study investigates the potential of GNNs as an approximation method for estimating the major complex and its maximum integrated information Φ. We first evaluate the performance of GNNs on small-scale systems where exact theoretical calculations are feasible. After confirming the persistence of qualitative trends in Φ and major complex size from small to larger systems, we qualitatively investigate large systems that resemble a split-brain scenario [34,35]. This allows us to assess the applicability of our approach to structurally meaningful, brain-inspired configurations. Through these experiments, we show that the proposed framework can capture qualitative patterns of integrated information and major complex formation in large systems. Hereafter, the term “integrated information” and the variable Φ will refer to the maximum value of the system-level integrated information across all subsystems, i.e., the integrated information of the major complex, unless otherwise noted.
While the present study adopts the framework of IIT 3.0, this choice does not imply a limitation of scope. IIT 4.0 can be viewed as an incremental refinement that builds upon the foundation of IIT 3.0, and both versions share the same objective: computing the system-level integrated information Φ and identifying the major complex. Our proposed GNN-based framework predicts these quantities directly from graph features, without relying on intermediate constructs specific to any particular version of the theory. In this sense, our method is compatible with IIT 4.0.
Methods
In this study, we estimate integrated information Φ and major complex for randomly generated systems consisting of nodes. This limitation on the number of nodes is necessary because obtaining the exact solutions for IIT 3.0 becomes computationally infeasible for larger systems. For each value of N, numerous random systems are generated and used to train the GNN model.
Each node in the system can be in one of two states, +1 or –1, denoted as Si (), and the states are randomly assigned. The undirected edges between any pair of nodes are assigned with a probability of p = 0.4, and the edge weights Jij between nodes i and j are randomly determined following a standard normal distribution. The conditional probability
that node i will take on state
in the next time step is determined on the basis of the current states
using the Boltzmann distribution:
where represents the input to node i, and
denotes the set of nodes connected with node i. The parameter T, controlling the sharpness of the distribution, is uniformly sampled from the range [0.1,3.0] and is set independently for each generated system. This stochastic model captures the dynamics of the system, allowing probabilistic determination of node states based on neighborhood interactions.
Using the PyPhi library published by Mayner et al. [14], the exact solutions for the integrated information Φ and the corresponding major complex are obtained for each of the randomly generated systems. These systems and their solutions serve as the dataset for subsequent analysis within a supervised learning framework, where a GNN is trained on part of the dataset and tested on the remaining data.
Each system is treated as a graph where nodes and edges have specific features. Node is assigned a feature vector consisting of the following six dimensions:
: The larger value between the probability that node i will be in state +1 in the next time step and the probability that it will be in state –1. Note that the node states
affect only this feature and have no influence on the other features.
- Parameter T: A parameter that controls the sharpness of the probability distribution. In each graph, all nodes share the same value of T.
- Node degree: The number of edges connected to node i, indicating its connectivity in the system.
- Closeness centrality: A measure of how close node i is to all other nodes in the system, calculated as the inverse of the average of the shortest path distances from node i to all other nodes. It indicates how rapidly information spreads from node i to others.
- Betweenness centrality: A measure of how frequently node i lies on the shortest paths between pairs of other nodes. It indicates the importance of node i in the network communication.
- Clustering coefficient: A measure of how closely the neighbors of node i are connected to each other, indicating the density of local connections around node i. It is calculated as the ratio of actual connections to possible connections among the neighbors of node i.
Meanwhile, the edge features simply consist of a single dimension representing the edge weights, Jij, between the nodes.
It should be noted that the node state Si is not utilized as a feature. The random systems in our study exhibit invariance under inversion of the state vector. Specifically, the conditional probabilities of transitions are symmetric, indicating that −
, where
denotes a specific state to which the system transitions from the current state
. These conditional probabilities are derived by the product of Eq 1 over all the nodes, assuming conditional independence. According to IIT 3.0, for such symmetrical systems, both the state vector
and its inverse
should yield the same integrated information and major complex. However, if the node state Si is included as a node feature in the GNN, the use of ReLU activations and bias terms could violate this invariance, leading to different outputs for
and
despite their theoretical equivalence. We have designed the six aforementioned features to be identical for both node states
and
, ensuring that the GNN produces theoretically consistent predictions.
Fig 1 shows our GNN architecture, which is designed to perform multi-task estimation, i.e., predicting the integrated information Φ and identifying the major complex for a target system. The input consists of the six-dimensional feature vector for every node, with each dimension normalized using the entire training dataset. These inputs pass through four consecutive transformer convolutional layers. In each layer, attention weights are dynamically adjusted on the basis of the relationships between nodes [32], which has been highly successful in natural language processing [33]. Batch normalization and dropout follow each of the first three transformer layers to stabilize training and prevent overfitting.
The input node features are processed through four consecutive transformer convolutional layers with multi-head attention. After the fourth layer, the network splits into two branches to realize multi-task learning. One branch performs graph-level pooling followed by a fully connected layer to estimate the system-level Φ, while the other branch compares pooled and node-level features to classify whether each node belongs to the major complex.
Our model employs the multi-head attention mechanism, where each head is expected to independently capture different aspects of the relationships between nodes. Specifically, d-dimensional feature vector of each node, , is transformed into three components, namely query (
), key (
), and value (
), through learnable weight matrices
,
, and
. The transformations for each head
are given by
where ,
, and
are in
and C represents the number of output channels. The attention weight
toward node i from its neighbor j is computed using the dot product of the query from node i and the key from node j, with the addition of a term based on the edge feature
(one dimension in our study) and its learnable transformation matrix
:
Here, represents the importance of the feature of node j for updating node i, and
is a scaling parameter typically set to C/H. The softmax operation ensures that the incoming attention weights to each node are normalized so that their sum is 1. The updated feature for node i in each head is then computed as a weighted sum of the value vectors from its neighboring nodes:
where matrix is in
. The outputs from the multiple heads are concatenated to form the final updated feature for node i:
which results in a feature vector of dimension CH. This multi-head attention mechanism enables our GNN to attend to different aspects of the input graph simultaneously.
Following the fourth transformer layer, the network splits into two separate branches. In the first branch, the graph-level features are obtained using global max pooling, which selects, for each feature dimension, the maximum value across all nodes in the graph. Then, this pooled feature vector is passed through a dropout layer, followed by a fully connected layer that outputs the estimated integrated information Φ for the entire system.
In the second branch, the global max-pooled vector is subtracted from the feature vector of each node. This operation highlights the relative importance of each node compared to the most prominent features across the graph. Subsequently, the resulting differences are passed through a fully connected layer with a softmax activation function to perform a binary classification, determining whether each node is included in the major complex.
The subtraction of the global max-pooled vector from the feature vector of each node is crucial for improving the classification performance. Using only individual node features as inputs to the fully connected layer in Branch 2 can cause significant errors when the target system consists of multiple disconnected subsystems. As illustrated in Fig 2(a), when system A or system B alone constitutes the whole system, we assume that the GNN accurately predicts the integrated information and identifies the major complex for each subsystem. Even under this assumption, the prediction fails when subsystems A and B are evaluated together as a single disconnected larger system (Fig 2(b), 2(c)). In this larger system, the graph convolution results for subsystems A and B remain identical to those obtained when the subsystems are evaluated individually. As a result, the GNN misclassifies some nodes from the subsystem with lower integrated information (B in this case) as belonging to the major complex. This misclassification occurs because using only the individual node features without considering the global structure fails to differentiate between the subsystems. Subtracting the global max-pooled vector from each node feature allows the model to capture relative importance. Consequently, nodes from the subsystem with the highest integrated information (A in this case) are more likely to be correctly identified as part of the major complex.
(a) When each system is evaluated individually, the integrated information and the major complex can be correctly estimated using node features alone as input to the fully connected layer, without subtracting the max-pooled vector. Black circles represent nodes included in the major complex, while white circles represent nodes not included. (b) Since no edge exists between the two subsystems, the results of the transformer convolutions are identical to those obtained when each subsystem is evaluated independently. However, this leads to an incorrect estimation. (c) In the ground truth, the nodes in the subsystem with the smaller Φ should not be included in the major complex.
Results
In this section, we present the results of our simulations. We employed the PyTorch Geometric library [36] to implement the GNNs in the following experiments. Computations were mainly performed on a workstation equipped with an Intel Core i9 2.00 GHz processor, 128 GB of RAM, and an NVIDIA RTX A4500 GPU (20 GB memory). To evaluate the performance of our GNN model in estimating integrated information Φ and classifying the inclusion of the nodes in the major complex, we conducted three types of experiments.
The first experiment used a dataset of 3000 graphs in total, consisting of 1000 random systems generated as described in the previous section for each case of , and 7. The entire dataset was randomly shuffled and then split into two subsets: 90% of the total data was used in the training process, while the remaining 10% was reserved for testing. Then, the performance of the model was evaluated on the test dataset by measuring the accuracy of both the integrated information estimation and the classification of whether each node belongs to the major complex.
The second experiment aimed to assess the ability of the model to generalize to unseen system sizes slightly larger than those used in training, which was regarded as an extrapolative setting. We trained the model using a dataset consisting of 1500 graphs for each case of N = 5 and 6, i.e., a total of 3000 random systems. The test dataset consisted of 1000 random systems for N = 7. The performance of the model was evaluated on this test dataset with the same metrics as in the first experiment.
Finally, using the GNN models trained in the first experiment, we extended the evaluation to larger systems. This included an examination of scaling behavior across representative graph topologies and a qualitative test on a structured large-scale example. In the latter, we designed a test system inspired by the split-brain scenario [34,35]. The system comprises two subsystems, each of which contains 50 nodes. By varying the probability of an edge connecting nodes in different subsystems, we investigated changes in the estimated integrated information and major complex, highlighting the transition from local to global integration.
For the three types of experiments, we applied several common settings and parameters that were empirically determined. We defined two class labels: label in, representing nodes “included in the major complex,” and label out, representing nodes “not included in the major complex.” To address the imbalance between the numbers of these classes, meaning that label in was more frequent in our dataset, we applied a penalty factor of 1.8 to label out in the cross-entropy loss calculation. The total loss function for our multi-task model consisted of the sum of the mean squared error for the integrated information estimation and five times the cross-entropy loss for the major complex classification. This weighting factor of five was empirically introduced to balance the two tasks. The optimizer was set to LION (evolved sign momentum) [37]. This optimizer updates parameters based only on the gradient sign, whereas traditional optimizers such as SGD and Adam [38] also use the gradient magnitude. The LION optimizer provides a simple yet effective optimization strategy, focusing on efficiency and memory-saving features suitable for training deep network models. The learning rate in the algorithm was set to 0.0001 and the mini-batch size was set to 128.
Furthermore, to address the imbalance between label in and label out in the training dataset, we adopted a data augmentation strategy to artificially increase the number of label out instances. Specifically, we generated additional systems amounting to 5% of the total data used in the training process. To do this, we randomly selected two graphs from the training dataset and treated them as disconnected subsystems within a larger system. Assuming that the integrated information values of the two selected graphs are and
with
as in Fig 2(a), the whole of this artificially created system would have an integrated information value of
. The true label for major complex inclusion would be retained for nodes from graph A, while all nodes from graph B would be assigned the label out as in Fig 2(c), thereby increasing the proportion of out labels in the dataset.
As the distribution of Φ values within the training dataset was not uniform, an oversampling technique was adopted to improve the estimation accuracy in the low-frequency regions. We used k-means clustering to divide the data into seven bins based on the value of Φ. Then, we oversampled the smaller bins to match the size of the largest bin, adding noise by applying a multiplicative factor of 5% independently to both the integrated information values Φ and each dimension of the node features.
Following the application of data augmentation and oversampling techniques, 10% of the expanded training dataset was set aside as validation data to monitor the performance of the model during training. The training process was halted if the validation loss did not improve for 50 consecutive epochs, following the early stopping strategy.
Estimation in non-extrapolative setting
The results of the first experiment are summarized in Table 1. The table includes four main metrics evaluated on the test data: the mean squared error (MSE) and correlation coefficient between the estimated and actual values of the integrated information Φ, the accuracy of the node-wise prediction for inclusion in the major complex, and the graph-wise accuracy, i.e., the proportion of graphs where the major complex is predicted completely correctly. The values shown in the table are averages obtained over 100 repetitions of the experiment, where different combinations of training and testing datasets were used. The table presents the performance achieved with the proposed method, and for comparison, it also shows the following:
- Performance when removing each dimension of the node features.
- Performance when replacing the transformer convolutional layer with more standard convolutional types, specifically the graph convolutional layer (GraphConv class provided in the PyTorch Geometric library) and graph attention network [39] (GAT class in the PyTorch Geometric library). GraphConv simply aggregates features from neighboring nodes using a weighted sum without attention mechanisms. GAT employs simpler attention mechanisms than the transformer convolution.
- Performance when replacing LION with other optimizers, specifically Adam [38], which is widely regarded as the de facto standard, and its derivatives RAdam [40] and AdamW [41].
- Performance for single-task scenarios, where either integrated information Φ or major complex is estimated independently. This corresponds to evaluations on networks with either Branch 1 or Branch 2 removed (see Fig 1).
- Performance under two pooling settings:
- using global average pooling instead of global max pooling in Branches 1 and 2,
- using node features as inputs to the fully connected layer in Branch 2 without subtracting the max-pooled vectors.
The performance of the proposed method in estimating integrated information shows a relatively strong correlation coefficient of 0.7446 between the estimated and true values, implying a reasonable level of predictive capability. Fig 3 shows a scatter plot comparing the estimated and actual values of Φ in one trial. Most systems had the values of Φ below 1, where the model demonstrated higher accuracy. By contrast, systems with Φ values around 1 or higher were less frequent and tended to be predicted with lower accuracy. This uneven distribution with respect to the values of Φ led to greater variability in the higher-Φ regions. While oversampling could increase the apparent number of samples, it did not sufficiently reproduce the diverse characteristics of graph structures and node features in regions with few samples, which might limit its effectiveness in improving prediction accuracy. This result suggests that having a more balanced set of real samples could help improve the model’s robustness across the entire range of Φ values.
All 300 test data used in one of the 100 trials conducted with the proposed method are shown. In this case, the correlation coefficient was 0.7464, which was close to the average over all the trials. Accurately estimated instances are located on the dotted line with a slope of 1.
When predicting node inclusion in the major complex, the proposed method achieves a node-wise accuracy of 0.8574, indicating reliable performance in this binary classification task. The graph-wise accuracy measures the proportion of graphs in which the major complex is completely identified, and it is 0.5779 in this experiment. Interestingly, this is higher than the expected value obtained by simply raising the node-wise accuracy to the power of the number of nodes (5, 6, or 7). This suggests that our model captures a certain degree of interdependence among nodes forming the major complex, rather than treating each node independently. Specifically, the model seems to learn characteristic patterns in the graph structure associated with major complex, and to make incorrect predictions on graphs that deviate from these learned patterns. Further improvements can be achieved by intensively training on rare graph configurations.
When a single node feature is removed, the decline in accuracy for major complex estimation is relatively small. By contrast, the performance related to Φ estimation shows a more pronounced decrease, indicating that integrated information is more sensitive to changes in node features. Since both the probability of a node being in state +1 or –1 and the parameter T are closely related to the state transition probabilities required for the IIT calculation, it is intuitively expected that they would have a larger impact than the centrality-based features. However, we do not find significant differences to identify the best-performing feature.
Meanwhile, the selection of convolution type has a greater influence on performance compared to node feature reduction. The standard convolutional methods, GraphConv and GAT, resulted in much lower performance compared to our transformer-based approach. This result highlights the effectiveness of the weight determination of the attention mechanism in the transformer, which has also proven successful in natural language processing.
In addition, although the Adam and the RAdam optimizers slightly outperform the proposed method in terms of major complex estimation accuracy, the overall performance of LION, Adam, RAdam, and AdamW shows no significant differences. However, the proposed method using LION demonstrates a clear advantage in efficiency, with an average convergence time of 94.2 epochs, compared to 166.2 epochs for Adam, 185.4 epochs for RAdam, and 160.8 epochs for AdamW. This highlights the effectiveness of the LION optimizer, designed primarily to enhance efficiency rather than accuracy.
Estimating integrated information and major complex with the multi-task approach resulted in limited performance improvements compared to single-task models. This suggests that, although deriving integrated information and identifying major complex are theoretically related within the framework of IIT 3.0, the proposed method does not fully capture their deeper interconnections. Enhancing the ability of the multi-task model to better utilize the intricate relationships between integrated information and major complex is a key challenge for future work.
Using global average pooling fails to achieve the same level of accuracy as global max pooling, especially for major complex estimation, even though average pooling is commonly used for graph-wise prediction. This highlights the importance of global max pooling in the proposed method. Global max pooling emphasizes the most dominant feature signals by taking the element-wise maximum across all nodes. Average pooling, however, tends to dilute such signals, leading to reduced accuracy. Similarly, using node features directly as inputs to the fully connected layer in Branch 2 (see Fig 1), without subtracting the global max-pooled vector, significantly degrades the major complex estimation accuracy. Although this approach slightly improves estimation accuracy of Φ, subtracting the max-pooled vector offers a distinct theoretical advantage for major complex estimation, which outweighs those small gains.
Fig 4 provides an example of attention weights. It shows the weights assigned between neighboring nodes at each transformer layer in a system with a high Φ value (), where all nodes form the major complex. This system was included in the training dataset, and the weights were computed using the model after training was completed. Edge thickness and color intensity represent the magnitude of the attention weights, and for all but the last transformer layer, the weights are averaged across the four attention heads. A common trend is that attention distributions tend to be relatively uniform in the early layers, while deeper layers show more focused patterns, highlighting specific nodes or edges as being more influential for the prediction task. It should be noted, however, that these visualizations do not reveal any clear correspondence to information integration as defined in IIT. In this study, no general or consistent relationship was observed between the learned attention patterns and the IIT framework.
This figure shows the weights between neighboring nodes at each transformer layer for a system with N = 7 nodes and . Edge thickness and color intensity indicate the magnitude of the attention weights. Except for the last transformer layer, the weights are averaged across the four attention heads.
Estimation in extrapolative setting
The results of the second experiment, where the model was trained and validated on graphs with and tested on graphs with N = 7, are presented in Table 2. This setup represents an extrapolative scenario, as the model must generalize to unseen system sizes larger than those used during training. The structure of the table and metrics are consistent with those in the first experiment, and the values represent averages over 100 trials.
For integrated information estimation, the performance appears to decline compared to the previous experiment. However, the MSE and the correlation coefficient only for N = 7 in the first experiment were 0.7983 and 0.6730, respectively, which are comparable to those in this experiment. Fig 5 shows a scatter plot of estimated versus true integrated information values for a specific trial, illustrating the tendency of the model to underestimate Φ in the high-value range that is not covered by the training data. As high values of integrated information are associated with systems of N = 7 in most cases, the integrated information is expected to rise further as the graph size increases beyond N = 7; hence, the confidence in integrated information estimates may decrease for such larger graphs.
The plot shows all 1000 test data corresponding to graphs for N = 7 in a specific trial conducted with the proposed method. In this trial, the correlation coefficient was 0.6815, matching the average across all trials.
For the major complex estimation task, both node-wise and graph-wise accuracies also appear to be lower compared to the first experiment. However, considering the test data only for N = 7 in the first experiment, the node-wise accuracy was 0.8302, and the graph-wise accuracy was 0.4893, which are nearly consistent with the results here. While the integrated information Φ behaves like an extensive variable and is sensitive to changes in graph size, the membership of each node in the major complex is inherently non-extensive. This suggests that our model may maintain a certain level of reliability in predicting the major complex for larger graphs and exhibit promising extrapolation performance.
In the comparison experiments, we found trends qualitatively similar to those observed in the previous experiment. The removal of one node feature leads to minor declines in major complex estimation accuracy, while integrated information estimation is more sensitive to these changes. The selection of convolution type also has a significant impact, with traditional methods resulting in lower performance than the transformer-based approach. The optimizer selection does not cause a significant difference in performance. Our multi-task approach does not show a clear benefit over single-task estimation, suggesting potential limitations of the proposed method in fully leveraging the theoretical relationship between integrated information and major complex. In addition, global max pooling outperforms global average pooling especially in major complex estimation, because it more effectively captures the most significant feature signals within the graph.
We describe the computational time required by the proposed method. The training process, which used a total of 3000 graphs with N = 5 and 6, typically completed within 2 to 3 minutes. Inference of Φ and the major complex for 1000 graphs with N = 7 took less than 2 seconds in total. In contrast, computing Φ and the major complex based on the IIT 3.0 framework for a single graph with N = 7 required more than 30 minutes on average. The individual computation times varied considerably depending on the number and configuration of edges. Even when the cut-one approximation [14] was applied, the computation time was reduced by only less than 10% on average. These results indicate that our method provides a significant advantage in computational efficiency and is scalable to systems with larger numbers of nodes.
Scaling behavior of Φ and major complex formation
To predict Φ and the major complex for substantially larger systems, it is important to first evaluate the reliability of such estimates. A direct theoretical analysis of error growth with N is intractable, because in IIT 3.0 the value of Φ depends intricately on the interplay between system topology and state-transition probabilities, and the expressive capability of GNNs in this task is not analytically obvious. We therefore investigated whether the trends observed in exact IIT 3.0 calculations for small N are preserved in our GNN-based predictions for larger N.
Three types of system topologies, as shown in Fig 6(a), 6(b), 6(c), were considered for the analysis of small N:
- Tree-like systems with N–1 undirected edges and no self-loops for N ranging from 4 to 10.
- Fully connected systems for N ranging from 4 to 7.
- Loop-containing systems with an intermediate number of edges. For N = 6 and 7, we set 6 and 8 edges, respectively, corresponding to 40% of the edges in the fully connected configuration. For N = 4 and 5, we set
edges to ensure that the systems were not too close to the tree-like configuration.
These examples show the case of N = 5. (a) Tree-like systems contain exactly N–1 edges and have no closed loops. (b) Fully connected systems contain all possible edges between the N nodes, excluding self-loops. (c) Loop-containing systems have an intermediate number of edges compared with the other two types and include at least one closed loop.
For each N and topology, 10 random instances were generated, and Φ and the major complex were computed according to the IIT 3.0 framework.
Fig 7 shows the resulting Φ values, and Fig 8 shows the ratio of the number of nodes in the major complex to the total number of nodes in the system. Each plotted point corresponds to one system instance, with a slight horizontal offset to prevent overlap. The results reveal distinct topology-dependent trends:
- Tree-like systems have Φ values ranging from near zero to slightly above 1. The major complex frequently consists of only two nodes, leading to a low ratio in most cases.
- Fully connected systems show a rapid increase in Φ as N grows. In many cases, all nodes are included in the major complex.
- Loop-containing systems show intermediate Φ values that increase gradually as N grows, and their major complexes often involve a larger number of nodes.
For each N–topology combination, 10 random systems were generated. Each point corresponds to one instance, plotted with a slight horizontal offset to prevent overlap. For fully connected and loop-containing systems, results are only presented for owing to the computational expense of exact IIT 3.0 evaluation.
Each point corresponds to one system instance, plotted with a slight horizontal offset to prevent overlap. The ratio is defined as the number of nodes included in the major complex divided by the total number of nodes N.
We then applied 100 GNN models, trained only on and 7 in the non-extrapolative setting, to larger systems with
. In these large-N test systems, we again considered the three topologies: tree-like systems with N–1 edges, fully connected systems, and loop-containing systems with 40% of the total number of possible edges. Ten random instances were generated for each N and topology. To reduce variability in the predictions across the 100 models, we used an ensemble approach. For each test system, the value of Φ was obtained by averaging the predictions across all models, and a node was considered to be included in the major complex if at least 60% of the models predicted its inclusion.
Fig 9 and Fig 10 show the estimated Φ values and major complex ratios, respectively. Each plotted point corresponds to one system instance, with a slight horizontal offset for clarity. Most of the qualitative trends observed in the exact computation for small N are preserved:
- Tree-like systems maintain low Φ and small major complexes.
- Fully connected systems show increasing Φ with N. In many cases, all nodes are included in the major complex.
- Loop-containing systems exhibit intermediate Φ values that increase with N, and major complexes spanning almost the entire system. The major complex size appears to follow the trend implied by exact values for small N.
For each N–topology combination, 10 randomly generated systems were evaluated, and each point corresponds to one system instance. Tree-like systems show Φ values that remain within a narrow range across N. For fully connected and loop-containing systems, Φ tends to increase with N, but the values are considerably smaller than those implied by the exact results for small N (see Fig 7).
Each point represents one system instance, plotted with a slight horizontal offset to prevent overlap. Tree-like systems generally show low ratios, whereas fully connected and loop-containing systems tend to include most or all nodes in the major complex.
However, for both fully connected and loop-containing systems, the estimated Φ values are lower than those implied by the exact results for small N.
These results suggest that, while our proposed method cannot accurately estimate Φ values for large N, it may still be useful for comparing different systems in terms of Φ or for identifying topology-dependent trends. Consequently, our method can be applied to reveal qualitative features of information integration and major complex formation in large-scale systems where exact IIT 3.0 computation is not feasible.
Qualitative analysis of split-brain-like systems
Building on the results of the previous section, we perform a qualitative analysis of a simplified yet structurally meaningful system that mimics a specific brain condition known as the “split brain.” Our system employs only 100 nodes, far fewer than the number of neurons in an actual brain. Nevertheless, this system is designed to capture the structural separation between the two hemispheres. By applying our GNN-based estimation models to this system, we examine how integrated information Φ and the major complex change as inter-hemispheric connectivity varies. This approach enables the investigation of split-brain–like disconnections within a tractable computational setting, and it allows us to explore their potential implications for understanding brain network organization.
Each test system comprised two subsystems: for the first 50 nodes, edges were randomly generated between pairs of nodes with a probability of 0.6, while the remaining 50 nodes were connected with a probability of 0.4. An edge connecting nodes between the two subsystems was set with probability pe ranging from 0 to 0.4. For all edges, their strengths were independently sampled from a standard normal distribution. In the case of pe = 0, the two subsystems are completely separate as in Fig 2, resembling a split-brain scenario. As the value of pe increases, inter-subsystem connections are introduced, resulting in progressively greater integration between the subsystems. Specifically, we were interested in observing how the system transitions from the local integration to the global integration.
Fifty large-scale test systems were generated for each value of pe. From the first experiment in the non-extrapolative setting, we obtained 100 estimation models trained on smaller systems with and 7. These models were then applied to the large-scale systems to predict the integrated information and identify the major complex. To enhance reliability, considering the variability in individual model predictions especially for extrapolative conditions, we averaged the estimated values of
across the 100 models and determined that a node belonged to the major complex if at least 60% of these models supported its inclusion.
Fig 11 shows the percentage of cases in which the subsystem of the first 50 nodes forms the major complex out of the 50 test systems by varying the value of pe. In all trials with pe = 0, the major complex is composed of the first 50 nodes with denser internal connectivity. Identifying a single subsystem as the major complex is facilitated by the data augmentation techniques described in the Methods section and the operation of subtracting the max-pooled vector from the feature of each node. As pe increases, this proportion of “local integration” declines rapidly, and when pe reaches around 0.01 to 0.02, the major complex spans both subsystems with nearly 100% certainty.
At pe = 0, the major complex consists of only the first 50 nodes. As pe increases, this rate rapidly declines.
Fig 12 shows the ratio of nodes included in the major complex out of the total 100 nodes as the value of pe varies. This ratio was averaged over 50 test systems. At pe = 0, the major complex is formed by all nodes in the first subsystem across all test systems, resulting in a ratio of 0.5. As pe increases toward 0.02, the local integration disappears rapidly; by contrast, this ratio progresses more slowly, reaching only around 0.6. As pe increases further, the ratio rises steadily and eventually reaches approximately 0.93. This indicates that a larger fraction of the system contributes to the formation of the major complex, which can be regarded as “global integration.” This behavior may be akin to a functional shift from localized processing to a more distributed and unified form of information integration, as observed in neural circuits of the brain.
The values of ratios were averaged over 50 test systems. The ratio is 0.5 at pe = 0, and it gradually increases with pe, eventually reaching around 0.93.
Fig 13 shows the estimated integrated information Φ averaged over 50 test systems and 100 models as a function of pe. Based on the scaling behavior, the estimated values of Φ are not highly reliable; however, relative changes in Φ can provide insights into significant trends. When pe is sufficiently small such that test systems resemble a split brain, the values of Φ remain approximately constant. As pe increases and the major complex expands to include more nodes from both subsystems, the value of Φ grows correspondingly, indicating that enhanced interconnections and interactions between subsystems correlates with higher integrated information. As with the increase in the major complex size discussed above, such growth in the value of Φ might correspond to a transition from localized to cohesive and unified processing in the brain.
The values were averaged over 50 test systems and 100 estimation models. For small pe, the values of remain nearly constant as illustrated in the interpolated diagram. It begins to increase as the major complex spans both subsystems, reflecting higher integration with increased connectivity.
These results illustrate the capacity of our model to differentiate between local and global integration in the system. For lower pe values, the system exhibits characteristics similar to a split brain, where the major complex is restricted to one of the subsystems. As pe increases and the inter-subsystem connectivity increases, the model begins identifying a unified major complex that spans both subsystems, corresponding to a globally integrated configuration. This highlights the potential of the GNN model to effectively make predictions based on large-scale connectivity patterns and to explore how system integration relates to neural configurations, although further validation across different datasets and scenarios is required.
Discussion
This study explored the applicability of GNNs for estimating integrated information and major complex as defined by IIT 3.0. Our GNN models could capture qualitative trends in both Φ and the major complex through their estimations in large systems. However, the multi-task architecture showed clear limitations, as our model was not able to fully exploit the intricate relationships between the integrated information and the major complex. These limitations highlight the need for refining the model architecture, which we discuss below.
One potential enhancement involves expanding the predictive scope of the model to include additional variables (see S1 Text), such as the “mechanism-level” integrated information (small phi) and the integrated information of unidirectionally-partitioned subsystems. Although these variables span different computational stages within the IIT 3.0 framework, they can be represented within a heterogeneous graph [42,43]. For example, consider “mechanisms” in IIT 3.0, which are defined as subsets of system nodes and are characterized by their own
values. One could extend the graph by introducing “mechanism nodes,” distinct from the original system nodes, and by connecting them to their constituent system nodes through edges of a different type. This results in a heterogeneous graph that includes not only the original system structure but also mechanism-level nodes and edges. Training a GNN on this graph allows the model to learn multi-dimensional embeddings for mechanism nodes from their constituent system nodes. Using these embeddings, a subsequent regression layer can then predict integrated information
of each mechanism. Since embeddings of mechanism nodes propagate back into their constituent system nodes through message passing, incorporating this mechanism-level information may in turn improve the accuracy of system-level predictions for Φ and the major complex. The total number of mechanisms grows exponentially with N, so not all mechanism nodes can be constructed for large systems. Therefore, in practice, a heterogeneous graph for test systems can be constructed by randomly generating mechanism nodes and their associated edges.
One potential improvement in the training strategy is to incorporate contrastive learning. This technique assigns latent representations (embedding) to input data such that semantically similar data have close representations, while dissimilar data have distinct ones [44–46]. This is achieved by introducing a contrastive term into the loss function to encourage this behavior. In graph-based contrastive learning, similarity is typically defined by node features or graph topology. However, in our problem, graphs with different topologies can yield similar integrated information Φ; thus, defining similarity only by topology is inappropriate. Therefore, we propose to define similarity on the basis of Φ or major complex composition, in analogy to the use of class labels in supervised contrastive learning [46]. In our model, the graph representation is obtained by applying global pooling to the output of the fourth transformer convolution layer. A contrastive term based on this embedding can be added to the overall loss function, where graphs with similar values of integrated information Φ or similar major complex composition are treated as positive pairs, while the others are treated as negative pairs. This contrastive loss may improve the prediction accuracy of our GNN model.
The proposed model faced challenges in accurately estimating integrated information as the number of nodes increased, particularly when extrapolating to larger systems. Underestimation in the region of higher values suggests that current training process might not sufficiently capture the scaling factors relative to the number of nodes. This issue could be addressed by treating the scaling factors as learnable parameters. One promising solution is to adopt an iterative learning approach consisting of two steps: (1) training the network weights on smaller graphs such as N = 5 or N = 6 with the scaling factors fixed, and (2) training only the scaling factors on larger graphs such as N = 7 with the network weights fixed. By repeating these steps, the model may progressively enhance its extrapolative estimation ability.
Despite the aforementioned limitations of current models, the approach developed in this study can be applied to IIT 4.0, the latest version of the theory. This is because the main objective of IIT 4.0 remains the same: estimating integrated information Φ and identifying the associated complex based on nodes, edges, and transition probabilities of a system. Furthermore, the advanced computational process in IIT 4.0 introduces the concept of “relations,” which specify connections between groups of nodes. These relations could also be effectively represented by a heterogeneous graph. In particular, “relation nodes” could be added as a new node type, with edges connecting each relation node to the original system nodes that constitute the relation. This could help GNNs capture higher-order interactions and improve the prediction of Φ and its associated complex.
From the formulation of the transformer convolution (Eqs (2)–(4)), learnable parameters in the model are contained in the weight matrices and
, which are shared across all nodes and are not directly influenced by the number of nodes N. By contrast, it is necessary to compute the attention coefficients
(Eq (3)) for every edge in the graph. Additionally, deriving centrality-related node features becomes computationally expensive as N grows, since such computations often have a time complexity greater than O(N). Taken together, if the number of connections per node remains O(1) and the centrality features are substituted with computationally efficient alternatives, the computational cost can be substantially reduced. Under these conditions, the proposed method should remain feasible even for much larger N than those considered here within the limits of available computational resources, particularly memory capacity. This suggests that the proposed method may be applicable to real-world neural systems. For example, connectome-level brain networks obtained from human or animal studies [47,48] could serve as valuable testbeds for examining integrated information and major complexes in large biological systems.
In conclusion, our GNN-based model provides a promising approach for approximating IIT calculations, particularly for large systems. Further research should focus on refining the model architecture, exploring scaling strategies, and enhancing the multi-task framework to improve prediction accuracy. Ultimately, developing such a tool will provide a deeper understanding of consciousness and other emergent phenomena. It may also offer crucial insights for bridging computational models with neuroscience.
Supporting information
S1 Text. Nested optimization framework in IIT 3.0.
https://doi.org/10.1371/journal.pone.0335966.s001
(PDF)
References
- 1. Tononi G. An information integration theory of consciousness. BMC Neurosci. 2004;5:42. pmid:15522121
- 2. Balduzzi D, Tononi G. Integrated information in discrete dynamical systems: motivation and theoretical framework. PLoS Comput Biol. 2008;4(6):e1000091. pmid:18551165
- 3. Tononi G. Consciousness as integrated information: a provisional manifesto. Biol Bull. 2008;215(3):216–42. pmid:19098144
- 4. Tononi G. Integrated information theory of consciousness: an updated account. Arch Ital Biol. 2012;150(2–3):56–90. pmid:23165867
- 5. Oizumi M, Albantakis L, Tononi G. From the phenomenology to the mechanisms of consciousness: integrated information theory 3.0. PLoS Comput Biol. 2014;10(5):e1003588. pmid:24811198
- 6. Tononi G, Boly M, Massimini M, Koch C. Integrated information theory: from consciousness to its physical substrate. Nat Rev Neurosci. 2016;17(7):450–61. pmid:27225071
- 7. Albantakis L, Barbosa L, Findlay G, Grasso M, Haun AM, Marshall W, et al. Integrated information theory (IIT) 4.0: formulating the properties of phenomenal existence in physical terms. PLoS Comput Biol. 2023;19(10):e1011465. pmid:37847724
- 8. Krohn S, Ostwald D. Computing integrated information. Neurosci Conscious. 2017;2017(1):nix017. pmid:30042849
- 9. Aguilera M, A Di Paolo E. Integrated information in the thermodynamic limit. Neural Netw. 2019;114:136–46. pmid:30903946
- 10. Popiel NJM, Khajehabdollahi S, Abeyasinghe PM, Riganello F, Nichols ES, Owen AM, et al. The emergence of integrated information, complexity, and “Consciousness” at criticality. Entropy (Basel). 2020;22(3):339. pmid:33286113
- 11. Hosaka T. Effects of parity, frustration, and stochastic fluctuations on integrated conceptual information for networks with two small-sized loops. Neural Netw. 2023;162:131–46. pmid:36905823
- 12. Kitazono J, Kanai R, Oizumi M. Efficient algorithms for searching the minimum information partition in integrated information theory. Entropy (Basel). 2018;20(3):173. pmid:33265264
- 13. Toker D, Sommer FT. Information integration in large brain networks. PLoS Comput Biol. 2019;15(2):e1006807. pmid:30730907
- 14. Mayner WGP, Marshall W, Albantakis L, Findlay G, Marchman R, Tononi G. PyPhi: a toolbox for integrated information theory. PLoS Comput Biol. 2018;14(7):e1006343. pmid:30048445
- 15. Besharatifard M, Vafaee F. A review on graph neural networks for predicting synergistic drug combinations. Artif Intell Rev. 2024;57(3).
- 16.
Fout A, Byrd J, Shariat B, Ben-Hur A. Protein interface prediction using graph convolutional networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017.
- 17. Réau M, Renaud N, Xue LC, Bonvin AMJJ. DeepRank-GNN: a graph neural network framework to learn patterns in protein-protein interfaces. Bioinformatics. 2023;39(1):btac759. pmid:36420989
- 18. Bessadok A, Mahjoub MA, Rekik I. Graph neural networks in network neuroscience. IEEE Trans Pattern Anal Mach Intell. 2023;45(5):5833–48. pmid:36155474
- 19. Li Y, Guo Z, Wang K, Gao X, Wang G. End-to-end interpretable disease-gene association prediction. Brief Bioinform. 2023;24(3):bbad118. pmid:36987781
- 20. Aftab R, Qiang Y, Zhao J, Urrehman Z, Zhao Z. Graph Neural Network for representation learning of lung cancer. BMC Cancer. 2023;23(1):1037. pmid:37884929
- 21. Li Q, Wang Z, Li L, Hao H, Chen W, Shao Y. Machine learning prediction of structural dynamic responses using graph neural networks. Computers & Structures. 2023;289:107188.
- 22. Ngo Q-H, Nguyen BLH, Vu TV, Zhang J, Ngo T. Physics-informed graphical neural network for power system state estimation. Applied Energy. 2024;358:122602.
- 23. Chen M, Zhang J, Dong R, Xu Y, Liang H, Zheng J, et al. An interpretable weather forecasting model with separately-learned dynamics and physics neural networks. Geophysical Research Letters. 2025;52(13).
- 24. Tan G. NAH-GNN: a graph-based framework for multi-behavior and high-hop interaction recommendation. PLoS One. 2025;20(4):e0321419. pmid:40299984
- 25. Zhang P, Zhao H, Shao Z, Xie X, Hu H, Zeng Y, et al. Enhanced multi-scenario running safety assessment of railway bridges based on graph neural networks with self-evolutionary capability. Engineering Structures. 2024;319:118785.
- 26. Shao Z, Peng X, Zhang P, Liu Z, Chen Y, Yang R, et al. An intelligent GNN seismic response prediction and computation framework adhering to meshless principles: a case study for high-speed railway bridges. Engineering Analysis with Boundary Elements. 2025;179:106359.
- 27.
Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE. Neural message passing for quantum chemistry. In: Proceedings of the 34th International Conference on Machine Learning. 2017. p. 1263–72.
- 28. Weisfeiler B, Lehman AA. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsia. 1968;2(9):12–6.
- 29.
Xu K, Hu W, Leskovec J, Jegelka S. How powerful are graph neural networks?. In: Proceedings of the 7th International Conference on Learning Representations, 2019.
- 30. Morris C, Ritzert M, Fey M, Hamilton WL, Lenssen JE, Rattan G, et al. Weisfeiler and leman go neural: higher-order graph neural networks. AAAI. 2019;33(01):4602–9.
- 31.
Maron H, Fetaya E, Segol N, Lipman Y. On the universality of invariant networks. In: Proceedings of the 36th International Conference on Machine Learning. 2019. p. 4363–71.
- 32.
Shi Y, Huang Z, Feng S, Zhong H, Wang W, Sun Y. Masked label prediction: unified message passing model for semi-supervised classification. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. 2021. p. 1548–54. https://doi.org/10.24963/ijcai.2021/214
- 33.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017.
- 34. Sperry RW. Cerebral organization and behavior: the split brain behaves in many respects like two separate brains, providing new research possibilities. Science. 1961;133(3466):1749–57. pmid:17829720
- 35. Gazzaniga MS, Bogen JE, Sperry RW. Some functional effects of sectioning the cerebral commissures in man. Proc Natl Acad Sci U S A. 1962;48(10):1765–9. pmid:13946939
- 36.
Fey M, Lenssen JE. Fast graph representation learning with PyTorch Geometric. In: Proceedings of ICLR 2019 Workshop on Representation Learning on Graphs and Manifolds. 2019.
- 37.
Chen X, Liang C, Huang D, Real E, Wang K, Pham H, et al. Symbolic discovery of optimization algorithms. In: Proceedings of the 37th Conference on Neural Information Processing Systems; 2023.
- 38.
Kingma D, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations; 2015.
- 39.
Veli čkovi ć P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention networks. In: Proceedings of the 6th International Conference on Learning Representations; 2018.
- 40.
Liu L, Jiang H, He P, Chen W, Liu X, Gao J, et al. On the variance of the adaptive learning rate and beyond. In: Proceedings of the 8th International Conference on Learning Representations; 2020.
- 41.
Loshchilov I, Hutter F. Decoupled weight decay regularization. In: Proceedings of the 7th International Conference on Learning Representations. 2019.
- 42.
Schlichtkrull M, Kipf TN, Bloem P, van den Berg R, Titov I, Welling M. Modeling relational data with graph convolutional networks. In: Proceedings of the Semantic Web: 15th International Conference, ESWC. 2018. p. 593–607.
- 43.
Zhang C, Song D, Huang C, Swami A, Chawla NV. Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019. p. 793–803. https://doi.org/10.1145/3292500.3330961
- 44.
Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning. 2020. p. 1597–607.
- 45.
You Y, Chen T, Sui Y, Chen T, Wang Z, Shen Y. Graph contrastive learning with augmentations. In: Proceedings of the 34th Conference on Neural Information Processing Systems; 2020.
- 46.
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, et al. Supervised contrastive learning. In: Proceedings of the 34th Conference on Neural Information Processing Systems. 2020.
- 47. Van Essen DC, Smith SM, Barch DM, Behrens TEJ, Yacoub E, Ugurbil K, et al. The WU-Minn Human Connectome Project: an overview. Neuroimage. 2013;80:62–79. pmid:23684880
- 48. White JG, Southgate E, Thomson JN, Brenner S. The structure of the nervous system of the nematode Caenorhabditis elegans. Philos Trans R Soc Lond B Biol Sci. 1986;314(1165):1–340. pmid:22462104