Skip to main content
Advertisement
  • Loading metrics

Upper bounds for integrated information

Abstract

Originally developed as a theory of consciousness, integrated information theory provides a mathematical framework to quantify the causal irreducibility of systems and subsets of units in the system. Specifically, mechanism integrated information quantifies how much of the causal powers of a subset of units in a state, also referred to as a mechanism, cannot be accounted for by its parts. If the causal powers of the mechanism can be fully explained by its parts, it is reducible and its integrated information is zero. Here, we study the upper bound of this measure and how it is achieved. We study mechanisms in isolation, groups of mechanisms, and groups of causal relations among mechanisms. We put forward new theoretical results that show mechanisms that share parts with each other cannot all achieve their maximum. We also introduce techniques to design systems that can maximize the integrated information of a subset of their mechanisms or relations. Our results can potentially be used to exploit the symmetries and constraints to reduce the computations significantly and to compare different connectivity profiles in terms of their maximal achievable integrated information.

Author summary

Integrated Information Theory (IIT) offers a theoretical framework to quantify the causal irreducibilty of a system, subsets of the units in a system, and the causal relations among the subsets. For example, mechanism integrated information quantifies how much of the causal powers of a subset of units in a state cannot be accounted for by its parts. Here, we provide theoretical results on the upper bounds for this measure, how it is achieved, and why mechanisms with overlapping parts cannot all be maximally integrated. We also study the upper bounds for integrated information of causal relations among the mechanisms. The ideas introduced here can potentially pave the way to design systems with optimal causal irreducibility and to develop computationally lightweight exact or approximate measures for integrated information.

1 Introduction

Integrated information theory (IIT) has been developed as a comprehensive theory of what it takes for a system to be conscious, how much, and in which way [1, 2]. The theory starts from the existence of experience and characterizes its essential properties—those that are true of every conceivable experience—called phenomenal ‘axioms’. These are as follows: every experience is intrinsic (for the subject), specific (this one), unitary (a whole, irreducible to its parts), definite (this whole, having a border and grain), and structured (being composed of phenomenal distinctions and relations) [1]. The theory then formulates the axioms in operational terms as ‘postulates’ of cause-effect power. Given a system’s causal model (a set of units in its current state, together with its transition probability matrix (TPM)), IIT’s postulates can then be employed to identify a substrate of consciousness or ‘complex’—a maximally irreducible set of units (having maximum system integrated information φs) with its specific intrinsic cause-effect state. Finally, IIT’s postulates are employed to ‘unfold’ the cause-effect structure specified by the complex in its current state—the set of causal distinctions (cause-effects) specified by subsets of units within the complex, as well as their relations (the way their cause-effects overlap). According to IIT, the composition of the cause-effect structure corresponds to the quality of an experience—how the experience feels—and the sum of the integrated information values of its composing distinctions (φd) and relations (φr) corresponds to the quantity of consciousness (Φ)—how much an entity exists intrinsically (for itself).

As shown in other work, the theory has explanatory, predictive, and inferential power [2]. For example, it explains why certain parts of the brain, but not others, can support consciousness, and why consciousness is lost during dreamless sleep and anesthesia [3, 4]. The theory has also been employed to account for the quality of experience, namely the pervasive feeling of spatial extendedness [5] and the feeling of temporal flow [6]. It has led to clinical applications, using crude proxies of the system integrated information φs that nevertheless offer what is currently the most sensitive and specific test for the presence of consciousness in non-responsive patients [3, 7, 8]. Finally, to the extent that the theory continues to be validated empirically in humans, it supports inferences about the presence, quantity, and quality of consciousness in other species as well as in artifacts. For example, it can be shown that computer architectures cannot be conscious in any meaningful sense because they break down into small complexes, each of which has a trivial value of Φ, regardless of their ability to simulate intelligent behaviors and functions [9].

IIT is unique in providing an exact calculus for the quantity and quality of consciousness—from first principles and based on phenomenology. However, unfolding cause-effect structures and determining the associated value of Φ exactly is not feasible for realistic systems, for several reasons. Among them is the difficulty of obtaining a system TPM at the right grain, the nested combinatorial explosions in assessing candidate unit grains and candidate complexes, as well as the composing distinctions and relations. For these reasons, and not unlike well-known precedents in statistical physics and quantum mechanics, it is essential to develop approximations and heuristics. These can then be used to estimate Φ and related quantities based on simple properties of various substrates, including the density and pattern of connections as well as various symmetries. As a step in this direction, it is important to begin establishing bounds on IIT’s basic quantities, which is the goal of this study. By obtaining such bounds, and progressively tightening them based on various properties of substrates of interest, we should ultimately be able to make well-grounded estimates about the presence and quantity of consciousness in different regions of the human brain, allowing us to more precisely test the theory’s predictions.

A further goal will be to estimate the value of Φ in brains markedly different from ours, as well as in other natural and artificial systems. An important outcome of the search for bounds is the determination of orders of magnitudes for Φ. As shown here, sums of integrated information of distinctions and relations can grow hyper-exponentially with the number of units. Therefore, a system with an architecture that allows a large number of units to constitute a maximally irreducible complex, should yield hyper-astronomical values of Φ. We have conjectured that this should be the case for a densely connected lattice of units such as that found in posterior-central regions of the cerebral cortex [1, 2]. In contrast, large systems with fault lines, or high levels of indeterminism and/or degeneracy, break down into many small complexes, which will necessarily have very low values of Φ. This should be the case for many other regions of the brain, such as the cerebellum and much of prefrontal cortex, for other parts of the body, and certainly for artificial systems such as computers. The expected hyper-astronomical difference in Φ values between these substrates is essential for providing some principled guidelines about the occurrence of consciousness in nature and for informing the ongoing debate about panpsychism [10].

1.1 Outline

The measures to quantify integrated information of distinctions φd and relations φr that capture the postulates of IIT are presented in [1] and are discussed in detail in Section 2.1 and Table 1. According to IIT, the causal components within the system must satisfy the same properties as the system except composition: they must have cause-effect power within the system (intrinsically), select a specific state (information) in a way that cannot be accounted for by their parts (integration) and over a definite set of units (exclusion) [1]. Formalism of IIT enables us to quantify the causal irreducibility of each subset of system. A set of units in its current state, also referred to as a mechanism, is irreducible if its causal powers cannot be accounted for by its parts. For that, the potential cause and the potential effect of each mechanism are identified and the irreducibility is quantified by measuring how such cause or effect can be accounted for only using the parts of the mechanism. A central quantity in the formalism of IIT is the mechanism integrated information, denoted by φ, which measures the causal irreducibility of a single mechanism [1, 11]. An irreducible mechanism, i.e., a mechanism with nonzero φ, with its corresponding cause and effect is referred to as a distinction.

thumbnail
Table 1. A summary of IIT concepts and the relevant notation used in this manuscript.

https://doi.org/10.1371/journal.pcbi.1012323.t001

IIT also provides us with a framework to quantify how different distinctions causally interact with each other by defining and measuring the strength of the relations among them [1, 5]. For example, distinctions might have overlapping effects, making them related. Finally, the causal powers of the whole system can be accounted for by its cause-effect structure, which is composed of its causal distinctions and causal relations that bind together the distinctions.

In this work we study the upper bounds achievable by these measures and how we can achieve them. Barbosa et al [11, 12] showed the integrated information of an individual distinction can increase by adding reliable, not noisy, units to it. Here, we study the maximum achievable integrated information of a single distinction of fixed size and the trade-offs when considering the system as a whole, with all its overlapping distinctions and their shared parts. In particular, we discus the following bounds:

  • In Section 2.1.1, we derive an upper bound for how much information a distinction can specify beyond each of its parts.
  • We use this bound to find the maximum integrated information achievable by a distinction, as well as a bound for the sum of integrated information of all the distinctions.
  • Then in Section 2.1.2 and Section 2.1.3, we will show why the distinctions of a system cannot all achieve their corresponding maximum φ. We further provide a numerical bound for a special class of systems with grid-like connectivity pattern.
  • The upper bounds for integrated information of relations, as well as the conditions necessary for achieving them, are presented in Section 2.2.

Section 3 provides numerical experiments and discussions on mathematical properties of the bounds and the constructions. Finally, implications of this study, a few open problems, and potential directions for research are discussed in Section 4.

2 Methods

2.1 Upper bounds for mechanism integrated information

Consider a stochastic system S consisting of N random variables {S1, S2, …, SN}. These random variables represent a system with transition probability matrix (TPM) defined as p(St+1 = st+1St = st), which denotes the probability that the system is in state st+1 at time t + 1 given the state of the system at time t. The mechanism integrated information, presented in [1], quantifies how much a mechanism MS in state mt constrains the state of its potential causes Zt−1S and its potential effects Zt+1S, above and beyond their parts. For that, a difference measure is developed that compares the probability distribution of a cause purview Zt−1 or an effect purview Zt+1, before and after partitioning the mechanism-purview pair into independent parts.

Formally, the effect repertoire πe(Zt+1Mt = mt) is defined as the probability distribution over the potential effect purview Zt+1, given that the mechanism is in state mt, and is obtained by causally marginalizing out the random variables outside the mechanism and purview, under the assumption that the variables at t + 1 are conditionally independent given all the variables at t [1, 11]. Having introduced the effect repertoire, the maximal effect state of mt within the purview Zt+1 is defined as: (1) where πe(z; M) is the unconstrained effect probability of state z ∈ ΩZ:

To avoid cluttering the notation, the time subscripts t + 1 and t are dropped here. Eq (1) gives us the state for which the mechanism m makes the most difference compared to chance. Since there is at least one state z ∈ ΩZ such that πe(zm) > πe(z; M), the maximal effect state will always be a state for which the mechanism increases the probability compared to the unconstrained probability. This is in line with our intuition that for a state to be caused by a mechanism, its probability needs to be increased by that mechanism. Given this maximal effect state, the integrated effect information of m quantifies how much difference the mechanism makes to this state above and beyond its parts, by comparing the effect repertoire for state , , with its partitioned version : (2) Here, |.|+ represents the positive part operator, which sets the negative values to 0. This reflects the intuition that the mechanism as a whole needs to increase the probability of its effect, compared to the mechanism’s parts. In S2 Appendix, we show that our results hold, even if we replace the positive part operator with the absolute value operator. is the effect repertoire calculated after partitioning the mechanism-purview pair into independent parts using the partition θ ∈ Θ(M, Z), Θ(M, Z) is the set of all the valid partitions, and θ′ is the minimum information partition (MIP). φe(m, Z) quantifies how much the mechanism as a whole increases the probability of the effect state compared to the MIP, which is obtained as: (3) is the maximum possible distance between and achievable by partition θ and its value will be derived shortly in Lemma 2. Normalizing the distance between and by makes the comparison between different partitions with different number of parts fair. This is because partitions that severe fewer causal connections tend to change the probability less. The MIP is the partition θ that makes the least difference to the effect repertoire Normalized by .

If there exists a partition for which the unpartitioned probability is less than or equal to the partitioned probability, the mechanism is reducible to its parts and the integrated effect information over purview Z is 0. We can further find the most irreducible purview by finding is the subset of units that mechanism m as a whole makes the most difference to. is the difference that m as a whole makes to its most irreducible purview, which is referred to as the mechanism integrated effect information. Similar analysis and procedure can be used to define the cause repertoire πc(Zt−1Mt = mt) and the integrated cause information φc(m) [1]. For a detailed description of the definitions for the cause side, see S1 Appendix. Finally, the overall integrated information of a mechanism is then defined as φ(m) = min{φe(m), φc(m)}. An irreducible mechanism with its maximal cause and effect purviews are referred to as a distinction.

It is evident that the difference measure of the form plays a central role in measuring φ. This was derived from the postulates of IIT and was first introduced in [12]. In short, this measure satisfies the following properties: (i) The measure differs from 0 only if the probability of the state is increased. (ii) The measure is not an aggregate over all the states and reflects how much change is made in an individual state. (iii) In a scenario where p(z1z2m) = p1(z1m)p2(z2m), q(z1z2m) = q1(z1m)q2(z2m), and p2(z2m) = q2(z2m), the measure produces a smaller value for the purview z1z2 than only z1. This scenario represents the case where m makes no difference to a subset of the purview z2, therefore having this subset in the maximal purview is discouraged.

We can look at this measure as the product of two terms. The first term, p(zm), is referred to as selectivity, while the second term, , is called informativeness. Adding new units can never increase the selectivity, therefore the measure is only increased if the new units increase the informativeness enough. Variations of this measure have been used to define integrated information of distinctions and systems [1, 13]. However, the question of how large these measures can get remains open. In this section, we first study the maximum integrated information achievable by a single distinctions. Then we show why this upper bound cannot be achieved by all the distinctions of a system, by studying a few important special cases.

Our working assumption to derive these bounds is that the system is realizable by a TPM that is a product of unit TPMs (conditional independence). This is a minimal assumption as both the definition of φ and causal marginalization process make use of such TPM [1, 11, 13]. The conditional independence reflect the assumption that the state of the units only depend on the previous time step and there is no instantaneous causation. Furthermore, we consider systems consisting of binary units, but the results are generalizable to non-binary units as well. In S2 Appendix, we show that our results still hold even if we use slightly different difference measures such as point-wise mutual information or Kullback–Leibler divergence.

2.1.1 Single mechanism.

Before discussing our main results, we first need to present a few helpful lemmas. Lemma 1 states how the process of causal marginalization places a certain limit on the informativeness. Formally, given the set of units outside the mechanism W = SM, the effect repertoire of a single unit ZiZ is defined as: (4) where |W| = N − |M| is the number of units outside the mechanism. Furthermore, the effect repertoire of Z is defined as: (5)

Using this definition, it can be shown that:

Lemma 1. Given two mechanisms and M, such that , and a single unit Zi, we have:

All the proofs are provided in S3 Appendix. is the number of units that need to be causally marginalized to calculate from πe(Zi = zim), which can also be thought of as the number of causal connections cut from the unit Zi. Throughout this manuscript, cutting a (causal) connection between a unit at t and a unit at t + 1 refers to recalculating the conditional probability distribution of the output by causally marginalizing out the input unit. Furthermore, this bound does not depend on the state of the purview unit zi and it holds for any state, not just the maximal state selected in Eq (1). The result in Lemma 1 can be generalized as:

Lemma 2. Given a mechanism M in state m, a purview Z in state z, and a partition θ, we have: where is the total number of connections cut by the partition θ.

This result is general in the sense that it holds for any partitioning that removes an arbitrary subset of connections, even if it is not a valid partition and does not divide the mechanism-purview pair into independent parts. Therefore, it does not depend on the constraints imposed on the partitions. Lemma 1 and Lemma 2 establish a connection between the number of connections severed by a partition and the value of the integrated information. This can help us to develop a more intuitive understanding of mechanism integrated information. In words, the mechanism integrated information, as defined in [1], counts the number of causal connections that need to be severed to disintegrate the mechanism into causally independent parts.

As shown in Eq (3), to find the partition that makes the least difference, we normalize the difference between the unpartitioned and partitioned repertoires by . This makes the comparison between the partitions that sever different number of connections fair. Using Lemma 2 and the fact that the informativeness term is a bound for the integrated information, we can readily state our first main result. Theorem 1 states that the integrated information of mechanism M over a purview cannot be larger than the total number of potential connections between them. In other words, to disintegrate a maximal integrated mechanism we need to severe all causal connections between the mechanism and the purview. This holds both for the cause and the effect sides.

Theorem 1. For a mechanism MS in state m, a candidate cause purview C, and a candidate effect purview E, we have: where |E| and |C| denote the size of the candidate effect and cause purviews, respectively.

This bound is achievable and the conditions to achieve it are presented in the proof. Theorem 1 provides us with our first upper bound for the sum of integrated information of all the distinctions of a system: (6)

And, if we are interested in the upper bound over unique purviews: (7)

Inequality (a) follows from the fact that for any two sets S1 and S2, |S1|2 + |S2|2 ≥ |S1||S2| + |S2||S1|. This means that in a scenario where each purview can be assigned to only one mechanism, matching the sizes of the mechanisms and purviews maximizes the upper bound. This scenario is an important special case, as it was studied in [5] to explain how IIT can be used to account for the quality of certain type of experiences, such as the spatial experience. Both of these bounds consist of a quadratic term and an exponential term. This is because the total number of connections in a system grows quadratically and the number of subsets/mechanisms grows exponentially with the number of units in the system. This also emphasizes the fact that the notion of integrated information is fundamentally different from the Shannon information encoded in the state of the system. Shannon information of a system consisting of N binary variables can at most be N. In the next section, we will show due to the constraints imposed by overlapping distinctions of different sizes, also referred to as mechanisms of different orders, these bounds are not achievable.

It is also worthwhile to mention that the results provided here for a single mechanism are in line with the bounds for the system integrated information defined in [13]. As shown in Theorem 1 in [13], the maximum achievable system integrated information for a given partition is the number of connections cut by that partition. This makes the overall maximum system integrated information the maximum number of connections cut by any valid cut, which is |S|(|S| − 1) for the system partitions considered in [13].

2.1.2 Inter-order constraints.

In this section, we discuss how mechanisms that are either subsets or supersets of each other place certain constraints on the maximum integrated information achievable by them. We refer to this set of constraints as inter-order constraints, as they are resulted from interactions among mechanisms of different order/size. The results presented in this section formalize another property of integrated information: the tension between the parts and the whole. We will show that either the whole or the parts can be maximally integrated. As already shown in Lemma 1, the levels of determinism for a single purview unit given two different mechanisms are bounded by each other, if one mechanism is a subset of the other. It can also be shown that if a mechanism M fully specifies its cause purview, i.e., πc(Z = z* ∣ m) = 1, then any superset of that mechanism also fully specifies the same purview, . The same holds for the effect side as well.

Lemma 3. Superset of a deterministic mechanism is deterministic. For two mechanisms and a single-unit purview Zi, If πe(Zi = zim) = 1, then and if πe(Zi = zim) = 0, then .

Such constraints can be exploited to show that the upper bound can only be achieved by a subset of the mechanisms in the system. Another important fact to keep in mind is that φe(m, Z = z) = |M||Z| is only achievable when πe(Z = zm) = 1. This is because this bound is derived for the informativeness term and is only achievable when the selectivity term, πe(Z = zm), is 1:

Lemma 4. If φe(m, Z) = |M||Z| then πe(Z = zm) = 1. Similarly, if φc(m, Z) = |M||Z| then πc(Z = zm) = 1.

This is a very strict constraint if we want to construct a system such that for all the mechanisms. In what follows, we show that even having a single mechanism-purview pair that achieves this upper bound makes it impossible for a significant number of other mechanism-purview pairs to achieve their maximum integrated information.

Theorem 2. , if φe(m, Z) = |M||Z| and

  • and , OR
  • and .

Theorem 2 states that any subset or superset of M cannot share purview units with M and still achieve the maximum, if φe(m, Z) = |M||Z|. Such mechanisms can only achieve their maximal integrated information over disjoint purviews. This makes it immediately clear why all the distinctions in a system larger than 1 unit cannot be maximally irreducible, as there are fewer disjoint purview sets than all the mechanisms. As discussed in S1 Appendix, the same results hold for the integrated cause information as well. This makes the bounds in (6) and (7) not achievable.

Corollary 1. All the distinctions in a system cannot be maximally integrated, if the system is composed of more than one unit.

This is in line with our intuitive definition of integrated information. φ captures the difference that a mechanism makes over and beyond its parts. Therefore, if parts (subsets) of a mechanism-purview pair are maximally irreducible, the pair itself cannot be. For example, if there exist a mechanism in the system that achieves maximum φe over the entire system, i.e. |M||Z| = |M|N, none of the subsets or supersets of that mechanism can achieve maximum φe over their corresponding purviews. Another special case is when every single-unit mechanism achieves φe = 1 over itself. Again, this makes it impossible for all the other distinctions to have maximum φe. In fact, in this case, the integrated effect information for the rest of the distinctions will be 0, as any mechanism-purview pair is reducible to its parts.

2.1.3 Intra-order constraints.

As outlined in the preceding sections, having a high selectivity is necessary to have a large integrated information. For example, Lemma 4 states that having selectivity of 1 is necessary to achieve the maximal integrated effect and cause information. This motivates us to study a special case that is particularly important. Assume that in a system consisting of N units, all the mechanisms of size K specify themselves with probability 1, i.e., πe(Z = z′ ∣ m) = 1 where Z = M. Since the mechanisms of size K are not subsets or supersets of each other, Theorem 2 does not hold. However, even in this setting, the intra-order constraints prevent the distinctions from achieving the maximum integrated information.

Theorem 3. In a system S consisting of N units, for a given mechanism size 1 < K < N, if πe(Z = z′ ∣ m) = 1, ∀M: |M| = K, and the purview units are the same as the mechanism units, i.e., Z = M, none of the mechanisms with |M| = K can achieve their maximum integrated effect information of |M||Z| = |M|2.

The theorem states that if all the mechanisms of size K fully specify themselves, their φ cannot achieve its maximum. The only exceptions are when there is only one such mechanism, K = N, or there is no overlap between the mechanisms K = 1. In S1 Appendix, we show that this result holds for the integrated cause information as well.

Analyzing this setup helps us to understand how mechanisms that are not parts of each other, but share parts, constrain each other. The proof for Theorem 3, which is provided in S3 Appendix, is a constructive proof. Thus, it also provides us with a procedure to construct a TPM that can achieve the maximum possible integrated information in this setting. The main idea behind the proof, and therefore the construction, is to use the definition of the effect repertoire in (4) and (5) to characterize the TPM that can satisfy the assumptions of the theorem. This both gives us the TPM and limits the maximum attainable integrated effect information.

In S3 Appendix, we also show that the MIP for the system under consideration can be found by evaluating only partitions, significantly reducing the computations and making it feasible to calculate the integrated information of such distinctions in larger networks. In section 3, we show that the integrated effect information in this system is much less than the number of connections by numerically evaluating the candidate partitions. We can employ this linear time numerical evaluation to calculate the sum of integrated effect information over all the mechanisms in a hypothetical system where all the mechanisms of any size have themselves as the most irreducible purview with selectivity 1 (not just all the mechanisms of size K). This gives us a numerical upper bound for reflexive systems, i.e., systems in which every mechanism has itself as the purview. We will provide experiments to show how this computationally light numerical bound is tight and may in fact hold in general.

2.2 Upper bounds for relation integrated information

Any mechanism m with its maximally irreducible cause and effect purviews defines a causal distinction d(m) = (z*(m), φd(m)), where contains the maximally irreducible cause and effect purviews in their maximal states.

Causal relations are defined over subsets of the causal distinctions. If we define the set of all the valid distinctions of a candidate system as , a subset of distinctions forms a relation if the cause purview, the effect purview, or both the cause and the effect purviews of each distinction overlap over the same units in the same states. We use bold face to represent sets of distinctions, e.g., d. |d| is the number of distinctions in the relation. This definition includes the special case of a self-relation where the cause and effect purviews of the same distinction overlap congruently over a set of units. This means a system consisting of N units can potentially contain as many as 22N−1 − 1 causal relations. Any nonempty subset of units can define a distinction and any nonempty subset of distinctions can potentially define a relation. Here we discuss the bound on the integrated information of a single relation and the sum of relations’ integrated information, given a bound on the sum of distinctions’ integrated information. This makes the analysis in this section mostly independent of the results in the previous sections, as the final results holds for any bound on the sum of distinctions’ integrated information.

In [1], it is shown that we can write the relation integrated information of a set of distinctions with |d| ≥ 2 as: (8)

Both the intersection and the union operators consider both the units in the purview and the state of units. If there is no congruent overlap among the purviews of the distinctions, and φr(d) would be zero. For the special case of self-relations, |d| = 1, the relation integrated information of a distinction d is calculated as: (9)

Our first observation is that the relation integrated information of any subset of distinctions d cannot be larger than the smallest distinction’s integrated information

This is because the relation overlap is always smaller than the of all the distinctions involved. Therefore, to maximize the integrated information of an individual relation, we need to maximize the minimum integrated information of all the distinctions involved. In other words, we need to maximize the integrated information of all the distinctions.

However, as discussed in Section 2.1, due to inter-order and intra-order constraints, all the distinctions cannot achieve their maximum. Therefore, we study the problem of how we can maximize , given that is bounded by a certain value. We present the analysis in multiple steps:

  1. First, we rewrite the sum of the relations’ integrated information as a linear combination of distinctions integrated information,
  2. This turns our problem into a linear programming problem, as we are looking to maximize a linear combination of distinctions integrated information, given that the sum of them is bounded by some value. We then present the solution for this problem, which gives us the bound for for any given set of cause and effect purviews. This provides us with a measure to compare different cause effect structures. We also briefly discuss the implications of this results on optimal connectivity profile,
  3. We finally use the bounds derived in the previous section to calculate the growth rate of in terms of the number of units.

Step 1: In [1], it is shown that we can write as a linear combination of distinctions’ integrated information by defining as: (10)

Each element in is a tuple that contains both and φ of a distinction that contains o in either its cause or effect purview. o is an arbitrary unit in a arbitrary state in the system.

This definition helps us to rewrite as [1, S3 Text]: (11)

The total sum is reformulated as the sum of the integrated information of self-relations and a linear combination of of the distinctions. denotes sorted indexing of the elements in , such that (z(1), φ(1)) has the smallest ratio, (z(2), φ(2)) has the second smallest ratio, and so on. The first summation in the last term sums over all the units o in all their possible states.

To find an upper bound of for any given set of cause and effect purviews, we can simplify the objective function by finding the maximum of each term in the sum separately. The maximization is over the values of the distinction integrated information: (12)

Step 2: The first term is bounded by

This is because, as we just discussed, the integrated information of a self-relation is less than its distinction integrated information.

We can find the maximum for the second term in the objective function by solving the following problem for a given o: (13)

Even if the value of S(o) is not known, we can find the optimal distribution of over all the distinctions, study the optimal connectivity profile, and analyze the trade-off between S(o) and . In S3 Appendix, we provide the solution to this problem and the conditions to achieve the bound. In short, we show that a system can achieve the maximal value, if the distinctions with the same have the same value of integrated information. Systems with random connectivity cannot satisfy this symmetry and therefore they cannot have very large values of . On the other hand, the systems that exhibit homogeneous grid-like symmetries are more likely to satisfy it, as many subsets of units with same purview size have the same connectivity pattern, and therefore the same integrated information. We also show that the maximum achievable value for the objective function of (13) is (14)

Plugging this result back into (12), we get: (15) which describes the trade-off between the number of distinctions that contain an specific purview unit o, , and the sum of their , S(o). In general, due to the inter- and intra-order constraints, increasing might decrease S(o). However, since the bound increases almost exponentially in , in most cases we can increase the bound by increasing for all o. This means that a system is more likely to achieve large values of , if its distinctions have bigger purviews. Therefore, densely connected grid-like systems are the candidate systems for high values of .

Step 3: To calculate a growth rate and a bound as a function of only the number of units in the system, we can study the following extreme case: We know that cannot be larger than the total number of distinctions, i.e., To find an upper bound for S(o), we can use one of the bounds derived in Section 2.1.1, i.e., φe ≤ |M||ze|. Since for any distinction φφe and , we have . Therefore:

Using the above values for and S(o), as well as the bound in (6), we get: (16)

Since achieving φe = |M||ze| is not possible for all the distinctions, this bound is not tight, but it gives us the growth rate of . Alternatively, we could have arrived at the same rate of growth by noticing that the maximum number of relations is 22N−1 − 1 and the maximum value for distinction integrated information is N2.

The solution studied in this section translates the problem of finding a bound for to the easier of problem finding a bound or a closed form solution for S(o). In other words, Eq (15) readily gives us the bound for , for any given bound for S(o). Deriving a tighter general bound for S(o), i.e., any arbitrary set of distinctions that share a common purview unit, leads to a tighter bound for and remains an open problem.

3 Results

In this section, we provide numerical evaluations of the bounds discussed above, as well as experiments illustrating how tight the bounds are. The calculations were performed using the freely available PyPhi toolbox [14]. The code for PyPhi is available at https://github.com/wmayner/pyphi. The code for the experiments presented here is available at https://github.com/zaeemzadeh/IIT-bounds. Since the values of mechanism integrated information and relation integrated information are state-specific, the bounds and the optimal TPM constructions are state-dependent. This means that if a system or mechanism is optimal in one state, it is not necessarily optimal in other states. Here, we show the results for the optimal state, as we are interested in the bounds and the maximal values. Fig 1A shows the distinction integrated information of distinctions in the system discussed in Section 2.1.3. The solid lines represent the integrated effect information, , of a mechanism of size K over itself, if all such mechanisms specify themselves with probability 1. In this setting the mechanism size is same as the purview size. The dotted line represent the upper bound if we consider the mechanisms in isolation, which is simply the number of connections, i.e., K2. For mechanism size of K = N and K = 1, where there is no overlap among the mechanisms, this upper bound is achievable. However, when the mechanisms share parts with each other, the integrated effect information cannot get close to the K2 bound. For a fixed mechanism size K, if we increase the system size N, the maximum achievable φe decreases. This is because in a larger system, there are more mechanisms of size K, making each mechanism compete with combinatorially more mechanisms. This figure also shows that for a fixed system size N, the maximum achievable φe increases with the mechanism/purview size. However, this can be misleading. For example, although the mechanisms of size K = N can achieve the largest φe value, they are the least numerous mechanisms.

thumbnail
Fig 1. Upper bounds for integrated effect information φe vs the value achieved by the proposed construction, for different mechanism/purview size K = 1, …, 12.

In this setting, the mechanism is same as the purview. The dotted line represents the upper bound if we consider each mechanism in isolation, which is the number of connections. The solid lines represent the integrated effect information achieved the proposed TPM construction, . (A) for a single mechanism of size K and (B) summed over all the mechanisms of size K in a system of size N. This figure shows that the derived bound is achievable when the mechanisms do not share parts. Also, in larger systems, mechanisms of size close to achieve the maximum , as they are the most numerous ones.

https://doi.org/10.1371/journal.pcbi.1012323.g001

To emphasize this point, Fig 1B illustrates the sum of integrated effect information of all the mechanisms of size K, i.e., . This figure shows that although larger mechanisms can achieve higher φe, if the goal is to maximize ∑φe, it is better to maximize the integrated information of the more numerous mechanisms. For example, for the system size of N = 12, the maximum is achieved when K = 7, which also illustrates the trade-off between the achievable φe and the number of mechanisms. Although mechanisms of size 7 are less numerous than mechanisms of size 6, each of them can achieve larger integrated effect information.

Fig 1B exhibits the maximum achievable ∑φe(m) for a specific mechanism size, when all such mechanisms fully specify themselves. Fig 2 shows the sum of integrated information over all the mechanisms, if mechanisms of certain size fully specify themselves. For each system size N, N different TPMs are constructed, each representing a system where all the mechanisms of size K have selectivity of 1, K = 1, …, N. The dotted lines represent . This value represents a hypothetical system in which all the mechanisms have selectivity of 1 over themselves and can achieve their maximum φe. We can use this value as a numerical upper bound.

thumbnail
Fig 2. Sum of integrated effect information ∑MS φe(m) and sum of integrated information ∑MS φ (m) for a system in which all the mechanisms of size K specify themselves with probability 1 i.e., selectivity is 1, for different system sizes N = 4, …, 8.

In this setting, the mechanism is same as the purview. The dotted line is the numerical bound derived in Section 2.1.3 the solid line is ∑MS φe(m) achieved by the proposed TPM construction, and markers denote ∑MS φ(m) achieved by the construction. The bound is a tight upper bound for ∑MS φe(m), and ∑MS φe(m) is the same as ∑MS φ(m). Furthermore, in larger systems, both of the sums are maximized when mechanisms of size close to have selectivity of 1.

https://doi.org/10.1371/journal.pcbi.1012323.g002

As shown in S3 Appendix, to evaluate , we need to compute only partitions, making the computational complexity of calculating quadratic in N. The solid lines in the figure is the achieved sum of integrated effect information ∑MS φe(m), and the markers represent the achieved sum of integrated information ∑MS φ(m) = ∑MS min{φe(m), φc(m)}. It is evident that the sum bound is tight, as ∑MS φe(m) and ∑MS φ(m) can get very close to the bound. This figure shows that we can exploit the symmetries and the constraints on the system to find good approximations of ∑MS φ(m) with significantly less computation.

Tables 2 and 3 summarize the bounds discussed in this paper. Bound I and Bound II are the bounds derived in Eqs (6) and (7), respectively. They represent the absolute upper bounds without considering the inter- and intra-order constraints. To calculate their corresponding bound of the sum of relations’ integrated information, the closed form sum in Eq (11) is used, excluding the self-relation term for a less cluttered presentation. It is worthwhile to mention that each of the rows in Table 3 can be used to calculate its corresponding bound on Φ, which is the sum of integrated information of all the distinctions and the relations.

thumbnail
Table 2. The bounds for integrated information of an individual distinction, relation, and system.

https://doi.org/10.1371/journal.pcbi.1012323.t002

thumbnail
Table 3. The bounds for the sum of distinctions integrated information ∑φd and the sum of relations integrated information ∑φr.

https://doi.org/10.1371/journal.pcbi.1012323.t003

Bound III corresponds to the hypothetical system discussed in Section 2.1.3, where all the mechanisms can achieve the maximal integrated effect information over themselves while having a selectivity of 1. As discussed earlier, can be used as a tight upper bound for the sum of distinctions’ integrated information in such system. To numerically calculate its corresponding bound for ∑φr, we further assumed Zc = S for all the distinctions and used Eq (11). This assumption makes the number of relations in this system the same as the number of relations in Bound I, as all the distinctions are related over their cause purviews. This makes the ∑φr in Bounds I and III grow at a similar rate. Next, we will show Bound III is tight for both ∑φd and ∑φr of the proposed construction and may in fact be a general bound.

Fig 3 illustrates how the upper bounds and the maximum achievable ∑MS φ(m) by different TPM construction methods grow as we increase the system size N. The TPMs are constructed by the following methods: (i) High selectivity refers to the construction discussed in Section 2.1.3 and Theorem 3. As shown in S3 Appendix, the TPM corresponding to this construction only contains 0s and 1s. (ii) Random deterministic construction fills the TPM entries with only 0s and 1s at random. The probability of an entry being one is drawn from a uniform distribution over [0, 1]. (iv) Hamming is the decoding procedure of the famous (7, 4) Hamming code. In a system of 7 units, the state at time t + 1 is the closest valid codeword to the state at time t with probability 1. This construction has been suggested as a good candidate system for high integrated information [15].

thumbnail
Fig 3. Growth of different upper bounds for the sum of distinctions’ integrated information w.r.t the system size N, as well as the maximum achievable ∑MS φ(m) by different TPM construction methods.

The derived numerical upper bound grows almost exactly as 2N. ∑MS φ(m) achievable by the proposed construction grows exponentially with N as well. The deterministic TPMs, i.e., TPMs containing only 0s and 1s, outperform nondeterministic TPMs. The scale of the y-axis is logarithmic.

https://doi.org/10.1371/journal.pcbi.1012323.g003

Fig 3 shows that both the numerical upper bound and the achievable values for the proposed construction scale almost exponentially with the system size N and at the rate of . For both of the random constructions, the maximum sum achieved over 100 runs is reported in the figure. This figure shows that generally deterministic TPMs achieve higher sum of integrated information, compared to nondeterministic TPMs. This is because to achieve large φ, selectivity close to 1 is necessary (see Lemma 4) and to have selectivity of 1, TPM entries need to be 1 or 0 (see S3 Appendix). This figure also suggests that the numerical upper bound derived in Section 2.1.3, i.e., , might be a general upper bound for any system, not just for high selectivity regime. The generality of this upper bound remains to be investigated.

Finally, Fig 4 illustrates the bounds and the achievable values for the sum of integrated information of relations in a system. Similar to the distinctions case the proposed construction can get very close to its bound. Furthermore, this figure shows that higher sum of distinctions integrated information does not translate to higher sum of relations integrated information. For example, unlike Fig 3, bound III is larger than bound II in Fig 4. This is because having larger purviews increases the number of relations exponentially, making the purview sizes, not the integrated information value, the dominant factor in the sum of relations integrated information. Due to the super exponential growth of the number of the relations in terms of the number of the units, we used double-logarithmic scale for the y-axis.

thumbnail
Fig 4. Growth of different upper bounds for the sum of relations’ integrated information w.r.t the system size N, as well as the maximum achievable values by different TPM construction methods.

The derived numerical upper bound and ∑MS φ(m) achievable by the proposed construction grow super exponentially with N. The scale of the y-axis is double-logarithmic.

https://doi.org/10.1371/journal.pcbi.1012323.g004

4 Discussion and future directions

As a theory of consciousness, Integrated Information Theory (IIT) [1, 2] introduces a mathematical framework that quantifies the causal powers of systems and subsets of units in a system. The methods of IIT have inspired activity in the neuroscience [1620], complexity science [2124], and physics [15, 25, 26]. In short, IIT identifies the essential properties of experience and provides a framework to quantify to what extent a physical system complies with such properties in terms of its causal powers. These properties are intrinsicality, information, integration, exclusion, and composition [1]. This work is mainly focused on the last property, i.e., composition, meaning that we are concerned with how the subsets of units in the system specify causes and effects over subsets of units and how these causes and effects are related. Our results are applicable even if a candidate system does not satisfy all the essential properties.

Since the exact calculation of integrated information in larger biological systems is not feasible, theoretical investigation of different connectivity profiles or approximate calculations are necessary. The results and the proofs presented in this work are a step toward practical application of IIT to real-world systems in several ways. First, we provide the growth rate of several measures as a function of the number of units under certain assumptions. Such results and techniques can be used to derive analytical forms for other connectivity profiles.

Second, we provide techniques to exploit the symmetries and the repeating patterns in connectivity to simplify the computation of the integrated information in certain families of systems. Similar techniques can be used to derive approximate measures for any biological system with repeating patterns of connectivity. For instance, In S3 Appendix, we show that the normalized partitioned integrated information can be simplified to average information gain over all the severed connections. This can potentially be used in conjunction with exact or approximate subset selection algorithms to find the optimal partition in a more computationally feasible manner. In this work, we used this technique and the symmetries in the system to reduce the number of candidate partitions significantly, from more than exponential to linear in size of the mechanism.

Finally, our results provide us with qualitative insights about the connectivity profile of the biological systems that can achieve higher values of integrated information. For example in the proof of Theorem 3, we provide a TPM construction method that maximizes the sum of distinction integrated information under certain assumptions. Similar techniques can be used to design optimal TPMs and study biological systems satisfying other assumptions. Our analysis can also be helpful to compare different families of networks, e.g., random vs homogeneous connectivity. This can lead to better understanding of which areas of brain can have higher integrated information. For example, in Section 2.2, we derive a bound for the sum of relations integrated information for any set of cause and effect purviews. Our results showed that the sum of integrated information of relations is maximized when the distinctions’ integrated information are proportional to the size of their purviews. This means distinctions with the same size of purview should have the same value of integrated information. Such regularity is not attainable in systems with random connectivity profile, but can potentially be achieved in systems with homogeneous grid-like connectivity.

Theoretical investigation of any measure can help us to sharpen our intuition and deepen our understanding of that measure. For example, optimal coding theorems establish the connection between the Shannon entropy measure and the minimum number of bits assigned to symbols. This makes Shannon information theory more tangible and easier to understand. Similarly, in this paper, we show that the value of mechanism integrated information, system integrated information, and the information loss by any partition are closely related to the number of causal connections. This gives us a more intuitive understanding of their behavior. More importantly, the results discussed here are utilized in the development of IIT. For example, in IIT, to fairly compare partitions that severe different number of causal connections, the value of the information loss corresponding to each partition is normalized by the maximum information loss achievable by such partition. The formal proof for the upper bound achievable by any partition is presented in Section 2.1.1.

More generally, our results have implications outside the field of theoretical neuroscience, such as finding the signatures of complexity in biological and artificial networks. There has been increasing interest in IIT as a formal framework to study complex systems and non-linear dynamics [2124, 27]. Therefore, our results and analysis can also be extended to the field of complexity science and to potentially derive approximate or heuristic measures of complexity [2830]. In S2 Appendix, we show that the results provided here are still relevant even if other distance measures, such as point-wise mutual information or KL divergence, are used for quantifying the integrated information.

There are also many questions left open to be explored. The results provided here are state-dependent, meaning that they are achievable only in one state of the system and/or the mechanism. The problem of finding the conditions under which the systems and the mechanisms can achieve high integrated information in more than one state is open to investigation. Also, as shown in Section 2.1, we can make the distinction bound tighter by considering the inter- and intra-order constraints. In Section 2.1.3, we developed a linear time method to evaluate the integrated information in a special class of systems and numerically showed that the this bound is much smaller than the bound without considering the constraints. Finding a closed form solution for such systems remains an open problem. In general, finding a closed form solution for the bound on sum of the integrated information of any arbitrary subset of distinctions is an interesting problem and can be used in conjunction with our results in Section 2.2 to derive tighter bounds for sum of integrated information of relations.

Furthermore, in our derivation of the bounds we mainly focused on the integrated effect information, as the bound for φe is a bound for φd as well. In S1 Appendix, we show that most of our results are applicable to the integrated cause information as well. However, there are dependencies among the cause and effect side that can potentially be used to make the overall bounds tighter.

Supporting information

S1 Appendix. Integrated cause information.

Definition of the integrated cause information and description of how the results for integrated effect information can be translated to the integrated cause information.

https://doi.org/10.1371/journal.pcbi.1012323.s001

(PDF)

S2 Appendix. Using other difference measures.

Describes how the results change if other difference measures, such as KL divergence, are used for comparing the distributions.

https://doi.org/10.1371/journal.pcbi.1012323.s002

(PDF)

Acknowledgments

The authors thank Larissa Albantakis, William Marshall, Leonardo Barbosa, and Erick Chastain for helpful discussions and comments on earlier drafts of this article, and Will Mayner for maintaining PyPhi, his advice on implementing the experiments, and his helpful comments.

References

  1. 1. Albantakis L, Barbosa L, Findlay G, Grasso M, Haun AM, Marshall W, et al. Integrated information theory (IIT) 4.0: Formulating the properties of phenomenal existence in physical terms. PLOS Computational Biology. 2023;19(10):e1011465. pmid:37847724
  2. 2. Tononi G, Boly M, Massimini M, Koch C. Integrated information theory: from consciousness to its physical substrate. Nature Reviews Neuroscience 2016 17:7. 2016;17(7):450–461.
  3. 3. Massimini M, Ferrarelli F, Huber R, Esser SK, Singh H, Tononi G. Breakdown of cortical effective connectivity during sleep. Science. 2005;309(5744):2228–2232. pmid:16195466
  4. 4. Pigorini A, Sarasso S, Proserpio P, Szymanski C, Arnulfo G, Casarotto S, et al. Bistability breaks-off deterministic responses to intracortical stimulation during non-REM sleep. NeuroImage. 2015;112. pmid:25747918
  5. 5. Haun A, Tononi G. Why Does Space Feel the Way it Does? Towards a Principled Account of Spatial Experience. Entropy 2019, Vol 21, Page 1160. 2019;21(12):1160.
  6. 6. Comolatti, R and Grasso, M and Tononi, G. Why does time feel the way it does?; forthcoming.
  7. 7. Casarotto S, Comanducci A, Rosanova M, Sarasso S, Fecchio M, Napolitani M, et al. Stratification of unresponsive patients by an independently validated index of brain complexity. Annals of Neurology. 2016;80(5):718–729. pmid:27717082
  8. 8. Sarasso S, D’Ambrosio S, Fecchio M, Casarotto S, Viganò A, Landi C, et al. Local sleep-like cortical reactivity in the awake brain after focal injury. Brain. 2020;143(12):3672–3684. pmid:33188680
  9. 9. Findlay G, Marshall W, Albantakis L, Mayner WGP, Koch C, Tononi G. Dissociating Intelligence from Consciousness in Artificial Systems–Implications of Integrated Information Theory. In: Proceedings of the 2019 Towards Conscious AI Systems Symposium, AAAI SSS19; 2019.
  10. 10. Tononi G, Koch C. Consciousness: here, there and everywhere? Philosophical Transactions of the Royal Society B: Biological Sciences. 2015;370 (1668). pmid:25823865
  11. 11. Barbosa LS, Marshall W, Albantakis L, Tononi G. Mechanism Integrated Information. Entropy 2021, Vol 23, Page 362. 2021;23(3):362. pmid:33803765
  12. 12. Barbosa LS, Marshall W, Streipert S, Albantakis L, Tononi G. A measure for intrinsic information. Scientific Reports 2020 10:1. 2020;10(1):1–9.
  13. 13. Marshall W, Grasso M, Mayner WGP, Zaeemzadeh A, Barbosa LS, Chastain E, et al. System Integrated Information. Entropy 2023, Vol 25, Page 334. 2023;25(2):334.
  14. 14. Mayner WGP, Marshall W, Albantakis L, Findlay G, Marchman R, Tononi G. PyPhi: A toolbox for integrated information theory. PLOS Computational Biology. 2018;14(7):e1006343. pmid:30048445
  15. 15. Tegmark M. Consciousness as a state of matter. Chaos, Solitons & Fractals. 2015;76:238–270.
  16. 16. Casali AG, Gosseries O, Rosanova M, Boly M, Sarasso S, Casali KR, et al. A theoretically based index of consciousness independent of sensory processing and behavior. Science Translational Medicine. 2013;5(198). pmid:23946194
  17. 17. Haun AM, Oizumi M, Kovach CK, Kawasaki H, Oya H, Howard MA, et al. Conscious Perception as Integrated Information Patterns in Human Electrocorticography. eNeuro. 2017;4(5). pmid:29085895
  18. 18. Gallimore AR. Restructuring consciousness -The psychedelic state in light of integrated information theory. Frontiers in Human Neuroscience. 2015;9(June):346. pmid:26124719
  19. 19. Lee U, Kim S, Noh GJ, Choi BM, Mashour GA. Propofol Induction Reduces the Capacity for Neural Information Integration: Implications for the Mechanism of Consciousness and General Anesthesia. Nature Precedings 2008. 2008; p. 1–1.
  20. 20. Leung A, Cohen D, Van Swinderen B, Tsuchiya N. Integrated information structure collapses with anesthetic loss of conscious arousal in Drosophila melanogaster. PLOS Computational Biology. 2021;17(2):e1008722. pmid:33635858
  21. 21. Albantakis L, Hintze A, Koch C, Adami C, Tononi G. Evolution of Integrated Causal Structures in Animats Exposed to Environments of Increasing Complexity. PLOS Computational Biology. 2014;10(12):e1003966. pmid:25521484
  22. 22. Albantakis L. Quantifying the Autonomy of Structurally Diverse Automata: A Comparison of Candidate Measures. Entropy 2021, Vol 23, Page 1415. 2021;23(11):1415. pmid:34828113
  23. 23. Mediano PAM, Rosas FE, Farah JC, Shanahan M, Bor D, Barrett AB. Integrated information as a common signature of dynamical and information-processing complexity. Chaos. 2022;32(1):13115.
  24. 24. Tajima S, Kanai R. Integrated information and dimensionality in continuous attractor dynamics. Neuroscience of Consciousness. 2017;2017(1). pmid:30042844
  25. 25. Barrett AB. An integration of integrated information theory with fundamental physics. Frontiers in Psychology. 2014;5(FEB):63. pmid:24550877
  26. 26. Albantakis L, Prentner R, Durham I. Computing the Integrated Information of a Quantum Mechanism. Entropy 2023, Vol 25, Page 449. 2023;25(3):449.
  27. 27. Niizato T, Sakamoto K, Mototake Yi, Murakami H, Tomaru T, Hoshika T, et al. Finding continuity and discontinuity in fish schools via integrated information theory. PLOS ONE. 2020;15(2):e0229573. pmid:32107495
  28. 28. Oizumi M, Tsuchiya N, Amari SI. Unified framework for information integration based on information geometry. Proceedings of the National Academy of Sciences of the United States of America. 2016;113(51):14817–14822. pmid:27930289
  29. 29. Kanwal MS, Grochow JA, Ay N. Comparing Information-Theoretic Measures of Complexity in Boltzmann Machines. Entropy 2017, Vol 19, Page 310. 2017;19(7):310.
  30. 30. Nilsen AS, Juel BE, Marshall W. Evaluating Approximations and Heuristic Measures of Integrated Information. Entropy 2019, Vol 21, Page 525. 2019;21(5):525.