Fast and exact search for the partition with minimal information loss

In analysis of multi-component complex systems, such as neural systems, identifying groups of units that share similar functionality will aid understanding of the underlying structures of the system. To find such a grouping, it is useful to evaluate to what extent the units of the system are separable. Separability or inseparability can be evaluated by quantifying how much information would be lost if the system were partitioned into subsystems, and the interactions between the subsystems were hypothetically removed. A system of two independent subsystems are completely separable without any loss of information while a system of strongly interacted subsystems cannot be separated without a large loss of information. Among all the possible partitions of a system, the partition that minimizes the loss of information, called the Minimum Information Partition (MIP), can be considered as the optimal partition for characterizing the underlying structures of the system. Although the MIP would reveal novel characteristics of the neural system, an exhaustive search for the MIP is numerically intractable due to the combinatorial explosion of possible partitions. Here, we propose a computationally efficient search to precisely identify the MIP among all possible partitions by exploiting the submodularity of the measure of information loss, when the measure of information loss is submodular. Submodularity is a mathematical property of set functions which is analogous to convexity in continuous functions. Mutual information is one such submodular information loss function, and is a natural choice for measuring the degree of statistical dependence between paired sets of random variables. By using mutual information as a loss function, we show that the search for MIP can be performed in a practical order of computational time for a reasonably large system (N = 100 ∼ 1000). We also demonstrate that MIP search allows for the detection of underlying global structures in a network of nonlinear oscillators.


Introduction
The brain can be envisaged as multi-component dynamical systems, in which each of individual components interacts with each other.One of the goals of system neuroscience is to identify a group of neural units (neurons, brain area, and so on) that share similar functionality [1][2][3][4].
Approaches to identify such functional groups can be classified as "external" or "internal".In the external approach, responses to external stimuli are measured under the assumption that a group of neurons share similar functionality if their responses are similar.A vast majority of studies in neuroscience have indeed used the external approach, by associating the neural function with an external input to identify groups of neurons or brain areas with similar functionality [5].
On the other hand, the internal approach measures internal interactions between neural units under the assumption that a group of neurons with similar functionality is connected with each other.Use of the internal approach has rapidly grown following recent advancements in simultaneous recording techniques.It is undoubtedly important to elucidate how neurons or brain areas interact with each other in order to understand various brain computations.In particular, to understand awareness or consciousness, the internal approach is preferable to the external approach because consciousness cannot be essentially associated with any external signals.For example, dreaming is a phenomenon which clearly tells us that consciousness occurs regardless of any external signals [6].
Consistent with this, the Integrated Information Theory (IIT) of consciousness is constructed to focus on the internal interactions in a neural system [7][8][9].IIT states that the prerequisite for consciousness is integration of information realized by the internal interactions of neurons in the brain.IIT proposes to quantify the degree of information integration by an information theoretic measure, "integrated information" and hypothesizes that integrated information should be related to the level of consciousness [10].
Conceptually, integrated information quantifies the degree of interaction between parts or, equivalently, the amount of information loss caused by splitting a system into parts [7,8,11].By using a measure of information loss such as integrated information, we can find functional groups of neural units using the criterion of "minimal information loss".For example, consider the system consisting of 4 neurons shown in Fig. 1(a).The two neurons on each of the left and right sides are connected with each other but do not connect with those on the opposite side.The natural inclination is to partition the system into left (orange) and right (blue) subsystems, as shown in Fig. 1(a).This critical partition can be identified by searching for that partition where information loss is minimal or integrated information is minimized.In IIT, the partition with minimal information loss is called the "Minimum Information Partition (MIP)" [7].In fact, if a system is partitioned with MIP as in the example system, integrated information is 0 because there is no information loss.If the system is partitioned in a different way than MIP, as shown in Fig. 1(b), integrated information is non-zero because there are connections between the top (orange) and the bottom (blue) subsystems.This is not the optimal grouping of the system from the viewpoint of information loss.
With the MIP approach, we can quantify "global" integration in a whole system rather than "local" integration in subsystems.The system in Fig. 1 is considered to be a system in which information is locally integrated within each left or right subsystem but is not globally integrated between the left and right subsystems.Integrated information with the MIP shown in Fig. 1(a) quantifies global integration, which is zero and integrated information with the partition shown in Fig. 1(b) quantifies local integration, which is non-zero.IIT considers that a system's global integration determines consciousness as a whole system.Aside from its relevance to consciousness, MIP could also be used to identify functional groups of neural units from the viewpoint of global information loss.
In a general case in which there is no obvious clear-cut partition (Fig. 1(c)), an exhaustive search for the MIP would take an exceptionally large computational time which increases exponentially with the number of units.This computational difficulty hinders the use of MIP-based characterization of a system, although the theoretical idea is attractive to research into consciousness and other fields of neuroscience, as well as to network science in general.
In this study, we show that the computational cost of searching for the MIP can be reduced to the polynomial order of the system size, if a measure of integrated information is a submodular function.We utilize one of the submodular optimization, the Queyranne's algorithm [12], and show that the exponentially large computational time is drastically reduced to O(N 3 ) where N is the number of units.The algorithm proposed in this study is an exact search for the MIP, unlike previous studies which found only the approximate MIP [13,14].This algorithm makes it feasible to find MIP-based functional groups in real neural data such as multi-unit recordings, EEG, ECoG, etc.., which typically consist of ∼ 100 channels.
The paper is organized as follows.In the Section 2, we formulate the search for the MIP, and show that mutual information is one of the submodular functions, and that we can treat it as a measure of information loss for a bi-partition.In Section 3, we report on numerical case studies which demonstrate the computational time of this MIP search for analysis of a system-wise correlation and a utility.In Section 4, we discuss the potential use of the submodular search for other measures which are not exactly submodular.

Submodular function
We analyze a system with N ∈ N distinct components.Assume that each of the N components is a random variable, and denote the random variable of the i th components by x i for 1 ≤ i ≤ N .Denote the set of indices N := {1, 2, . . ., N } and the set of the N variables by we call it submodular (See [15] for a review of submodularity).Equivalently, for If −f (X) is submodular, we call it supermodular.Submodularity in discrete functions can be considered as an analogue of convexity in continuous functions.It has been shown that the minimization of submodular functions can be solved in polynomial order of computational time, circumventing the combinatorial explosion.In this study, we utilize submodular optimization to find the partition with minimal information loss (Minimum Information Partition (MIP)).

Minimum Information Partition (MIP)
We consider bipartition of the whole system X.X is divided into two parts M and M where M is a non-empty subset of the whole system X, M ⊂ X and M is the complement of M , i.e., M = X \ M .Note that bipartition (X, M ) is uniquely determined by specifying only one part, M , because the other part is determined as the complement of M .Minimum Information Partition (MIP), M MIP , is defined as the subset that minimizes the information loss caused by partition, indicated by a non-empty subset M ⊂ X, M MIP (X) := arg min where f (M ) is the information loss caused by a bipartition specified by the subset M .
IIT proposes several measures for information loss, which it terms "integrated information" [7,8,11].In terms of integrated information, MIP is defined as the partition where integrated information between the parts is minimized.
The number of possible bi-partitions for the system size N is 2 N −1 − 1, which grows exponentially as a function of the system size N .Thus, for even a modestly large number N of variables (N ∼ 40), exhaustively searching all bi-partitions is computationally intractable.

Information loss function
In this study, we use the mutual information between the two parts M and M as an information loss function, = where H(X) is the Shannon entropy [16,17] of a random variable X, H(X) := − x∈X P (x) log P (x).

PLOS
As we will show in the next section, the mutual information is a submodular function.The mutual information (Eq.5) is expressed as the KL-divergence between P (X) and the partitioned probability distribution Q(X) = P (M )P (M ) where the two parts M and M are forced to be independent, The Kullback-Leibler divergence measures the difference between the probability distributions and can be interpreted as the information loss when Q(X) is used to approximate P (X) [18].Thus, the mutual information between M and M (Eq.6) can be interpreted as information loss when the probability distribution P (X) is approximated with Q(X) under the assumption that M and M are independent [11].
In the first version of IIT (IIT 1.0), the mutual information was used as a measure of integrated information [7].Although different measures that take account of the dynamical aspects of a system are proposed in the later versions of IIT [8,9], we do not consider them in this study (see Discussion).

Submodularity of the loss functions
We will show that the mutual information (Eq.5) is submodular.To do so, we use the submodularity of entropy.The entropy H(X) is submodular because for which satisfies the condition of submodularity (Eq.2).
By straightforward calculation, we can find that the following identity holds for the loss function f (M ) = I(M ; M ).
Thus, from the submodularity of the entropy, the following inequality holds, which shows that f (M ) = I(M ; M ) is submodular.
A submodular system (X, f ) is said to be symmetric if f (M ) = f (X \ M ) for any subset M ⊆ X.It is easy to see that the mutual information is a symmetric submodular function from Eq. 5. When a submodular function is symmetric, the minimization of submodular function can be solved more efficiently.Applying Queyranne's algorithm [12] (See also Appendix in Section 4), we can precisely identify the bi-partition with the minimum information loss in computational time O(N 3 ).

Sequential MIP search
As introduced above, Queyranne's algorithm can find the minimum information bi-partition.If one wishes to partition the whole set into more than two subsets, one can apply the bi-partition procedure recursively.We implement sequential MIP search, in which each of subsets of the MIP found in a given set of variables is further partitioned at each recursive step until it satisfies certain condition (e.g., a given set size limit, a number of maximum number of steps, and so on).In Study 2 (Section 3.2), we demonstrate a sequential MIP search until it reaches the step with no further possible partition.

Numerical Studies
To demonstrate this search for the bi-partition with the minimal loss of information, we report here several case studies with artificial datasets.Throughout these case studies, we assume that the data is distributed normally.Under this assumption, we obtain the simple closed form where Σ X is the covariance matrix of the data, Σ M , Σ M is the covariance matrix of the variables in the subsets M and M , and |Σ| denotes the determinant of the matrix Σ.The computation of |Σ X | can be omitted because |Σ X | is constant across every step in the search and has no effect on the minimization of I M ; M .

Study 1: Computational time
In the first case study, we compare the practical computational time of the submodular search with that of the exhaustive search.We artificially generated a dataset consisting of 10,000 points normally distributed over N dimensional space for N = 2, 3, . . ., 400.Each dimension is treated as an element in the set.The exhaustive search is performed up to N = 16, but could not run in a reasonable time for the dataset with N = 17 or larger due to limitations of the computational resource.Up to N = 16, we confirmed that the submodular search found the correct MIPs indicated by the exhaustive search.Figure 2 (a) shows the semi-logarithm plot of the computation time of the two searches.The empirical computation time of the exhaustive search was closely along the line, T = 0.891 log 2 (N ) − 12.304.This indicated that the exhaustive search took an exponentially large computational time ≈ O(2 N ), which fits with the number of possible bi-partitions.Figure 2 (b) shows exactly the same results as the double-logarithm plot.In this plot, the computational time of the Queyranne's search was closely along the line log 2 T = 3.210 log 2 N − 18.722, which indicated that the Queyranne's search took cubic time ≈ O(N 3 ), as expected from the theory.With N = 1000, the Queyranne's search takes 9738 seconds of running time.The computational advantage of the Queyranne's search over the exhaustive search is obviously substantial.For example, even with a modest number of elements, say N = 40, the computational time of the exhaustive search is estimated to be 1.07 × 10 7 sec ≈ 123days while that of the Queyranne's search is only 1sec.

Study 2: Sequential MIP search
As a demonstration of sequential MIP search (Section 2.5), we consider a set of 8 random variables with the correlation matrix shown in Figure 3.There are four pairs, variables 1 and 2, 3 and 4, 5 and 6, and 7 and 8, with positive correlations, while any other pairs of variables show nearly zero correlation.This correlation matrix is constructed as follows: For each pair of correlated variables (such as the Variable 1 and 2), a set of 1000 samples (X, Y ) is drawn using the independent bi-variate normal distribution with the mean (0, 0) and the variance (1, 0.5), and the coordinate (X, Y ) is rotated by π/4 to (X ′ , Y ′ ).The eight dimensional dataset analyzed is constructed by concatenating the four pairs of variables, each of which is constructed in this way.
By applying the first MIP search, the system is partitioned into the pair of Variables 1 and 2, and the rest.As the pair of variables has no further meaningful partition, the other set of six variables is further partitioned.At this second step, the remaining six variables are partitioned into the pair of variables 3 and 4 and the rest.By recursively applying this procedure to the subset with a larger number of variables, we obtain the four subsets (1, 2), (3,4), (5,6), and (7,8).As expected from Figure 3, the sequential partitioning successfully re-discovers the underlying four groups of elements from the dataset.

Study 3: Nonlinear dynamical systems
In Study 3, we demonstrate how MIP changes depending on underlying network structures.For this purpose, we chose a nonlinear dynamical system in which multiple nonlinear components are chained on a line.Specifically, we construct a series of variants of the Coupled Map Lattice (CML) [19].Kaneko [19] analyzed the CML in which each component is a logistic map and interacts with the one or two other nearest components on a line, and showed the emergence of multiple types of dynamics in the CML.In this model, each component is treated as a nonlinear oscillator, and the degree of interaction between other oscillators can be manipulated parametrically.By manipulating the degree of interaction, we can continuously change the global structure of the CMLs from one coherent chain to two separable chains.We apply the MIP search for the CMLs with different interaction parameters, and test whether the MIP captures this underlying global structure of the network.Specifically, the CML is defined as follows.Let us write the logistic map with a parameter a by f a (x) := 1 − ax 2 .Let x i,t ∈ [0, 1] be a real number indicating the i th variable at t th time step for i = 1, 2, . . ., N, t = 0, 1, . . ., T .For each i = 1, . . ., N , the initial state of the variable x i,0 is set to a random number drawn from the uniform distribution on [0, 1].For t > 0, we set the variables with the lateral connection parameter ǫ ∈ [0, 1] by x 1,t = (1 − ǫ)f a (x i,t−1 ) + ǫf a (x i+1,t−1 ), x N,t = (1 − ǫ)f a (x i,t−1 ) + ǫf a (x i−1,t−1 ), According to the previous study [19], the defect turbulence pattern in the spatio-temporal evolution in (x) i,t is observed with the parameter a = 1.8950 and ǫ = 0.1.In this study, we additionally introduce the "connection" parameter between the variables i = 20, 21 among N = 30 variables (Figure 4(a)).Namely, with the connection parameter δ, we redefine variables 19, 20, 21 and 22 by x 20,t := (1 − ǫ)f a (x With the connection parameter δ = 1/2, this model is identical to the original CML above, and with δ = 0, it is equivalent to the two independent CMLs of (x 1 , . . ., x 20 ) and (x 21 , . . ., x 30 ), as it has no interaction between variable 20 and 21.Given a sufficiently small connection parameter 0 ≤ δ < 1/2, we expect that the MIP would separate the system into the subsets {1, 2, . . ., 20} and  {21, 22, . . ., 30}because the degree of the interaction between units 20 and 21 is the smallest.On the other hand, if the system is fully connected, which happens when δ = 1/2, we expect that the MIP would separates the system into the subsets {1, 2, . . ., 15} and {16, 17, . . ., 30} (in the middle of 30 units), due to the symmetry of connectivity on the line.The correlation matrices for different connection parameters δ from 0 to 1/2 are shown in Figure 4.In the figure, the crossed lines show the expected separation between variables 20 and 21.We found the block-wise correlation patterns in the matrix with δ = 0, as expected; similar but less clear patterns with δ = 0.25; and no clear block-wise patterns with δ = 1/2.
To summarize, this case study confirmed our theoretical expectation that the MIP captures the block-wise informational components; namely that the partition probability is a decreasing function of the connection parameter δ.This means that the MIP search detects the weakest underlying connection between 20 and 21, and successfully separates it into the two subsets, if the connections between 20 and 21 are weak.

Discussion
In this paper, we proposed a fast and exact algorithm which finds the weakest link of a network at which the network can be partitioned with the minimal information loss (MIP).Since searching for the MIP has the problem of combinatorial explosion, we employed Queyranne's algorithm for a submodular function.We first showed that the mutual information is a symmetric submodular function.Then, we used it as an information loss function to which Queyranne's algorithm can be applied.Our numerical case studies demonstrate the utility of the MIP search for a set of locally interacting nonlinear oscillators.This demonstration opens the general use of the MIP constructed by the procedure above for the submodular system (V, f ), the following inequality holds g(W i ) + g(y) ≤ g(W i \ X) + g(X + y).
See [12] for the proof of this inequality.By putting i = N − 1 in the inequality, we can see that the partition (v N , V \ {v N }) gives the minimum among all subsets separating v N from v N −1 .
By definition of the pendent pair, one of the following two cases, case 1 or 2, holds for a given pendent pair (t, u).
1.The set {u} is a solution of the minimization problem.
2. Some set U ⊇ {t, u} is a solution of the minimization problem.
In the first case, the algorithm reports it.In the second case, the algorithm constructs another submodular system (V ′ , f ), in which a new element is defined by merging the pendent pair u ′ = {t, u} and V ′ = V \ {t, u} ∪ u ′ .The new system (V ′ , f ) with the merged pair is also submodular, and thus the same argument for the pendent pair can apply recursively.

Fig 1 .
Fig 1.(a) Minimum information partition (MIP).(b) Another possible partition, which differs from the MIP.(c) MIP in a general network where there is no clear-cut partition.

Fig 2 .
Fig 2. (a) The semi-logarithm plot and (b) double-logarithm of the computation time for the two searches.

Fig 3 .
Fig 3. Correlation matrix which reflects the toy problem with four correlated pairs in a system with eight components.The colored line shows partitioning at each step.

Fig 4 .
Fig 4. (a) The Coupled Map Lattice model with connection parameter δ indicating the connectivity between variables 20 and 21.The correlation matrices with (b) δ = 0 (disconnected), (c) δ = 0.25 (half-connected), and (d) δ = 0.5 (fully connected).The white crossing lines show the expected partition as the ground truth, and the black dotted crossing lines show the MIP typically found for each particular parameter.
Fig 4. (a) The Coupled Map Lattice model with connection parameter δ indicating the connectivity between variables 20 and 21.The correlation matrices with (b) δ = 0 (disconnected), (c) δ = 0.25 (half-connected), and (d) δ = 0.5 (fully connected).The white crossing lines show the expected partition as the ground truth, and the black dotted crossing lines show the MIP typically found for each particular parameter.

1 PFig 5 .
Fig 5.The probability of the subset with the smaller number of elements in the MIP is {21, 22, . . ., 30} is plotted as a function of the connection parameter δ ∈ [0, 1/2].Each circle shows the sample probability of 500 independent simulations, and the solid line shows the moving average of the probability.