THOI: An efficient and accessible library for computing higher-order interactions enhanced by batch-processing

doi:10.1371/journal.pone.0348005

Fig 1.

Efficient computation of HOI using batch processing of covariance matrices.

A set of multivariate time series X is transformed using the Gaussian copula approach, generating covariance matrices for each dataset. The covariance matrices are then sub-sampled using a batch of k -plet indices, defined by a binary mask applied to , yielding the sub-covariance matrices for each k-plet. These sub-covariance matrices are then batched together, and their determinants are computed, which are subsequently used to calculate the entropies and associated HOI defined by the DTC, TC, , and S-information metrics, where the entropy is computed from the determinants of single variables , the whole system and the whole system without a single variable (see Supporting information Generalizations of the mutual information for detailed descriptions). Finally, batches are pre-processed using a custom function (e.g., extracting the minimum ), and the results are aggregated to produce the final output. Note that multiple datasets with identical system and sample sizes can be processed simultaneously and the batch management system allows flexible analysis on the fly.

More »

Expand

Fig 2.

Sub-covariance matrices sampled with padding to allow different covariance matrix sizes in a single batch.

1) First, a mask is applied to the full covariance matrices using a masked encoding of the n-plets (each with a different number of masked variables) to obtain each sub-covariance matrix. At this point, the obtained covariance matrices are invalid as the masked rows and columns have zeros on the diagonal, yielding a constant distribution. 2) Then, an identity matrix is masked with the inverted n-plet encodings. 3) Both masked matrices are added to obtain the final covariance matrix where the rows and columns of the n-plet have the values from the full covariance matrix, and the remaining rows and columns have ones on the diagonal and zeros elsewhere, representing an independent standard normal component.

More »

Expand

Fig 3.

Efficient computation of HOI using batch processing of covariance matrices.

A) Computational time versus order of interactions for a 30-variable system with 1000 samples. The THOI method successfully computes all possible HOI in less than 6 hours, whereas other libraries are unable to process interactions beyond order 11 within the same time frame. B) Log-log plot of computational time as a function of sample size for a 20-variable system. All libraries exhibit logarithmic scaling, but THOI outperforms the others in terms of computational speed.

More »

Expand

Fig 4.

Within-order optimization with greedy and simulated annealing algorithms A, B) Maximum (red) and minimum (blue) obtained by greedy and SA algorithms for a 100-variable system composed of strong/weak R and S systems and an independent system, each with 20 variables.

Dashed horizontal lines indicate ground truth for the weak systems, and dotted lines represent the sum of weak and strong systems (red for R, blue for S). Both algorithms successfully identify the systems, but greedy required 100 times more repeats than SA. The yellow vertical dashed line denotes the order of interaction of the ground truth subsystems and their concatenation are detected, i.e., 20, 40, 60, 80 and 100. Note that for 20 and 40 a local and a global maxima (minimum) is detected, the former corresponding to the strong systems and the second to the concatenation of the strong and weak systems. C, D) Subsets of variables that maximize at each order of interaction for greedy and SA. Each row corresponds to a single variable and colors denote different subsystems (weak R: light red; strong R: dark red; weak S: light blue; strong S: dark blue; independent: gray). Colored cells indicate that the variable contributed to maximize at a given order of interaction, and white the opposite. The pie charts are positioned in line with the yellow vertical dashed line of panel A to summarize the subsystems that were detected. Both algorithms prioritize strong R, then weak R, followed by a mix of independent and S systems (denoted by weak color intensities in the pie charts). E, F) Subsets of variables that minimize . Both algorithms prioritize strong S, then weak S, followed by a combination of independent and R systems, with a preference for the former.

More »

Expand

Fig 5.

A) Estimated maximum and minimum via the GA for awake (green) and deep anesthesia state (purple).

Inset shows the reduction of minimum at lower orders of interaction. B) Average maximum (red) and minimum (blue) effect size obtained from a GA tailored to amplify the difference between the two conditions. Shaded areas denote the range from the minimum to the maximum value for each optimization procedure. C) Distribution of for the whole-brain in awake (green) and deep anesthesia state (purple). Each dot is a subject and lines connect their respective value in both conditions. Despite the trend to reduce redundancy, no significant difference was found (Wilcoxon p > 0.001) D) Distribution of the n-plets that maximizes (left) and minimizes (right) the effect size obtained by the GA. E) Same as D, but for the n-plets obtained with the SA algorithm. Order is the number of elements in n-plets, p is the Wilcoxon p-value and d is the Cohen’s d.

More »

Expand

Fig 6.

A) Spearman correlation matrix of features across datasets.

Colors code different types of features. ‘prop. syn-plets’ is the proportion of synergy-dominated n-plets out of the total number of n-plets. ‘order max ’ and ‘order min ’ is the order where was maximized and minimized, respectively, normalized by the system size. ‘mean MI’ and ‘std MI’ are the mean and standard deviation of the pairwise mutual information for each dataset. The prefix ‘whole’ indicates that the whole system was considered (i.e., all the system variables). B) Cumulative explained variance associated with each PC after PCA. The first four components capture approximately 95% of the variance. C) Values of the first four PCs on each feature. Colors are the same as in A. PC1 captures the overall interdependencies, by grouping together all the ‘max’, ‘mean’, ‘whole’ and ‘MI’ related features. PC2 captures the overall independence by grouping together all the ‘min’ related features. PC3 captures the proportion of synergy-dominated interaction by grouping together ‘prop. syn-plets’ and the order at which is maximized and minimized. PC4 captures the behavior of , by grouping together its maximum, minimum, mean and whole-system values.

More »

Expand