Detecting Causality by Combined Use of Multiple Methods: Climate and Brain Examples

Identifying causal relations from time series is the first step to understanding the behavior of complex systems. Although many methods have been proposed, few papers have applied multiple methods together to detect causal relations based on time series generated from coupled nonlinear systems with some unobserved parts. Here we propose the combined use of three methods and a majority vote to infer causality under such circumstances. Two of these methods are proposed here for the first time, and all of the three methods can be applied even if the underlying dynamics is nonlinear and there are hidden common causes. We test our methods with coupled logistic maps, coupled Rössler models, and coupled Lorenz models. In addition, we show from ice core data how the causal relations among the temperature, the CH4 level, and the CO2 level in the atmosphere changed in the last 800,000 years, a conclusion also supported by irregularly sampled data analysis. Moreover, these methods show how three regions of the brain interact with each other during the visually cued, two-choice arm reaching task. Especially, we demonstrate that this is due to bottom up influences at the beginning of the task, while there exist mutual influences between the posterior medial prefrontal cortex and the presupplementary motor area. Based on our results, we conclude that identifying causality with an appropriate ensemble of multiple methods ensures the validity of the obtained results more firmly.


Introduction
Complex networks are ubiquitous in the real world, the brain and the earth's climate being two typical examples that we are going to study in this paper. From the viewpoint of complex networks, the brain is a small world network, i.e., most of the brain areas are connected within a short distance effectively [1], while the climate on the earth is characterized by spatially connected regions with a high betweeness centrality, which implies the existence of large dynamical information flows actually conveyed by ocean surface currents [2]. To obtain a deeper understanding of such complex networks, a correct identification of the interactions among their sub-systems is necessary. Therefore, we refer to such interactions as directional couplings to indicate clearly that one sub-system influences another. Thus, we consider here directed networks, contrary to [1] and [2], where only undirected networks are considered.
The detection of directional couplings from time series data has been investigated extensively in the past. The most common approach is the Granger causality test [3], which evaluates the predictability when some components are hidden. In general, this test assumes a linear stochastic system [3]. Other methods focus particularly on how to deal with the nonlinearity of a system [4][5][6][7][8][9][10][11][12][13][14][15][16][17]. As it turns out, each method has its drawbacks, the major obstacle being typically a hidden common cause that affects the observed sub-systems [14].
In this report, we combine three methods for identifying directional couplings in order to overcome their drawbacks and thus achieve an accurate understanding of the interactions among the sub-systems of a system. For this purpose, we are going to propose first one already known method and then two additional new methods, all of which can be used even in the presence of hidden common causes. Finally, the overall performance of the three methods will be compared with the method by Sugihara et al. [16] (convergent cross mapping), which is also claimed to be applicable under such circumstances.

Methods
Our first method employs the inclusive relations of recurrence plots proposed by Hirata and Aihara [14]. To the best of our knowledge, this was the first method able to identify directional couplings under the existence of unobservable common driving forces for nonlinear dynamics. The fundamental input is the delay coordinates of a forced system [18], which allows reconstruction of the joint state of the forced system and the driving forces. It turns out that neighboring points in the reconstructed driving system (Fig 1C) need not be close to each other in the reconstructed driven system (see Fig 1D), while neighbors in the reconstructed driven system ( Fig 1F) remain close to each other in the reconstructed driving system (see Fig 1E), at least if the dimension is sufficiently large (see Fig 1; this example was generated using the coupled Lorenz models discussed below). Hirata and Aihara [14] implemented this principle by means of recurrence plots [19,20] to test whether hypothetical directional couplings can be refuted. Later, Sugihara et al. [16] used the same principle but adopted a different implementation to identify directional couplings in such a situation. The method of Sugihara et al. [16] is called convergent cross mapping. Therefore, the reconstruction of the state space with delay coordinates is the key for identifying directional couplings.
In our second method we employ again the delay coordinates of a forced system [18] to identify directional couplings, this time via the joint distribution of distances for two sub-systems. This implementation differs, therefore, from the implementations of Hirata and Aihara [14] and Sugihara et al. [16]. If there is a directional coupling from system A to system B, the delay coordinates of system B reconstruct the joint states of system B and system A, whereas the delay coordinates of system A reconstruct only the states of system A [21,22]. Therefore, if two points are close to each other in terms of the delay coordinates of system B, they are also neighbors in terms of the delay coordinates of system A. It follows that the joint distribution of normalized distances for the two systems in terms of their delay coordinates is ideally located only in a triangular region, as shown in Fig 2. We quantify this characteristic to test the directional coupling from system A to system B (see Methods A in S1 File for further details).
Our third method is based on the transfer entropy [6] evaluated with permutations [23,24]. If system A drives system B, there is an information flow from system A to system B. We quantify this information by the mutual information of transcripts [24], which are permutations obtained algebraically from pairs of permutations. To judge whether this information is significantly large, we use symbolic dynamics on permutations constructed empirically from the observed data and generate 100 surrogate series of permutations for system B. If the original information transferred from system A to system B is within the top q r % of the information transferred from system A to the surrogate series for system B, we conclude that there is a directional coupling from system A to system B (see Methods B in S1 File for further details of our implementation).
We combine the three above-mentioned methods as follows. When we apply these three methods and at least two of them detect directional couplings from system A to system B, we say that overall directional coupling is detected from system A to system B. Furthermore, we calculate the p-value using the binomial distribution where the number of independent experiments is the total number of tests and the probability of rejection of each test is q r , except for the case of the monkey experiment discussed later.

Toy examples
Before applying our approach to real data, we tested its performance against the convergent cross mapping [16] with toy models. In all the toy models, we set q r = 0.01. The first toy model consists of two logistic maps [25] coupled mutually (see Methods E in S1 File for the detailed setups). We found that our method tends to identify the directional couplings more accurately than the method proposed by Sugihara et al. [16] (see Fig 3). In particular, increasing the coupling strength of one directional coupling affected the identification of the other directional coupling in the convergent cross mapping [16], but not in our approach. The second toy model consists of two uncoupled logistic maps driven by another logistic map (see Fig 4). Although the convergent cross mapping [16] was affected to a considerable extent by the couplings with the third logistic map, the combined application of our three methods was not affected significantly by this common driver. The third toy model consists of two mutually coupled logistic maps driven by another logistic map (see Methods G in S1 File for further details). The results shown in Fig 5 indicate that the directional couplings were almost accurately identified by our method, whereas the convergent cross mapping [16] did not work well, i.e., false positives were detected when the opposite directional couplings were strong. The fourth toy model is composed of two logistic maps coupled with nonlinear terms (see Methods H in S1 File for further details). Even in this case, the results of the proposed approach were more accurate than those of the convergent cross mapping [16] (Fig 6).
The four toy models described above are examples based on maps. As the fifth example we considered the Rössler models [26] coupled as in the test conducted by Hirata and Aihara [14] (see Methods I in S1 File for further details). The results shown in Tables A and B in S1 File indicate that both our method and the method proposed by Sugihara et al. [16] could detect the directional couplings accurately in this case.
For our sixth toy example, coupled Lorenz models (see Methods J in S1 File for further details), both the proposed method and the convergent cross mapping [16] revealed the correct network structure when the length of time series was sufficiently long (see Tables C and D in S1 File). Tests for the existence of directional coupling between mutually coupled logistic maps, x and y. Panels A and B show the results for the method proposed by Sugihara et al. [16], and panels C and D show the results for our method. Panels A and C refer to the direction from x to y, and panels B and D refer to the direction from y to x. The gray scales show the p-values in the logarithmic scale of base 10. temperature drive each other, the CH 4 concentration and the temperature drive each other too, and the CH 4 concentration drives the CO 2 concentration only. In the second type, the CO 2 concentration drives the temperature. Therefore, in either type, the temperature is driven by the CO 2 concentration in the atmosphere. As shown in Fig 7C, the second type of connectivity held in the period 200k-300k years BP, while the first type did in the remaining two periods.

Brain data
Furthermore, we also applied our framework to elucidate the internal brain process underlying the generation of monkey's reactions in a visually cued, two-choice arm-reaching task [31]. The animals were treated and cared for in accordance with the National Institute of Health guidelines and the institutional guidelines for animal experiments of Tohoku University. Two of us had recorded, in Ref. [31], local field potentials from the three cortical areas, the posterior medial prefrontal cortex (pmPFC), the presupplementary motor area (preSMA), and the supplementary motor area (SMA). Here we analyzed the datasets for characterizing the temporal dynamics by their directional couplings. An example of time series data is shown in Fig 8A. The analysis revealed that there were no causative influences with significance levels of less than 0.0001 between -400 and -20 ms before the hold release. Then, the bottom-up causative influences emerged from SMA to preSMA, from preSMA to pmPFC, and from SMA to pmPFC (beginning 20 ms before and ending 40 ms after the movement onset, the orange rectangle of Fig 8B), followed by enhanced mutual intermittent influences between pmPFC and preSMA (between 260 and 500 ms after the movement onset, the magenta rectangles of Fig  8B). Finally (500ms and later), strong and mutual influences developed among the three regions (pmPFC, preSMA and SMA, the blue rectangle of Fig 8B). In short, several regions are connected with one another when a monkey makes a decision: at the beginning, the bottom up signals elicit the decision-making process (see Fig 8C), and in the middle time period, the communication between pmPFC and preSMA seems to play a central role during the decision-making process (see Fig 8C). These observations could serve as the first crucial step in understanding how a monkey makes a decision (see Text C in S1 File for further discussion of these results).

Discussion
In general, a set of multiple pieces of evidence provides a stronger argument than a single piece, thereby enriching our viewpoints. This way of using multiple pieces of evidence can be regarded as a restatement of the law of large numbers [32]. In recent years, this type of approach has gained popularity in the field of systems biology [33]. In the field of nonlinear time series analysis, to the best of our knowledge this approach was first adopted in [34] to argue that the behavior of a squid giant axon is chaotic. In this paper, we carefully selected our ensemble of methods for identifying directional couplings so that each method can deal with (i) nonlinear dynamics, (ii) mutual couplings, and (iii) hidden common causes (iv) without assuming a specific model. Therefore, although there are many excellent linear methods [35][36][37][38][39][40] that can identify directional couplings, some of which can be applied even when there are some hidden components [37][38][39][40], and there are  Detecting Causality by Using Multiple Methods some methods that can identify which coupling direction is more dominant even when there are some hidden components [41,42], we did not include these methods in our ensemble because none of these methods simultaneously satisfy the above four conditions. Thus, if one develops more qualified methods to identify directional couplings and use them simultaneously, the reliability of the results can improve. S1-S4 Figs show that each of the three methods provided us independent results to some extent, which are important for the proposed framework to work appropriately. In particular, S1 and S3 Figs show that one of the three methods sometimes fails to detect directional couplings, while the failures were covered by the other two. Importantly, we observed a tendency that each of the three methods is likely to produce a false negative but not a false positive.
When we had shorter time series of network activities with too weak or strong coupling strengths, it became difficult for the proposed method to detect a directional coupling (S5-S10 Figs). The main reason is that the method of recurrence plots and the method using transfer entropy could not detect strong couplings. However, since the results are likely to be false negatives, detected couplings might be assumed to exist. This observation means that we might weaken the significance level and set the total significance level to 0.05. If we set the total significance level weaker, the above problems in S5 and S9 Figs can be avoided.
When we added 10% observational noise, the proposed framework still keeps working appropriately to some extent (S11-S16 Figs), although we observed a parameter region in the middle of each of panels C and D of S11 Fig where the p-value is still 5% significant but larger than 1%. Since this problem commonly appeared in the method using recurrence plots, the method of joint distribution of distances, and the convergent cross mapping, the problem might be related to delay coordinates. However, the observations of panels C and D of S11 Fig are related to false negatives but not false positives, hence we may assume that detected couplings are likely to exist, while there might be a chance that we miss some of directional couplings. Thus, these examples show that the chosen ensemble of the three methods is a nice combination of methods.
As we did in the Introduction, we define a directional coupling as an effect or influence that a subsystem has on another; For the ice core data, the present study indicated that the causal relationship between the CO 2 concentration and the temperature remained constant: i.e. the CO 2 concentration consistently drove the temperature over the time period we analyzed (Fig  7). This finding would support the hypothesis that the atmospheric CO 2 concentration is the cause of the current global warming.
On the other hand, our results uncovered the dynamic change in the causal influences between the cortical areas (preSMA and pmPFC) during task performance (Fig 8): i.e. the bottom up influences were dominant in the beginning of the task, while pmPFC and preSMA exerted reciprocal influences at the middle of the task. Such dynamic modulation of network structure would enable the flexible switching of function of the neural network [43,44].
The most important point is that these influences could not be revealed solely by observing time series and/or calculating the correlation coefficient between a pair of their components.
Our results in Fig 7 are consistent with those of [45] because we also found both directional couplings between the temperature and the CO 2 concentration except for the period 200,000-300,000 years before. Our results are also consistent, to some extent, with the ones obtained by convergent cross mapping in [46]. On one hand, we detected as in [46] bi-directional couplings between the temperature and CO 2 level, as well as between the temperature and CH 4 level. On the other hand, we found a uni-directional coupling from CH 4 to CO 2 , while the authors of [46] found a bi-directional coupling between CH 4 and CO 2 .
In Fig 8, we detected directional couplings from SMA to preSMA and from preSMA to pmPFC between 0 and 20ms from the hold release. The latter directional coupling might be through preSMA in comparison with the previous electrophysiological studies [47,48].
Whether directional couplings between pmPFC and SMA should be direct or indirect must be investigated in the future by methods such as those in [49] and [50].
When we analyzed the brain data, we did not use the k-means algorithm because the beginning of the task is a kind of a singular point and cannot be represented well by a set of centroids used in the k-means algorithm.
To conclude, the two key points of our framework can be summarized as follows. First, we use three methods for identifying directional couplings, all of which can be used for potentially mutually coupled nonlinear systems whose families of equations are not known even in the presence of hidden common causes. Second, these methods are applied concurrently in order to gain a better understanding of the network structure. In case that the results are not unanimous, the question of whether or not a directional coupling exists is settled by the majority vote. This enhances the statistical significance of the acceptance or rejection of a causal relation. Thus, as stated in the caption of Table B in S1 File, a significance level of 0.01 for each test in the system composed of five coupled Rössler oscillators is lowered to 3.0 × 10 −4 when two tests resulted in rejection. As shown by the examples of climate and brain data presented above, our framework provides insights into interactions among sub-systems in scientific, technological, and social contexts.