Quantifying ‘Causality’ in Complex Systems: Understanding Transfer Entropy

‘Causal’ direction is of great importance when dealing with complex systems. Often big volumes of data in the form of time series are available and it is important to develop methods that can inform about possible causal connections between the different observables. Here we investigate the ability of the Transfer Entropy measure to identify causal relations embedded in emergent coherent correlations. We do this by firstly applying Transfer Entropy to an amended Ising model. In addition we use a simple Random Transition model to test the reliability of Transfer Entropy as a measure of ‘causal’ direction in the presence of stochastic fluctuations. In particular we systematically study the effect of the finite size of data sets.


Introduction
Many complex systems are able to self-organise into a critical state [2,4].In the critical state local distortions can propagates throughout the entire system [4,11,23].This leads to correlation spanning across the entire system.We address here how to identify directed stochastic causal connections embedded in stochastic fluctuating but strongly correlated background.
Most of 'causality' and directionality measures have been tested on low dimension systems and neglect addressing the behaviour of systems consisting of large numbers of interdependent degrees of freedom that is a main feature of complex systems.From a complex systems point of view, on one hand there is the system as a whole (collective behaviour) and on another there are individual interactions that lead to the collective behaviour.A measure that can help understand and differentiate these two elements is needed.
We shall first seek to make a clear definition of 'causality' and then relate this definition to complex systems.We outline the different approaches and measures used to quantify this type of 'causality'.
We highlight that for multiple reasons, Transfer Entropy seems to be a very suitable candidate for a 'causality' measure for complex systems.Consequently we seek to shed some light on the usage Transfer Entropy on complex systems.
To improve our understanding of Transfer Entropy we study two simplistic models of complex systems which in a very controllable way generates correlated time series.Complex system whose main characteristic consist in essential cooperative behaviour [12] takes into account instances when the whole system is interdependent.Therefore, we apply Transfer Entropy to the (amended) Ising model in order to investigate its behaviour at different temperatures particularly near the critical temperature.Moreover, we are also interested in investigating the different magnitude of Transfer Entropy in general (which is not fully understood [24]) by looking at the effect of different transition probabilities, or activity levels.
We discuss the interpretation of the different magnitudes of the Transfer Entropy by varying transition rates in a Random Transition model.

Quantifying 'Causality'
The quantification of 'causality' was first envisioned by the mathematician Wiener [31] who propounded the idea that the 'causality' of a variable in relation to another can be measured by how well the variable helps to predict the other.In other words, variable Y 'causes' variable X if the ability to predict X is improved by incorporating information about Y in the prediction of X.The conceptualisation of 'causality' as envisioned by Wiener was formulated by Granger [8] leading to the establishment of the Wiener-Granger framework of 'causality'.This is the definition of 'causality' that we shall adopt in this paper.
In literature, references to 'causality' take many guises.The term directionality, information transfer and sometimes even independence can possibly refer to some sort of 'causality' in line with the Wiener-Granger framework.Continuing the assumption that Y causes X, one would expect the relationship between X and Y to be asymmetric and that the information flows in a direction from the source Y to the target X.One can assume that this information transfer is the unique information provided by the causal variable to the affected one.When one variable causes another variable, the affected variable (the target) will be dependent (to certain extent) on the causal variable (the source).There must exist a certain time lag however small between the source and the target [3,9,25], this will be henceforth referred to as the causal lag [8].One could also say the Wiener-Granger framework of prediction based 'causality' is equivalent to looking for dependencies between the variables at a certain causal lag.
Roughly, there are two different approaches in establishing 'causality' in a system.One approach is to make a qualified guess of a model that will fit the data, called the confirmatory approach [7].Models of this nature are typically very field specific and rely on particular insights into the mechanism involved.
A contrasting approach known as the exploratory approach, infers 'causal' direction from the data.This approach does not rely on any preconceived idea about underlying mechanisms and let results from data shape the directed model of the system.Most of the measures within the Wiener-Granger framework falls into this category.One can think of the different approaches as being on a spectrum from purely confirmatory to purely exploratory.
The nature of complex systems calls for the exploratory approach.The abundance of data emphasises this even more so.In fact 'causality' measures in the Wiener Granger framework have been increasingly utilised on data sets obtained from complex systems such as the brain [19,29] and financial systems [18].
Unfortunately, most of the basic testings of the effectiveness of these measures is mostly done on dynamical systems [13,22,26] or simple time series, without taking into account the emergence of collective behaviour and criticality.Complex systems are typically stochastic and thus different from deterministic systems where the internal and external influences are distinctly identified.As mentioned above, here we focus on the emergence of collective behaviour in complex systems and in particular on how the intermingling of the collective behaviour with individual (coupled) interactions complicates the identification of 'causal' relationships.Identifying a measure that is able to distinguish between these different interactions will obviously help us to improve our understanding the dynamics of complex systems.

Transfer Entropy
Within the Wiener-Granger framework, two of the most popular 'causality' measure are Granger Causality (G-causality) and its nonlinear analog Transfer Entropy.G-causality and Transfer Entropy are exploratory as their measures of causality are based on distribution of the sampled data.The standard steps of prediction based 'causality' that underlies these measures can be summarized as follows.Say we want to test whether variable Y causes variable X.The first step would be to predict the current value of X using the historical values of X.The second step is to do another prediction where the historical values of Y and X are both used to predict the current value of X.And the last step would be to compare the former to the latter.If the second prediction is judged to be better than the first one, then one can conclude that Y causes X.This being the main idea, we outline why Transfer Entropy is more suitable for complex systems.
Granger causality is the most commonly used 'causality' indicator [3].However, in the context of the nonlinearities of a complex systems (collective behaviour and criticality being the main example), using G-causality may not be sufficient.Moreover, this AR framework makes G-causality less exploratory than Transfer Entropy.Transfer Entropy was defined [13,26] as a nonlinear measure to infer directionality using the Markov property.The aim was to incorporate the properties of Mutual Information and the dynamics captured by transition probabilities in order to understand the concept and exchange of information.
More recently, the usage of Transfer Entropy to detect causal relationships [10,17,28] and causal lags (the time between cause and effect) has been further examined [24,30].Thus we are especially interested in Transfer Entropy due to its propounded ability to capture nonlinearities, its exploratory nature as well as its information theoretic background that provides information transfer related interpretation.
Unfortunately, some of the vagueness in terms of interpretation may cause confusion in complex systems.
The rest of the paper is an attempt to discuss these issues in a reasonably self-contained manner.

Mutual Information based measures
Define random variables X, Y and Z with discrete probability distributions p X (x), x ∈ X , p Y (y), y ∈ Y and p Z (z), z ∈ Z.The entropy of X is defined [6,27] as where log to the base e and 0 log 0 = 0 is used.The joint entropy of X and Y is defined as and the conditional entropy can be written as where p XY is the joint distribution and p X|Y is the respective conditional distribution.The Mutual Information [6,14] is defined as Taking into account conditional variables, the conditional Mutual Information [6,10] is defined as I(X, Y |Z) = H(X|Z) − H(X|Y, Z).A variant of conditional Mutual Information namely the Transfer Entropy was first defined by Schreiber in [26].Let X τ be the variable X that is shifted by τ , so that the values of where X(n) is the value of X at time step n and similarly for Y .We highlight a simple form of Transfer Entropy where conditioning is minimal such that The idea is that, if Y causes X at causal lag t Y , then T (tY ) Y X for any lag τ since H(X|X 1 , Y tY ) ≤ H(X|X 1 , Y τ ) due to the fact that Y tY should provide the most information about the change of X 1 to X.This simple form allows us to vary the values of time lag τ in ascertaining the actual causal lag.This form of Transfer Entropy was also used in [16,20,22,29,30].The Transfer Entropy in equation ( 5) can also be written as Our choice of this simple definition was motivated by the fact that it directly captures how the state of . In other words, equation ( 5) is tailor made to measure whether the state of Y (n − τ ) influences the current changes in X.This coincides with the predictive view of 'causality' in the Wiener-Granger framework where the current state of one variable (the source) influences the changes in another variable (the target) in the future.The same concept will be applied in order to probe this kind of 'causality' in our models.

The Ising model
A system is critical when correlations are long ranged.A simple prototype example is the Ising model [4] at critical temperature, T c , away from T c correlations are short ranged and dies off exponentially with separation.We shall apply Transfer Entropy to the Ising model in order to investigate its behaviour at different temperatures particularly in the vicinity of the critical temperature.One can visualize the 2D Ising model as a two dimensional square lattice with length These sites can only be in two possible states, spin-up (s i = 1) or spin-down (s i = −1).We restrict the interaction of the sites to only its nearest neighbours (in two dimensions this will be sites to the north, south, east and west).Let the interaction strength between i and j be denoted by so that the Hamiltonian (energy), H, is given by [4, 5] H is used to the obtain the Boltzmann (Gibbs) distribution γ B = exp(−βH) exp(−βH) where β = 1 KB T and K B is the Boltzmann constant and T is temperature.
We implement the usual Metropolis Monte Carlo (MMC) algorithm [4,15,21]  The logic being that, since sites to be considered are chosen randomly one at a time, after N flips, each site will on average have been selected for consideration once.The interaction strength is set to be J = 1 and the Boltzmann constant is fixed as K B = 1 for all the simulations.We let the system run up to 2000 samples before sampling at every N = L 2 time steps.
Through the MMC algorithm, a Markov chain (process) is formed for every site on the lattice.The state of each site at each sample will be taken as a time step n in the Markov chain (s X ) n .Let S be the number of samples (length of the Markov chains).To get the probability values for each site, we utilise temporal average.All the numerical probabilities obtained for the Ising model in this paper have been obtained by averaging over simulations with S = 100000 unless stated otherwise.

Measures on Ising model
In an infinite two dimensional lattice, the phase transition of the Ising model with J = 1 and K B = 1 is known to occur at the critical temperature 2) ≈ 2.269185 [4].In a finite system, due to finite size effects, the critical values will not be quite as exact, we will call the temperature where the transition effectively occurs in the simulation as the crossover temperature T c .Susceptibility χ is an observable that is normally used to identify T c for the Ising model as seen in Figure (1).In order to The susceptibility [4] is given by where E[.] is the expectation in terms of temporal average and T is temperature.The covariance on the Ising model can be defined as where X, Y ∈ N .
To display measures applied on individual sites, let sites [2, 2] and [3,3] respectively.The values of the covariance Γ(A, G) and AG in the figures thus no direction of 'causality' can be established between A and G.This is expected due to the symmetry of the lattice.More interestingly, the fact that Transfer Entropy peaks near T c can be due to the fact that at T c the correlations span across the entire lattice.Therefore, one may say that the critical transition and collective behaviour in the Ising model is detected by Transfer Entropy as a type of 'causality' that is symmetric in both directions.It is logical to interpret collective behaviour as a type of 'causality' in all directions since information is disseminated throughout the whole lattice when it is fully connected.This is an important fact to take into account when estimating Transfer Entropy on complex systems.

Amended Ising model
In the amended Ising model we introduce an explicit directed dependence between the sites A, B and G in order to study how well Transfer Entropy is able to detect this causality.We will define the amended Ising model using the algorithm outlined as follows.At each step in the algorithm a site chosen at random will be considered for flipping with a certain probability γ B except when A or B is selected where an extra condition needs to be fulfilled first before it can be allowed to change.If (s G ) n−tG = 1, A (or B) can be considered for flipping with probability γ B as usual, however if (s G ) n−tG = −1, no change is allowed.Thus only one state of G (s G = 1 in this case) allows sites A and B to be considered for flipping.Therefore, although A (and B) have their own dynamics, their changes still depend on G.
We simulated the amended Ising model with t G = 10 for different lattice lengths L. Figures (7) display the values of susceptibility χ on the model and the peaks clearly show the presence of T c in our model.Figures (8) and ( 9) display the values of the covariance Γ(A, G) and the Mutual Information I(A, G).We reiterate that our correlations reach across the system for L ≤ 50 [4,32].While covariance and Mutual Information gives similar results to those of the standard Ising model, a difference is clearly seen in Transfer Entropy values.Figure (10)(11)(12) displays the contrasts of T The effect of deviation from the predetermined causal lag t G = 10, can be clearly seen in Figure (16), where for the values of T (τ ) GA , τ = 10 reduces to 0 but at different rates depending on the deviation of τ from t G .The further away from t G , the faster the decrease to 0. Figure (17)  That temperature is a main factor in influencing the strength of Transfer Entropy values is apparent in all the figures in this section.One can observe that the Transfer Entropy values approaches 0 as they get further away from T c except when the time lag matches the delay induced by definition between the dynamics of the two spins A and B and the G spin, in which case the Transfer Entropy value stabilizes to a certain fixed value as seen in Figure (20).In the vicinity of T c , the lattice is highly correlated thus subsequently leading to higher values of Transfer Entropy.The increase and value stabilization after T c is due to the fact that, as temperature increases, the probability for all spin flips approaches a uniform distribution.This leads to transfer of information between site G and sites A and B occurring much more frequently at elevated temperature.AA is zero in both figures due to the definition in equation (5).Note that this is only for τ = 1, if τ = 1 the Transfer Entropy value will be nonzero and also peak at T c .More importantly we see that T

Transfer Entropy, directionality and change
In order to understand the dynamics of of each site we calculate the effective rate of change (ERC) in relation to the transition probabilities.Let ERC X = P (X n = X n−1 ) for any site X on the lattice.Figure (13) illustrates how ERC A and ERC B are equal, as expected, and significantly different from ERC G .In Figure (10), the corresponding Transfer Entropy in both directions are displayed.At higher temperatures, it can be clearly seen that T (tG) GA is larger than T (tG) AG .However for temperatures near T c it is not as clear and therefore to highlight the relative values we calculate

ERCA
in Figure (14) and Figure (15) where We see that this value actually gives a clear jump at T c and remains more or less a constant after T c .Therefore even though Transfer Entropy in neither direction is zero, a clear indication of directionality can be obtained.Interestingly, the division with ERC brought out the clear phase transition-like behaviour that seems to distinguish the situation below and above T c .

Referring back to Figure (4) of the unamended Ising model we can clearly see that
any direction in the unamended Ising model.We have demonstrated that ERCA is able to cancel out the symmetric contribution from the collective behaviour and only captures the imposed directed interdependence.
In his introductory paper [26], Schreiber warns that in certain situations due to different information content as well as different information rates, the difference in magnitude should not be relied on to imply directionality unless Transfer Entropy in one direction is 0. We have shown that when collective behaviour is present on the Ising model, the value of Transfer Entropy cannot possibly be 0. We suggest that this is due to fact that collective behaviour is as a type of 'causality' (disseminating information in all directions) and thus the Transfer Entropy is correctly indicating 'cause' in all directions.The clear difference in Transfer Entropy magnitude (even at T c ) observed when the model is amended indicates that the difference in Transfer Entropy can indeed serve as an indicator of directionality in systems with emergent cooperative behaviour.We have seen that Transfer Entropy is influenced by the nearest neighbour interactions, collective behaviour and the ERC.In the next section we use the Random Transition model to further investigate how the ERC influences the Transfer Entropy.

Random Transition Model
In the amended Ising model we implemented a causal lag as a restriction of one variable on another, in a way that a value of the source variable will affect the possible changes of the target variable.It is this novel concept of implementing 'causality' that we will analyze and expand in the Random Transition model.Let µ X , µ Y and µ Z , be the independent probabilities for the stochastic swaps of the variables X, Y and Z at every time step respectively.In addition to that, a restriction is placed on X and Y such that they are only allowed to do the stochastic swap with probability µ X and µ Y if the state of Z n−tZ fulfills a certain condition.This restriction means that X and Y can only change states if Z is in the conditioned state at time step n − t Z thus creating a 'dependence' on Z, analogous to the dependence of A and B on G in the amended Ising model.However in this model we allow the number of states n s to be more than just two.The purpose of this is twofold, on one hand it contributes towards verifying that the behaviours of Transfer Entropy observed on the amended Ising model does extend to cases where n s > 2.
On the other hand, the model also serves to highlight different properties of Transfer Entropy as well as the very crucial issue of probability estimation that may lead to misleading results.The processes are initialized randomly and independently.The swapping probabilities are taken to be µ X = µ Y = µ Z = 1 ns , thus enabling Transfer Entropy values to be calculated analytically (see appendix for detailed analytic formulations).
The unclear meaning of the magnitude of Transfer Entropy is one of its main criticism [22,24].This is partly due to the ERC which incorporates both external and internal influences, the separation of which is rather unclear.The advantage of investigating Transfer Entropy on the Random Transition model is that the ERC can be defined in terms of internal and external elements i.e. for any variable X we have that where µ X is the internal transition probability of X and Ω represents the external influence applied on X.If the condition in our model is that Z n−1 = 1 for X n and Y n to change values then, Ω = P ( condition fulfilled ) = P (Z n−1 = 1) so that ERC X = µ X P (Z n−1 = 1) and ERC Y = µ Y P (Z n−1 = 1).
However, for the source Z which has no external influence, Ω = 1 and consequently When n s = 2, the model essentially replicates the Ising model without the collective behaviour effect i.e. far above the T c where the Boltzmann distribution approaches a uniform distribution.Consequently, at these temperatures the influence of collective behaviour is close to none.One can see in Figure (21) and Figure (22) that the µ (hence the ERC) values are indeed key in determining the strength of Transfer Entropy.In Figure (21), µ X influences T (tZ ) ZX monotonically when every other value is fixed, therefore in this case the Transfer Entropy reflects the internal dynamics µ X rather than the external influence Ω.
If 'causality' is the aim, surely Ω is the very thing that makes the relationship 'causal' and should be the main focus.This is a factor that needs to be taken into account when comparing the magnitudes of Transfer Entropy. Figure ( 21) also shows that when µ Z is uniform (since n s = 2 hence µ Z = 1 ns = 1 2 , one gets that T (τ ) ZX = 0 only if τ = t Z which makes causal lag detection fairly straight forward.However, in Figure (22) the effect of varying µ Z can be clearly seen in the nonzero values T (τ ) ZX = 0 when τ = t Z .Nevertheless, the value at τ = t Z seems to be fully determined by µ X regardless of µ Z value.The mechanism in which µ Z effects T (τ ) ZX is sketched in the appendix.Therefore one can conclude that when Z is the source ('causal' variable) and X is the target (the variable being affected by the 'causal' link), the value of the Transfer Entropy T (τ ) ZX is determined by both µ X and µ Z .We have verified that this is indeed the case even when n s > 2 in this model.This should apply to all variables in the model and much more generally to any kind of source-target 'causal' relationship in this sense.We suspect that this also extends to cases when there is more than one source and this will be a subject of future research.Thus for causal lag detection purposes, it is clear that theoretically Transfer Entropy will attain maximum value at the exact causal lag.It is also clear that Transfer Entropy at nearby lags can be nonzero due to this single 'causal' relationship and on data sets it is strongly recommended to test for relative lag values.

Transfer Entropy estimations of the Random Transition model
The estimations of Transfer Entropy for large number of states n s requires sufficient sample size.To illustrate this we set the value Ω to three different values; Ω = 1 ns for Case 1, Ω = ns−1 ns for Case 2 and Ω = 1 2 for Case 3. We plot the analytical Transfer Entropy T (tZ ) ZX , and the estimations of it on simulated values for all three cases in Figure (23).Even though n s is known and incorporated in the estimations, the inaccuracies are quite worrying.This situation would be even more exaggerated in situations where n s is not known (unfortunately, this is more often than not the case).We strongly advice checking the accuracy of Transfer Entropy estimation and adjusting the n s value before using it for any type of analysis and drawing any conclusion.One way to do this is by generating a null model (in the case of the Random Transition model this is simply three randomly generated time series) and test the values of Transfer Entropy as in Figure (24) to ascertain the level of accuracy that is to be expected.
Subtracting the null model from the values on the Random Transition model is equal to subtracting the Transfer Entropy values of both directions as one direction is theoretically zero.However this does not quite solve the problem as the values may still be negative if the sample size is small.There are many other types of corrections [24,29] proposed to address this issue involving substraction of the null model in some various forms.Nevertheless, as we have seen in Figure (14) of the amended Ising model, only by subtracting the two directions of Transfer Entropy did we obtain the clear direction as this cancelled out the underlying collective behaviour.We suspect that this will work as well for cancelling out other types of background effects and succeed in revealing directionality.

Discussion
This paper highlights the question of distinguishing interdependencies induced by collective behaviour and individual (coupled) interactions, in order to understand the inner workings of complex systems derived from data sets.These data sets are usually in the form of time series that seem to behave essentially as stochastic series.It is hence of great interest to understand measures proposed to be able to probe 'causality' in view of complex systems.Transfer Entropy has been suggested as a good probe on the basis of its nonlinearities, exploratory approach and information transfer related interpretation.
To investigate the behaviour of Transfer Entropy, we studied two simplistic models.From results of applying Transfer Entropy on the Ising model, we proposed that the collective behaviour is also a type of 'causality' in the Wiener-Granger framework but highlighted that it should be identified differently from individual interactions by illustrating this issue on an amended Ising model.The collective behaviour that emerges near criticality may overshadow the intrinsic directionality in the system as it is not detected by measures such as covariance (correlation) and Mutual Information.We showed that by taking into account both directions of Transfer Entropy on the amended Ising model, a clear direction can be identified.In addition to that, we verified that the Transfer Entropy is indeed maximum at the exact causal lag by utilizing the amended Ising model.
By obtaining the phase transition-like difference measure, we have shown that the Transfer Entropy is highly dependent on the effective rate of change (ERC) and therefore likely to be dependent on the overall activity level given by, say, the temperature in thermal systems as we demonstrated in the amended Ising model.Using the Random Transition model we have illustrated that the ERC is essentially comprised of internal as well as external influences and this is why Transfer Entropy depicts both.This also explains why collective behaviour on the Ising model is detected as type of 'causality'.In complex systems where there is bound to be various interactions on top of the emergent collective behaviour, the situation can The relationship between Ω and Q To understand how the values of µ Z affects the value of T (τ ) ZX we need a different variable.Let Q be the probability that the condition is fulfilled given current knowledge at time τ such that Q Thus Q (τ ) sgn(γ) = P ( condition fulfilled |Z n−τ = γ) with the sgn(γ) as in equation (11).The relationship between Q (τ ) sgn(γ) and Ω can be defined using the formula for total probability P (B) = γ P (B|Z = γ)P (Z = γ).Let B = { condition fulfilled } and using the fact that P (Z n−τ = γ) = 1 ns , we get that     T GA (10) of L=50 T AG (10) of L=50

I(A,G) of L=10 I(A,G) of L=25 I(A,G) of L=50 I(A,G) of L=100
GA on the Ising model of lengths L = 50 obtained using equation (5).Peaks for both direction are at T c .T GA (10) of L=10 T GA (10) of L=25 T GA (10) of L=50 T GA (10) of L=100 GA on the Ising model of lengths L = 10, 25, 50, 100 obtained using equation (5).Peaks can be seen at respective T c .T AG (10) of L=10 T AG (10) of L=25 T AG (10) of L=50 T AG (10) of L=100 AG on the Ising model of lengths L = 10, 25, 50, 100 obtained using equation (5).Peaks can be seen at respective T c .T GA (10) of L=50 T AG (10) of L=50  T GA (10) of L=10 T GA (10) of L=25 T GA (10) of L=50 T GA (10) of L=100 GA on the Ising model of lengths L = 10, 25, 50, 100 obtained using equation (5).Values continue to increase after T c .T AG (10) of L=10 T AG (10) of L=25 T AG (10) of L=50 T AG (10) of L=100 AG on the Ising model of lengths L = 10, 25, 50, 100 obtained using equation (5).Peaks can be seen at respective T c similar to Ising model results.                   2 ) and t Z = 5 in equation ( 16) where µ X is varied but µ Z = 1 2 fixed.T ZX is monotonically increasing with respect to µ X .T (tZ ) ZX is affected by µ X .  2 ) and t Z = 5 in equation ( 16) where µ X = 1 2 fixed and µ Z is varied.µ Z does not effect T for the simulation of the Ising model in two dimensions with periodic boundary conditions.The algorithm proposed by Metropolis and co-workers in 1953 was designed to sample the Boltzmann distribution γ B by artificially imposing dynamics on the Ising model.The implementation of the MMC algorithm in this paper is outlined as follows.A site is chosen at random to be considered for flipping (change of state) with probability γ B .The event of considering the change and afterwards the actual change (if accepted) of the configuration, shall henceforth be referred to as flipping consideration.A sample is taken after each N flipping considerations.
2) and Figure(3).It can be seen that for the Ising model, Mutual Information gives no more information than covariance.From this figure, one can see that the values are system size dependent up to system size L = 50 or N = 2500.We conclude from this that, up to this length scale correlations are detectable across the entire lattice[4].Thus we shall frequently utilize L = 50 when illustration is required.Using time shifted variables we obtained the Transfer Entropy T (τ ) Y X = T (τ ) sY sX in Figure (??).One can see that there is no clear difference between T (τ ) GA and T (τ ) GA on the amended Ising model which explicitly indicates the direction of 'causality' G → A.
is simply Figure (16) plotted over different time lags τ to illustrate how Transfer Entropy correctly and distinctly identified causal lag t G = 10.

Figure ( 18 )
Figure(18) and(19) display Transfer Entropy values for the Ising model and amended Ising model with t G = 1 respectively.The figures illustrate the mechanism in which Transfer Entropy detects the predefined causal delay.Consider the following question: which site 'causes' site A? Firstly we see thatT (1) BA .In Figure(18) the difference is due to distance in space and nearest neighbour interaction in the model, thus T BA since G is further away from A than B. But in Figure(19), the opposite is true and distance in space does not dominate in this interaction.The figure very clearly indicates that G 'causes' A at τ = 1 and B does not.In other words, in the amended Ising model Transfer Entropy identifies G as a source in which one of the target is A, whereas in the Ising model the expected nearest neighbour dynamics presides.This result is only obtained for measures sensitive to transition probabilities.Measures that depend only on static probabilities such as covariance, Mutual Information and conditional Mutual Information will only give values in accordance to the underlying nearest neighbour dynamics in both the Ising model and the amended Ising model[1].
= P ( condition fulfilled | knowledge at time τ ).The value of Q (τ ) sgn(γ) will depend on γ, and in our model here, particularly on whether or not Z n−tz = γ satisfies the condition.One can divide the possible states γ of all the processes into two groups such that G U = {γ ∈ A, Z n−tZ = γ fulfills the condition} and G D = {γ ∈ A, Z n−tZ = γ does not fulfill the condition}.Note that |G U | = n s Ω and |G D | = n s (1 − Ω) since Ω = P ( condition fulfilled ) such that Ω can be interpreted as the proportion of states of Z that fulfill the condition.Due to equiprobability of spins and uniform initial distribution, for any τ there are only two possible values of Q (τ ) sgn(γ) , one for γ ∈ G U and one for γ ∈ G D .Therefore define sgn(γ) such that

Figure 10 .
Figure 10.Transfer Entropy T (10) AG & T (10) GA on the amended Ising model of lengths L = 50 and t G = 10, obtained using equation (5).Direction G → A at time lag 10 is indicated.Very different from result on Ising model in Figure 4.

Figure 13 .
Figure 13.ERC of sites A, B and G on amended Ising model with t G = 10 and L = 50.

Figure 16
Figure 16.T (τ ) GA versus T for different time lags τ in amended Ising model with t G = 10 and L = 50 using equation (5).

( 1 )
GA due to distance in space.

( 1 )
GA in Figure19up to T = 15.Transfer Entropy in the indicated direction stabilizes at higher temperature.

Figure 21 .
Figure 21.Analytical Transfer Entropy T (τ ) ZX versus time lags τ of the Random Transition model with n s = 2 (hence Ω = 12 ) and t Z = 5 in equation (16) where µ X is varied but µ Z = 1 2 fixed.T

Figure 22 .
Figure 22.Analytical Transfer Entropy T (τ ) ZX versus time lags τ of the Random Transition model with n s = 2 (hence Ω = 12 ) and t Z = 5 in equation (16) where µ X = 1 2 fixed and µ Z is varied.µ Z does not effect T

Figure 23 .
Figure 23.Transfer Entropy T (tZ )ZX versus number of state n s for Cases 1, 2 and 3. µ X = µ Z = ns−1 ns are uniformly distributed.Analytical values obtained from substituting respective Ω values in equation(17).Simulated values are acquired using equation(5) on simulated data of varying sample size S where 1K = 1000.

Figure 24 .
Figure 24.Transfer Entropy using equation (17) on simulated null model with varying sample size S where 1K = 1000.Analytical values are all 0.