A study for multiscale information transfer measures based on conditional mutual information

As the big data science develops, efficient methods are demanded for various data analysis. Granger causality provides the prime model for quantifying causal interactions. However, this theoretic model does not meet the requirement for real-world data analysis, because real-world time series are diverse whose models are usually unknown. Therefore, model-free measures such as information transfer measures are strongly desired. Here, we propose the multi-scale extension of conditional mutual information measures using MORLET wavelet, which are named the WM and WPM. The proposed measures are computational efficient and interpret information transfer by multi-scales. We use both synthetic data and real-world examples to demonstrate the efficiency of the new methods. The results of the new methods are robust and reliable. Via the simulation studies, we found the new methods outperform the wavelet extension of transfer entropy (WTE) in both computational efficiency and accuracy. The features and properties of the proposed measures are also discussed.


Introduction
As big data science developments, practical time series methods are demanded to study the complexity and dynamics of the data. Real-world data are time series usually obtained by experiments or observations whose models are diverse and the data are often nonlinear or non-stationary [1][2], e.g. the EEG time series measured from experiments [2][3][4][5] and financial data observed from real-world markets [6]. Therefore, efficient method is necessary to study the dynamics of these complex systems.
Various directed methods have been developed for studying the directed interaction between time series. The most classic causality measure is the Granger causality (GC) [7][8], it is a prime model for causality measures which uses significance tests to detect the directed dependency of one time series on another time series [7][8]. However, GC has many limitations, it is bivariate time domain method suits for only linear models [7][8]. Other similar methods have been derived to cover the limitation of GC [9][10][11][12]. For instance, the conditional Granger causality is a multivariate method that can detect direct interactions between time series [10], the frequency domain GC is derived for frequency domain data analysis [10][11], and the nonlinear GC can be applied to nonlinear data analysis [12]. More advanced measures a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 are such as the Partial Direct Coherence (PDC) [13][14][15][16][17] and Directed Transfer Function (DTF) [18][19], which are sophisticated frequency domain measures. However, both PDC and DTF are linear measures whose validity rely on the linear autoregressive or moving average model fitting [13][14][15][16][17][18][19].
Since real world time series are often nonlinear and sometimes nonstationary, the data models are usually varied and unknown, therefore linear causal measures are farfetched to analyze the real-world time series. To analyze the dynamics and interactions between real-world time series and also theoretic models, information transfer measures are preferred [1,12,[20][21][22][23][24][25][26][27][28][29][30]. Information transfer measures are used to detect the directed information transfer between coupled time series which can be used to study the direction of interactions of complex networks. However, the information transfer is a different notion to the causal effect [31][32][33][34]. In [33], J. T. Lizier and M. Prokopenko have used the study of the transfer entropy [27] and information flow to differentiate the concepts of information transfer and causal effect [33]. N. Ay and D. Polani have introduced in [31] the notion of causal independence which allows for defining a measure for the strength of a causal effect. In their work, they call this notion the information flow which is compared with the transfer entropy. A relevant work is presented by Wibral et. al. in [34], in which an extension method from transfer entropy is proposed that account for delayed source-target interactions, while crucially retaining the conditioning on the embedded target state at the immediately previous time step [34]. This new extension method proposed by Wibral et. al. in [34] is proved as the only relevant option in keeping with the Wiener's principle of causality. To clarify the notional causality which is different from notion of information transfer, a clear and systematic literature of causality, including theories and causal models is presented in the work by J. Pearl in [32].
Transfer entropy (TE) is a fundamental information transfer measure proposed by T. Schreiber [27], it is a directed information transfer method that evaluates the bivariate information transfer between coupled time series. Due to the model-free nature of information methods, information transfer measures such as transfer entropy (TE) are preferred in many studies to analyze the interactions for varied models. To suffice the needs of different analysis, many other information transfer methods are derived from the transfer entropy. For instance, the symbolic transfer entropy (STE) [28] and the Partialized Transfer Entropy (PTE) [35] are derived to improve TE for particular applications. However, these transfer entropy measures are computationally redundant, in that they use uniform embeddings in their expressions, which leads to high computational complexity and redundancy in their computations, because variables of no significant contribution to the information transfer detection are also included in the computation [22,29].
To solve this problem, non-uniform state-space embedding methods such as MIME (conditional mutual information from mixed embeddings) [29] and partial MIME (PMIME, a direct version of MIME) [22] are developed to reduce the computational redundancies. Both MIME and PMIME use a progressive scheme of a maximum criterion and a stopping criterion to select significant contributive components from uniform state-space embedding vector to form refined embedding vectors for information transfer detection [22,29]. In consequence, both methods are computational efficient and have wide applications to various data analysis [22,26,29,[35][36][37][38].
Most of these measures require data stationarity [7][8][13][14][15][16][17][18][19]. Real world data such as financial and biological time series are not seriously stationary. Wavelet is an ideal tool for non-stationary data analysis who presents good solutions to time and frequency allocations and outperforms the short-time Fourier transforms [24,[39][40][41][42][43]. In [24], MORLET wavelet [40] is introduced to TE to cover non-stationary and discontinuous data analysis [24]. Since TE is primitive and computationally redundant, we are inspired to use MORLET wavelet to extend MIME and PMIME into computational efficient multi-scale measures that cover the deficiency of WTE, and the new extensions are expected to be useful in various data analysis particular the real-world data analysis.
In this paper, we introduce the MORLET wavelet extension to MIME and PMIME and study their efficiency in application to both model data and real-world time series. The paper is organized as follows. In the Introduction section, we review the background of this study. In the section of Materials and methods, we describe the formulative wavelet extension of MIME and PMIME. In the Results section, four synthetic models (theoretic maps) and two real-world examples (EEG and financial data) are used to demonstrate the efficiency of the proposed extensions, where all simulation studies are compared to the wavelet-extension of TE (WTE). In the Discussion section, application and features of the new methods are discussed. The final conclusion of this study is drawn in the Conclusion section.

Materials and methods
In this section, we introduce the details of the wavelet extension of MIME and PMIME. Here, we refer the two wavelet extensions as WM and WPM, respectively.

MORLET wavelet extension of MIME (WM)
Conditional mutual information from mixed embeddings (MIME) [29] is a time domain nonlinear information transfer measure whose wavelet extension is described as follows.
Assume X and Y are two arbitrary time series of length N, L X and L Y denote the maximum time lag for X and Y. To compute the WM for Y->X, a mother function cðZÞ ¼ p À 1=4 e io 0 Z e À Z 2 =2 [2,24,[39][40][41][42][43] is used to convert the X and Y time series into MORLET wavelet coefficients [24] where ω 0 2 [5,6] is the normalized frequency, time lags τ X (1 � τ X � L X ) and τ Y (1 � τ Y � L Y ) are the translation parameters used to localize the wavelet, s i (1 � i � m, m is the total number of time scales) is the time scale that determines the width and resolution of the wavelet, � represents the complex conjugation [24]. This wavelet setting is the same to the WTE [24]. The WM are computed for every time scale A collective set of candidate components is also defined at the time scale: The same progressive scheme of MIME [29] is used for each time scale s i . The progressive scheme starts with an empty vector b 0 (s i ) = ;. In the first iterative cycle, WM goes through B (s i ) to find the element x 0 that satisfies the maximum criterion [29]: where I(x;V F (s i )) is the mutual information rate between the x and V F (s i ). The element x 0 that satisfies the maximum criterion is selected to join b 0 (s i ) that forms b 1 (s i ) = (x 0 ), x 0 is then removed from B(s i ) and obtains B k−1 (s i ) = B(s i )\{x 0 } [29]. At a k-th iterative cycle, WM seeks the element x 0 in the remaining set B k−1 (s i ) (obtained from the k-1-th iterative cycle) that satisfies the maximum criterion [29] I : and moves the element The progressive scheme stops at a k+1-th iterative cycle and uses b k (s i ) as the final selected embedding vector, if the following stopping criterion is satisfied [29]: Here, A is the significance threshold (between 0 and 1) controls the inclusion of embedding components [29]. This stopping criterion ensures contributive components to be included while prevents useless components from being added. The progressive scheme stops if no significant information can be given when including new component is included [29,[35][36][37][38].
The WM for time scale s i is evaluated by [29]: We note that this WM information transfer between coupled time series is evaluated at the same time scale s i , (i = 1,2,. . .,64), which means that the WM does not evaluate the cross scales information transfers. This is limitation of this method. The evaluation of cross-scale information transfers will be our research of the next stage.

MORLET wavelet extension PMIME (WPM)
WPM is the multi-variate version of WM, which inferences only the direct interactions. Without the loss of generality, assume X, Y and Z are three arbitrary time series of length N, L X , L Y , L Z are the maximum time lags for the three time series. WPM first converts the X,Y, Z time series into MORLET wavelet coefficients [24]: where the mother function and all the other wavelet parameters are the same to WM. WPM values are computed for every time scale. To compute the WPM of Y->X, a future embedding vector [22] of time horizon T is defined: Different from WM, the collective set of candidate components are multivariate that are defined by all time series in the system [22]: The initial selected embedding vector is again an empty vector b 0 (s i ) = ;. WPM follows the same progressive scheme and the maximum criterion to WM with the only difference of the collective set of candidate components. WPM selects candidate components from all rather than two variables in the system which contribute to the inference of direct information transfers [22].
If the progressive scheme stops at a k+1-th iterative cycle and uses b k (s i ) as the final selected embedding vector. The WPM for Y->X at time scale s i is given by where b V k ðs i Þ; b W k ðs i Þ and b U k ðs i Þ are the X, Y and Z components of b k (s i ) [22,29], respectively. Also, we should note that the WPM evaluates information transfer between wavelet coefficients of the same scales.

Bias correction by surrogate data
We use time-shifted surrogate [22,24,29,[44][45][46][47][48][49]50] to test the significance of the results. Take WM as an example. Let fV s i ;t X g and fW s i ;t Y g to denote the MORLET wavelet coefficients for the two arbitrary time series X and Y, the WM X!Y (s i ) indicates the WM information transfer from X to Y evaluated at time scale s i . We fix fV s i ;t X g and permute the temporal indices of fW s i ;t Y g randomly [22,24,29,50] to obtain the surrogate of fW s i ;t Y g. Next, we apply the WM method on the original series of fV s i ;t X g and the surrogate time series of fW s i ;t Y g, the results are denoted as WM X!Y (s i ,q), where q is the index for the surrogates of fW s i ;t Y g. Thus, the bias corrected WM for X!Y is given by [24] In the following context, we use WM C,X!Y (s i ) to denote the bias corrected WM information transfer from X to Y [22,24,29,50]. The bias corrected WM for the inverse direction, and the bias corrected WPM and WTE are similarly defined. In all simulations, we use q = 10 [22,24,29,50].

Contrast information transfer
To obtain the dominance of interaction between coupled time series, we compute the contrast information transfer between paired variables [24]. For example, to analyze the dominance of interaction between two time series X and Y, we compute the contrast information transfer between X!Y and Y!Xfor each time scale

Results
In this section, we use six examples, including both synthetic data and real-world time series, to demonstrate the efficiency of WM and WPM. In these examples, various types of interactions are displayed, and all simulation studies are compared with the method of WTE.

Synthetic data
Synthetic data are examples of model time series generated by equations. These examples include the Henon maps, linear autoregressive models, and Lorenz systems [22,29,[50][51][52][53], which are indeterministic systems that are frequently used for time series analysis [22,29,50,53]. Here, we use four synthetic examples to demonstrate the efficiency of the proposed information transfer measures.
3.1.1 Cosine map. The cosine map consists of two unidirectionally coupled first order autoregressive processes where one of the processes contains a cosine [24]: where n ðxÞ t ; n ðyÞ t are independent zero mean Gaussian random processes with variance σ 2 = 2, and e2[0,1] is the coupling strength controls the linear interaction from X t to Y t . The data of this example can be found in S1 Dataset.
The initial data are randomly generated from normal distribution with zero mean and unit variance. Each data series contains 5×10 4 data points. For comparison purpose, we use the same MORLET parameters for WM and WPM as recommended for WTE [24,40]: r = 0.125, ω 0 = 6,s 0 = 0.5,V = 10, and n = 64.
To analyze the effect of stopping criteria, we compute the WM and WPM as functions of the stopping criterion A. The stopping criterion A represents the proportionality of the conditional mutual information between the current and the previous iterative cycles. In general applications [22,29,46], A is a value close to but no greater than 1 [22,29]. Larger values of A representing looser criteria, while smaller values of A imply more rigid criteria. In the MIME and PMIME studies, the usual choice of the stopping criteria is A = 0.95 for MIME and A = 0.97 for PMIME. These choices of A value are obtained by various simulation studies, A = 0.95 and 0.97 are appropriate A values that not only allow useful lagged values to be detected but also prevent false positiveness from being included [22,29,46]. To study the impact of the stopping criteria A on evaluation of the information transfer, we alter the criterion A = 0.91, 0.93, 0.95, 0.97, 0.99, and present the WM and WPM results (directional inference) in Table 1. Since the cosine map has only two processes, the multivariate measure WPM has same results to that of the bivariate measure WM. We can see from Table 1, when the coupling strength 0.1�e�1, both WM and WPM indicate the correct information transfer from X to Y. When e = 0, the coupling disappears and no flow of information is detected.
An example of the contrast WM results between X and Y (coupling strength e = 0.5) at different scales are shown in Fig 1. In this figure, we can see that different stopping criteria A presents similar results of the contrast WM between X and Y. However, the stopping criterion theoretically becomes looser when A increases. In later Henon map analysis, we will see that A = 0.95 and A = 0.97 are good choices, but A = 0.95 is a bit rigid than A = 0.97 in the  directional inference of indirect interactions, therefore we use A = 0.97 for both WM and WPM in all simulation studies.
In this example, we use the stopping criterion A = 0.97 and the referenced embedding parameters [22,29,[36][37][38]: T = 1 (time horizon, prediction time step) and Lmax = 5 (the maximum time lag). The WM, WPM and WTE information transfer values are filtered with the surrogate bias correction. Moreover, the contrast information transfer values between X and Y are computed for WM, WPM and WTE which are shown in Figs 2-4, respectively.
In these figures, the contrast information transfer values are plotted against time scales and coupling strength. We can see that both all three measures inference the correct linear interaction from X!Y, which are supported by the non-negative surfaces of O WM,X!Y (Fig 2), O WPM,X!Y (Fig 3), and O WTE,X!Y (Fig 4).  In these figures, we can see a 'cosine' shape (a slowly increasing ridge or a 'tail' rather than well-localized in scale) of the surfaces, which nicely reflects the 'cosine' influence on the dynamics of the coupled system [24].
To analyze the influence of the stopping criteria, we take X1 and X3 as an example. Fig 6  shows the contrast WM between X1 and X3 with different stopping criteria and time scales. In this figure, the indirect interaction from X1->X3 can only be detected when A�0.97, and when A = 0.99, WM presents false positiveness for X3->X1. Because WPM is a direct measure, it gives all zeros for the information transfer between X1 and X3 for all stopping criteria.
This Similarly, we analyze the criteria effect on the direct interactions. We take the direct interaction X2->X3 as an example. The WM and WPM information transfer between X2 and X3 with different stopping criteria A are shown in Figs 7 and 8. We can see from these figures, the different stopping criteria presents similar values of the contrast results for both WM and WPM, and the trends of the contrast WM and the contrast WPM are similar. This is because this interaction from X2->X3 is direct, and the WM and WPM may present similar results.  Also, we note that in the contrast results of WM, when A = 0.99 the contrast WM presents negative values which implies a false direction from X3->X2. This maybe because the A is large and the stopping criteria become too loose that false positive is detected in this situation.
For the overall situation for the directional inference, the average information transfer values (over time scales) for the Henon maps are shown in Fig 9. In this figure, the average information transfer values are plotted by color-matrices, the correspondence between the colors and the information transfer values is shown in the color-bar. In the color-matrices, the directional inference of each lattice is from the row channel to the column channel, e.g. the (1,2)-th lattice in the color-matrix represents the average information transfer for X1 ! X2. We can see that WM indicates all the correct directions of interactions, while WPM inferences only the direct interactions. In this study, WTE only indicates clear interaction from X1 ! X2,   . Alternatively, it may due to the full correlation between the coupled time series that if the time series are fully correlated, the system becomes deterministic, whose values of information transfer between the wavelet coefficients become zero. The signal correlation at certain common frequencies [24,[39][40][41][42][43] is also be a possible reason for the vanishing causalities.
The bias corrected information transfer values are computed for each direction. The WM and WPM information transfer for the X1->X2 are similar, and all three measures present the correct interaction from X1->X2. We take the direction of X1->X3 as an example. The information transfer for both X1->X3 and X3->X1 are presented in the Fig 13. In this figure, we can see that all three measures detect the correct nonlinear interaction from X1->X3. This can also be seen from the plot of the contrast information transfer between X1 and X3 in Fig 14. For the interaction between X2 and X3, the contrast information transfer between X2 and X3 is shown in Fig 15. 3.1.4 Three coupled Lorenz systems with nonlinear couplings. The three coupled Lorenz systems with nonlinear couplings X 1 !X 2 and X 2 !X 3 are given by the following equations [50]: where k (k = 0,1,3,5) is the coupling strength regulates the interaction from X 1 !X 2 and X 2 !X 3 . All time series become completely synchronized when the coupling strengths k�8.

Real world time series
In this section, we use two real-world data examples to demonstrate the analysis of WM and WPM. The examples include a set of EEG data measured from experiments and a financial data set observed from real market.

The reading experiment.
The reading experiment is comprised of a reader and a listener whose EEG data are measured when the reader is reading a short story to the listener. This experiment has been reported in [26,38] for information flow test. The EEG data is made up of 10 channels for each participant, which are measured from 10 international standard electrodes [26,38] at 100Hz frequency. The set of EEG time series are typical nonlinear and non-stationary [2][3][4][5]. The reader and the listener together form a "driver-responder" system. Here, we use WM and WPM to test the information transfer for the EEG data. The data of the reading experiment can be found in S5 Dataset.   1,2,. . .64). By comparing the strength between X1 and X3, only WM identifies the indirect direction of interaction from X1->X3, WPM is a direct measure that presents zero information transfer between X1 and X3. WTE presents almost zero information transfer between X1 and X3 with slightly higher X3->X1 than X1->X3. To view the instantaneous dynamics of the system, the entire data is split into equal-space time windows of 4 seconds [26,38]. The entire data contains 30 such time windows, we use intermediate 20 consecutive time windows (from the 6 th to the 25 th windows) to demonstrate the analysis. WM and WPM are supposed to detect the directed interaction from the reader to the listener [26,38]. Fig 25 shows the average values of information transfer (over windows and scales) for the 20 channel EEGs of reader and the listener. The color lattices indicate the magnitudes of the average information transfer values, whose direction is read from the row channel to the column channel. In this figure, the 20x20 color-matrix are divided into two 10x10 diagonal blocks presenting "intra-brain" interactions within the participants and two 10x10 off-diagonal blocks presenting the "cross-brain" interactions between different participants. Here, both WM and WPM indicate information transfer from the reader to the listener, because the upper-right block (Reader->Listener) is bit brighter than the lower-right block (Listener->Reader). The color-matrix of WTE is nearly symmetric, which is hard to detect a clear direction.
To analyze the directed interaction between the reader and the listener, the contrast information transfers for Reader!Listener are plotted in  To study the dynamics at different time scales, we also compute the contrast information transfer for the three measures, an example of the Germany bond futures is shown in  In Fig 32, WM indicates clear interactions from GER5->GER2 (black) and GER5->GER10 (red), and fluctuant interaction between GER10 and GER2 (blue). WPM (Fig 33) also indicates clear interaction from GER5->GER2 (black) and GER5->GER10 (red), and also GER2->GER10 (blue). WTE (Fig 34) indicates GER5->GER10 (red), GER2->GER10 (blue) and GER2->GER5 (black). For each lattice the directional inference is read from the row channel to the column channel. In the color-matrices, the diagonal blocks present the "intra-brain" information transfers within each participant, while the off-diagonal blocks present the "cross-brain" interactions across different participants. We also examine the information transfer for each independent direction. An example of the information transfer between Germany bond futures is shown in Fig 35. In this figure, the three graphs separately show the directed information transfer between the three Germany bond futures. In these graphs, we can see that both WM and WPM give higher information transfer values for GER5->GER10, WTE presents some of the positiveness for GER5->GER10, but the strength of WTE is quite subtle that almost vanish. For the interaction between GER10 and GER2, both WM and WPM indicate dominant information flows from GER2->GER10 at middle time scales, again WTE presents subtle information transfers that approximately vanish. In the third graph, both WM and WPM show dominant information flow from GER5->GER2, while WTE fails to identify the interactions.
The overall results of the directed interaction for the bond futures agree with the outcomes found by MIME [55]. The difference is that the WM and WPM can interpret the directed interaction at multi-scales. Since the scales are related to the frequencies, one can use these multi-scale measures to specify the interactions or time series correlation at specific scales or  frequencies, the multi-scale interpretation of information transfer may help one to find more information from given datasets.

Discussion
In this paper, we propose two multi-scale information transfer measures, namely the WM and WPM, which are extended from conditional mutual information measures by using MORLET wavelet. By a series of simulation studies, the two measures are proved to be efficient and accurate in directional inference, which are computational efficient and outperform the wavelet extension of transfer entropy (WTE) in various situations. Particularly, the two measures are very useful in real-world data analysis.
The two proposed measures have many good advantages. By using wavelet, the proposed measures are able to inference the directed interactions at multi-scales. This not only helps to discover more information between coupled time series, but also solves the problem for nonstationary and discontinuous data analysis [24,[39][40][41][42][43]. One reason that we use MORLET wavelet to do the measure extensions is that MORLET wavelet is believed to be closely related to human perception and has vital applications to medicine [39][40][41][42][43], which may have wide- applications in medicine and biological data analysis e.g. EEG data analysis. The other reason is that M. Lungarella and A. Pitti have successfully introduced MORLET wavelet to Transfer Entropy, which is proved to be feasible in information transfer detection [24] of non-stationary and discontinuous data analysis. Since real-world time series are sometimes non-stationary and discontinuous, it is necessary to have this wavelet-extension that suits for practical data analysis.
Due to the basis of conditional mutual information measures, the proposed multi-scale measures can have wide-applications to various data models. Information transfer measures are a type of very useful measures. Although we note that information transfer is a different concept from that of the causal effect [31][32][33][34], but the efficiency of directional inference and the model-free advantage of the information transfer measures guarantee their wide-applications in various types of data analysis particularly the real-world data analysis [26]. Transfer entropy (TE) is a fundamental information transfer measure, which is a good prototype for other derivative information measures. The work by M. Lungarella and A. Pitti [24] shows the wavelet derivation of TE for non-stationary and discontinuous data analysis. However, WTE uses uniform state-space embedding vector [25][26][27] which is computational redundant and costs long time of computation. This drawback not only affects the accuracy in information flow detection but also limits WTE from large dataset applications.
To avoid the computational redundancy and improve the speed and accuracy, we do the extension of wavelet on two conditional mutual information measures of mixed embeddings [22,29]. The two prototype measures are the MIME and PMIME, while the latter is the direct  version of the former. These two measures are proved to be accurate and efficient in various data analysis [22,26,29,[35][36][37][38]. They use a progressive scheme and a stopping criterion to select only useful embedding components to be included and also to prevent false causalities [22,29]. This stopping criterion is used as a threshold that balances the inclusion of useful lagged elements and the exclusion of useless lagged elements [22,26,29,38,46]. By using the selected non-uniform state-space embedding vectors, MIME and PMIME exclude the redundant components in information transfer evaluation, which not only guarantees the accuracy but also removes the computational redundancy.
In the simulation study, the parameters used are referenced from early studies [22,29,24,38,50]. Other parameters are plausible but depend on specific type of the applications. In this paper, we particularly studied the influence of the stopping criterion on the information transfer detection. The stopping criterion A is the proportionality of the conditional mutual information rate between the current and the past iterative cycles. We see that appropriate choices of this criterion help to ensure the true interactions to be identified, and false positiveness are prevented. Via simulation studies, we found that A = 0.97 and A = 0.95 are good choices for the stopping criterion. The stopping criterion A should not be too large or too small, because a too large A for instance A>0.97 will cause false interactions to be detected, while a too small A for instance A<0.95 will be too rigid for the criterion that it often fails to identify the interactions that truly exist. However, the choice of the stopping criterion may also depend on the datasets to be analyzed.
By definition of the WM and WPM methods, we should note that both WM and WPM evaluate the information transfer between wavelet coefficients of the same scales rather than different scales. This may be a limit of these two measures. However, the detection of information transfers across scales will be our next stage study.
In the synthetic data analysis, we note that the information transfer declines to zero for large time scales. Many reasons can explain for this phenomenon. One reason is that, when time scale grows, the frequency and resolution decrease, hence details of the time series are smeared out, the directed interaction becomes too weak to be detected. The other reason is the characteristic correlation time of theoretic maps [51-52, 54]. When the time scale exceeds the characteristic correlation time, wavelet coefficients of the coupled time series become fully correlated and the system becomes deterministic, information transfer measures such as MIME and PMIME are vanished for deterministic systems [22,29,50]. Another reason is that, the time scale threshold may correspond to the frequency where two time series have large mutual information between each other, the wavelet coefficients are correlated at this frequency and the system becomes deterministic. This concept of correlation is one of the key features of wavelet [24,[39][40][41][42][43] that makes WM and WPM special in this case. We are interested to make the bold hypothesis that the WM and WPM may be able to inference the cross-correlation time between time series and may also be able to identify the common frequencies for signal correlations.

Conclusion
In this paper, we have proposed two multi-scale information transfer measures, namely the WM and WPM, which are the MORLET wavelet extension of conditional mutual information from mixed embedding measures. Both measures are model-free and accurate in information transfer detection of various datasets. By using non-uniform state-space embeddings, both WM and WPM are computational efficient which outperform WTE in both accuracy and speed. Due to the nature of wavelet, the proposed measures may have wide-applications including also non-stationary and discontinuous data analysis.