Inferring Weighted Directed Association Network from Multivariate Time Series with a Synthetic Method of Partial Symbolic Transfer Entropy Spectrum and Granger Causality

Complex network methodology is very useful for complex system explorer. However, the relationships among variables in complex system are usually not clear. Therefore, inferring association networks among variables from their observed data has been a popular research topic. We propose a synthetic method, named small-shuffle partial symbolic transfer entropy spectrum (SSPSTES), for inferring association network from multivariate time series. The method synthesizes surrogate data, partial symbolic transfer entropy (PSTE) and Granger causality. A proper threshold selection is crucial for common correlation identification methods and it is not easy for users. The proposed method can not only identify the strong correlation without selecting a threshold but also has the ability of correlation quantification, direction identification and temporal relation identification. The method can be divided into three layers, i.e. data layer, model layer and network layer. In the model layer, the method identifies all the possible pair-wise correlation. In the network layer, we introduce a filter algorithm to remove the indirect weak correlation and retain strong correlation. Finally, we build a weighted adjacency matrix, the value of each entry representing the correlation level between pair-wise variables, and then get the weighted directed association network. Two numerical simulated data from linear system and nonlinear system are illustrated to show the steps and performance of the proposed approach. The ability of the proposed method is approved by an application finally.


Problem Statement
Association networks are found in many domains, such as networks of citation patterns across scientific articles [1][2][3], networks of actors co-starring in movies [4][5][6], networks of regulatory influence among genes [7,8], and networks of functional connectivity between regions of the brain [9,10]. The rules defining edges in association networks are not the same. In general, if number of measurements are available. Villaverde et al. [19] reviewed some of the existing information theoretic methodologies for network inference, and clarify their differences.
In addition, approaches rooted in Bayesian Networks (BN) employ probabilistic graphical models in order to infer causal relationships between variables. Aliferis et al. [20] presented an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples. The selected feature sets can be used for causal discovery and classification. Dondelinger et al. [21] introduced a novel information sharing scheme to infer gene regulatory networks from multiple sources of gene expression data. They illustrate and test this method on a set of synthetic data, using three different measures to quantify the network reconstruction accuracy. As a review paper, Lian et al. [22] first discussed the evolution of molecular biology research from reductionism to holism. This is followed by a brief insight on various computational and statistical methods used in GRN inference before focusing on reviewing the current development and applications of DBN-based methods.
Granger causality (GC) is also a very popular tool for association networks inference. It can assess the presence of directional association between two time series of a multivariate data set. GC was introduced originally by Wiener [23], and later formalized by Granger [24] in terms of linear vector autoregressive (VAR) modeling of multivariate stochastic processes. Tilghman and Rosenbluth [25] presented Granger Causality as a method for inferring communications links among a collection of wireless transmitters from externally measurable features. The link inference method was applicable to inferring the link topology of broad classes of wireless networks, regardless of the nature of the Medium Access Control (MAC) protocol used. Cecchi et al. [9] presented a scalable method, based on the Granger causality analysis of multivariate linear models, to compute the structure of causal links over large scale dynamical systems that achieves high efficiency in discovering actual functional connections. The method was proved well to deal with autoregressive models of more than 10,000 variables. Schiatti et al. [26] compared the GC with a novel measure, termed extended GC (eGC), able to capture instantaneous causal relationships. The practical estimation of eGC worked with a two-step procedure, first detecting the existence of zero-lag correlations, and then assigning them to one of the two possible causal directions using pairwise measures of non-Gaussianity.
Of course, there are many more methods for association networks inference and we have not mentioned above, such as neural network [27], SparCC [28], S estimator [29,30], Maximal Information Coefficient (MIC) [31], Local Similarity Analysis (LSA) [32,33], and so on. They all showed some excellent performance through experiment and observation.
Although any of the abovementioned researches have its advantages approved by different styles, it is not always suitable for any network inference problem. Because each strategy applies different assumptions, they each have different strengths and limitations and highlight complementary aspects of the network. In this paper, we aim at inferring weighted directed association network from multivariate time series and the abovementioned methods can't meet our requirements well. For instance, some of these popular tools are non-directional, e.g. correlation or partial correlation, mutual information measures and Bayesian Networks, thus these measures cannot satisfy one's directed association networks inference study [34]. Granger causality is able to detect asymmetry in the interaction. However, its limitation is that the model should be appropriately matched to the underlying dynamics of the examined system, otherwise model misspecification may lead to spurious causalities [35]. Some of the proposed methods cannot detect indirect relationships, such as basic correlation, mutual information and Bayesian Networks. Some of the proposed methods mainly deal with linear problem, e.g. Pearson correlation and Spearman correlation, but are not appropriate for nonlinear problem.

Primary Contribution of This Work
To address the issues mentioned above, we will propose an approach called small-shuffle partial symbolic transfer entropy spectrum(SSPSTES). This work face with five challenges: 1. Time series being non-stationary and continuous: It is very important that the time series is statistically stationary over the period of interest, which can be a practical problem with transfer entropy calculations [36]. In addition, it is problematic to calculate the transfer entropy on continuous-valued time series. Thus, here we will resort to an extended solution of transfer entropy, i.e. symbolic transfer entropy.
2. Threshold selection: Many current methods, e.g. correlation efficient, mutual information and transfer entropy, decide whether exists an edge between two time series by threshold selection. If a larger value is selected, it will loss many real correlations and result a sparse network. By contrast, if a smaller threshold is selected, it will bring many spurious relationships and result a dense network. Although there are many researches on threshold selection, it is still difficult for user to select a proper threshold when inferring association network. The proposed method is a solution for this problem.
3. Strong relationships identification: In general, we are more interested in the strong correlation than weak correlation. Because the relationships among these variables are unknown, strong correlations are more convincing but weak correlations have a greater probability of misidentification and this may bring a serious consequence. In addition, strong correlation is usually direct relation and not indirect relation. It is expected in the inference of association network.
4. The direction and quantity of influence: The direction of edge is crucial for network prediction and evolution. It means that the proposed method should have the ability of detecting the directional influence that one variable exerts on another between two variables.

Temporal relation identification:
The proposed method has some ability of detecting the specific temporal relation based time lags, namely the function relation of time.
6. In the next section, we will propose a method of inferring association network from multivariate time series. The emphasis is on how to solve the five challenges mentioned above. Section 3 will apply the proposed method to two numerical examples whose coupled relationships of their components are clear and the values are time-varying. We summarize the results of this paper and figure out some topics for further study in Section 4.

Methods
In this section, we will explain the proposed approach in detail. First, we will show you an integrated framework of the approach, and then carry out a detailed description around the framework.

Main Principle
The approach designed for association network inference takes exploration and application into account so that minimizing human intervention when modeling. Therefore, the approach starts with inputting data and ends with outputting a network inferred from multivariate time series. The modelling process is transparent for users. The main principle of the proposed approach is shown in Fig 1. The integrated framework has three layers. The first layer, so-called Data Layer, is the interface interaction with users. One thing to do in this layer is to input the original multivariate time series and modelling parameters, the other thing to do is to shuffle the original data several times with a surrogate data method. The most important and complicated layer of the framework is the second layer, i.e. Model layer. We will identify all the impossible relationships among the multivariate time series in this layer. In order to achieve this goal, the core things to do are time series symbolism, partial symbolic transfer entropy calculation and spectrum construction. The output of this layer is candidate relationships. The task of last layer is to construct a weighted directed network. In order to retain the strong correlation only, the candidate relationships are filtered. For the indirect correlation, it is removed by DPI(Data Processing Inequality) [37]. For the bidirectional correlation, we deal with this problem by an empirical criterion. In the inferred association network, the start node of an arrowed edge represents a driven variable and the end node represents its corresponding variable. The weight of an edge quantifies the correlation between two nodes, i.e. time series variables.
As shown in Fig 1, there are seven key processing operations, represented by rounded rectangles, to accomplish association networks inference. Thus, we will introduce the seven steps one by one in the rest of this section. The black solid arrowed line in the flow diagram represents the determined sequential process and the blue dashed arrowed line, along with a Boolean condition, represents potential process. When the value of condition expression is false, the corresponding process will be carried out. Each rounded rectangle represents a key processing operations using a specific method and each hexagon represents a staged result.

Small-Shuffle Surrogate Data Method
The technique of surrogate data analysis is a randomization test method [38]. Given time series data, surrogate time series are constructed consistent with the original data and some null hypothesis. The random-shuffle surrogate (RSS) method proposed in [38] can test whether data can be fully described by independent and identically distributed random variables. As summarized in [38,39], the limit of RSS method is that it destroys any correlation structure in data. That is, not only the short-term relationship but also the long-trend relationship between two variables are also destroyed. The RSS method assumes global stationarity and performs a pairwise linear decoupling between channels. But in many typical examples the individual channels are also influenced by other nonstationary variation. So we prefer to use the smallshuffled surrogate (SSS) method proposed in [39][40][41].
The SSS method destroys local structures or correlations in irregular fluctuations (shortterm variabilities) and preserves the global behaviors by shuffling the data index on a small scale. The steps using SSS method are described as follows.
Let the original data be and so x(i(t)) = x(t)], let g(t) be Gaussian random numbers, and s(t) will be the surrogate data.
i. Shuffle the index of x(t): where A is an amplitude. ii. Sort i'(t) by the rank order and let the index of i'(t) beîðtÞ.
iii. Obtain the surrogate data: Parameter A reflect the extent of shuffling data. A higher value of parameter A results more difference between surrogate data and original data. On the contrary, the smaller the value of A, the less the difference. The parameter A is input at the beginning of the method and its empirical value of A is 1.0.

Time Series Symbolization
The technique of time series symbolization was introduced with the concept of permutation entropy [42,43]. This technique makes many other researches on time series get new progress and bring us some new techniques, e.g. permutation entropy [42] and symbolic transfer entropy(STE) [43]. It is helpful to deal with the problem of continuous and non-linear time series. The principle of time series symbolization is described as follows: For original multivariate time series, let two time series V 1 ,V 2 , be {v 1,t }, {v 2,t } respectively, t = 1,2,Á Á Á,k. The embedding parameters in order to form the reconstructed vector of the time series V 1 are the embedding dimension m 1 and the time delay τ 1 . Accordingly, m 2 and τ 2 are the embedding parameters defined for V 2 . The reconstructed vector of V 1 is defined as: where t = 1,2,Á Á Á,k' and k' = k − max((m 1 −1)τ 1 ,(m 2 −1)τ 2 ). For each vector ν 1,t , the ranks of its components assign a rank-pointν 1;t ¼ ½r 1;t ; r 2;t ; Á Á Á ; r m 1 ;t where r j,t 2 {1,2,Á Á Á,m 1 } for j = 1,2,Á Á Á,m 1 .ν 2;t is defined accordingly.

Partial Symbolic Transfer Entropy Calculation with Different Time Lags
Symbolic transfer entropy means that our transfer entropy calculation is based on symbolic time series data in section 2.3. Symbolic transfer entropy is defined as follows [43]: where τ is the time delay, pðν 1;tþt ;ν 1;t ;ν 2;t Þ, pðν 1;tþt jν 1;t ;ν 2;t Þ and pðν 1;tþt jν 1;t Þ are the joint and conditional distributions estimated on the rank vectors as relative frequencies, respectively. Symbolic transfer entropy uses a convenient rank transform to find an estimate of the transfer entropy on continuous data without the need for kernel density estimation. Since slow drifts do not have a direct effect on the ranks, it still works well for non-stationary time series [34].
The partial symbolic transfer entropy(PSTE) [34] is defined conditioning on the set of the remaining time series z = {v 3 ,v 4 ,Á Á Á,v n }.
where the rank vectorẑ t is defined as the concatenation of the rank vectors for each of the embedding vectors of the time series in z. The partial symbolic transfer entropy is similar as partial correlation, it can eliminate some of the indirect correlation and remain the pure or direct information flow between v 2 and v 1 . Due to the time delay is underdetermined, the partial symbolic transfer entropy is calculated n times for each pair of time series. This process is described using algorithm 1 shown in Box 1. We first use algorithm 1 to get a list of symbolic transfer entropy matrix on original time series. Then we shuffle the original data several times which has been specified at the beginning of our method. We repeat the algorithm 1 on each shuffled data accordingly.

Partial Symbolic Transfer Entropy Spectrum Composition
Partial Symbolic Transfer Entropy Spectrum(PSTES) is defined as follows: The PSTES between time series Y and X is composed of their many partial symbolic transfer entropy curves drawn in a rectangular coordinate system. The horizontal axis represents different time delays and the vertical axis represents transfer entropy. One of the transfer entropy curves is resulted from original data and other curves are resulted from shuffled data.
Let L o Y!X be the transfer entropy curve of original data, L s Y!X be the transfer entropy curve of shuffled data, then PSTES between Y and X can be denoted as follows: In order to compose transfer entropy spectrum, we must understand the structure of the output in section 2.4. The output is a complicated list of PSTE matrix. For each data, original data or shuffled data, a list of PSTE matrix with different delays is returned after carrying out algorithm 1. Thus, for all data, the returned result of last step is a list of PSTE matrix lists. The parameters input at the beginning of the method are maximum time delay tm and shuffling times sm. Let tm = 10, sm = 99, then the output of last step is a list of 100 elements and each element is a list of 10 transfer entropy matrices. Moreover, each entry of the transfer entropy matrix reflects the correlation strength of a pair of time series. Thus, according to the define of PSTES, we first split the output of section 2.4 into pieces and then compose partial symbolic transfer entropy spectrum in a certain way.

Correlation Identification and Filter
Candidate relationships identification. The target of the proposed method in this paper is strong correlation identification and is not all correlation among multivariate time series. The scenario for this method is that we don't know the relationships in the complex system. We pay more attention to the precision of correlation identification than the sensitivity. Because the misidentification of relationships among variables may bring a serious consequence to our data analysis.
Our decision whether existing a strong correlation or not between two variables is made by the characteristic of PSTES. This characteristic is based on the theory of hypothesis testing which is often used in surrogate data method [30,34,38,41]. Discriminating statistics are necessary for surrogate data hypothesis testing. The cross correlation and average mutual information were selected as discriminating statistics in [40,41], and partial symbolic transfer entropy in [34]. In this paper, we consider transfer entropy as discriminating statistics. The surrogate data method also need a null hypothesis. Applying a statistical hypothesis test can result in two outcomes, i.e. the null hypothesis is rejected or not. There are two type of errors when using the hypothesis testing. If the null hypothesis is rejected and it is true, this is called type I error; if we fail to reject the null hypothesis when it is in fact false, this is called type II error. The null hypothesis in our proposed method is that there is no short-term correlation structure between the data or that the irregular fluctuations are independent. In the symbolic transfer entropy spectrum, if the symbolic transfer entropy of the original data falls outside the distribution of the SSS data and existing an outlier point that its value is greater than any other points' value, we can reject the null hypothesis. As a result, we consider that there is a short-term correlation structure between the data and this correlation is a strong correlation. Otherwise, we accept the null hypothesis and consider that there is not a strong correlation between the data. The output of this step is an adjacency matrix and its entry a ij is denoted as follows: where t 2 (1,2,Á Á Á,tm), s 2 (1,2,Á Á Á,sm), PSTE o i!j ðtÞ is the partial symbolic transfer entropy from variable i to variable j with a time delay t based on the original data and PSTE s i!j is the partial symbolic transfer entropy with all different time delays from variable i to variable j based on the shuffled data.
Relationships Filter. In order to retain the strong correlation only, the candidate relationships are filtered. In order to deal with the indirect correlation, three ideas are synthesized into the filter method.
The first component of the filter method is DPI(Data Processing Inequality) [37]. The data processing inequality of information theory states that given random variables X, Y and Z which form a Markov chain in the order X->Y->Z, then the mutual information between X and Y is greater than or equal to the mutual information between X and Z. Of course, the mutual information between Y and Z is greater than or equal to the mutual information between X and Z. PSTE is extended from mutual information, so we deal with indirect relations according to the following equations: THEN the relationship between X and Z is removed. Second, for the bidirectional correlation, we deal with this problem by an empirical criterion. The criterion is defined as follows: Third, although PSTE measures the correlation of variation trend, it doesn't measure the correlation of value. As a complementary method, we introduce Granger causality which is based on the residual of linear model. The strategy is as follows: After this step, we will get the final 0-1 adjacency matrix. If a ij = 1, the relationship between i and j is called strong relationship.

Association Network Inference
The association network inferred from multivariate time series can be denoted as G = (V,E). Here V = {v 1 ,v 2 ,Á Á Á,v n } is the set of vertices, i.e. time series variables, and E is the set of edges, i.e. the strong correlations, identified in the section 2.6, between each pair of vertices in V.
From the 0-1 adjacency matrix from the last step, we have determined the direction of the network. In this step, we assign a weight to the edges in E. The selected measure for the weight is the corresponding maximum symbolic transfer entropy of original data calculated in section 2.4 and the Eq (6) is transformed as follows: where i is the driven variable, and j is the response variable. Finally, we can plot the association network based the weighted adjacency matrix denoted as Eq (7) and carry out deep network analysis.

Results
In this section, we demonstrate the application of the propose method to simulated time series data from two types of complex system, i.e. linear system and nonlinear system. The relationships among the variables in these two examples is clear and therefore we can assess our method by some measures. In all the following cases, the parameters for modelling with SSSTES method are shuffling amplitude A = 1.0, the dimension of symbolic time series m = 3, maximum time delay tm = 10, maximum shuffling times sm = 99, time point t = 1,2,Á Á Á,1000. These parameters are input in the Data Layer shown in Fig 1. Numerical Example from linear system First, we apply our method to a linear system which has five time series variables, i.e. x 1 (t), x 2 (t), x 3 (t), x 4 (t), x 5 (t). The relationships among these variables are modelled by the following expressions [41]: x 2 ðtÞ ¼ 20 þ 0:6x 2 ðt À 1Þ À 0:4x 2 ðt À 6Þ þ r 2 ðtÞ; ð10Þ x 3 ðtÞ ¼ 2:2 þ 0:2x 1 ðt À 2Þ þ 0:5x 3 ðt À 1Þ þ 0:3x 4 ðt À 9Þ þ r 3 ðtÞ; ð11Þ x 4 ðtÞ ¼ 1:5 þ 0:7x 1 ðt À 2Þ þ 0:3x 4 ðt À 1Þ þ r 4 ðtÞ; ð12Þ x 5 ðtÞ ¼ 10 þ 0:9x 4 ðt À 4Þ þ 0:1x 5 ðt À 1Þ þ r 5 ðtÞ: Their fluctuations seem to be irregular and don't have obvious trend but they have linear relationships in real. If the variable y is a linear combination of variables x 1 ,x 2 ,Á Á Á,x n , we say y is a response variable and x 1 ,x 2 ,Á Á Á,x n are the drive variables. In the network, we denote the driveresponse relationship between y and x 1 as a arrowed edge from x 1 to y. Therefore, the responding network of above linear system is shown in Fig 3(A).
As shown in Fig 3(A), variable x 1 is driven by two other variables x 2 ,x 4 , variable x 3 is driven by variables x 1 ,x 4 and x 4 is driven by x 1 . However, x 2 and x 5 is not driven by any other variables and it is only as a driven variable of x 1 . It is noted that there are autocorrelations in Eqs (9)-(13) but we do not show the autocorrelations in Fig 3(A). In this paper, we focus on the relationships among different variables but not concern the autocorrelation.
After generating the simulated data(S1 Dataset) by Eqs (9)- (13) in Data Layer shown in Fig  1, what we should do is modelling with the proposed method SSPSTES. This process has been described in detail in section 2.3, 2.4, 2.5, 2.6. The shuffled data used in modelling process is generated with the method described in section 2.2. One output of the Model Layer is the symbolic transfer entropy spectrums shown in  transfer entropy. In each plot, the red curve is resulted from original data and other curves are resulted from shuffled data.
Next, the candidate relationships are filtered by the method described in section 2.6.2. After this step, we get all the strong relationships and the output is a 0-1 adjacency matrix. The resulted adjacency matrix is described by Eq (14): From this adjacency matrix, we find that five candidate relationships are removed and the other five retained relationships are considered as strong relationships, i.e. x1->x4, x2->x1, x4->x1, x4->x3, x4->x5. These identified strong relationships are all correct but one real relationship is filtered out mistakenly, i.e. x1->x3.
Finally, we infer a weighted directed association network in the last layer. From Eq (14), we can get a directed network and then we should quantify the correlation strength between those pairs of relationships that have been identified out above. Therefore, we introduce a correlation measure into adjacency matrix C and get a new weighted adjacency matrix C 0 , whose entries is described as Eq (8). The selected measure is the maximum partial symbolic transfer entropy with different time lags of original data. Then, we get the weighted adjacency matrix The association network corresponding to the matrix C 0 is shown in Fig 3(B). In Fig 3(B), each time series is mapped as a node, and each arrowed edge stands for a drive-response relationship, and we associate each edge with a weight value, i.e., the max partial symbolic transfer entropy value, which is mapped as the width of the lines. As we see, the relationship from x 4 to x 5 is the most strongest one. In Fig 3, the original network (A) has six directed edges and the inferred network (B) has five edges. By comparison, we find that the five edges of inferred network all exist in the original network, thus we get a higher precision.
In order to assess the performance of the proposed method, we use two indicators, i.e. precision and sensitivity(or recall, true positive rate) [44,45]. Precision is defined as Eq (16) and sensitivity is defined as Eq (17).
Here, TP is the numbers of edges which are in the intersection between original edge set and inferred edge set, FP is the number of edges which is in inferred edge set but not in original edge set and FN is the number of edges which is not in inferred edge set but in the original edge set. In order to test whether the model is sensitive to the system noise, we generate ten groups of data generated by Eqs (9)- (13) and then apply the proposed method to these data. As a result, we get ten precision values and sensitivity values and their average values shown in Table 1. From Table 1, the average precision of our model is higher to 0.86 and the average sensitivity achieve to 0.80 although it is inferior to precision. Next, we discuss the temporal relation identification of the proposed method. Please note that the following discussion is based on those edges inferred correctly. The time lag assigned to two correlation variables is the time point when PSTE of original data achieve the maximum value. Based on this definition, we define a measure, i.e. the precision of time lags(PTL), to assess the temporal relation identification of the proposed method. It is defined as Eq (18): Here, TPL is the correct number of temporal relation identification in those edges which have been identified correctly, FPL is the error number of temporal relation identification in those edges which have been identified correctly. The results of PTL are shown in Table 1. We get a higher PTL 1.00.
In addition, we discuss how the dimension of symbolic time series affects the performance of the proposed method and the results are shown in Table 2. With dimension 2, the precision is 0.84 and the sensitivity is 0.70. With dimension 3, the precision is 0.86 and the sensitivity is 0.80.
We also discuss how the length of data affects the performance of the proposed method and the results are shown in Table 3. It is found that the precision is more higher with the increase of the length of data. The sensitivity is unstable, but it keeps a high level. Although the performance of the proposed method is affected by the data length, we still get a good result when the length of data is small such as 500.
SSPSTES is a synthetic method, we make a comparison between the proposed method and some other common methods. The results are shown in Table 4. The precision of SSPSTES is highest, i.e. 0.86. The sensitivity of SSPSTES is higher than two other methods, i.e. STE and PSTE. Although the sensitivity of GC [24,46] is highest, its precision is too small. Therefore, we conclude that SSPSTES is good at inferring association network from linear time series. The selected p value of GC is 0.01. The selected threshold value of STE and PSTE is the mean value. If the STE or PSTE between two time series variables is bigger than the mean value, we say there is a strong relationship between these two variables. Numerical Example from nonlinear system In this section, we validate whether the proposed method work well for nonlinear system. The simulated data is generated by Eqs (19)-(24): x 1 ðtÞ ¼ 2:7 þ 0:5x 1 ðt À 1Þ þ r 1 ðtÞ; ð19Þ x 2 ðtÞ ¼ 1:7 þ 0:2x 2 ðt À 1Þ þ 0:3x 2 1 ðt À 1Þ þ r 2 ðtÞ; ð20Þ x 4 ðtÞ ¼ 2:1 þ 0:25x 4 ðt À 1Þ À 0:7x 5 ðt À 2Þ þ 0:6x 3 ðt À 4Þ þ r 4 ðtÞ; ð22Þ x 5 ðtÞ ¼ 1:5 þ 0:35x 5 ðt À 1Þ À 0:5x 4 ðt À 3Þ þ r 5 ðtÞ; ð23Þ x 6 ðtÞ ¼ 1:3 þ 0:2x 6 ðt À 1Þ þ 0:4x 2 ðt À 1Þx 3 ðt À 5Þ þ r 6 ðtÞ: Here, r i (t)(i = 1,2,3,4,5,6) are random noise, independent and identically distributed Gaussian random variables with mean zero and standard deviation 1.0. In this example, all variables except x 1 are nonlinear. In Eq (20), there is a square item x 2 1 ðt À 1Þ and this results that x 2 is nonlinear. In Eq (21), there is a square root item ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi x 1 ðt À 3Þ p and this results that x 3 is nonlinear. In Eq (22), there is a product 0.6x 3 (t−4) and this results that x 4 is nonlinear. In Eq (23), there is a product 0.5x 4 (t−3) and this results that x 5 is nonlinear. In Eq (24), there is a product 0.4x 2 (t−1)x 3 (t−5) and this results that x 6 is nonlinear. In this example, we introduce into three kinds of direct nonlinear correlations, i.e. square correlation, square root correlation and the product of two one-order item. The time series(S2 Dataset) generated by Eqs (19)-(24) are shown in Fig 5. According to the drive-response relationships among the six time series variables, the responding original network of this nonlinear system is shown in Fig 6(A). In this Fig, we can see three kinds of nodes. The first kind of nodes is that the out-degree is zero, e.g. x 1 . The second kind of nodes is that the in-degree is zero and the third kind of nodes is that both the outdegree and the in-degree are not zero.
We apply the proposed method to this nonlinear system and the process is the same as that described in section 3.1. The resulted partial symbolic transfer entropy spectrum is shown in Fig 7. In the PSTES, if part of the red curve stands outside the other black curves, we consider the relationship between this pair of variables as a candidate strong relationship. From Fig 7, we get the candidate relationships, i.e. x1->x2, x2->x6, x3->x4, x3->x6, x4->x5, x5 ->x4, x1->x6, x2->x4, x4->x6. The variable on the right of the arrow is influenced by the left one. The number of identified candidate relationships is nine pairs and the correct relationships are the first six pairs. The candidate relationships are filtered by the method described in section 2.6.2. All the retained strong relationships are denoted as an adjacency matrix Eq (25)

Inferring Association Network with a Synthetic Method
This is a 0-1 adjacency matrix. We aim to get a weighted directed network, so we assign a weight to each edge following the method described in section 2.7. Then, we get the weighted adjacency matrix which is denoted as Eq (26): From this matrix, we get the association network which is shown in Fig 6(B). The inferred network has six edges and they are all contained in the original network which is shown in Fig  6(B). Therefore, we consider that the proposed method works well for nonlinear system.
We also assess the performance of the proposed method when it is applied to the nonlinear system. The indicators are still precision, sensitivity [44, 45] and PTL, described in section 3.1. The results measured on ten groups of data are shown in Table 5. From Table 5, we see that the average precision of our model is higher to 0.98, the average sensitivity achieves to 0.86 and the precision of time lags identification is 0.98.
In addition, we also discuss how the dimension of symbolic time series affects the performance of the proposed method applied in nonlinear system and the results are shown in Table 6. With dimension 2, the precision is 0.92 and the sensitivity is 0.84. With dimension 3, the precision is 0.98 and the sensitivity is 0.86. For the two different parameters, the proposed method works well, especially let the dimension of symbolic time series be 3.
We also discuss how the length of data affects the performance of the proposed method applied in nonlinear system and the results are shown in Table 7. It is found that the precision is always 1. The sensitivity is unstable, but it keeps a high level. Therefore, we can apply the proposed method in a small data set. At the end of this section, we make a comparison between the proposed method and three other common methods. The results are shown in Table 8. Each value is an average value of ten-times experiments. The precision of SSPSTES is highest, i.e. 0.98. The sensitivity of  SSPSTES is 0.87 and it is higher than two other methods, i.e. STE and PSTE. The sensitivity of GC is 0.98 and it is highest. But the precision of GC is lowest. Therefore, we conclude that SSPSTES is a good method for inferring association network from linear time series. The parameters and process of experiments are same as section 3.1.

Application
In this section, we apply the proposed method to a real data set, i.e. overseas departures from Australia(S3 Dataset). This data set was observed from January 1976 to February 2012. The data set has 5 time series and 434 observed point. The five time-vary features are permanent, reslong, vislong, resshort and visshort. They mean that permanent departures, long-term (more than one year) residents departing, long-term (more than one year) visitors departing, short-term (less than one year) residents departing and short-term (less than one year) visitors departing. The five time series are shown in Fig 8(A).
Based on the experiments from simulated numerical examples in section 3.1 and 3.2, we apply the proposed method to departures data set. The inferred association network is shown in Fig (B).
From Fig 8(B), we see the following pair-wise relationships. Feature vislong is influenced by reslong. They are all long-term departures. As the increase of long-term residents departing, more long-term visitors departing will happen. It is reasonable and because people look forward to go to a better place for studying, work or tour and so on. It is obvious that permanent departures will be influenced by long-term residents. In addition, feature resshort and feature visshort belong a same class. First, they are both short-term departing. Second, the relationships between them and feature vislong are both bidirectional. Of course, this conclusion is reasonable.

Conclusions
In order to infer a weighted directed association network from multivariate time series, we have proposed a method named small-shuffle partial symbolic transfer entropy spectrum (SSPSTES) which synthesizes Symbolic Transfer Entropy(STE) and Small-Shuffle Surrogate (SSS) method and a filter algorithm. We first proposed the framework of the method. It is composed of three layers, i.e. Data Layer, Model Layer and Network Layer. Then we described the seven main process of SSPSTES from section 2.2 to section 2.7. Next, we applied the proposed method to numerical simulated linear system and nonlinear system. We used three indicators, i.e. precision, sensitivity and PTL, to assess the proposed method. We discussed how the different dimension of symbolic time series and different length of the data affect the performance of the proposed method. We also made a comparison between SSPSTES and three other relevant methods. As a result, the proposed method makes a better performance both on linear system and nonlinear system than other methods. In general, the method can identify the strong correlation and also find out the time delay between pairwise time series. Finally, we applied the proposed method to a real multivariate time series data set, i.e. overseas departures from Australia. The inferred association network is reasonable.  Inferring Association Network with a Synthetic Method Although it is illustrated that the proposed method is good at inferring association network from multivariate time series, there are still some topics that are worth studying in future. First, in this paper, it is considered that the misidentification of relationships may bring with the serious consequences, thus we aim to the strong correlation identification and ignore the proportion of identified relationships among all relationships existing in the complex system. The sensitivity is unstable and sometimes may be a little low. Therefore, we will attempt to improve the sensitivity of SSPSTES. Second, the proposed method can be optimized to reduce the complexity. Third, we will apply the method to some lager systems and real complex systems, e.g. the gas pipe monitoring system and electric power monitoring system. All these topics are interesting and worth deeply studying. Nevertheless, the proposed method still can serve as a heuristic tool for inferring association network from multivariate time series so as studying the system deeply with complex network knowledge.