IPCA-CMI: An Algorithm for Inferring Gene Regulatory Networks based on a Combination of PCA-CMI and MIT Score

Inferring gene regulatory networks (GRNs) is a major issue in systems biology, which explicitly characterizes regulatory processes in the cell. The Path Consistency Algorithm based on Conditional Mutual Information (PCA-CMI) is a well-known method in this field. In this study, we introduce a new algorithm (IPCA-CMI) and apply it to a number of gene expression data sets in order to evaluate the accuracy of the algorithm to infer GRNs. The IPCA-CMI can be categorized as a hybrid method, using the PCA-CMI and Hill-Climbing algorithm (based on MIT score). The conditional dependence between variables is determined by the conditional mutual information test which can take into account both linear and nonlinear genes relations. IPCA-CMI uses a score and search method and defines a selected set of variables which is adjacent to one of or Y. This set is used to determine the dependency between X and Y. This method is compared with the method of evaluating dependency by PCA-CMI in which the set of variables adjacent to both X and Y, is selected. The merits of the IPCA-CMI are evaluated by applying this algorithm to the DREAM3 Challenge data sets with n variables and n samples () and to experimental data from Escherichia coil containing 9 variables and 9 samples. Results indicate that applying the IPCA-CMI improves the precision of learning the structure of the GRNs in comparison with that of the PCA-CMI.


Introduction
Bayesian networks (BNs) provide an efficient and effective representation of the joint probability distribution of a set of variables. The identification of the structure of a BN from the data is known to be an NP-hard problem [1]. There are many learning algorithms for automatically building a BN from a data set. These are generally classified into three classes, namely constraint-based methods [2][3][4][5], score and search methods [6][7][8][9][10][11] and hybrid methods [12][13][14].
Gene Regulatory Networks (GRNs) explain how cells control the expression of genes. GRN is a collection of DNA segments in a cell. These segments interact indirectly with each other and with other substances in the cell and thereby governing the rates at which genes in the network are transcribed into Messenger RNA. Modeling the causal interactions between genes is an important and difficult task, and indeed, there are many heuristic methods for inferring GRNs from gene expression data [15,16]. BN is one of the popular methods which have been successfully implemented in learning GRNs [17].
There is a great potential for improvement of current approaches for learning GRNs [18,19]. The purpose of this study is to introduce a new algorithm, ''Improved Path Consistency Algorithm based on Conditional Mutual Information (IPCA-CMI)''. The algorithm is applied to a number of gene expression data sets in order to evaluate the accuracy of it for inferring GRNs.
IPCA-CMI is a combination of the PCA-CMI [5] and the Hill Climbing(HC) algorithm (based on mutual information test (MIT)) [11].
Being based on conditional mutual information (CMI), IPCA-CMI can take into account both linear and nonlinear genes relations. This is an improvement over linear testing methods. IPCA-CMI applies the HC algorithm (based on MIT score) to define weight values for each variable X . Then, a selected set which contains variables with weight values more than a defined threshold, is created. The method of evaluating dependency between two adjacent variables X and Y is represented by CMI test given a subset of genes of the selected set. To evaluate the accuracy of IPCA-CMI, it was employed to a number of gene expression data sets. For this purpose, the Dialogue for Reverse Engineering Assessments and Methods (DREAM) program was first introduced as a new efficient computation methods that help researchers to infer reliable GRNs [18]. The data sets comprised DREAM3 Challenge with n variables and n samples (n~10,50,100) and Escherichia coil gene expression data containing 9 variables and 9 samples.

Preliminaries
Bayesian network. Bayesian networks (BNs) [20,21], also known as belief networks, belong to the family of probabilistic graphical models. Each vertex in the graph represents a random variable and the edges between the vertices represent probabilistic dependencies among the corresponding random variables. A directed edge, Y ?X , describes a parent and child relation in which X is the child and Y is the parent of X . Let ADJ(X ) denotes the set of variables in the graph which are adjacent to X . In addition, each vertex in graph has a conditional probability distribution specifying the probability of possible state of the variable given possible combination of states of its parents. These conditional dependencies in the graph are often estimated by using known statistical and computational methods. Hence, BNs combine principles from Graph Theory, Probability Theory, Computer Science and Statistics. BNs are represented as a directed acyclic graph (DAG) that is popular in Statistics and Machine Learning subjects. We typically denote random variables with capital letters and sets of random variables as bold capital letters. Following the above discussion, a more formal definition of a BN can be given. A Bayesian network, B, is an annotated directed acyclic graph that represents a joint probability distribution over a set of random variables X~fX 1 ,:::,X n g. The network is defined by a pair B~SG,hT, where G is the DAG with vertex set fX 1 ,X 2 ,:::,X n g and the direct dependencies between these variables is represented by directed edges. The graph G encodes independence assumptions, by which each variable X i is independent of its non descendants given its parents in G. Let Pa B (X ) denote the parent set of X . The second component h describes the set of conditional probability distributions. This set contains the parameter h xiDPaB(xi)~PB (x i DPa B (x i )), where x i denotes some value of the X i and Pa B (x i ) indicates some set of values for X i 's parents. If X i has no parent, then P(X i DPa B (X i )) is equal to P(X i ). By using these conditional distributions, the joint distribution over X can be obtained as follows: P(X 1 ,X 2 ,:::,X n )~P Definition 1. If P(X ,Y DZ)~P(X DZ)P(Y DZ), then two variables X and Y are conditionally independent given Z.
Definition 2. A path p from X to Y in G is said to be blocked by a set of variables Z if and only if: 1. p contains a chain X?K?Y or a fork X/K?Y such that K[Z, or 2. p contains a collider X?K/Y such that K and all the descendants of K are not in Z.
Definition 4. A v-structure in G is an ordered triplet (X ,Y ,K) such that G contains the directed edges X?Y and K?Y, so that X and K are not adjacent in G.
For the following discussion, suppose that the set of parents of X i is fX i1 ,:::,X isi g, where s i denotes the number of parents of X i (DPa(X i )D~s i ). The BN deals with: N Discrete variables i.e. the variable X i and its parents take discrete values from a finite set. Then, PfX i DX i1 ,:::,X isi g is represented by a table that specifies the probability of values for X i for each joint assignment to fX i1 ,:::,X isi g.
N Continuous variables i.e. the variable X i and its parents take real values. In this case, there is no way to represent all possible densities. A natural choice for multivariate continuous distributions is the use of Gaussian distributions [15].
N Hybrid networks i.e. the network contains a mixture of discrete and continuous variables.
Information Theory. Gene expression data are typically modeled as continuous variables. The following steps are applied to calculate mutual information (MI) and CMI for continuous variables. MI has been widely used to infer GRNs because it provides a natural generalization of association due to its capability of characterizing nonlinear dependency [22]. Furthermore, MI is able to deal with thousands of genes in the presence of a limited number of samples [23].
Entropy function is a suitable tool for measuring the average uncertainty of a variable X . Let X be a continuous random variable with probability density function f (x), the entropy for X is: The joint entropy for two continuous variables X and Y with joint density function f (x,y) is: The measure of MI indicates the dependency between two continuous variables X and Y , which is defined as: Variables X and Y are independent when MI has zero value. The measure of MI can also be determined in terms of entropy as follows: In the GRN the dependency of two genes needs to be determined. CMI is a suitable tool for detecting the joint conditional linear and nonlinear dependency between genes [5,24]. CMI between two variables X and Y , given the vector of variables Z is: where p is the dimension of vector Z and f (x,y,z) denotes the joint density function for variables and f (xDz) is the conditional density distribution of X given Z. CMI between X and Y given Z can also be expressed by: where H(X ,Y ,Z) denotes the joint entropy between X , Y and Z. Theorem 1 [25]: Let X~(X 1 ,:::,X n ) T be an n-dimensional Gaussian vector with mean m~(m 1 ,:::,m n ) T and covariance matrix C(X)~E(X{ m)(X{ m) T , i.e. X*N( m,C(X)). The entropy of X is: H(X)~log½(2p e) n=2 det(C(X)) 1=2 ~1 2 log½(2p e) n det(C(X)), ð7Þ where det(C(X)) indicates the determinant of C(X). With the widely adopted hypothesis of Gaussian distribution for gene expression data, the measure of MI according to Eqs. 4 and 7 for two continuous variables X and Y can be easily calculated using the following equivalent formula [5,26]: where s 2 X , s 2 Y and s XY indicate the variance of X , the variance of Y and the covariance between X and Y . Similarly, according to Eqs. 6 and 7, CMI for continuous variables X and Y given Z can be determined by [5]: in which C(X ,Y ,Z) denotes the covariance matrix of variables X , Y and Z. When X and Y are conditionally independent given Z, then CMI(X ,Y DZ)~0. In order to test whether a CMI is zero, Z{statistic is calculated in two steps [5,27,28]: In step 1, the MI and CMI, respectively, are normalized as follows:M In step 2, the Z{statistic of MI and CMI, respectively, are calculated by: In order to determine the statistical test of conditional independence, a confidence level a is fixed.
) then, the hypothesis of conditional independence of X and Y given Z is accepted (at the significance level a); otherwise the hypothesis is rejected. Here W( : ) denotes the cumulative distribution function of the standard normal distribution and DZD indicates the dimension of vector Z.

Score and Search Algorithms
Score and search algorithms can be completely described by specifying two components: a scoring function and a search procedure. The score and search algorithms try to identify a network with maximum score.
In this study, we apply the HC algorithm as a search procedure where MIT score is used as a scoring function. Gene expression data are typically continuous variables. The MIT score deals with discrete variables. Therefore, continuous variables have to be discretized. We do this based on the procedure proposed by [29][30][31][32].
Discretization methods. To draw inferences about a GRN based on the set of genes X~fX 1 ,:::,X n g, we start with a data set D~fX 1 (1),:::,X n (1)g,:::,fX 1 (N),:::,X n (N)g, where n indicates the number of genes and N is the number of measurements of these genes. An n by N matrix D is used to denote gene expression data. X s (t) indicates the expression value of gene s at time t. X s ( : ) denotes expression data of gene s at all time. The equal width discretization (EWD) and equal frequency discretization (EFD) methods are applied to discretize continuous gene expression data [30][31][32]. EWD method for s-th gene divides the line between min½X s ( : ) and max½X s ( : ) into k intervals of equal width. Thus the intervals of gene s have width, w~m ax½X s ( : ){min½X s ( : ) k , with cut points at min½X s ( : )zw,min½X s ( : )z2w,:::,min½X s ( : )z(k{1)w. In EWD, k is a positive integer and is a user predefined parameter.
EFD method for s-th gene divides the sorted X s ( : ) into m intervals so that each interval contains approximately the same number of expression values. Similarly, in EFD, m is a positive integer and is a user predefined parameter.
In this study, gene expression data sets related to DREAM3 Challenge lie in the interval [0, 1]. We applied EWD method to discretize DREAM3 data sets. For instance, for each gene, parameter k is considered to be equal to 10. EFD method is applied to discretize SOS repair data. Gene expression data sets related to SOS DNA repair network lie in the interval [20.2730, 26.6330] and the parameter m is considered to be equal to 9.

Scoring Function
There are many scoring functions to measure the degree of fitness of a DAG G to a data set. These are generally classified as Bayesian scoring functions [7,9,33] and information theory-based scores [11,[34][35][36][37]. The chosen score and search algorithm can be more efficient if the scoring function has the decomposability property.
Decomposability property: A scoring function g is decomposable if: where and N X i ,Pa B (X i ) denotes the number of instances in data set D that match with each possible configuration of fX i g S fPa B (X i )g. Another property, which is particularly interesting if the score and search algorithm searches in a space of equivalence classes of DAGs, is called the score equivalence.
Theorem 2 [38]. Two DAGs are equivalent if and only if they have the same skeletons and the same v-structures.
When two Bayesian networks are equivalent, they can represent the same set of probability distributions. The relation of network equivalence imposes a set of equivalence classes over Bayesian network structures [39].
Score equivalence: A scoring function g is score equivalence if the score assigns the same value to equivalent structures.
MIT Score. Mutual information test (MIT) belongs to the family of information theory-based scores which is defined as follows [11]: where N denotes the total number of measurements in D and where N ijk represent the number of measurements in the data set D for which X i~k and Pa B (X i )~j, where j denotes a joint configuration of all parent variables of X i . N ij denotes the number of measurements in D, in which Pa B (X i )~j. Similarly, M ik indicates the number of measurements in D which the variable X i~k and l i,s i (j) is defined by: where s i~½ s i (1),:::,s i (s i ) indicates any permutation of the index set (1,:::,s i ) of the variables in Pa B (X i )~fX i1 ,:::,X is i g. Finally, x a,l i,s i (j) is the value such that P(x 2 (l i,s i (j) )ƒx a,l i,s i (j) )~a (the Chisquared distribution at significance level 1{a with l i,s i (j) degrees of freedom).
The MIT score has decomposability property and dose not satisfy score equivalence, however, it satisfies less demanding property. This property of the MIT score concerns another type of space of equivalent of DAGs, namely restricted acyclic partially directed graphs (RPDAGs) [40]. RPDAGs are partially directed acyclic graphs (PDAGs) which represent sets of equivalent DAGs, although they are not a canonical representation of equivalence classes of DAGs (two different RPDAGs may correspond to the same equivalence class).
Theorem 3 [11]. The MIT score assigns the same value to all DAGs that are represented by the same RPDAG.
MIT score can be applied without any problem to search in both the DAG and the RPDAG spaces [11]. In different studies, the score equivalence could be concluded as a good or bad property. The score equivalence property is appropriate when the data are not applied to distinguish the equivalent structures. In searching and scoring scheme for learning structure of Bayesian networks, equivalent classes should be considered. This means when more than two graphs are equivalent, those graphs have the same dependency; therefore, two structures have identical scores. As an example, two variables A and B may have two different structures as A?B or A/B, however, as equivalent classes, these two structures end up having the same score for any given data. In order to detect causal relationships between genes, score equivalence property does not necessarily impair the search process, because equivalent structures represent different causal relationships. In this study, we are interested to the scoring functions which considered different scores for A?B or A/B. So, MIT score is applied in the HC algorithm to compute the score of DAGs. The non equivalence of the score function does not necessarily impair the search process to learn BNs. The MIT score is implemented within the Elvira system (a JAVA package for learning the structure of BN [11]). The Elvira package can be downloaded from http://leo.ugr.es/elvira/. The MIT score is available at =bayelvira2=elvira=learning=MITMetrics:java: In this study, we rewrite the MIT score program (Red.Pen) which, in comparison to the Elvira system, reduces running time and memory occupied by the algorithm. The source of the program and data sets are available at http://www.bioinf.cs.ipm.ir/ software/IPCA-CMI/.

Search Procedure
Given a scoring function g, the task in this step relates to search between possible networks to find G Ã such that: in which g(G : D) denotes the degree of fitness of candidate G to data set and F (n) indicates all the possible DAGs defined on X.
The challenging part of search procedure is that the size of the space of all structures, f (n), is super-exponential in the number of variables [41], So an exhaustive enumeration of all the structures is not possible. Instead, researchers have considered heuristic search strategies [9,42]. The Hill Climbing algorithm is particularly popular in this field.
The Hill Climbing Algorithm. The Hill Climbing (HC) algorithm is a mathematical optimization technique which belongs to the family of local search. The HC algorithm traverses the search space by starting from an initial DAG then, an iterative procedure is repeated. At each procedure, only local changes such as adding, deleting or reversing an edge are considered and the greatest improvement of g is chosen. The algorithm stops when there is no local change yielding an improvement in g.
Because of this greedy behavior the execution stops when the algorithm is trapped in a solution that is mostly local rather than global maximizer of g. Different methods are introduced to escape from local optima such as restarting the search process with different initial DAGs. It means that after a local optima is found the search is reinitialized with a random structure. This reinitialization is then repeated for a fixed number of iterations, and the best structure is selected [20]. The local search methods can be more efficient if the scoring function has the decomposability property [11]. By considering the decomposability property, by adding, deleting or reversing the edge between two variables, the score values of this variables are updated while the score values of other variables remain unchanged. In order to apply the HC algorithm based on scoring function with the decomposability property, the following differences are calculated to evaluate the improvement obtained by local change in a DAG [43]: Reversal of X j ?X i : First the edge from X j to X i is deleted then, a edge from X i to X j is added.

Method
In this section the details of PCA-CMI and IPCA-CMI are presented to show how the structure of GRN is learned from gene expression data sets.

PC Algorithm based on CMI test (PCA-CMI)
The PCA-CMI is applied to infer the GRNs [5]. The PCA-CMI is computationally feasible and often runs very fast on networks with many variables. This algorithm starts with a complete undirected graph over all variables. The following steps are applied to assign skeleton S i from S i{1 .
Step 2: Set i~iz1. Suppose X and Y are adjacent in S i{1 , then V XY is defined by: Suppose that, there are j number of genes in V XY (DV XY D~j). If iƒj, for each i-subset of V XY such as M~fm 1 ,:::,m i g, the i-order CMI(X ,Y DM) is computed according to Eq. 9. All the i-order CMIs between X and Y given all possible combination of i genes from j genes are computed and the maximum one was selected as CMI max (X ,Y DM). If CMI max (X ,Y DM)vE, the edge between X and Y is removed from S i{1 . So, V XY includes the separator set for X and Y. The algorithm is stopped when iwj. Let S i be the skeleton of the constructed graph in this step and return to step 2. The algorithm is stopped when S i{1~Si for the first i. It is notable that in each step of the PCA-CMI, X [X is selected from 1 to n and Y [ADJ(X ) is selected by the order of genes. Details of PCA-CMI are given in table 1.

The Improvement of PC Algorithm based on CMI test (IPCA-CMI)
The HC algorithm is the well-known approach to search between possible DAGs to determine the best fit of network based on defined scoring function. In addition, Zhang [5] have implemented the PCA-CMI for inferring GRNs from gene expression data, using CMI test in the process of dependency determination between genes. The skeleton of a GRN in each order of IPCA-CMI is determined by CMI test. Therefore, only the local changes related to reversed edges between genes are applied in the HC algorithm (step 3 of the algorithm) in order to direct the edges of the skeleton. When heuristic search algorithms are applied, we are not guaranteed to find a global optima structure. Different methods have been proposed to escape local optima.
In this study, during each iteration in the HC algorithm, a new solution is selected from the neighborhood of the current solution (random change including adding, deleting and reversing). If that new solution has a better quality MIT score, then the new solution becomes the current solution. The algorithm stops if no further improvement are possible. We have to start with some (50 randomly generated) solution and evaluate it based on MIT score. The HC algorithm can only provide locally optima that depends on the starting solution. We have to start the HC algorithm from a large variety of different solutions. The hope is that at least some of these initial locations have a path that leads to the global optima. We choose the initial solutions (50 DAGs) at random.
Details of the IPCA-CMI are presented in two parts. Part 1 is related to the zero order of the IPCA-CMI. In this order, same skeletons are obtained by PCA-CMI and IPCA-CMI, but the HC algorithm is utilized in IPCA-CMI in order to direct the edges of the skeleton. Details of the IPCA-CMI for order i (iw0) are presented in part 2.
Part 1: The details of IPCA-CMI for i~0. First, the IPCA-CMI generates complete graph according to the number of genes. Then, for each adjacent gene pair such as X and Y, the measure of MI is computed according to equation [8]. The measures of MI between X and Y are calculated to be compared with E. If MI(X ,Y )vE, the edge between X and Y is removed from complete graph. Finally, MIT score is applied in the HC algorithm in order to direct the edges of skeleton to obtain the directed acyclic graph G 0 . Details of zero order of IPCA-CMI are shown in table 2.
Part 2: The IPCA-CMI for iw0. Set i~0 and the following process is applied to assign directed acyclic graph G i from G i{1 : Step 1: Set i~iz1. Let Z be an adjacent of X in G i{1 . Then, A qz for 1ƒqƒ4 are defined as follows: The weight value for variable Z is determined by: where DA qZ D for 1ƒqƒ4 denotes the size of A qZ .
Step 2: Let R XY be defined by:  where k denotes the median of weights related to all adjacent variables of X or Y. It can be concluded that variables in set R XY are selected from fADJ(X )|ADJ(Y )g\fX ,Y g in which at least k number of paths started from X or Y are blocked by these variables. Therefore, by considering these variables many paths between X and Y are removed.
Step 3: Let X and Y be adjacent in G i{1 , we have done the following process: Suppose that, there are t genes in R XY (DR XY D~t). If iƒt, for each i-subset of R XY such as H~fh 1 ,:::,h i g, the i-order CMI(X ,Y DH) is computed according to equation [9]. All the iorder CMIs between X and Y given all possible combination of i genes from t genes are computed and the maximum result (CMI max (X ,Y DH)) is compared with E. If CMI max (X ,Y DH)vE, the edge between X and Y is removed from G i{1 . The algorithm is stopped when iwt. Let S i be the skeleton of the constructed graph in this step.
Step 4: MIT score is applied in the HC algorithm in order to direct the edges of S i to obtain the directed acyclic graph G i , return to step 1.
The algorithm is stopped when G i{1~Gi for the first i. Table 3 is related to the details of i order (i.0) of IPCA-CMI.
It is notable that in each step of the IPCA-CMI, X [X is selected from 1 to n and Y [ADJ(X ) is selected by the order of genes.
The rational behind Weight X (Z) is in definitions 2 and 3. Weight X (Z) indicates the number of paths started from X and blocked by Z.
In fact the main difference between the IPCA-CMI and the PCA-CMI is in choosing a selected set of variables which includes the separator set. IPCA-CMI uses the HC algorithm and define a selected set of variables which are adjacent to one of X or Y, with weight values more than a defined threshold.

Software
Software in the form of MATLAB and JAVA codes. The source of data sets and codes are available at http://www.bioinf.cs.ipm. ir/software/IPCA-CMI/.

Results
In order to validate our algorithm, the true positive (TP), false positive (FP), true negative (TN) and false negative (FN) values for proposed algorithms are computed. Where TP is the number of edges that are correctly identified, FP is the number of edges that are incorrectly identified, TN is the number of edges that are correctly unidentified and FN is the number of edges that are incorrectly unidentified. In addition, some famous measures such as the accuracy (ACC), false positive rate (FPR), false discovery rate (FDR), positive predictive value (PPV), F-score measure, Matthews correlation coefficient (MCC) and true positive rate (TPR) are considered to compare algorithms, more precisely. These measures are defined by: , MCC is a convenient quantity for comparing predicted and actual networks. MCC quantity for each algorithm indicates which method is more efficient in predicting networks. The algorithm which has higher values for measures TP, TN, ACC, PPV, F, MCC and TPR is more efficient for predicting the skeleton of networks.
The DREAM3 Challenge consists of 4 data sets that were produced from in-silico networks. The goal of the in-silico Challenge is the reverse engineering of gene networks from time series data. The gold standard for DREAM3 Challenge were determined from source networks of real species. In this study, we tested the performance of IPCA-CMI on the DREAM3 data sets with n variables and n samples (n~10,50,100) and to experimental data from Escherichia coil containing 9 variables and 9 samples. Data sets contain the expression values of genes, in which rows are genes and columns indicate the samples. In order to compare results of PCA-CMI and IPCA-CMI, in each algorithm, we used the same threshold for CMI tests previously applied by Zhang [5]. IPCA-CMI, is a combination of a constraint-based method named PCA-CMI with a score and search method named the HC algorithm. Since the HC algorithm includes the process of randomly selecting initial graphs, IPCA-CMI is supposed to run hundred times and then we take the average as the final result. It can be concluded that outcomes of Tables 1 to 5 are related to the average of results which obtained from IPCA-CMI in hundred times. Fig. 1(A) shows the structure of true network for DREAM3 which contains 10 genes and 10 edges. The result obtained by Zhang [5] illustrated in Fig. 1(B), and Fig. 1(C) is related to the result of IPCA-CMI. In Fig. 1(B), edges that are correctly found by PCA-CMI are shown in Black color and the edge that wrongly inferred by this algorithm (edge G2-G4) is shown in red color. The true edges, which found by IPCA-CMI, are indicated by Black color and edge G4-G9 is a false negative. Fig. 1(C) is related to the best result of IPCA-CMI in running it hundred times. Table 4 indicates the result of PCA-CMI and IPCA-CMI with zero-order CMI test for DREAM3 and SOS real gene expression data. In zero-order two algorithms returned the same results, since both algorithms contain the same procedure. Table 5 indicates the result of PCA-CMI and IPCA-CMI for DREAM3 data set in size of 10 genes with 10 edges. We set the threshold value 0.05 of MI and CMI tests for dependency determination. As shown by Table 5, TP, ACC, PPV, F, MCC and TPR under PCA-CMI are less than those of IPCA-CMI. So, it can be concluded that the IPCA-CMI is more suitable for structure learning.
Results of applying PCA-CMI and IPCA-CMI for DREAM3 Challenge with 50 genes and 77 edges are collected in Table 6. We chose 0.1 as the threshold value of MI and CMI tests to determine the dependency between genes. IPCA-CMI can detect the true network in 2 steps, FP value is reduced as a result of applying algorithm step by step. According to Table 6 the FP value is reduced from 21 to 11.78, as a result of using IPCA-CMI. The TP, ACC, PPV, F, MCC and TPR measures receive higher values by using IPCA-CMI for inferring GRNs which shows that the IPCA-CMI performs better than the PCA-CMI.
Results of DREAM3 with 100 variables and 166 edges are illustrated in Table 7. Threshold value 0.1 for MI and CMI tests is considered to determine the dependency between genes. As shown by Table 7 in the second-order network, the FP value is reduced from 25 to 15.16. The TP, ACC, PPV, F, MCC and TPR measures receive higher values by using IPCA-CMI for inferring about DREAM3 with 100 variables. Results of applying PCA-CMI and IPCA-CMI for the real data set with 9 genes and 24 edges are given in Table 8. We chose 0.01 as the threshold value. Table 8 indicates that ACC, F and MCC measures receive higher values by using IPCA-CMI for inferring about BNs which shows that the IPCA-CMI performs better than the PCA-CMI.
According to Tables 4 to 8, the number of FP is decreased, as a result of using IPCA-CMI. So it can be concluded that the IPCA-CMI is more suitable for learning the structure of GRNs. Tables (4 to 8) show that IPCA-CMI not only can reduce the number of FP but also it remarkably can find some true different edges in comparison with PCA-CMI. As shown by these Tables (4 to 8), some better results can be obtained by using IPCA-CMI. So, it can be concluded that IPCA-CMI performs better than the PCA-CMI for learning the structure of GRNs. Another comparison that can be made between these algorithms is a determination of the probability of selecting subgraph with k edges from graph G with m edges. These probabilities are calculated for two mentioned algorithms. The algorithm which receives smaller value of the probability is efficient for predicting the skeleton of GRN. Results of this comparison for networks which are obtained using DREAM3 and SOS real gene expression data are given in Table 9. As shown by Table 9, better results (e.g., smaller probability values) are obtained by using IPCA-CMI. Therefore, it can be concluded that the performance of IPCA-CMI is much better than that of PCA-CMI based on the better determination of the probability for selecting a subgraph in all data sets.

Discussion
In this study a new algorithm called IPCA-CMI for inferring GRNs from gene expression data was presented. Results of this study show that using IPCA-CMI improves the precision of the learning the structure of GRNs, considerably. Zhang [5] reported that the PCA-CMI performed better than linear programming method [44], multiple linear regression Lasso method [45], mutual information method [46] and PC-Algorithm based on partial correlation coefficient [27] for inferring networks from gene expression data such as DREAM3 Challenge and SOS DNA repair network. Therefore, it can be concluded that the results of IPCA-CMI will be more precise compared to the methods studied by Zhang [5].
Our algorithm starts with a complete undirected graph over all variables. IPCA-CMI constructs S i (the skeleton of order i) according to CMI test. Then perform the HC algorithm to direct the edges of S i . If X and Y are adjacent in S i , weight values are defined for variables in set Q XY~f ADJ(X )|ADJ(Y )g\fX ,Y g. Subsequently, variables with high weight values were selected as the members of the set R XY . The separator set being a subset of R XY , makes defining the set R XY in the algorithm very important. We adopted a method to select i number of genes from R XY . Suppose that, there are t number of genes in R XY (DR XY D~t). In order to construct the i-order (iƒt) network, all the i-order CMIs between X and Y given all possible combination of i genes from t genes are calculated and the maximum result compared with E threshold to decide whether to keep the edge between X and Y or to remove it.
The PC algorithm starts with a complete undirected graph over all variables. In order to construct S i , the Chi-square test is applied to determine dependency between variables. The separator set for adjacent genes X and Y in S i are selected from Q XY . The PC algorithm is fast to learn networks with many variables. The drawback of the PC algorithm is the requirement for large sample sizes to perform high order conditional independence (CI). The number of records in a microarray data set is rarely sufficient to perform reliable high-order CI tests. Using IPCA-CMI statistical error in the process of learning the skeleton of GRNs is reduced. This is a result of the reduction of the size of the set which includes the separator set.
On the other hand in PCA-CMI, only genes connected with both X and Y are considered for dependency determination. It means that the separator set for two adjacent genes X and Y are selected from V XY~f ADJ(X ) T A DJ(Y )g. So, small set of variables are considered for dependency determination. It can be concluded that some of the variables which play an important role in dependency determination are not considered in separator set. The achieved improvement of our algorithm in comparison with PCA-CMI is related to the consideration of important adjacent genes of one of X or Y. This method leads us to determine the separator set for X and Y more precisely.
For the aforementioned problem for PC and PCA-CMI, in this study we applied an iterative strategy to select R XY which includes separator set for adjacent genes X and Y. It can be concluded that DV XY DƒDR XY DƒDQ XY D. It means that, we chose the set of variables, among which to pick the separator set, in a somehow intermediate way between the standard PC algorithm and the method of Zhang et al. (2012). Therefore, the set of variables, among which we pick the separator set, is bigger than those considered by Zhang et al. (2012). The MIT scoring function is decomposable and is not score equivalent. However, it satisfies a restricted form of score equivalence which allows us to use it to search not only in the DAG space but also in the RPDAG space. In the future work we would like to investigate whether MIT score is more appropriate for gene expression data than other scores. It has been previously shown that the score equivalence is not an important feature to learn Bayesian networks by searching in the DAG space. This confirms the previous results stated by [11,47]. Gene expression data are typically modeled as continuous random variables. The MIT score can be applied in analyzing continuous random variables, but only after the data has been discretized. In the future work we would like to apply a more suitable method to discretize gene expression data [29].