Ensemble Inference and Inferability of Gene Regulatory Networks

The inference of gene regulatory network (GRN) from gene expression data is an unsolved problem of great importance. This inference has been stated, though not proven, to be underdetermined implying that there could be many equivalent (indistinguishable) solutions. Motivated by this fundamental limitation, we have developed new framework and algorithm, called TRaCE, for the ensemble inference of GRNs. The ensemble corresponds to the inherent uncertainty associated with discriminating direct and indirect gene regulations from steady-state data of gene knock-out (KO) experiments. We applied TRaCE to analyze the inferability of random GRNs and the GRNs of E. coli and yeast from single- and double-gene KO experiments. The results showed that, with the exception of networks with very few edges, GRNs are typically not inferable even when the data are ideal (unbiased and noise-free). Finally, we compared the performance of TRaCE with top performing methods of DREAM4 in silico network inference challenge.


Introduction
The discovery and analysis of biological networks have important applications, from finding treatment of diseases to engineering of microbes for the production of drugs and biofuels [1][2][3][4].With continued advances in high throughput and omics technology, the inference of biological networks from omics data has received a great deal of interest.In particular, the inference of gene regulatory networks from gene expression data constitutes a major research topic in systems biology.In the last decade, the number of methodologies that are dedicated for the GRN inference has increased tremendously [5][6][7].
The Dialogue for Reverse Engineering Assessments and Methods (DREAM) project is a community-wide effort initiated to fulfill the need for a rigorous and fair comparison of the strengths and weaknesses of methods for the reverse engineering of biological networks from data.To this end, challenges involving the inference of cellular networks are organized on a yearly basis (http://www.the-dream-project.org/challenges).Specifically, the inference of GRN has become a major focus of several DREAM challenges.The outcomes of these challenges indicate that the state-of-the-art algorithms for GRN inference are unable to provide accurate and reliable network predictions, even when large expression datasets are available and the number of genes is small (10-100 genes) [8][9][10].Nonetheless, a crowd-sourcing strategy that combines the predictions of different inference methods has been shown to be more reliable than any individual method [10].
Whether or not a direct regulation of one gene by another can be correctly inferred depends not only on the ability of an inference method to extract the relevant information from data, but also on the availability of such information in the data.In general, the information content of data is determined first and foremost by the conditions of the experiments.If the required information is unavailable or incompletely available, then the inference problem is underdetermined.In such a case, the network is not inferable from the data regardless of the method used.
The underdetermined nature is not exclusive to the inference of GRNs.Much of the difficulty in the inverse modeling of signaling and metabolic networks can also be attributed to the lack of inferability or identifiability of model structure and parameters [11][12][13][14].As the inference problem is underdetermined, there exist multiple solutions which are indistinguishable.The lack of model identifiability has motivated a paradigm shift toward ensemble modeling [15][16][17][18].While such a strategy has begun to gain traction in the modeling of signaling and metabolic networks, the ensemble paradigm has not been widely used in the inference of GRNs.Also, since the network representation and data for GRNs differ markedly from those for signaling and metabolic networks, existing algorithms for ensemble modeling cannot be directly applied for the inference of GRN.
In this work, we introduce new framework and algorithms, called Transitive Reduction and Closure Ensemble (TRaCE), for the ensemble inference of GRNs.Specifically, TRaCE produces the lower and upper bounds of the ensemble, i.e. the smallest network and the largest network that limit the complexity of networks in the ensemble.As the size of the ensemble reflects the uncertainty about the GRN inference, the bounds can also be used to analyze the inferability of GRNs.In this study, we have used TRaCE in two applications.First, we investigated the inferability of random GRNs and the GRNs of E. coli and S. cereviseae given steady-state gene expression data of single-and double-gene KO experiments.Then, we applied TRaCE to simulated gene expression data, generated in the same manner as the DREAM 4 in silico network inference challenge, and compared the performance of TRaCE with existing methods.

Theoretical Foundation
Definitions.Here, we provide a short synopsis of basic concepts in graph theory that are necessary for the development of our algorithm.A graph G is an ordered pair (V (G), E(G)), where V (G) is the set of vertices (or nodes) and E(G) is the set of edges.The number of vertices n and the number of edges m are called the order and the size of the graph, respectively.An edge e is defined by the pair (i,j)[E(G) 5 V (G) | V (G), describing the existence of a relationship between two vertices i and j.In this case, the edge e~i,j ð Þ is said to be incident to the vertices i and j.The set of edges of a graph G that are incident to a vertex i is denoted by E G (i), while the cardinality of E G (i) is called the degree of the vertex i.Similarly, the set of edges that are incident to a set of vertices Z5V (G) is denoted by (B).In this case, B is called the supergraph of A, and is also said to contain A. Furthermore, the union of two graphs A and B is denoted by A|B, where The intersection of two graphs A and B is denoted by A\B, where Finally, the difference between two graphs A and B with V (A)~V (B) is denoted by A{B and defined as the set of edges in the graph A that do not belong to the graph B, i.e. edges in the set difference E(A){E(B).
A directed edge is an ordered pair i,j ð Þ, representing an edge from the vertex i pointing to the vertex j.A directed graph or digraph G is a graph in which all of its edges are directed.A directed path is a sequence of vertices such that there exists a directed edge from one vertex to the next vertex in the graph.The first vertex in a directed path is called the start vertex, and the last is called the end vertex.A directed cycle is a directed path where the start and the end vertices are the same.A directed acyclic graph (DAG) is a digraph which does not contain any cycle.The adjacency matrix of a digraph G of order n, denoted by Adj(G), is an n | n matrix with Adj i,j ~1 when (i,j)[E(G), and Adj i,j ~0 otherwise.In other words, the non-zero elements of the adjacency matrix represent all directed edges from any node i to another node j in the graph G.Meanwhile, the accessibility matrix of G, denoted by Acc(G), is an n | n matrix with Acc i,j ~1 when there exists a directed path from node i to node j, and Acc i,j ~0 otherwise.When Acc i,j ~1, vertex j is said to be accessible from the vertex i.
A strongly connected component or strong component of a digraph G is a maximal subset of nodes in G where any two nodes in the subset are mutually accessible.Every pair of nodes that are part of a directed cycle belong to the same strong component, while any node that is not part of a cycle is a strong component of its own.The condensation of a digraph G is the DAG of the strong components of G, which is generated by lumping the nodes belonging to a cycle into a single node and replicating the edges that are incident to any of these nodes onto the lumped node [19].
A digraph is transitive if for every pair of vertices i and j, there exists an edge (i,j) when there is a directed path from i to j.The transitive closure of a digraph G, denoted by G G, is the smallest transitive supergraph of G (with the fewest edges) [20].When G is a DAG, we denote the transitive closure of G as G T .As shown in Fig. 1(a)-(b), the transitive closure of a digraph can be generated by adding a directed edge (i,j), whenever a directed path exists from vertex i to vertex j.Note that the accessibility matrix of a digraph is the adjacency matrix of its transitive closure, i.e.

Acc G
ð Þ~Adj G G À Á . For a digraph G, the set of digraphs that have the same transitive closure The transitive reduction of G, denoted by G t , is defined as the smallest member of S(G) in size (i.e. the graph with the fewest edges).The transitive reduction of a DAG is unique, given by [20].An algorithm for obtaining transitive reduction has been previously developed [19], in which any directed edge (i,j) is pruned whenever there exists a directed path from i to j that does not include (i,j) (for example, see Fig. 1(c)).Note that the transitive reduction of a digraph with cycles is not unique.
Inference of Network Ensemble Bounds.In this work, we consider the inference of GRNs as digraphs, where the nodes correspond to the genes and the directed edges represent the gene regulatory interactions.An edge i?j implies that the expression of gene i influences the expression of gene j.In the following, the GRN of interest is denoted by the digraph G 6 0 .For any set of genes V KO 5V (G 6 0 ), we use G VKO to denote a subgraph of G 6 0 that results from removing all edges incident to the genes in the set For example, G fBg associated with G 6 0 in Fig. 1(a) is the graph with all edges incident to gene B removed, as shown in Fig. 1(d).Here, we interchangeably use the notations for a graph G and its adjacency matrix Adj(G).
Gene KO experiments are commonly performed for the purpose of GRN inference.In these experiments, the resulting data typically consist of gene expression profiles taken after the effects of the gene perturbation have reached steady state.While temporal gene expressions are increasingly measured, here we focus on using more commonly available steady-state expression data.The treatments of time-series measurements and observational data are left to future publications.Many network inference algorithms have been developed for using data of gene KOs [7,9], and most of these algorithms produce a single network prediction.In contrast, an ensemble inference strategy is adopted in this work.
In order to illustrate the limitation of using steady-state gene expression data for GRN inference, we consider a GRN G 6 0 described by the graph in Fig. 1(a).Here, KO of gene A is expected to cause changes in the expression of genes B, C, D and E at steady state, even though A directly regulates only B and D. This simple illustration demonstrates that we cannot in principle discriminate direct and indirect gene regulations from steady-state gene KO expression data [19].In general, genes that are differentially expressed upon knocking out gene i in the GRN correspond to those that are directly and indirectly regulated by gene i, i.e. vertices in G 6 0 that are accessible from the vertex i. Motivated by such a limitation, in TRaCE we first convert gene KO data into gene accessibility lists or matrices.As the minimum input, TRaCE requires the complete dataset of single-gene KO experiments, from which one can construct the accessibility matrix G G 6 0 .More specifically, the j-th element in the i-th row of G G 6 0 (i.e. ( G G 6 0 ) i,j ) is set to 1 when gene j is differentially expressed in the KO experiment of gene i.The other elements of G G 6 0 are set to 0. The detailed procedure of differential expression analysis adopted in this work is described in the Numerical Implementation section.
For data of multi-gene KO experiments, we consider the accessibility matrix of G VKO for an appropriately chosen set of genes V KO .In principle, we can determine G G VKO from the complete set of experiments involving KOs of the genes in the set V KO and an additional gene i for all i[fV (G 6 0 ){V KO g.These experiments are equivalent to performing single-gene KOs of the GRN G VKO , and therefore G G VKO can be obtained by following the same procedure as that for G G 6 0 above.As an illustration, consider the GRN in Fig. 1(a) with V KO ~fBg.The graph G fBg is given in Fig. 1(d).In this case, we can construct the accessibility matrix G G fBg from the data of two-gene KO experiments, namely fA,Bg, fC,Bg, fD,Bg and fE,Bg KOs.As these experiments differ from each other in only one gene while sharing the KO of gene B, the differential expression analysis of the data thus correspond to changes in the expression of the GRN G fBg caused by a singlegene KO.Consequently, in this analysis, genes that are found to be differentially expressed in the KO of fi,Bg are those that are accessible from gene i (i[fA,C,D,Eg) in the graph G fBg .For example, the KO of fA,Bg is expected to cause differential expression in genes D and E. The full accessibility matrix of G fBg is illustrated by the digraph in Fig. 1(e).
We can generalize the simple example above to any set of genes V KO that could be derived from the available multi-gene KO experiments.More specifically, we set ( G G VKO ) i,j to 1 when knocking-out fi,V KO g leads to a differential expression of gene j with respect to its expression level in G VKO .The remaining elements of G G VKO are set to 0. Unfortunately, the construction of G G VKO of a large GRN G 6 0 would proportionally require a high number of KO experiments (the number of KO experiments is n exp ~n{n VKO , where n and n VKO is the number of genes in G 6 0 and V KO , respectively).However, when G 6 0 is sparse, G G VKO differs from G G 6 0 for only a few elements and importantly, these elements can be determined from G G 6 0 (see the next section).In the theoretical development below, we assume that the accessibility matrices G G 6 0 and G G V k KO for k~1,2, . . .,K, have already been obtained from the expression data.Here, K denotes the total number of accessibility matrices involving subgraphs of the GRN G 6 0 that can be constructed from data.For example, from the dataset of the complete double-gene KO experiments, we can obtain the accessibility matrix G G V k KO ~ G G fkg for k~1,2, . . .,n (here, K~n).In TRaCE, we consider the ensemble containing all digraphs that are consistent with the accessibility matrices G G 6 0 and G G V k KO 's, which is the set: where Note that the GRN G 6 0 is a member of the ensemble A. The size of the ensemble is a direct measure of uncertainty in the network inference problem.A GRN is therefore deemed inferable when the ensemble only contains a single (unique) network.
As the number of digraphs in the ensemble is often very large, in TRaCE we generate only the lower and upper bounds of the ensemble, denoted by G L and G U , respectively.The bounds are defined such that each digraph in the ensemble is a supergraph of G L and a subgraph of G U .For GRNs without any cycle (i.e. and their transitive reductions (i.e.G t 6 0 and G t V k KO 's), using the following equations (for details see the Numerical Implementation section): where , the upper bound of the ensemble is simply given by the accessibility matrix G T 6 0 and the lower bound is the transitive reduction is also a subgraph of G T 6 0 .In Eq. ( 3), the upper bound is constructed starting from G T 6 0 in which edges are removed based on G T

V k KO
. Here, edges incident to V k KO are not altered during the intersection of the accessibility matrix G T V k KO . For example, consider again the GRN in Fig. 1(a) with the accessibility matrices G T 6 0 and G T fBg in Figs.1(b) and 1(e).The resulting upper bound G U from the combination of these accessibility matrices will have one fewer edge than G T 6 0 , which is the edge (A,C) (see Fig. 1(f)).Thus, the size of the upper bound is generally reduced with the incorporation of G T V k KO 's.On the other hand, according to Eq. (2), the lower bound becomes larger with the inclusion of every available G t V k KO .In the same example above, the transitive reduction of G T fBg happens to be the graph G fBg (i.e. in this case G t fBg ~GfBg ).Here, the combination of G t 6 0 and G t fBg in Figs.1(c) and 1(d), respectively, gives the lower bound G L that is equal to G 6 0 .However, in general, G L is a subgraph of G 6 0 .
Theorem 1 establishes G L and G U in Eqs. ( 2)-(3) as valid lower and upper bounds of the set A for DAGs.
Theorem 1: For G L and G U described in Eqs. ( 2)-( 3), the following relationship applies: implies that either e[E G t . Therefore, we have either: 0 ).In addition, this edge satisfies either: Therefore, e[E G U ð Þ. % Remark: Since G 6 0 is a member of A, G L and G U can also be thought as the lower and upper bounds of G 6 0 , i.e.G L (G 6 0 (G U .For DAGs, the members of the set A can be obtained by combinatorially adding edges in the set (G U {G L ) to G L .Therefore, the dimension of A is equal to 2 N where N is the difference between the number of edges in G U and G L .Finally, Theorem 1 guarantees that when G L ~GU , G 6 0 is fully identifiable, i.e.G L ~GU ~G6 0 .
For digraphs with cycles, the upper bound can still be constructed using Eq. 3 with G G replacing G T .In this more general case, the relationship G 6 0 5G U in Theorem 1 is still valid.However, as mentioned earlier, the transitive reduction of digraphs with cycles is not unique.In a previous publication [19], Wagner proposed a procedure in which digraphs are first condensed into DAGs before constructing the transitive reduction [19].Similarly, in TRaCE, each input accessibility matrix is first condensed and the transitive reduction algorithm is subsequently applied to the DAG of the strong components.Here, edges incident to the condensations of cycles are also removed.Afterwards, the transitive reduction graph is expanded, reversing the condensation step.Except for cycles involving two nodes, edges of any directed cycle cannot be uniquely prescribed and are therefore pruned.The above procedure for reducing digraphs with cycles is referred to as Condensation, Transitive Reduction and Expansion (ConTREx).The ConTREx of an accessibility matrix G G, denoted by G, may no longer be a valid transitive reduction (i.e.G T may not necessarily be equal to G G). Nonetheless, the lower bound constructed using Eq. ( 2) with G's replacing G t 's, satisfies G L 5G 6 0 .The proof of this relationship is analogous to the one presented for Theorem 1.However, G L may not be a member of A (see Fig. S1).Finally, the enumeration of digraphs with cycles from G L and G U is more complicated than that for DAGs.The main difference is in the generation of all possible cycles among nodes belonging to a particular strong component, constrained by G L and G U (see an example in Fig. S2).
Error Correction and Filter.In practice, the accessibility matrices constructed from data contain errors.Some elements of the accessibility matrices maybe identified as 1 when they should be 0 (i.e.false positive, FP), and some maybe identified as 0 when they should be 1 (i.e.false negative, FN).These errors can affect the lower and upper bound constructed by Eqs. ( 2)-(3).We denote the erroneous lower and upper bound by G G L and G G U , respectively.In this case, neither G G L is guaranteed to be a subgraph of G 6 0 , nor G G U a supergraph of G 6 0 and G G L .There are several types of errors affecting G G L and G G U .In the first case (Type A error), an edge which is not present in As such error affects both G G L and G G U in the same manner, this error is not detectable from G G L and G G U .The second case (Type B error) involves either a FP in the accessibility matrix or a FN in the ConTREx matrix.In this case, the resulting bounds are still consistent with each other and are still valid for G 6 0 .However, the ensemble size and the network uncertainty increase due to this error.In the third case (Type C error), an edge erroneously appears only in G G L , or vice versa, an edge is errorneously missing only from ), where c denote the complement of a set).Here, the bounds become inconsistent with each other (i.e., G G L 6 ( G G U ). Thus, we refer such errors as inconsistent edges, which can be identified by searching for edges belonging to G ). Table 1 illustrates the three types of errors mentioned above.
A closer scrutiny of Eqs. ( 2)-( 3) reveals that errors from the input accessibility matrices are passed on and compounded in the bounds.For example, FN errors in KO , such that there exists a directed path from i to j involving one or more genes in V k KO .As directed paths involving genes in V k KO are disconnected in G VKO , we can potentially verify the existence of the testable edges from For example, the existence of the edge (A,D) in Fig. 1(a) can be verified using the transitive reduction of G G fBg , which is the graph shown in Fig. 1(d).Meanwhile, we can establish the absence of the edge (A,C) in G 6 0 using the accessibility matrix G G fBg (see Fig. 1(e)).The number of testable edges can also be used to estimate the information content of a G G VKO , where a higher number of testable edges indicates more informative G G VKO .For a given V KO , it is straightforward to show that the testable edges are the non-zero entries of the testability matrix: where ( G G 6 0 ) i and ( G G 6 0 ) i are the i-th column and i-th row of G G 6 0 , respectively, and 6 denotes the outer product.During the construction of the lower and upper bound GRNs, the incorporation of each G G V k KO will only need to be performed for the associated testable edges, i.

Numerical Implementation
The pseudo-codes and the MATLAB implementations of TRaCE are provided in the supporting material and the following website (http://www.cabsel.ethz.ch/tools/trace).Given steadystate gene expression data, we first group the data into datasets according to the KO experiments required for the construction of the accessibility matrices.We perform differential expression analysis for each dataset using Z-score transformation and obtain the corresponding accessibility matrix.We provide two implementations of TRaCE, one with and another without error correction.TRaCE without error correction should only be applied when the input accessibility matrices are error free (e.g. for inferability analysis).In any other scenario, TRaCE with error correction should be used.If desired, a ranked list of gene regulatory predictions can also be generated using the lower and upper bounds of the ensemble and the differential expression analysis.

Constructing Accessibility Matrices from Expression
Data.In the case studies, we have employed the Z-score transformation for differential gene expression analysis [21].Without loss of generality, we describe below the procedure for constructing G G VKO from the complete set of single-gene KOs of G VKO , i.e. all possible combinations of fi, V KO g genes KOs.The gene expression dataset is organized into a matrix in which the rows correspond to the experiments and the columns correspond to the genes.Technical replicates are arranged into separate data matrices.For microarray data, the gene expression is typically represented by log-10 transformed fluorescence intensity data.The following procedure is also illustrated in Fig. 2.
1. We first obtain the sample mean m j and standard deviation s j of the expression of each gene j in the dataset.More specifically, for each technical replicate, we calculate the sample mean and standard deviation of the j-th column in the data matrix.Then, we identify expressions that differ from the mean by more than a specified multiple z cutoff of the standard deviation.We subsequently recompute the sample mean and standard deviation m j and s j by excluding the data beyond z cutoff .When available, we also use the expression data from the KO experiment of genes V KO in calculating m j and s j .
2. For each replicate, we compute a z-score matrix z i,j for i,j[V (G 6 0 ) according to [21] z i,j ~gi,j {m j s j , Vi,j where g i,j is the expression level of gene j associated with knocking out gene i and genes in V KO .These z-scores reflect the significance of changes in the gene expression with respect to the GRN G V KO .
3. Subsequently, we average the z-score matrices over the technical replicates, producing the overall z-score matrix Z GV KO .
4. We determine the accessibility matrix G G VKO from Z GV KO using a threshold, as follows: In our experience, z cutoff ~3 and z threshold ~2 provide reliable network ensembles.In general, choosing higher z cutoff and z threshold will lead to fewer FPs but more FNs in the accessibility matrix.For the GRN examples considered in this work, the performance of TRaCE does not vary considerably within the selected ranges of z threshold between 1.5 and 2.5 and z cutoff between 2 and 3 (see Results).G G VKO from expression data.The data come from KOs of genes in the set V KO and an additional gene i, Vi[V(G 6 0 ) { V KO .For each replicate, the expression data are arranged into a matrix where the rows correspond to the experiments and the columns correspond to the genes.(1) The sample mean and standard deviation of the expression of gene j, denoted by m j and s j , respectively, are obtained using the expression data in the j-th column of the data matrix.(2) For each replicate, a z-score matrix is computed according to Eq. ( 5). ( 3) Subsequently, the z-score matrices are averaged over the technical replicates to give Z GV KO .(4) The accessibility matrix G G VKO is determined from Z GV KO based on a threshold criterion in Eq. ( 6).doi:10.1371/journal.pone.0103812.g002 TRaCE without error correction.TRaCE without error correction is implemented as matrix-operations of Eqs. ( 2) and (3).Briefly, the upper bound G U is constructed by performing Hadamard (element wise) multiplications of the accessibility matrices, excluding the rows and columns corresponding to genes in V k KO .On the other hand, the transitive reduction is based on the algorithm by Wagner [19], which has been re-implemented using matrix operations.When there is no cycle in G G 6 0 and G G V k KO 's, the transitive reduction algorithm is applied to each accessibility matrix and the construction of G L is done by binary additions of the transitive reductions, following Eq.( 2).Cycles and genes involved in cycles can be detected from entries of Acc(G 6 0 ) | Acc(G 6 0 ) T [22].For GRNs with cycles, the Con-TREx procedure is applied to each available accessibility matrix, and the resulting G matrices are again combined using binary .As testable edges are determined from G G 6 0 , a preprocessing step is performed to reduce errors in G G 6 0 .The premise behind the pre-processing step is that an error unlikely affects the same edge, and that testable edges of any G G V k KO constitute only a small subset of edges in G G 6 0 (i.e. the network is sparse).Following this premise, edges that appear in a majority of the accessibility matrices (above a certain threshold) are kept, but are otherwise removed.In our experience, a threshold of 65% gives a good and reliable performance, but any value between 50% to 80% works quite well in the case studies (see Results).A more detailed description of the pre-processing method can be found in text S1, while the filtering algorithm is provided in text S2.
The schematic diagram of TRaCE with error correction is given in Fig. 3(b).We consider two sets of accessibility matrices; the first set comes from differential expression analysis (based on Z G 's) and the second set comes from the transitive closure of the first set.We create the second set of matrices since the accessibility matrices identified from differential expressions may not satisfy the transitivity condition due to errors.The pre-processing step above is applied to both sets of matrices.Subsequently, two candidate upper bounds are generated using TRaCE with filtering.The upper bound obtained from the first set of accessibility matrices, denoted by G G U 1 , is expectedly smaller (in size) than the bound from the second set, denoted by G G U 2 .Note that ConTREx is only applicable to transitive digraphs, and therefore is applied only to the transitive closures (i.e. the second set).Using TRaCE with filtering, a candidate lower bound, denoted by G G L 1 , is generated from the results of ConTREx.
The last step in the procedure is to correct inconsistent edges, which is done by voting.For each inconsistent edge, we compared the number of times that the edge is present in the accessibility matrices and the ConTREx results (supporting the presence of the edge), with the number of times that the edge is absent from the accessibility and ConTREx matrices (supporting the absence of the edge).The upper bound is corrected (by addition of this edge) when the presence of the edge receives a (simple) majority vote.Vice versa, the lower bound is corrected (by removal of this edge) when the absence of the edge receives a majority vote.In the case of no majority vote, the edge is added to the upper bound and removed from the lower bound.The detail of the consistency check is described in text S3.As shown in Fig. 3(b), the consistency check and correction are first performed for the pair G G U 1 and G G L 1 , and subsequently the corrected lower bound, denoted by G G L 2 , is compared with G G U 2 to obtain the final corrected G G L and G G U .Ranking of Edges from Ensemble Bounds.If desired, a ranked list of edges can be generated using the lower and upper bounds of TRaCE in conjunction with the average z-scores for G 6 0 , i.e.Z G 6 0 .Here, we carry out the ranking of regulatory edges in two phases.In the first phase, we rank subsets of edges according to the lower and upper bounds in the following order: edges in and finally edges in S 5 ~( G G U ) c .In the second phase, we rank the edges within individual subsets according to the average z-scores.We implement the second phase by first computing the overall scores z z i,j according to Following the submission requirement of DREAM 4 network inference challenge, we then assign a confidence score x i,j to the edge (i,j) according to: A score x i,j of 1 reflects the highest confidence of the existence of an edge (i,j), and vice versa a zero confidence score indicates certainty in the inexistence of (i,j).Finally, the ranked list of edges is generated by sorting the edges in decreasing order of confidence scores.A similar procedure, called down-ranking, has been presented in Pinna et al. [23], where feed-forward edges are ranked lower than edges in the transitive reduction of the accessibility matrix.However, the down-ranking algorithm is described only for data of single-gene KO experiments.

Inferability Analysis
We first applied TRaCE to error-free accessibility matrices of G 6 0 and G V k KO by assuming ideal data (unbiased and error free) for the purpose of inferability analysis.Such an analysis is analogous to a priori identifiability analysis in the kinetic modeling of biological networks [12].Here, we evaluated the network distances between the lower and upper bounds and the GRN, i.e. the numbers of edges in the set G 6 0 {G L and G U {G 6 0 , respectively.
Random GRNs.We investigated the inferability of random GRNs of orders n~10 and n~100 genes.We set the network size  (i.e.number of edges) between 0 and 3n randomly with equal probability, and assigned the edges without any preference.The upper size limit of 3n was chosen based on the ratio between the number of edges and the number of nodes in E. coli and yeast GRNs [24].For each random network, we generated nz1 accessibility matrices associated with G 6 0 and G fig for every i[V (G 6 0 ): These accessibility matrices correspond to performing the full set of single-and double-gene KO experiments.
We applied TRaCE without error correction to construct the ensemble lower and upper bounds for each random network using the aforementioned accessibility matrices.The mean network distances of the bounds from G 6 0 are shown in Figs.4(a) and (b) as a function of network size.Here, we plotted the network distances of the lower bound using negative numbers and those of the upper bound using positive numbers.By doing so, we could illustrate the distance between the lower and upper bounds in the same plot.In particular, the number of edges in the set G U {G L is equal to the distance between the two network distance curves in Fig. 4.Not surprisingly, the network distance increased with the size of the networks, i.e. larger networks are more difficult to infer than smaller networks.The difference between the lower and upper bounds also broadened with network size, indicating higher network uncertainty in the inference of larger GRNs.For networks containing fewer edges than nodes, the GRN G 6 0 could generally be recovered from G G 6 0 and G G fig 's.Nevertheless, Fig. 4 demonstrated that the GRNs were typically (64% for 10 gene networks and 76% for 100 gene networks) not inferable, since the lower and upper bounds did not converge.
Fig. 5 shows two examples of GRN inference of order n~10 genes.In the first case (case I, m~8 edges), G 6 0 could be recovered from G G 6 0 and as few as 3 G G fig 's, while in the second case (case II, m~13 edges), the inference problem was underdetermined.Moreover, the results suggested that G G fig 's were not equally informative, as the reduction in the distance between the lower and upper bounds by incorporating an additional G G fig was not uniform.
Random scale-free GRNs.Many cellular networks have been shown to be scale-free with a power-law degree distribution [25], where the majority of the nodes have low degrees (1 to 2) and a few nodes (called hubs) are of high degrees.We also tested the Table 2. Ensemble inference of E. coli subnetworks (n~100 genes).performance of TRaCE using random scale-free networks.Here, we constructed two sets of 5000 scale-free GRNs with order n~10 and n~100 genes using the Baraba ´si-Albert model [26].Briefly, the GRNs were grown from a random seed network of small size (with 3 vertices) by sequentially adding nodes to the network.For each node addition, between 1 and 5 new edges were inserted to the network connecting the new node with existing ones, in a manner such that the degree distribution decayed exponentially.Again, for the purpose of inferabiliity analysis, we generated nz1 error-free accessibility matrices G G 6 0 and all G G fig 's, equivalent to having ideal data from single-and double-gene KO experiments.
We used the error-free implementation of TRaCE to construct the ensemble lower and upper bounds for each of the random scale-free GRNs.Fig. 6 shows the mean network distances of the bounds as a function of network size.Similar to the random GRNs, most (79% for 10 gene networks and 75% for 100 gene networks) scale-free GRNs were not inferable from single and double-gene KO experiments, as the ensemble lower and upper bound did not meet for the majority of the networks.The mean network distance of the lower and upper bounds again increased with network size.However, the inference of scale-free GRNs from the accessibility matrices G G 6 0 and G G fig 's appeared to be more difficult than that of random GRNs, as suggested by the larger distances between the lower and upper bounds for scale-free GRNs than for random GRNs of the same size.
E. coli and S. cerevisiae GRNs.Finally, we investigated the inferability of large, realistic GRNs of E. coli and S. cerevisiae available in GeneNetWeaver [24].The E. coli GRN consists of 1565 genes and 3758 edges, while the yeast GRN comprise 4441 genes and 12873 edges.For E. coli, we generated the accessibility matrices of G 6 0 and all G fig 's.To reduce computational complexity, in the case of yeast, we used only the 100 most informative G G fig 's based on the number of testable edges (i.e. the number of non-zero elements in the testability matrix T fig in Eq. ( 4)).The results are shown in Figs.7 and 8.Not surprisingly, both E. coli and yeast GRNs could not be completely inferred from the above accessibility matrices.There was a diminishing return of information after about 25 and 50 G G fig 's for the inference of E. coli and yeast GRNs, respectively.

Ensemble inference from errorneous accessibility matrices
We evaluated the performance of TRaCE with error correction using E. coli GRN and subnetworks, as well as yeast GRN.False positive errors were simulated by randomly adding edges to the accessibility matrices, while false negatives were simulated by randomly removing edges from the accessibility matrices.The performance of error correction in TRaCE was judged by the number of erroneous edges that remained in the bounds after correction for different FP and FN rates (abbreviated as FPR and FNR, respectively), defined with respect to the size of G 6 0 .
E. coli GRNs.We first used TRaCE with error correction for the ensemble inference of 50 random subnetworks of E. coli GRN with n~100 genes, generated using GeneNetWeaver [24].The average number of edges was 192.As in the above case study, we   created the accessibility matrices of G 6 0 and every G fig .We subsequently contaminated these matrices with FP and FN errors at the specified rates without any preference.The accuracy of the lower and upper bounds constructed using TRaCE with and without error correction is summarized in Table 2.
As in many scenarios above, none of the subnetworks was inferable.FP errors could be very effectively eliminated by error correction.FNs errors expectedly led to missing edges from the upper bound, as indicated by the number of edges of 2).The error correction could not completely eliminate Type A errors, leading to erroneous edges that appeared in the lower bound G G L but did not belong to G 6 0 (see m G G L {G 6 0 À Á in Table 2).A combination of FP and FN errors were more easily corrected than FN errors alone.While FNs were more difficult to eliminate than FPs, correcting FP errors tended to produce larger network ensembles than FNs, indicating higher network uncertainty (see 2).Nevertheless, even in the worst case (0% FP, 20% FN), roughly 90% of the errors in the lower and upper bounds could be removed by the error correction (compare before and after correction).For the inference of E. coli GRN, we generated erroneous accessibility matrices G G 6 0 and the 100 most informative G G fig 's corresponding to the top 100 highest numbers of testable edges.The performance of TRaCE with error correction for different FP and FN rates is summarized in Table 3.In addition, the structural Hamming distances of the lower and upper bounds before and after correction are reported in tables S1 and S2.As before, TRaCE with error correction could handle FPs more effectively than FNs, and a mixture of FP and FN errors in the accessibility matrices were more easily eliminated than FN alone.In the worst case (0% FP, 20% FN), more than 95% of the errors were corrected.The size of the ensemble also depended strongly on the FP errors, and at 20% FP, the number of edges between the lower and upper bound reached three times the size of the full GRN.
S. cerevisiae GRN.For yeast GRN, we generated erroneous G G 6 0 and the 100 most informative G G fig 's.The results of TRaCE with error correction using these accessibility matrices are summarized in Table 4.The performance of TRaCE here was notably better than the inference of E. coli GRN.In all cases, TRaCE could rectify almost all erroneous edges.However, the correction came at a price of high uncertainty, where the difference between the lower and upper bounds exceeded 20 times the number of edges in G 6 0 .Despite such high uncertainty, the gap between the bounds represented only 1.3% of the total possible edges.

Ensemble inference from expression data
We further evaluated the performance of TRaCE using in silico noisy gene expression data generated using GeneNetWeaver [24].We simulated steady-state gene expression data using the same settings as those in DREAM4 100-gene in silico network inference subchallenge.The simulated data are available at http://www.cabsel.ethz.ch/tools/traceor upon request.In the following case studies, we analyzed and converted the data into accessibility matrices following the procedure described in the Numerical Implementation section.Subsequently, we used the resulting Table 6.Ensemble inference of DREAM4 100-gene gold standard networks: single-and double-gene KO dataset.Based on the AUROC values, TRaCE performed as well as the downranking method (p~0:5) and GENIE3 (p~0:3), but better than TIGRESS (p~0:002).Based on the AUPR values, TRaCE performed as well as the downranking method (p~0:5), but better than GENIE3 (p~0:03) and TIGRESS (p~0:01).The statistical significance was evaluated using two sample t-test.doi:10.1371/journal.pone.0103812.g009accessibility matrices with TRaCE to produce the ensemble lower and upper bounds.For the purpose of comparison with existing network inference methods, we also ranked the gene regulatory interactions according to their confidence scores.In particular, we compared the rankings with those from top performing inference methods in DREAM4, namely the downranking method by Pinna et al. [27], GENIE3 [28] and TIGRESS [29].As mentioned earlier, the downranking method follows a two-phase procedure in ranking edges similar to our implementation, but the method only down-ranks feed-forward edges.On the other hand, GENIE3 uses a machine learning strategy called random forest, and TIGRESS is a regression-based method.For each method, we calculated the area under receiver operating characteristics (AUROC) and precision-recall (AUPR) using a redefined confusion matrix, in which methods were not penalized for any error within the set of non-inferable edges.We define non-inferable edges as edges belonging to the upper bound that are missing from the lower bound (i.e.all edges in the set G U { G L ), which are determined from error-free accessibility matrices of the gold standard GRNs.More details of the calculation of the AUROC and AUPR can be found in a recent publication [30].
DREAM4 in silico network inference 100-gene subchallenge.We first simulated 5 replicates of steady state gene expression data associated with the complete single-gene KO experiments.We then used the data to construct the accessibility matrices of the gold standard GRNs.In this case, the upper bound of the ensemble was simply given by the accessibility matrix, and the lower bound was the ConTREx of the upper bound.Table 5 shows the FPR and FNR in the accessibility matrices, the errors in the lower and upper bounds (m ), and the size of the ensemble (m G G U { G G L À Á ) constructed using TRaCE.We noted that the majority (90%) of FN errors in the accessibility matrices were associated with fan-in motifs, in which a gene was regulated by several genes.In such a case, the effect of knocking-out one of the regulator genes could be compensated by the others and thus, the KO experiment did not show any significant differential expression of the downstream genes.As FNs affected the accessibility matrices, the errors in the upper bound m G 6 0 { G G U À Á were higher than those in the lower bound m G G L {G 6 0 À Á .Subsequently, we created a ranked list of gene regulatory predictions and compared the list against those produced by the downranking method, GENIE3 and TIGRESS.Fig. 9 provides the comparison of AUROC and AUPR of the four methods.The comparison showed that TRaCE and the downranking method outperformed GENIE3 and TIGRESS, especially considering the AUPR values.Here, TRaCE performed as well as the downranking method, which was the best overall performer in DREAM 4 100-gene network inference subchallenge [27].
Ensemble inference from single-and double-gene Kos.We further simulated 5 replicates of steady state gene expression data for the complete set of single-and double-gene KO experiments using the gold standard GRNs of DREAM4 100gene subchallenge.We processed the data to obtain the accessibility matrices of G 6 0 and all G fig 's.We subsequently applied TRaCE with error correction to the accessibility matrices to obtain the ensemble lower and upper bounds, which are  summarized in Table 6.The average FPR and FNR were similar to the single-gene KO data since both datasets had the same number of replicates.Again, the majority (80%) of errors in the accessibility matrices were associated with fan-in motifs.By comparing Tables 5 and 6, the errors in the lower bounds improved slightly in comparison with those from only single-gene KO dataset (compare m G G L { G 6 0 À Á values).However, the errors in the upper bounds increased due to the accumulation of FN errors from fan-in motifs (compare m(G 6 0 { G G U ) values).Nevertheless, the additional data from double-gene KO experiments led to lower network uncertainties (compare m G G U { G G L À Á values).For each gold standard GRN, we also generated a ranked list of edges based on the confidence scores and compared the list with those using GENIE3 and TIGRESS.The downranking method could not be applied to double-gene KO data and was left out from the comparison.The AUROC and AUPR of the three methods are compared in Fig. 10.The AUPR of TRaCE was higher than GENIE3 and TIGRESS.Meanwhile, the AUROC values were generally high for all three methods, with TIGRESS having the lowest value.In comparison to single-gene KO data, the inclusion of double-gene KO data led to no change in the average AUROC (p~0:52, two sample t-test) and an increase in AUPR (p~0:17, two sample t-test).Finally, as shown in Tables 7  and 8, the AUROC and AUPR values of TRaCE were insensitive to z cutoff and z threshold used in the construction of the accessibility matrices, and the threshold value in the preprocessing of G G 6 0 .Because of the trade-off between FPs and FNs when using different z cutoff and z threshold , the maximum of AUROC and AUPR corresponded to intermediate values within the selected ranges of z cutoff and z threshold .
E. coli and Yeast GRNs.Finally, we simulated the complete set of single-gene KO experiments for E. coli and yeast.For each organism, we generated 10 replicates of steady state gene expression data.We performed differential expression analysis using the Z-score transformation and obtained the accessibility matrices using either 5 or 10 replicates.We then applied TRaCE with error correction to construct the ensemble lower and upper bounds, and created ranked lists of edges as done earlier.The errors in the accessibility matrices and in the bounds are reported in Table 9, along with the AUROC and AUPR values.Here, FPR in the accessibility matrices decreased with increasing number of replicates, but FNR did not change with the number of replicates.The errors in the upper bounds changed little with increasing technical replicates from 5 to 10, but those in the lower bounds dropped considerably.The sizes of the ensemble also decreased with increasing replicates.Finally, the AUPR values improved slightly with higher replicates, while the AUROC values were insensitive with respect to the number of replicates.

Discussion
The inference of GRNs from data of gene perturbation experiments is an important but unsolved problem.The difficulty stems from the underdetermined nature of such an inference [7,9], as the data do not contain the necessary information to establish the complete causal interactions among the genes.Consequently, there exist many indistinguishable solutions.We have developed TRaCE with this consequence in mind by employing an ensemble inference strategy.Specifically, we have taken into consideration the fundamental limitation in using steady-state expression data of gene KO experiments for establishing direct causal relationships among genes.In TRaCE, we first transform the expression data into accessibility relationships (matrices) among genes.The novel contribution of TRaCE is an algorithm for the construction of lower and upper bounds of network ensemble, where each member of the ensemble satisfies the accessibility matrices.Edges of the upper bound that do not appear in the lower bound are considered non-inferable, as the existence of such edges can not be verified.Here, the size of the ensemble provides a metric of uncertainty in the network inference problem, with which the GRN inferability can be rigorously assessed.The GRN is inferable when the lower and upper bounds coincide (i.e. the ensemble only contains one network).Thus, in TRaCE, the inference and inferability analysis are accomplished simultaneously.
In the case studies, we have demonstrated the use of TRaCE for analyzing the inferability of GRNs.With the exception of networks of low order and small size, the majority of the GRNs were not inferable even when using error-free accessibility matrices.As we have used sparse networks in the case studies, the lower bound of the ensemble was a better estimate of the GRN than the upper bound.Finally, the majority of double-gene KO experiments were non-informative as the reduction in the size of the ensemble diminished after only a small number of G G fig 's.The observation above suggests that experimental design contributes significantly to the underdetermined nature of the typical GRN inference.In this regard, the lower and upper bounds of the ensemble could be used for optimizing the gene perturbation experiments, for example by finding the KO experiment that provides the maximum reduction in the difference between the lower and upper bounds.A strategy for optimal design of experiments using ensemble inference will be presented in a future publication.
We have also used the ensemble lower and upper bounds in conjunction with the z-scores to produce a ranked list of gene regulatory predictions.In comparison with the top methods of DREAM4 network inference challenge, TRaCE could match the performance of the downranking method, the best overall performer in the 100-gene subchallenge.For single-gene KO dataset, TRaCE and the downranking method differed only for edges that were involved in cycles of more than 2 nodes.However, the two methods were fundamentally different, as TRaCE was developed for ensemble inference.Furthermore, the downranking method was created for single-gene KO experiments.Meanwhile, TRaCE significantly outperformed GENIE3 and TIGRESS when using single-and double-gene KO data.We note that GENIE3 and TIGRESS were also among the best performers in DREAM5 network inference challenge [10].
As expected, data noise negatively influenced the GRN inference and increased the uncertainty in the GRN inference.In the case studies, random errors in the accessibility matrices expectedly led to a larger ensemble.While the error correction in TRaCE was able to eliminate the majority of errors, some of the errors remained in the bounds.In particular, FN errors were harder to correct than FPs.The reason was that more Type A errors originating from FNs passed through the correction than those originating from FPs (see Fig. S3 and text S4).Meanwhile, FP errors could cancel out some Type A errors associated with FNs at the cost of increased uncertainty (see Fig. S4 and text S4).
The application of TRaCE to simulated noisy gene expression data indicated that the majority of errors in the accessibility matrices were due to FNs associated with fan-in motifs in the GRN.In such motifs, the effects of knocking-out one regulator gene could be compensated by other regulator(s), and differential expression analysis could only reveal the dominant regulator(s) of a gene.Note that such a problem could not be improved by increasing technical replicates (see Table 9).Meanwhile, errors associated with fan-in motifs usually lead to type A errors where the affected edges are absent from the lower and upper bounds.However, if the available experiments permit the construction of G G VKO in which V KO includes the dominant regulator(s) of a fan-in motif, then the related missing edge(s) may appear in G VKO , leading to a detectable and correctable type C error.We expect the issue above would improve when using more sensitive measurements of gene expression, for example RNAseq.

Conclusion
Inferring gene regulatory networks from DNA microarray data is an unsolved problem.Community-wide assessments of inference methods have shown that distinguishing direct and indirect regulatory interactions among genes is a common Achilles' heel of existing algorithms.In this study, we have adopted an ensemble inference strategy and develop new framework and algorithms for the creation of an ensemble of networks.Here, the ensemble represents the uncertainty associated with differentiating direct and indirect regulations using steady-state gene expression data of gene KO experiments.In particular, TRaCE produces the lower and upper bound of the ensemble.Using the bounds of the ensemble, a ranked list of gene regulatory predictions can also be generated.The case studies demonstrate that except for networks with few edges, most GRNs can not be fully inferred even when error-free data from complete single-and double-gene knock out experiments are available.In comparison with top performing methods of DREAM4 in silico network inference challenge, TRaCE performed equally well with the downranking method, the best overall performer in the challenge.However, the downranking method is not designed to handle data from multigene KO experiments.Meanwhile, TRaCE outperformed GE-NIE3 and TIGRESS when using single-and double-gene KO data.Nevertheless, the uncertainty in GRN inference is still significant, and systematic KOs of genes are often suboptimal as only a small fraction of the experiments are informative.We therefore hope that the shift of paradigm to ensemble inference will trigger further developments of methodologies for inferring and analyzing GRNs, in which network inference uncertainty is explicitly taken into consideration.

Figure 1 .
Figure 1.(a) An example of a directed graph G 6 0 .(b) The transitive closure G G 6 0 (in this case, G G 6 0 ~GT 6 0 since G 6 0 is a DAG).(c) The transitive reduction G t 6 0 of G G 6 0 .(d) The directed graph G fBg associated with G 6 0 .(e) The transitive closure G G fBg .In this case, the transitive reduction G t fBg happens to be the graph G fBg .(f) The ensemble upper bound G U obtained from G G 6 0 and G G fBg .The ensemble lower bound G L obtained from G t 6 0 and G t fBg while FPs in G 6 0 and any G V k KO will also appear in G G L .In order to reduce the transmission of errors, we have developed a filter such that only a subset of edges of G G V k KO are used for the construction of G G L and G G U .The filter is based on the concept of testable edges.Specifically, the testable edges of G G V k KO are any edge .1371/journal.pone.0103812.t001

Figure 2 .
Figure 2. Construction of accessibility matrixG G VKO from expression data.The data come from KOs of genes in the set V KO and an additional gene i, Vi[V(G 6 0 ) { V KO .For each replicate, the expression data are arranged into a matrix where the rows correspond to the experiments and the columns correspond to the genes.(1) The sample mean and standard deviation of the expression of gene j, denoted by m j and s j , respectively, are obtained using the expression data in the j-th column of the data matrix.(2) For each replicate, a z-score matrix is computed according to Eq. (5).(3) Subsequently, the z-score matrices are averaged over the technical replicates to give Z GV KO .(4) The accessibility matrix G G VKO is determined from Z GV KO based on a threshold criterion in Eq. (6).doi:10.1371/journal.pone.0103812.g002

Figure 3 .
Figure 3. Schematic diagrams of TRaCE with and without error correction.(a) Construction of the lower bound G L and upper bound G U from G G 6 0 and G G V k KO 's using TRaCE without error correction.Expression data from gene KO experiments are first converted into accessibility matrices.ConTREx is then applied to each accessibility matrix, removing feed-forward edges and edges incident to vertices belonging cycles with more than 2 nodes.The upper bound is constructed by taking the intersection of the accessibility matrices, while the lower bound is constructed by taking the union of the ConTREx outputs.(b) Construction of the lower bound G L and upper bound G U from G G 6 0 and G G V k KO 's using TRaCE with error correction.Expression data from gene KO experiments are converted into accessibility matrices G M 6 0 and G M V k KO

Figure 4 .
Figure 4. Ensemble inference and inferability of 10,000 random networks of order (a) n~10 and (b) n~100 genes.The mean network distances of the lower and upper bounds from G 6 0 are shown as a function of network size (i.e.number of edges).The error bars indicate the standard deviations.doi:10.1371/journal.pone.0103812.g004

Figure 5 .Figure 6 .
Figure 5. Examples of the ensemble inference of random networks with 10 genes.In case I, the GRN has 8 edges and is inferable from the accessibility matrices G G 6 0 and as few as three G G fig 's.In case II, the GRN has 13 edges and is not inferable.doi:10.1371/journal.pone.0103812.g005

Figure 7 .Figure 8 .
Figure 7. Ensemble inference of E. coli GRN from error-free G G 6 0 and the complete set of G G fig 's.The plot shows the network distances of the lower and upper bounds from G 6 0 as a function of the number of G G fig 's for the 50 most informative G G fig 's, i.e. the top 50 highest number of testable edges.The incorporation of G G fig 's and G fig 's was performed sequentially in decreasing number of testable edges.The inset shows the result for the complete set of G G fig 's.doi:10.1371/journal.pone.0103812.g007

Figure
Figure S1 An example of ConTREx of a simple GRN.The directed edges indicated by blue arrows in G G 6 0 are in the set of indirect regulations.The edges between A and B are retained, The reported values represent the averages over 50 subnetworks.FPR (FNR) is the ratio between the number of FP (FN) in the accessibility matrices and the number of edges in G 6 0 .Let m A{B ð Þof any two digraphs A and B denote the number of edges in the set E(A) { E(B).doi:10.1371/journal.pone.0103812.t002

Table 3 .
Ensemble inference of E. coli GRN.

Table 4 .
Ensemble inference of S. cerevisiae GRN.

Table 5 .
Þof any two digraphs A and B denote the number of edges in the set E(A) { E(B).doi:10.1371/journal.pone.0103812.t004Ensemble inference of DREAM4 100-gene gold standard networks: single-gene KO dataset.
FNR) is the average ratio between the number of FP (FN) in the accessibility matrices and the number of edges in G 6 0 .Let m G 6 0 À Á denote the number of edges in G 6 0 , and m A{B ð Þof any two digraphs A and B denote the number of edges in the set E(A){E(B).doi:10.1371/journal.pone.0103812.t006

Table 7 .
Effects of z cutoff and z threshold on AUROC and AUPR of TRaCE in DREAM4 100-gene subchallenge: single gene KO data.The AUROC and AUPR values are the average + standard devation over 5 gold standard networks.The values used in the comparison with existing methods are highlighted in bold. doi:10.1371/journal.pone.0103812.t007

Table 8 .
Effects of z cutoff , z threshold and the preprocessing threshold of G G 6 0 on AUROC and AUPR of TRaCE in DREAM4 100-gene subchallenge: single-and double-gene KO data.The AUROC and AUPR values are the average + standard devation over 5 gold standard GRNs.The values used in the comparison with existing methods are highlighted in bold. doi:10.1371/journal.pone.0103812.t008

Table 9 .
Ensemble inference of E. coli and Yeast GRNs from single-gene KO data.FNR) is the ratio between the number of FP (FN) in the accessibility matrices and the number of edges in G 6 0 .Let m A{B ð Þof any two digraphs A and B denote the number of edges in the set E(A) { E(B).doi:10.1371/journal.pone.0103812.t009