De-Novo Learning of Genome-Scale Regulatory Networks in S. cerevisiae

De-novo reverse-engineering of genome-scale regulatory networks is a fundamental problem of biological and translational research. One of the major obstacles in developing and evaluating approaches for de-novo gene network reconstruction is the absence of high-quality genome-scale gold-standard networks of direct regulatory interactions. To establish a foundation for assessing the accuracy of de-novo gene network reverse-engineering, we constructed high-quality genome-scale gold-standard networks of direct regulatory interactions in Saccharomyces cerevisiae that incorporate binding and gene knockout data. Then we used 7 performance metrics to assess accuracy of 18 statistical association-based approaches for de-novo network reverse-engineering in 13 different datasets spanning over 4 data types. We found that most reconstructed networks had statistically significant accuracies. We also determined which statistical approaches and datasets/data types lead to networks with better reconstruction accuracies. While we found that de-novo reverse-engineering of the entire network is a challenging problem, it is possible to reconstruct sub-networks around some transcription factors with good accuracy. The latter transcription factors can be identified by assessing their connectivity in the inferred networks. Overall, this study provides the gene network reverse-engineering community with a rigorous assessment of the accuracy of S. cerevisiae gene network reconstruction and variability in performance of various approaches for learning both the entire network and sub-networks around transcription factors.


Introduction
One of the fundamental problems of modern biology is reverseengineering of genome-scale regulatory networks.Addressing this problem is essential to expanding understanding of normal and pathologic cellular conditions and can lead to development of new drugs and therapies.While there are many databases that store biological pathways (e.g., KEGG and Ingenuity Pathway Analysis), these databases are often inaccurate and/or incomplete because their knowledge is derived from a multitude of biological systems and conditions that may not correspond to the problem at hand.Furthermore, pathways in these databases are affected by variability of the employed computational and experimental methods and their reproducibility characteristics [1][2][3].Therefore, there is a strong need for reverse-engineering of genome-scale regulatory networks de novo from data.
Gene regulatory networks can be constructed by integrating targeted perturbation data (e.g., gene knockouts or overexpression of transcription factors) with binding data (e.g., chromatin immunoprecipitation) (Figure 1).By knocking-out/deleting or over-expressing transcription factor X and comparing the expression level of other genes with the wild-type strain, one can determine regulatory targets of X.On the other hand, a binding assay allows identification of the binding targets of X.The overlap of regulatory targets and binding targets defines the set of direct regulatory targets of X which are graphically represented in gene regulatory networks.While modern methods in biology enable performing such studies in a variety of model systems, they are typically expensive to perform on a genome-scale and often unfeasible in humans.
However, the wide-spread use of genomic profiling technologies over the last two decades led to development of thousands of observational, i.e. non-perturbation datasets (e.g., from casecontrol and case-series studies), that are freely available in public repositories such as GEO [4] and ArrayExpress [5].In addition, the computational community has recently provided many algorithms that can infer causal relations from non-perturbation data [6][7][8][9][10]; some of them have been adopted to accommodate the high dimensionalities of modern genomics data [11,12], and some methods even lead to Nobel awards in domains outside of biomedicine [13][14][15][16].The question is whether these computa-tional methodologies can accurately learn de-novo gene regulatory networks from highly abundant data in the public domain?
Fortunately, this question has recently received attention in the scientific community [17][18][19][20][21].However, the major obstacle in testing gene network reverse-engineering methods is the absence of high-quality genome-scale gold-standards of direct regulatory interactions that are derived by integrating targeted perturbation with binding data (see Table 1).Another problem is that currently the scientific community primarily uses perturbation data for gene network inference (many studies use compendium microarray data that is obtained by merging a large number of studies, predominantly with deletion mutants), while results based on observational data are more important, since the latter data is easier and cheaper to obtain.In general, it is unknown what types of datasets are more suitable for gene network reverse-engineering studies.
To address gaps in prior research, this study focuses on S. cerevisiae, one of the most well-studied model organisms with a wide range of available genome-scale data.We first constructed high-quality genome-scale gold-standards of regulatory interaction and then assessed 18 statistical association-based approaches (from both bivariate analysis and multivariate causal graph-based methods) for de-novo network reverse-engineering in 13 different datasets that span over 4 data types: (i) observational data consisting of biological wild-type replicates, (ii) observational data obtained across time and/or environmental conditions, (iii) compendium (semi-perturbation) data, and (iv) perturbation data.This study uses de-novo methods based on statistical association [11,12,[22][23][24] because they are state-of-the-art [19] and are most prevalent in the community.In the course of this study, the following four questions are addressed: First, how accurately can one infer genome-scale networks with statistical association-based de-novo methods?Second, which datasets/data designs should be Table 1.Assessment of currently available genome-scale gold-standard networks used by prior gene network reverse-engineering studies.

Gold-standard Description
Limitations Used by #1 E. Coli network from RegulonDB, a curated database of regulatory interactions obtained through literature search [50] N Unknown quality DREAM2 [17], DREAM5 [18], [19], [21] N Heterogeneous data sources and experimental methods #2 S. Cerevisiae network from binding data [51] N Binding relations can be non-functional [28] [20] N Higher quality binding data exists [33] and is utilized in gold-standard #3 #3 S. Cerevisiae network from binding data [33] N Binding relations can be non-functional [28] DREAM5 [18], [19], [21] #4 S. Cerevisiae network from YEASTRACT, a curated database of regulatory interactions obtained through literature search [52,53] N Unknown quality DREAM5 [18] N Heterogeneous data sources and experimental methods #5 S. Cerevisiae network from deletion mutants [54] N Inferred transcription relations can be indirect DREAM5 [18] doi:10.1371/journal.pone.0106479.t001 used for network inference?Third, which statistical methods lead to better accuracy?Fourth, is it possible to identify sub-networks in the entire network that can be reconstructed with high accuracy?
To make conclusions of the study more useful to the community, results for 7 commonly used performance metrics are reported.

Results
Gold-standard gene regulatory networks integrate transcription factor-gene binding with perturbation (deletion mutants) data The analysis of targeted perturbation (deletion mutants) data described in the Methods section resulted in a network with 991,444 regulatory relations involving 5,395 genes, including 118 transcription factors (Spreadsheet S1).
The analysis of binding data described in the Methods section resulted in the following three networks: Binding network #1 (most conservative) involves Integration of binding and perturbation data resulted in three gold-standard networks with direct regulatory interactions (Table 2).Identified direct regulatory interactions are listed in Spreadsheet S3.Figures 2 and 3 visualize the gold-standard network #1 for all genes and only transcription factors, respectively.Figure 4 presents a topological analysis of that gold-standard network.Similar data is provided for gold-standard networks #2 and #3 in Figures S1-S6.

Assessment of the accuracy of network learning with sensitivity and specificity metrics
The network reconstruction results presented below were obtained from the most conservative gold-standard network #1 (Table 2).Results from the remaining two gold-standard networks are similar and are provided in Tables S4-S9.
Table 3 provides values of sensitivity and specificity and Table 4 provides a combined sensitivity/specificity Euclidean distance-based metric (see Methods) for 18 statistical approaches for reverse-engineering applied to 13 datasets, resulting in 234 inferred networks (see Table S1 for a colored version of Table 3 and Table 4, where color denotes ranking of performances).The best result for combined sensitivity/specificity metric ( = 0.64, corresponding to sensitivity = 0.52 and specificity = 0.58) is achieved in Hughes2 dataset by application of bivariate analysis with G2 test and 5% alpha threshold.The best 5% ranking results (see Table S1 part B) according to the combined metric (12 networks out of 234) correspond to bivariate analysis (10 networks) and GLL with conditioning on one gene (2 networks).In terms of datasets, 4 out of 12 best networks originate from Hughes1, 4 from Hughes2, 2 from GPL90, and 2 from Gasch.There is a large variability in accuracy of statistical approaches averaged over 13 datasets, and the most accurate approaches are bivariate (combined metric = 0.75-0.77versus 0.85-0.98 for other methods).The variability in accuracy of datasets averaged over 18 statistical approaches is smaller, and the best results are achieved in Gresham (combined metric = 0.82), Smith (0.84), and Holstege4 (0.84) datasets (versus 0.85-0.89for the remaining datasets).If we perform averaging over all statistical approaches and datasets belonging to the same data type, the best accuracy is achieved by observational data due to change in time/environ-ment and by compendium data (combined metric = 0.86), followed by perturbation data (0.87) and observational data consisting of biological wild-type replicates (0.88).
Figure 5 provides an additional visualization of sensitivity/ specificity pairs for 18 statistical approaches 613 datasets and the corresponding ROC curve [25,26] of the Pareto frontier [27].The resulting area under ROC curve (AUROC) is 0.546 (p-value = 1.12610 27 ).Figure 6 shows ROC curves and reports AUROC for each data type separately.It follows that observational data consisting of biological wild-type replicates leads to least accurate networks with AUROC consistent with prediction by chance (AUROC = 0.499, p-value = 0.55).Other data types lead to small but statistically significant AUROC values, with the best result achieved by perturbation data (AUROC = 0.541, p-value = 1.73610 26 ), followed by compendium data (AUROC = 0.536, p-value = 2.57610 25 ) and observational data due to change in time/environment (AUROC = 0.521, p-value = 0.01).S2 for a colored version of Table 5 and Table 6, where color denotes ranking of performances).The best result for combined PPV/NPV metric ( = 0.93, corresponding to PPV = 0.07 and NPV = 0.98) is achieved in the Smith dataset by application of GLL with a Z test, conditioning on 3 genes and using an AND rule.The best 5% ranking results (see Table S2 part B) according to the combined metric (17 networks out of 234) correspond to GLL with conditioning on either 2 or 3 genes.In terms of datasets, 5 out of 17 best networks originate from Yeung, 3 from Smith, 3 from Gasch, 3 from Hughes2, and the remaining 3 originate from M3D, GPL90, and Holstege4.There is a small variability in accuracy of statistical approaches averaged over 13 datasets, and the most accurate approach is GLL with Z test, conditioning on 3 genes and using an AND rule (combined metric = 0.96 versus 0.97-0.98 for other methods).The variability in accuracy of datasets averaged over 18 statistical approaches is even smaller, and the best results are achieved in Gasch, Smith, Yeung, and Hughes2 datasets (combined metric = 0.97 versus 0.98 for the remaining datasets).If we perform averaging over all statistical approaches and datasets belonging to the same data type, the best accuracy is achieved by observational data due to change in time/ environment (0.97), followed by other data types (0.98).S3 for a colored version of Table 7 and Table 8, where color denotes ranking of performances).The best results for combined recall/precision metric ( = 0.99, corresponding to recall = 0.89-0.91 and precision = 0.02) are achieved in GPL90 and M3D datasets by application of bivariate analysis with G2 test.The best 5% ranking results (see Table S3 part B) according to the combined metric (17 networks out of 234) also correspond to bivariate analysis.In terms of datasets, 5 out of 17 best networks originate from GPL90, 3 from M3D, 3 from Yeung, 3 from Smith, and 3 from Holstege2.There is a large variability in accuracy of statistical approaches averaged over 13 datasets, and the most accurate approaches are bivariate (combined metric = 1.04-1.09versus 1.27-1.38 for other methods).The variability in accuracy of datasets averaged over 18 statistical approaches is smaller, and the best results are achieved in GPL90 (combined metric = 1.19),Smith (1.20), and Gresham (1.20) datasets (versus 1.21-1.31for the remaining datasets).If we perform averaging over all statistical approaches and datasets belonging to the same data type, the best accuracy is achieved by compendium data (1.20), followed by observational data due to change in time/environment (1.23), observational data consisting of biological wild-type replicates (1.26), and perturbation data (1.27).

Connectivity of transcription factors is correlated with the accuracy of learning their sub-networks
Despite the overall low but statistically significant accuracies of gene network reverse-engineering in S. cerevisiae, some pathways or sub-networks can be learned with high accuracy from this data.
For example, application of GLL method (with Fisher's Z-test and conditioning on one gene) to Yeung dataset allowed us to learn a sub-network of direct regulatory interactions of transcription factor GCN4 (containing 44 genes) with sensitivity = 0.50, specificity = 0.91, PPV = 0.24, NPV = 0.97, which is statistically significant after adjustment for multiple comparison (Figure S7).We hypothesize that total connectivity of transcription factors (assessed either in gold-standard or inferred networks) is correlated with the reconstruction accuracy of their sub-networks.If this hypothesis is true, the connectivity measure may be used to identify transcription factors whose sub-networks can be learned accurately by de novo reverse-engineering methods.
The left panel of Figure 7 provides a scatter-plot showing significant correlation of transcription factor connectivity with the accuracy (combined PPV/NPV) of de novo reconstructing transcription factor sub-networks (that contain only direct regulatory interactions of each transcription factor).The right panel of Figure 7 shows the null distribution for assessing statistical significance of this correlation.Table 9 reports for each reverse-engineering approach and accuracy metric, the number of networks (in total we have 13 networks that were derived from 13 microarray gene expression datasets) with statistically significant correlation between connectivity of transcription factors and accuracy of reconstructing their subnetworks.As can be seen, for most reverse-engineering methods and accuracy metrics, connectivity of transcription factors in the inferred networks is significantly correlated with the reconstruction accuracy of their sub-networks.The correlations are sometimes robust and hold in multiple networks inferred from various datasets.However, the transcription factor connectivity assessed in the gold-standard networks correlates less robustly with the accuracy metrics; especially the combined sensitivity/specificity is rarely correlated.Overall, the correlations are typically negative, which implies that reverse-engineering methods can achieve higher accuracy (using each of the three combined distance metrics) for transcription factors with larger connectivity (i.e., more direct regulatory interactions).This behavior is particularly interesting for the combined sensitivity/specificity metric which is not influenced by the density of the network.

Construction of the gold-standard networks of direct gene regulatory interactions
The general process for construction of gold-standard networks with direct gene regulatory interactions is illustrated in Figure 1.Two types of genome-scale data are required for network construction: (i) targeted perturbation data with gene knocksouts/deletions or over-expressions that can be obtained by techniques for interference with RNA such as shRNA/siRNA or inducible promoters, and (ii) binding data that can be obtained by chromatin immunoprecipitation (ChIP) methods such as ChIPchip/ChIP-seq.Targeted perturbation data allows identification of regulatory targets, while binding data allows identification of binding targets of transcription factors.Using either data alone is not sufficient to infer direct regulatory relations because regulatory interactions resulting from targeted perturbation data may be either direct or indirect, and likewise binding interactions can be either functional or not [28].Therefore, we integrated regulatory and binding targets to obtain the set of direct regulatory targets which are graphically represented in gene regulatory networks.
In the current study, we used targeted perturbation data obtained by a co-author of this study (P.K.).The targeted perturbation data was obtained from 1,484 gene deletion (mutant) experiments.Full details of experimental procedures, normalization procedures and statistical analyses are described in [29].In summary, mutants from independent cultures were analyzed on dual-channel 70-mer oligonucleotide arrays using a batch of wildtype RNA as a common reference.In addition, wild-type profiles were obtained to statistically assess differences with mutant profiles.All gene expression profiles were normalized by loess method [30] followed by gene-specific dye-bias correction [31].Differentially expressed genes between wild-type and mutant profiles were determined using limma [32] at 5% alpha level adjusted for multiple comparisons using the methodology of [23,24].For the binding data, we used a previously published ChIP-chip dataset characterizing binding activity of 203 transcription factors to genes [33].The original study [33] suggested using two thresholds (0.001 and 0.005) for assessing significance of binding interactions.To further filter false-positive binding relations, the study [33] suggested assessing evolutionary conservation of binding sequences in 0, 1, or 2 of the related Saccharomyces species.The primary approach used in the current study for   identification of binding relations is based on the most conservative analysis of the above ChIP-chip data with binding threshold = 0.001 and conservation in 2 species (resulting in ''binding network #1'').In addition, we report in Supporting Information results for two other approaches: binding threshold = 0.005 and conservation in 1 species (resulting in ''binding network #2'') and binding threshold = 0.005 without conservation requirement (resulting in ''binding network #3'').
Finally, before the identified regulatory and binding relations were overlapped, all gene names were converted to systematic gene names using Saccharomyces Genome Database [34].Any gene that has no mapping or ambiguous mapping to a systematic name was removed.This resulted in 5,395 common genes between targeted perturbation and binding data.

Datasets for gene network reverse-engineering
We obtained 13 datasets to be used for reverse-engineering of S. cerevisiae gene regulatory networks.Datasets and their characteristics are listed in Table 10.The datasets span over 4 data types: (i) observational data consisting of biological wild-type replicates, (ii) observational data obtained by changing time and/or environmental conditions, (iii) compendium (semi-perturbation) data, and (iv) perturbation data.Data types (i) and (ii) contain samples collected by passive observation of the system without specific interference on the levels of genes.Data type (iii) was obtained by merging data from a large number of studies available in major public microarray data repositories.Those studies were predominantly perturbations-based (with gene knock-outs/overexpressions), and therefore we refer to such compendium data as ''semi-perturbation''.Data type (iv) originates from gene knockout/over-expression experiments.Out of 13 datasets used in the study, the following two are novel and are thus described in more detail below.
Dataset Gresham was obtained by a co-author of this study (D.G.), and it describes the transcriptional response of 5,590 S. cerevisiae genes to dynamic changes in environmental nitrogen.Cells in nitrogen limited chemostats were treated with an excess of nitrogen, and the transcriptional response was assessed at different time intervals after the nitrogen treatment, resulting in 100 gene expression profiles [35].
Dataset GPL90 was compiled by using all microarray chips from Affymetrix Yeast Genome S98 Array available in GEO [4].Specifically, 1,509 chips with raw data (CEL files) were downloaded from GEO on 08/21/2013.RMA normalization [36] was performed on all samples using Matlab function affyrma.Data for 39 out of 1,509 chips could not be processed and therefore discarded.The remaining data for 1,470 chips were processed as one batch.Affymetrix probe sets were mapped to gene names by a customized Matlab script using the platform annotation table for GPL90 (available on GEO) as reference.A total number of 6,740 genes over 1,470 samples were obtained upon completion of the process described above.The resulting dataset is provided in Spreadsheet S4.

Statistical methods for gene network reverseengineering
This study uses de-novo statistical association-based approaches for network reverse-engineering [11,12,[22][23][24] because they are state-of-the-art [19] and are most prevalent in the community.This is a very broad class of methods and it encompasses both traditional bivariate approaches (that consider only two genes/ variables at a time) and multivariate approaches (that perform conditioning based on other genes/variables).For the latter methods we use causal graph-based techniques from the Generalized Local Learning (GLL) algorithmic family [11,12].Under fairly broad distributional assumptions, GLL provably discovers genes/variables that are direct causes and direct effects of the gene/variable of interest [11,12], and is known to be one of the best performing methods for de novo gene network reverseengineering [19].
When we infer gene networks in this study, we follow the ''divide-and-conquer'' (also known as ''local-to-global'') approach whereby we first iteratively run each method to find direct upstream or downstream regulatory relations for each gene in the dataset, and then piece together the network.It may happen that the algorithm run on gene X may output that Y has a direct regulatory relation with X, however when the algorithm is run on gene Y, X does not belong to its output.We thus apply one of the two post-processing steps to piece together the network: (i) ''AND'' rule which implies that if the algorithm run on X outputs Y and if the algorithm run on Y outputs X, then X and Y have an edge in the resulting network, and (ii) ''OR'' rule which implies that if the algorithm run on X outputs Y or if the algorithm run on Y outputs X, then X and Y have an edge in the resulting network.Application of AND rule results in sparser networks, and OR rule results in denser networks.
The list of 18 approaches for network reverse-engineering is given in Table 11.Methods are based on two statistical association tests: Fisher's Z [37] and G 2 [38] test.The latter test requires application to categorical data, and therefore we discretized gene expression data into ternary by standardizing it to mean 0 and standard deviation 1 and considering three categories: smaller than -1, between -1 and 1, and greater than 1.
Finally, we note that all of the above approaches used in this study output undirected networks.Inference of directed networks from data remains a more challenging problem that is beyond the scope of the present study.

Metrics to assess accuracy of gene network reverseengineering
To assess accuracy of the network reverse-engineering, we used 4 core and 3 combined performance metrics.The core metrics used are: positive predictive value (PPV, also known as precision), negative predictive value (NPV), sensitivity (also known as recall), and specificity.PPV measures the probability that a regulatory interaction discovered by the algorithm exists in the gold-standard (i.e., the precision of the output network), while NPV measures the probability that an interaction not predicted by the algorithm does not exist in the gold-standard.Sensitivity measures the proportion of interactions in the gold-standard that are discovered by the algorithm (i.e., the completeness of the output network), whereas specificity measures the proportion of interactions absent in the gold-standard that are not predicted by the algorithm.The value of core metrics ranges from 0 to 1, with larger values corresponding to a more accurate algorithm.
Statistical significance of the output networks was assessed using the hyper-geometric test at 5% alpha level adjusted for multiple comparisons using the methodology of [23,24].The adjustment was performed over 3 (gold-standards) 618 (methods) 613 (datasets) = 702 applications of network reverse-engineering algorithms.

Assessing correlation between connectivity of transcription factors and the accuracy of learning their sub-networks
For every transcription factor we measured its total connectivity (either in the inferred or gold-standard network) and accuracy of learning its sub-network measured by one of the three combined metrics mentioned in the previous subsection.Then we measured correlation using Spearman correlation coefficient and assessed significance of correlation using exact statistical test following the theory of Good [39].The exact test is essential because transcription factors are not independent of each other.This test   Table 9. Number of networks that have significant correlations between transcription factor connectivity and accuracy of reconstructing their sub-networks.
The correlations were assessed for 13 different networks (derived from 13 gene expression microarray datasets) for each combination of network reverse-engineering approaches and combined accuracy metrics.Statistical significance is assessed at 5% alpha level adjusted globally for multiple comparisons (over all statistical tests performed for the table).The left portion of the table corresponds to transcription factor connectivity assessed in the gold-standard network, and the right portion corresponds to transcription factor connectivity assessed in the inferred network.doi:10.1371/journal.pone.0106479.t009involved 1,000 permutations of gene identifiers for a fixed network structure and establishing a null distribution for Spearman correlation coefficients.The p-value was computed as proportion of permuted networks where correlation was higher in magnitude than the observed one.When we evaluated correlation between connectivity and accuracy for multiple networks and accuracy metrics, statistical significance was assessed at 5% alpha level adjusted for multiple comparisons using the methodology of [23,24].

Comparison with prior results
The results of the current study indicate that gene network reverse-engineering in S. cerevisiae is a challenging problem.Given prior work in the field, it is interesting to compare current results with the prior studies in S. cerevisiae, while keeping in mind that prior studies used less comprehensive gold-standard networks (see Introduction and Table 1).Furthermore, the majority of prior work deals only with inferring likelihood scores of all possible network edges without establishing a threshold on these scores which would result in a discrete network [18,21].The latter studies do not report accuracy metrics of gene network reverse-engineering but typically report metrics related to ranking all possible network edges by the inferred likelihood scores.To the best of our knowledge, there are only two studies which inferred discrete genome-scale networks in S. cerevisiae.The study [20] applied two statistical methods, resulting in non-statistically significant networks, both with PPV = 0.The study [19] used 6 versions of S. cerevisiae binding data-based gold-standard and applied 30 approaches (many of which were not included in the current study) to learn a network.As can be seen in Table S7, results of the current study are much better in terms of sensitivity and specificity and related combined metric.However, in terms of PPV, NPV, and related combined metric, results are slightly worse (by 0.01 PPV).
While this study focuses on genome-scale regulatory network reverse-engineering in S. cerevisiae, there was significant prior work in other model systems/organisms, e.g.E. coli [17][18][19]21].Interestingly, inference of E. Coli networks seems to be an easier problem than inference of S. cerevisiae networks.For example, the best known result in terms of combined PPV/NPV metric for S. cerevisiae is 0.92 (PPV = 0.08 and NPV = 0.98) but for E. Coli it is 0.36 (PPV = 0.64, NPV = 0.98) [19].The results in terms of combined sensitivity/specific metric for S. cerevisiae are also worse than for E. Coli [19].Others have also made similar observation for additional metrics [18].It remains to be seen whether the difference in accuracy of learning S. cerevisiae and E. coli networks is due to the nature of transcription factor regulation, network complexity, quality of gold-standard networks, quality of datasets used for network learning, or combination of these factors.

Towards improving accuracy of gene network reverseengineering
While there are theoretical challenges of network reverseengineering from microarray data, e.g.impact of cellular aggregation on inference of statistical relations [44], we believe that there are several ways to improve the accuracy of learning gene regulatory networks.First, by further improving the quality and completeness of gold-standard networks.For example, one can improve networks obtained with current approaches by ensuring that all transcription factors participate in both binding and gene knockout data and by using a large number of biological replicates for gene knockouts.The binding data can be further improved by using ChIP-seq and inclusion of other indications of bindings, for example protein binding microarrays.Another possibility worth exploring is using protein-protein interaction data in addition to binding data which would allow enriching the ''FDR'' refers to thresholding associations at 5% FDR using the methodology of [23,24].''Alpha'' refers to thresholding associations at 5% alpha.''AND'' rule implies that if the algorithm run on X outputs Y and if the algorithm run on Y outputs X, then X and Y have an edge in the resulting network, and (ii) ''OR'' rule implies that if the algorithm run on X outputs Y or if the algorithm run on Y outputs X, then X and Y have an edge in the resulting network.doi:10.1371/journal.pone.0106479.t011 gold-standard networks that are currently based only on transcription factor-gene interactions.Second, by performing inference of gene networks from both observational and perturbation data with explicit knowledge of gene manipulations (current methods were not provided with information about targeted perturbations in the data).The latter methods (e.g., [45][46][47][48][49]) have promise because they allow to solve the theoretical problem of statistical indistinguishability of networks learned from observational data alone [6].

More on interpretation and analysis of obtained results
We used 4 widespread core performance metrics (sensitivity or recall, specificity, PPV or precision, and NPV) and 3 ways to combine them by equally weighting two antagonistic core performance metrics at a time (sensitivity and specificity, PPV and NPV, and recall and precision).Given that most methods output sparse graphs and the underlying gold-standard networks are also sparse, the combined sensitivity/specificity metric is significantly influenced by sensitivity (because many networks have specificity $0.90), and in particular combined PPV/NPV metric is largely influenced by PPV (because all networks but one have NPV$0.98).Combined recall/precision metric also suffers from similar issue since it is mostly influenced by sensitivity (because most methods have very low PPV#0.05).The interpretation of results and relevance to specific biological problems can be improved by using other combinations of core performance metrics (e.g., by using unequal weighting of PPV and NPV metrics in the Euclidean-based combined distance metric) or by devising new performance metrics.To facilitate the latter task, we are providing in Spreadsheet S5 detailed results with the numbers of true positive, true negative, false positive, and false negative edges computed for each network.

Conclusions
This study has two key contributions.First, we constructed highquality genome-scale gold-standards of direct regulatory interactions in S. cerevisiae that incorporate binding and gene knockout data.Second, we used 7 performance metrics to assess accuracy of 18 statistical association-based approaches for de-novo network reverse-engineering in 13 different real datasets spanning over 4 data types (observational data consisting of biological wild-type replicates, observational data obtained by changing time and/or environmental conditions, compendium/semi-perturbation data, and perturbation data).We found that inference of genome-scale regulatory networks in S. cerevisiae is a challenging problem and quantified resulting accuracies, most of which are statistically significant (see Table S10).We also found significant variability of the network reverse-engineering accuracy among statistical approaches for network inference.When accuracy is assessed based on sensitivity/specificity or recall/precision combined metrics, bivariate analysis is the best approach, and when accuracy is assessed based on PPV/NPV combined metric, Generalized Local Learning (GLL) with conditioning on 2-3 genes is the best approach.On the other hand, the variability of the network reverse-engineering accuracy is much smaller among various datasets and data types compared to variability among statistical approaches.However, some datasets/data types tend to dominate others for specific performance metrics, and in most cases using observational data consisting of biological wild-type replicates leads to worse accuracies compared with other datasets and data types.This indicates that considering that cost efficiency of various data types, observational data with changes in environments/time is preferable for network reconstruction.Finally, we found that for most reverse-engineering methods and accuracy metrics, connectivity of transcription factors is often significantly correlated with the reconstruction accuracy of their sub-networks.The correlations are sometimes robust and significant in multiple networks inferred from various datasets.Therefore, the connectivity measure may be used to identify transcription factors whose sub-networks can be learned accurately by de-novo reverse-engineering methods.We believe that the gene network reverse-engineering community will find this study useful in order to have a realistic perspective on this problem and performance of a variety of approaches.Table S1 Gold-standard network #1, sensitivity and specificity (panel A) and Euclidean distance from the optimal algorithm with sensitivity = 1 and specificity = 1 (panel B).

(DOCX)
Table S4 Gold-standard network #2, sensitivity and specificity (panel A) and Euclidean distance from the optimal algorithm with sensitivity = 1 and specificity = 1 (panel B).

(DOCX)
Table S7 Gold-standard network #3, sensitivity and specificity (panel A) and Euclidean distance from the optimal algorithm with sensitivity = 1 and specificity = 1 (panel B).

(DOCX)
Table S10 Comparison of accuracy of gene network reverse-engineering with the prior study.

Figure 1 .
Figure 1.Construction of gene regulatory networks by integrating targeted perturbation data with binding data.The relations in constructed gene regulatory network correspond to direct regulatory interactions.doi:10.1371/journal.pone.0106479.g001

Figure 2 .
Figure 2. Gold-standard gene regulatory network #1.Transcription factors are shown with large blue circles, and other genes are shown with small green circles.Edges in the network represent direct regulatory interactions.Inhibiting edges are shown with red, and excitatory edges are shown with black.doi:10.1371/journal.pone.0106479.g002

Figure 3 .
Figure 3. Direct regulatory interactions between transcription factors in gold-standard gene regulatory network #1.Inhibiting edges are shown with red, and excitatory edges are shown with black.doi:10.1371/journal.pone.0106479.g003

Figure 5 .
Figure 5. ROC curve of the Pareto frontier for sensitivity/ specificity pairs obtained by application of 18 network reverse-engineering approaches to 13 datasets.doi:10.1371/journal.pone.0106479.g005

Figure 6 .
Figure 6.ROC curves of the Pareto frontier for sensitivity/specificity pairs obtained by application of 18 network reverseengineering approaches to datasets of each type.doi:10.1371/journal.pone.0106479.g006

Figure 7 .
Figure 7. Example scatter-plot of transcription factor connectivity versus the accuracy (combined PPV/NPV metric) of reconstructing their sub-networks.The left panel shows the scatter-plot and the right panel shows the null distribution for establishing statistical significance of the observed correlation.doi:10.1371/journal.pone.0106479.g007

Figure
Figure S1 Gold-standard gene regulatory network #2.(PDF) Figure S2 Direct regulatory interactions between transcription factors in gold-standard gene regulatory network #2.(PDF) Figure S3 Topological analysis of gold-standard gene regulatory network #2.(PDF) Figure S4 Gold-standard gene regulatory network #3.(PDF) Figure S5 Direct regulatory interactions between transcription factors in gold-standard gene regulatory network #3.(PDF) Figure S6 Topological analysis of gold-standard gene regulatory network #3.(PDF) Figure S7 De-novo reconstruction of the GCN4 subnetwork.(PDF) metrics for all dataset, statistical approaches and gold-standard networks.(XLSX)

Table 5
provides values of positive predictive value (PPV) and negative predictive value (NPV) and Table6provides a combined PPV/NPV Euclidean distance-based metric (see Methods) for 18 statistical approaches for reverse-engineering applied to 13 datasets, resulting in 234 inferred networks (see Table

Table 2 .
Overlapping identified binding with regulatory relations results in gold-standard networks with direct regulatory relations. doi:10.1371/journal.pone.0106479.t002

Table 3 .
Sensitivity and specificity.Cells with bold font correspond to experiments with statistically significant reconstruction of regulatory networks.See Table 11 for abbreviations of row labels.See Table S1 part A for a colored version of this table.

Table 4 .
Euclidean distance from the optimal algorithm with sensitivity = 1 and specificity = 1.Cells with bold font correspond to experiments with statistically significant reconstruction of regulatory networks.See Table 11 for abbreviations of row labels.See Table S1 part B for a colored version of this table. doi:10.1371/journal.pone.0106479.t004

Table 5 .
Positive predictive value (PPV) and negative predictive value (NPV).Cells with bold font correspond to experiments with statistically significant reconstruction of regulatory networks.See Table 11 for abbreviations of row labels.See Table S2 part A for a colored version of this table.

Table 6 .
Euclidean distance from the optimal algorithm with PPV = 1 and NPV = 1.Cells with bold font correspond to experiments with statistically significant reconstruction of regulatory networks.See Table 11 for abbreviations of row labels.See Table S2 part B for a colored version of this table.

Table 7 .
Recall (sensitivity) and precision (PPV).Cells with bold font correspond to experiments with statistically significant reconstruction of regulatory networks.See Table11for abbreviations of row labels.See TableS3part A for a colored version of this table.

Table 8 .
Euclidean distance from the optimal algorithm with Sensitivity = 1 and PPV = 1.Cells with bold font correspond to experiments with statistically significant reconstruction of regulatory networks.See Table11for abbreviations of row labels.See TableS3part B for a colored version of this table.

Table 10 .
Datasets used for gene regulatory network reverse-engineering.

Table 11 .
Statistical approaches used for gene regulatory network reverse-engineering.