Causal network inference from gene transcriptional time-series response to glucocorticoids

Gene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately enabling regulatory network re-engineering. Network inference from transcriptional time-series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time-series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance and additionally infers whether causal effects are activating or inhibitory. We apply BETS to transcriptional time-series data of differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2768 genes and 31,945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: Overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is available as an open source software package at https://github.com/lujonathanh/BETS.

In general, the presentation (figures, writing, review of related work, etc.), methodological choices, and analyses are superb.The synthesis of different types of time series network inference techniques in Figure S1 is an excellent high-level overview of the field.The evaluation on the simulated data DREAM4 challenge is valuable, demonstrating that BETS performs reasonably well on this popular benchmarking dataset and putting BETS in the context of many other types of network inference tools.There is a sizeable AUPR performance gap between BETS and the top method.However, this is not a concern because there are other advantages of BETS besides its AUPR on a simulated data set (e.g.parallelism, speed, and statistical rigor).Furthermore, other work in the gene network inference field has shown that AUPR on simulated data does not reflect the challenges of network inference in real mammalian systems, so maximizing AUPR on the DREAM4 data should not be the main objective of new methods.
Therefore, the glucocorticoid case study is more relevant and interesting.Because there are not complete gold standard networks available for evaluating condition-specific human regulatory networks, the BETS predictions are assessed with two independent datasets.Overexpression of 10 transcription factors demonstrates that there is general agreement between the gene expression changes induced by overexpression and the edge signs predicted by BETS.The BETS predictions are not perfect, as four transcription factors have positive effects enriched among negative predicted edges.In addition, trans-eQTL analysis of GTEx lung gene expression data reveals many new trans-eQTLs that are nominated by the BETS network edges.These demonstrate how BETS predictions can be used to gain biological insights.
One weakness with respect to the method's originality and relevance to the network inference field is that there is not a direct analysis demonstrating that the novel methodological aspects of BETS have a practical impact.The DREAM4 results are 10 years old and do not reflect the state of the art in Granger causality analysis.Only BETS was run on the glucocorticoid express data.Conceptually, the null distributions and FDR-based approach should improve network quality, but this claim is not directly assessed.

Major comments:
Reviewer 1 Comment 1: Related to the comment above, the manuscript would be improved by more specifically demonstrating to readers why they should use BETS over other modern Granger causality approaches.For instance, if the FDR framework is the main appeal, can it directly demonstrate the advantages of having a principled way to select the size of a network?If it is the scalability and parallelization, can the high-throughput software pipeline be made more robust and user friendly (see below)?There is a comparison with elastic net regression without stability selection, but ensembling or stability selection is now common place in network inference.The most closely related Granger causality work that shares some features with BETS is not included in the DREAM4 benchmarking.Examples of closely related methods include the last two methods in Section 2.1.6 of the supplement (the references are broken so specific manuscripts are unknown) or SWING (Finkle 2018 doi:10.1073/pnas.1710936115),which was not referenced.

Author Response:
• As you suggested, we have performed a comparison against two other ensemble-based Granger causality methods: SWING-RF (Finkle et al. 2018) and SWING-Lasso (Finkle et al. 2018) on the DREAM data.SWING-RF is the top-performer of all 24 methods on the DREAM data set.However, SWING-RF is based on decision trees instead of vector autoregression, and therefore lacks interpretability of edges as being activating or inhibitory.It also does not have a thresholding approach for declaring a significant network.SWING-Lasso, on the other hand, is based on vector autoregression.However, it performs worse than BETS, achieving an average AUPR of 0.064 compared to BETS' 0.128 and an average AUROC of 0.596 compared to BETS' 0.688.
We have now referenced SWING and these results in the main manuscript.
• In our initial submission, we had compared BETS against two other modern Granger causality frameworks implemented using vector autoregression: OKVAR-Boost (Lim 2013) and Granger Causal Connectivity toolbox (Seth 2010).BETS outperforms all of them on DREAM, as in Figure 2A and in the 3rd paragraph under "Leading Performance on DREAM Network Inference Challenge." • An outdated supplement was uploaded in addition to the supplement already at the end of the main manuscript.We apologize for the confusion.We have fixed broken references in the new supplement.

Reviewer 1 Comment 2:
The analyses all fix the lag at L = 2.It is unknown whether that lag would still be appropriate for time-series data with more time points.The lag can be set by the user, but the relationships between the lag hyperparameter, the length of the time series, and the effectiveness of the FDR procedures have not been assessed.

Author Response:
• We ran BETS with lag L = 1 on DREAM (Table S4) and find similar performance to lag L = 2.It achieves AUROC of 0.686 (compared to lag L = 2's result of 0.688) and AUPR of 0.14 (compared to lag L = 2's result of 0.12).We have added a paragraph under "Leading Performance on DREAM Network Inference Challenge" to highlight these results.
• Based on these comments, we have run BETS with lag L = 1 on the glucocorticoid data (Supplemental Information Section: "Sensitivity Analysis").The discovered network has 2098 edges and shares 1286 edges with the lag L = 2 network.The overlap is significant (odds Ratio (OR) = 393.9,Fisher's exact test (FET) p ≤ 2.2 × 10 −16 ).We validated the discovered network on GTeX study data and found 76 trans-eQTLs corresponding to 24 network edges.74 of the trans-eQTLs were in lung samples, corresponding to 22 network edges (q-value FDR ≤ 0.2).The remaining 2 trans-eQTLs were in adipose samples, corresponding to the 2 remaining network edges.5 of the 74 discovered trans-eQTLs overlapped with the 341 trans-eQTLs discovered by lag L = 2, corresponding to 4 shared network edges.Thus, while using lag L = 1 reduced network size, the consistency and validity of the edges was still high.
• Assessing how time-series length and choice of lag jointly affect performance would require finding another biological data set of different time-series length and clear options for out-of-sample validation, as in our glucocorticoid data.This is beyond the scope of this work.We chose lag L = 1 and L = 2 due to the short nature of our time series.
Reviewer 1 Comment 3: The permutation and bootstrapping concepts in Figure 1 are understandable at a high level.The methodological details are challenging to follow.It appears that a single temporal permutation is made in the outer loop and another is made in the inner loop.Is a single permutation sufficient to obtain a robust null distribution?In addition, if the bootstrapping is done on the matrix form in Equation 10, which has already unrolled or expanded the L previous time points, how is the inner loop permutation performed?The details of the permutation and FDR procedures are difficult to verify.

Author Response:
• We think a single permutation provides a robust null distribution because each gene is independently shuffled across time.Furthermore, if there are p genes, a single permutation would generate about p(p − 1) null coefficients.In the glucocorticoid data, this would be 2768 × 2767 = 765906 null coefficients.
• The updated Methods now explain in detail the inner loop permutation and FDR for the bootstrapped versions Refer to the original data set as X g t , the inner-loop permuted data set as Xg t (defined under "Permuted Coefficients" in Methods), and the outer-loop permuted data set as Xg t (defined under "Stability FDR" in Methods).Let X g t and X G t− be the "unrolled" output and input of the regression model in equation 10, listed again below.
Let Xg t and XG t− be the unrolled output and input, constructed from Xg t .Let Xg t and XG t− be the unrolled output and input, constructed from Xg t .
Let the jth bootstrap sample from the original data be X g,j t and X G,j t− .Let the jth bootstrap sample from the inner-loop permuted data be Xg,j t and XG,j t− .Let the jth bootstrap sample from the outer-loop permuted data be Xg,j t and XG,j t− .
To generate bootstrap sample j, a set of N row indices, I j , are sampled with replacement from [1, 2, . . ., N ].X g,j t and X G,j t− , are created by choosing the rows I j of X g t and X G t− .Similarly, Xg,j t and XG,j t− , are created by choosing the rows I j of Xg t and XG t− .Xg,j t and XG,j t− , are created by choosing the rows I j of Xg t and XG t− .
In summary, the same row indices are used for generating the bootstrap samples from the original, inner loop permuted, and outer loop permuted data sets.We hope this clarifies the procedure.We have modified the text to reflect these clarifications.
Reviewer 1 Comment 4: Using the BETS network's edge signs to predict the vector autoregressive model's edge coefficients is a creative way to use the overexpression data to validate the network that accounts for the edge signs.However, there are some unaddressed caveats with this approach.The overexpression data have fewer time points, so a simpler regression model is used (Equation 17).Any errors in that simple model's fit will confound the BETS network assessment.This approach may also ignore false negative edges in the BETS network.An alternative approach would be to focus on the edge directions in the BETS network by estimating the differentially expressed genes in each overexpression experiment using a temporally-aware statistical test.nsgp ( Heinonen et al. 2015; doi:10.1093/bioinformatics/btu699)or a similar method may be able to accommodate the different number of time points in the original and overexpression data.Then, these genes can be treated as the targets of that regulator in a pseudo-gold standard network.The predicted edges for that regulator in the BETS network could be evaluated with a precision-recall curve, even if they have very few predicted target genes.
Author Response: • We agree that any error in the simple model's fit may confound the BETS network assessment and acknowledge that caveat in the Results.However, the approach does not ignore false negative edges in the BETS network.Any false negative edges in the BETS network are classified as "no edge" in the regression.We regress a one-hot encoding of the network's edge sign (i.e., positive versus no edge or negative sign; negative versus positive or no edge) as the predictor against the VAR model edge coefficients estimated from each of the overexpression time series as the response.
• Based on your recommendation, we have performed the temporally-aware statistical test, nsgp We generated a set of pseudo-targets for each overexpressed transcription factor tf .For every gene g ∈ G, we use nsgp where M LL tf,g ind is the marginal log likelihood of fitting one nonstationary GP for X g t in the TF overexpression data, and fitting an independent nonstationary GP for X g t in the original exposure data.M LL tf,g joint is the marginal log likelihood of fitting the same nonstationary GP over X g t jointly in the TF overexpression data and original exposure data.
nsgp does not provide a framework for significance quantification of differentially expressed genes.We thus developed a procedure for generating a null distribution ∆tf,g M LL .The null hypothesis is that g is not differentially expressed in the TF overexpression data set.We generated a null Xg t for the TF overexpression data by randomly substituting an original exposure measurement of X g t for an overexpression measurement of X g t for every time point t.We then compute ∆tf,g M LL as before.We do this 10 times, generating ∆tf,g M LL,n where n ∈ {1, 2, . . ., 10}.
We control the FDR at ≤ 0.01 by finding the threshold T tf such that Thus, BETS weakly predicts targets of 5 of the 10 overexpression TFs called by nsgp.
Thus, random guessing also weakly predicts targets of 5 of the 10 over-expression TFs called by nsgp.BETS does not significantly improve over random guessing at FDR 0.1.
A caveat of this validation is that it is limited only to edges with one of the 10 overexpression TFs as a cause.We have included this full set of results in our supplemental information.
Reviewer 1 Comment 5: The author contributions imply that the RNA-seq data were generated as part of this study.If that is the case, the experimental protocols and methods are incomplete.
In addition, the expression data should be made available.