Inference of Gene Regulatory Networks with Sparse Structural Equation Models Exploiting Genetic Perturbations

Integrating genetic perturbations with gene expression data not only improves accuracy of regulatory network topology inference, but also enables learning of causal regulatory relations between genes. Although a number of methods have been developed to integrate both types of data, the desiderata of efficient and powerful algorithms still remains. In this paper, sparse structural equation models (SEMs) are employed to integrate both gene expression data and cis-expression quantitative trait loci (cis-eQTL), for modeling gene regulatory networks in accordance with biological evidence about genes regulating or being regulated by a small number of genes. A systematic inference method named sparsity-aware maximum likelihood (SML) is developed for SEM estimation. Using simulated directed acyclic or cyclic networks, the SML performance is compared with that of two state-of-the-art algorithms: the adaptive Lasso (AL) based scheme, and the QTL-directed dependency graph (QDG) method. Computer simulations demonstrate that the novel SML algorithm offers significantly better performance than the AL-based and QDG algorithms across all sample sizes from 100 to 1,000, in terms of detection power and false discovery rate, in all the cases tested that include acyclic or cyclic networks of 10, 30 and 300 genes. The SML method is further applied to infer a network of 39 human genes that are related to the immune function and are chosen to have a reliable eQTL per gene. The resulting network consists of 9 genes and 13 edges. Most of the edges represent interactions reasonably expected from experimental evidence, while the remaining may just indicate the emergence of new interactions. The sparse SEM and efficient SML algorithm provide an effective means of exploiting both gene expression and perturbation data to infer gene regulatory networks. An open-source computer program implementing the SML algorithm is freely available upon request.


Cross-validation for ℓ 1 -regularized ML estimation
The CV procedure for selecting λ follows the steps used to select ρ in ridge regression. The sample is divided into K folds, and for κ = 1, . . . , K the κ-th fold is set aside for validation. For L values of λ between λ min = 10 −4 λ max and λ max , the solution to (3) computed using (Y (−κ) , X (−κ) ) is denoted as (B(λ, κ),F(λ, κ)), and the validation error is computed for each κ using (B(λ, κ),F(λ, κ)) and (Y κ , X κ ). Upon averaging the validation errors across κ, an optimal λ is selected as the largest parameter that minimizes this mean-CV error within one standard deviation.

Stability of model selection under CV with different folds
A set of simulations were run to test robustness of the SML algorithm. First, the fold number of CV was changed from k = 5 to k = 10 for the DAGs of 30 genes in Figures 2(c) and 2(d) and the DCGs of 30 genes in Figures 3(c) and 3(d) with an expected number of edges N e = 3. As shown in Figure S1, k = 5 and k = 10 offer almost identical performance. Simulations with a suboptimal λ that is 10% less than the optimal λ obtained from 5-fold CV were then run for the networks used in Figure S1. As expected, the suboptimal λ yielded slightly worse performance as shown in Figure S2; the performance degradation is very small for the DAGs but relatively large for the DCGs, which implies that it is important to choose the optimal value of λ.

Discarding rules
In Lasso regression, it is known that for a given λ some predictors can be set to zero a priori without solving the Lasso inference problem [2,3]. Hence, these predictors can be discarded when inferring other predictors. Rules for discarding predictors were also derived in [2,3]. In particular, the strong rules in [3] can discard a large number of predictors, which significantly reduces the computation needed to solve the Lasso problem. To reduce computational burden and improve the speed of the SML algorithm, the technique in [3] is employed next to derive strong rules for setting some entries of matrix B to zero a priori, before running the coordinate-ascent algorithm.
LetB(λ) andF(λ) denote the optima of (3) for a given λ. Let also Q ij (λ) stand for the derivative of the differentiable part of (3); i.e., N σ 2 log | det(I − B)| − 1 2 ∥Ỹ − BỸ − FX∥ 2 F , w.r.t. B ij evaluated at B(λ) andF(λ). Then, Q ij (λ) can be found as where c ij (λ) is the (i, j)th co-factor of I −B(λ), and σ 2 can be estimated asσ F . Let λ max denote the smallest value of λ that yieldsB ij = 0, ∀i, j (an expression for λ max will be given later). After recognizing that c ij (λ max ) = 1, it follows that whereF(λ max ) is obtained by substituting B = 0 into (7). Note that Q ij (λ max ) can be computed without knowledge of λ max . For λ < λ max , the discarding rule is given by When trying to find solutions of (3) along a path of λ defined with a decreasing set of values λ 0 = λ max > λ 1 > . . . > λ min , which are needed in CV, the following alternative rule is possible: Let S B (λ l ) denote the set ofB ij (λ l ) = 0 obtained from (S4) or (S3). The rationale behind rules (S3) and (S4) is described in the following. By the optimality ofB(λ), the KKT condition implies that ifB ij (λ) = 0. Taking the derivative w.r.t. λ on both sides of (S5) results in Thus, under the assumption that s ij + λ dsij dλ ≤ 1 (see [3] for a discussion on this assumption), it follows that Applying the mean-value theorem between λ l and λ l−1 yields If the inequality in (S4) holds, then (S7) implies |Q(λ l )| < λ l w ij , which in accordance with (S5) yields |s ij | < 1 and thusB ij (λ l ) = 0, as specified by rule (S4). Similarly, one can justify rule (S3).

Computation of λ max
When λ is sufficiently large such thatB = 0, (S5) and the definition of s ij imply that being the minimum possible value satisfying (S8) and thereby giving rise to a λ yieldingB = 0 in (3).

Extensions of the SML algorithm Stability selection
In Algorithm 1, CV is used to select the optimal value of λ that determines the level of sparsity inB. However, it was observed that a single run of CV may not yield a consistent estimate of variables [4,5]. An alternative approach to choosing appropriate variables is stability selection (STS) [6] that offers a theoretical upper bound on the FDR. We next describe the procedure for applying STS to our SML algorithm.  [6], where q denotes the average number of nonzeros inB ν (λ) across ν = 1, · · · , N ST S estimates, and q s the average number of stably identified edges. Both q and q s , and thus FDR(λ), increase as the sparsity-controlling parameter λ decreases. Therefore, the optimal λ denoted as λ ST S for a target FDR is selected as the smallest λ satisfying FDR(λ) ≤ FDR. The overall result presents low sensitivity on frequency δ, since a higher and more restrictive π is automatically compensated for by a lower more permissive λ. Note that the original STS procedure [6] employs the random LASSO where the weights are randomly selected from some specified values. In our case, we do not use random weights but still use the weights obtained from ridge regression, since our simulations show that those weights yield improved performance.

Heteroscedasticity
Removing the assumption that the residual error ϵ i in (1) has covariance matrix σ 2 I, the SML algorithm can be extended to the more general case where the covariance of ϵ i is a diagonal matrix R = diag(σ 2 1 , · · · , σ 2 Ng ) with unequal diagonal entries σ 2 i , i = 1, · · · , N g . In this case, the log-likelihood function in (2) becomes log p (Y|X; B, F, µ) where Tr(·) denotes the trace of the matrix in parentheses. It is easy to show that maximizing this likelihood function w.r.t. µ yields the same expression for µ as the one obtained earlier by maximizing the likelihood function in (2). Then the objective function in ridge regression problem (4) . Therefore, it is again possible to solve (4) row by row separately, but replace the objective function in (5) with where ρ i = ρσ 2 i . Specifically, problem (5) can be solved with this new objective function and a specific value ρ i that is obtained from CV performed separately for each row. Variance σ 2 i is then estimated as the residual error for the ith row obtained with estimated b i and f i . The ℓ 1 -regularized ML problem (3) can also be reformulated by replacing the objective function with the following one: i is the estimate of σ 2 i . Therefore, the coordinate-ascent algorithm can be easily modified by replacinĝ σ 2 withσ 2 i and w ij with w ijσ

Identification of eQTLs
The SML algorithm can be extended to handle the case where some or all phenotypes have unidentified cis-eQTLs, if a new penalty term, that involves the weighed ℓ 1 -norm of the entries of F excluding those corresponding to the identified cis-eQTL, is added to the objective function in (3). In this case, it is only necessary to modify line 13 of the SML algorithm as follows. Consider redefiningf i as the one that contains the entries of f i corresponding to the known cis-eQTLs and letf ′ i contain the remaining entries of f i . Similarly, letX i collect rows ofX corresponding to the known cis-eQTLs andX ′ i contain the remaining rows ofX. Then on line 13 of the SML algorithm, (7) is replaced byf with f ′ i taking values obtained in the previous iteration. The entries of f ′ i can be updated using the coordinate ascent method in the glmnet algorithm [7] for Lasso based linear regression.

State-of-the-art algorithms Adaptive Lasso-based algorithm
The AL-based algorithm [8] involves three basic steps: the first one performs standard eQTL mapping to identify a cis-eQTL per gene; the second one applies the adaptive Lasso [9] to infer the SEM; and the third step performs a permutation test to ensure that edges in the network obtained from the second step correspond to correct dependencies in the directed graph. Since the core of the AL-based algorithm is the adaptive Lasso in step 2, it is described here briefly for completeness. The adaptive lasso estimates B and F as follows Weights w ij and v ij are given by w ij := |B ij | −1/2 and v ij := |F ij | −1/2 , whereB ij andF ij are obtained by solving the following Lasso problem (B,F) = arg max subject to B ii = 0, ∀i = 1, . . . , N g , F jk = 0, ∀(j, k) ∈ S q with ψ(B, ∑ Nq j=1 |F ij |. Constants λ and ρ are obtained via CV. We obtained the program implementing the AL-based algorithm from the authors of [8] and used it in our simulation studies. In this program, the glmnet algorithm [7] is employed to solve Lasso problems (26) and (27).

QDG Algorithm
The QDG algorithm [10] first builds an undirected graph for the phenotypes under consideration, using an undirected dependency graph [11] or a skeleton derived from the PC algorithm [12]. It then orients edges in the undirected graph by using a score calculated from the likelihood of the data for different edge directions. The edge orientation process is performed iteratively for each edge until no edge changes its direction. We obtained the program implementing the QDG algorithm from the authors [10] and used the default settings of the program in our simulations.