scPADGRN: A preconditioned ADMM approach for reconstructing dynamic gene regulatory network using single-cell RNA sequencing data

Disease development and cell differentiation both involve dynamic changes; therefore, the reconstruction of dynamic gene regulatory networks (DGRNs) is an important but difficult problem in systems biology. With recent technical advances in single-cell RNA sequencing (scRNA-seq), large volumes of scRNA-seq data are being obtained for various processes. However, most current methods of inferring DGRNs from bulk samples may not be suitable for scRNA-seq data. In this work, we present scPADGRN, a novel DGRN inference method using “time-series” scRNA-seq data. scPADGRN combines the preconditioned alternating direction method of multipliers with cell clustering for DGRN reconstruction. It exhibits advantages in accuracy, robustness and fast convergence. Moreover, a quantitative index called Differentiation Genes’ Interaction Enrichment (DGIE) is presented to quantify the interaction enrichment of genes related to differentiation. From the DGIE scores of relevant subnetworks, we infer that the functions of embryonic stem (ES) cells are most active initially and may gradually fade over time. The communication strength of known contributing genes that facilitate cell differentiation increases from ES cells to terminally differentiated cells. We also identify several genes responsible for the changes in the DGIE scores occurring during cell differentiation based on three real single-cell datasets. Our results demonstrate that single-cell analyses based on network inference coupled with quantitative computations can reveal key transcriptional regulators involved in cell differentiation and disease development.


Introduction
In systems biology, the reconstruction of dynamic gene regulatory networks (DGRNs) has proven to be a crucial tool for understanding processes related to disease development and cell differentiation, such as hematopoietic specification [1], T cell activation [2], influenza infection, acute lung injury, and type 2 diabetes [3]. DGRNs specify links between genes over time. By exploring the differences in dynamic networks, researchers are able to comprehend the mechanisms causing complex diseases [3]. Before single cell data are available, networks are reconstructed at group level using bulk data.
Recently, large quantities of single-cell RNA sequencing (scRNA-seq) data have been obtained for various biological processes due to advances in sequencing techniques [4][5][6][7]. scRNA-seq data possess unique advantages and new methods are being developed to exploit cell heterogeneity information from scRNA-seq. Review [8] highlights some computational approaches for interpreting scRNA-seq data. Specifically, clustering method is among the very first development to investigate such heterogeneity on cell level data [9,10]. Reviews [11][12][13] provide further information on clustering methods. Also on studying cell heterogeneity, the concept of pseudotime is introduced by [14] to provide a quantitative measure of progress through a biological process. We refer to [15,16] for a comprehensive review.
Much progress has been made in constructing gene regulation networks for scRNA-seq data [17]. For example, [18] uses a probabilistic model to infer gene regulatory networks with uncovered discrete transitions among cell states. Key regulators responsible for the transition can be identified by comparing neighboring networks. [19] proposes an information-theorybased approach by using partial information decomposition. [20] proposes to use pseudotime information estimated from static cell data to refine the regular gene networks constructed based on ordinary differential equations (ODE). In this work, we study the time-series scRNAseq data where cells are sampled at different time points. With this additional temporal information, dynamic networks can be constructed to model interactions over time and exercise information on cell ordering [14,21,22]. [23] extends [20]'s work by constructing ODE-based networks with pseudotime inference on the time-series data.
With time-series scRNA-seq data, methods for constructing DGRNs on bulk data are not directly applicable since the biological meaning of a sample changes from the average for several cells in bulk data to the value for a single cell. Several individual cells can be sequenced at once, causing the form of the gene expression data to change from a single vector to several vectors, or a matrix. The cells sequenced at different time points are different. It is not possible to describe the dynamics of a single cell because that cell does not even exist at the next time point. However, the dynamics of cells at the cluster level can be modeled.
In this work, we present scPADGRN, a novel method of inferring DGRNs from time-series scRNA-seq data. scPADGRN combines the preconditioned alternating direction method of multipliers (PADMM) with cell clustering for DGRN reconstruction. The cell clustering process includes ranking cells in accordance with their pseudotimes and merging cells into clusters. Our optimization model considers network precision, network sparsity and network continuity. The PADMM is used to solve the optimization model to obtain the DGRN. Multiple matrices are updated, and three subproblems are solved by the PADMM algorithm in each iteration.
Simulated data and three real datasets concerning cell differentiation have been used to test the performance of scPADGRN. We propose a quantity called Differentiation Genes' Interaction Enrichment (DGIE) to quantify the changes in the interactions of a certain set of genes in a DGRN. First, we chose genes involved in the same biological processes or KEGG pathways to visualize subnetworks of DGRNs and computed their DGIE scores. Then, we selected all genes known to contribute to the process of cell differentiation and computed the corresponding DGIE scores. We also identified several genes responsible for the drastic changes in the DGIE scores in each dataset. These genes might be key regulators in cell differentiation. Our results demonstrate that single-cell analyses based on network inference coupled with quantitative computations can reveal key transcriptional regulators in cell differentiation and disease development.

Simulated datasets
In this section, we describe the simulation of cluster-specific data Y = [Y(1), � � �, Y(N)]. First, we simulated the X 1 values in time-series single-cell data X = [X 1 , � � �, X N ] using the scRNAseq simulation tool Splatter [24]. Note that Splatter requires that the number of cells to be much greater than the number of clusters, otherwise it returns a simulated dataset with some columns containing all zero values. After setting appropriate numbers of genes (m), cells (n) and cell clusters (r), we generated the initial gene expression data X 1 using Splatter. Then, we constructed cluster-specific data Y(1) by merging vectors (cells) belonging to the same cluster into a single vector, representing the gene expression value of the cluster. The next step was to generate the Y(t), 2 � t � N. We defined the dynamic network {A(1), � � �, A(N − 1)} in the form of random 0-1 matrices and Y(t We denote {A(1), . . ., A(N − 1)} as the dynamic networks, where the elements of each network can take values between [-1,0), 0, and between (0, 1], representing down regulation, no regulation, or up regulation, respectively. In our simulation, we take {A(1), . . ., A(N − 1)} in the form of random 0-1 matrices. Note, operating on the log scale then exponentiating the values back to the original scale can guarantee that the simulated gene expressions are positive, which otherwise would make no sense biologically.
In the experiments on the simulated data, there were two main questions of concern: how noise and the number of clusters r affect the network accuracy. To answer these two questions, we conducted two separate experiments. In the first experiment, we set the number of genes to 100, 200, 300, 400 and 500, individually. The number of cells was set to 10 times the number of genes, and the number of clusters was equal to the number of genes. We also set the number of time points to N = 5. Thus, we obtained corresponding cluster-specific data Y = [Y(1), � � �, Y(N)].
Here, we considered the noise to be independent and to follow a Gaussian distribution with a mean value of μ = 0 and a standard deviation of σ = 0.01, 0.02 or 0.05. The final datasets were obtained by adding noise to the cluster-specific data In the second experiment, we set the number of genes to 200 and 400. The number of cells was set to 10 times the number of genes, and the number of clusters was varied from 40 to 200 and from 40 to 400, separately. The number of time points was again set to N = 5.

Three real datasets
Three time-series scRNA-seq datasets concerning cell differentiation were obtained from [23], with pseudotimes inferred by Monocle [14]. Dataset 1 was derived from mouse embryonic stem cells (ES cells) differentiating to primitive endoderm (PrE) cells. This dataset uses G6GR ES cells [25], and was produced with RamDA-seq protocol [26]. A total of 356 cells, which were sequenced at 0, 12, 24, 48 and 72 h, were used. Dataset 2 was derived from mouse embryonic fibroblast cells differentiating to myocytes [5]. A total of 405 cells were sequenced at days 0, 2, 5, and 22. Dataset 3 contains data from 758 cells sequenced at 0, 12, 24, 36, 72 and 96 h. Dataset 3 was derived from human ES cells differentiating to definitive endoderm cells [6]. For all three real data examples, reference networks from the Transcription Factor Regulatory Network database (http://www.regulatorynetworks.org) were used to validate the inferred networks.

DGRN reconstruction
In this work, we propose a novel DGRN inference method called scPADGRN. The framework of scPADGRN is shown in Fig 1. Fig 1(a) shows original single cell level RNA-seq data at different time points. Because in the single cell RNA-sequence data, cell numbers are possibly different at different time points, leading to different column dimensions for gene expression matrices (cell number) in Fig 1(a). We use cluster information to estimate cell dynamics at each time point. Fig 1(b) is used to illustrate single cell level RNA-seq data after clustering. Two main steps are needed to infer DGRNs. First, we cluster scRNA-seq data for different cells based on cell pseudotrajectories to convert single-cell-level data into cluster-level data. Details on the cell clustering process are provided in Fig 2. Second, the PADMM method is used to solve the optimization problem with the reshaped data. Data conversion: From single-cell-level data to cluster-level data. First, we introduce the time-series scRNA-seq data. The time-series scRNA-seq data are denoted by E t , 1 � t � N, representing matrices of gene expression values at N different time points. The E t , 1 � t � N, are m t × n t numerical matrices whose rows represent the genes (features) and whose columns represent the cells (samples) at time t. Element (E t ) ij of E t is the expression value of the i-th gene in the j-th cell at time t. Generally, the genes at each time point are identical. Namely, their features are identical, and the number of features is m 1 = m 2 = � � � = m N = m. In contrast, the cells at each time point are totally different individuals. Usually, the number of samples n i is not equal to n j if i 6 ¼ j.
In Fig 2, an example with three time points is used to illustrate the two steps of data conversion. The first step is to acquire the pseudotrajectory information of all cells and rank the cells at each real time point from early to late stages in accordance with their pseudotimes. Namely, we realign the columns of E t , 1 � t � N. The reshaped data are denoted by X t , 1 � t � N. Mature technologies such as Monocle [14] can be employed to infer the cell pseudotrajectories. As part of this step, we project the cells on the real timeline to cells on a pseudotime line.
The second step is to cluster the cells on the pseudotime line into clusters on the real timeline. In detail, the conversion process includes the following operations. We set the number of clusters r equal to the minimum of the numbers of cells n t , 1 � t � N. For the realigned X t , we compute the distance between the gene expression vectors of every pair of adjacent cells. Then, we take the largest n t − r distances among the obtained n t − 1 distances and link their corresponding cells. We consider linked cells to belong to the same cluster. In this way, r ordered clusters are obtained. For the r ordered clusters of X t , we use y j (t) to denote the gene expression of the j-th cluster at time t. y j (t) is a column vector consisting of the row means of the matrix composed of the cells in the j-th cluster at time t.
We adopt the notation are m × r matrices representing the gene expression levels of the r clusters at time t. Through these steps, we convert the time-series single-cell data X = [X 1 , � � �, X N ] into time-series cluster-specific gene expression data Y = [Y(1), � � �, Y(N)]. The dimensions for X and Y are m � P N t¼1 n t and m × Nr, respectively.
Since the cells at each time point are different, it is difficult to describe the expression dynamics at the single-cell level. For example, suppose that cell 1 is sequenced at t 1 and cell 2 is sequenced at t 2 , where t 1 < t 2 . Cell 1 will be destroyed upon being sequenced at t 1 . Therefore, cell 1 does not correspond to any cells at t 2 . One feasible solution is to describe the dynamics at the cluster level; in this way, little information about cell heterogeneity is lost.
where Y i (t) is a continuous vector in time t, representing the i-th row of Y(t). Y i (t) represents the expression level of the i-th gene. v ij (t) and p ij (t) denote the reaction and the reaction rate, respectively, from the j-th gene to the i-th gene at time t. P i (t) is a parameter set.
To construct the DGRN, we need to search for the optimal parameter set O ¼ [ t P i ðtÞ in Eq (1). This problem can be converted into the problem of finding a set O to fit the simulation results to the experimental results. We consider the augmentation of cluster-specific data between two adjacent time points. Denote t as the discrete time t = 1, 2, � � �, N. The optimization problem is as follows:  where DY T i ðtÞ is the difference in gene expression between t and t + 1. (�) (exp) and (�) (sim) denote the experimental and simulated results, respectively.
The objective of problem (2) is to optimize the augmentation of the gene expression of the i-th gene at time t, and it is a nonlinear dynamic optimization problem (DOP), which is one of the most difficult types of optimization problems to solve. To simplify this problem, we presume that the interactions among genes between two adjacent discrete time points t and t + 1 where A i (t) is the i-th row of the m × m matrix A(t). Thus, the optimization problem (2) is converted into The objective of problem (4) is to optimize the parameters of the dynamics of the i-th gene at time t.
Using the following linear approximation of Eq (3), the optimization can be written in the matrix form as shown in Eq (5). Detailed derivation can be found in Supporting Information File S1 Text min Að1Þ;���;AðNÀ 1Þ 1 2 In the DGRN {A(1), � � �, A(N − 1)}, the nodes stand for genes, and the links stand for gene regulatory relationships between genes. The DGRN is a directed dynamic network whose positive and negative links correspond to activation and suppression relationships, respectively. Usually, DGRNs are sparse and continuous. In other words, most parameters in problem (5) will be zero, and the differences between the network states at two adjacent time points should be slight. Therefore, we define the following optimization problem: min Að1Þ;���;AðNÀ 1Þ 1 2 where the first term evaluates the precision of problem (5), the second term is the L 1 -norm of the dynamic network to guarantee the sparsity of the network, and the third term imposes the continuity assumption on the dynamic network states at consecutive time points. Both sparsity and continuity need to be considered in biological networks [3]. The parameters α and β are tuning parameters that control the penalties for sparsity and continuity, respectively. In practice, a model with sparsity a P NÀ 1 t¼1 kAðtÞk 1 only [27] is also feasible. There are N − 1 matrices that need to be optimized in problem (6). We use the alternating descent method to iteratively solve the problem. In each iteration, we update the N − 1 matrices sequentially. For each matrix A(t), 1 � t � N − 1, we update A(t) while keeping the other N − 2 matrices fixed.

PADMM algorithm
In the k-th iteration, for the update of A(t), 1 � t � N, there are three different cases, each corresponding to a different subproblem.

• Subproblem 1
When t = 1, there are three terms in the objective function.
• Subproblem 3 When t = N − 1, there are three terms in the objective function.
We update A(t), 1 � t � N − 1, iteratively until the stopping criteria is met. Details of solving subproblems 1-3 are provided in Supporting Information File S2 Text.
Parameter selection.
• Algorithm parameters The number of clusters r is set to the minimum among the numbers of cells at all time points. When t = 1, we take A(t), U(t), V(t) and W(t) as zero matrices and B(t), C(t) and D(t) as random matrices. A maximum number of iterations M and a relative error threshold � are set. Iteration is terminated when the maximum number of iterations M is reached or when The parameter ρ is chosen such that ρ k+1 = ρ k /2. For details on the algorithm parameters, please refer to [28].
• Choice of network thresholds Once the weighted adjacent matrices are computed, different network thresholds may lead to different network structures. We assume that the first network state of the dynamic network has the same average degree as the reference network, whose links have been confirmed by biological experiments. In practice, the threshold can also be selected using BIC criterion or more computational intensive methods such as cross-validation.

Analysis of network differences DGIE scores for measuring changes in the interactions of a certain set of genes in a DGRN.
To quantify the differences in the dynamic network states over time, we propose the DGIE score. Suppose that we want to study the progress of cell differentiation. Let the DGRN states be denoted by (1) and V (2) . V (1) is the set of genes that are known to contribute to processes related to cell differentiation, including cell growth, proliferation, and development. This information is available in gene annotation databases, such as Metascape [29]. Another possible choice for V (1) is to select genes that belong to the same pathway. In this case, the DGIE scores can help identify the activation states of this pathway. After V (1) is determined, V (2) is the set of the remaining genes.
We define the DGIE score as (1) ) is the edge set of the subgraph whose vertex set is V (1) in the t-th network state of the DGRN. E t (V (1) , V (2) ) is the edge set of the bigraph whose vertex sets are V (1) and V (2) in the t-th network state of the DGRN. |�| is the number of elements of a set. The denominator jE t ðV t Þj V in the definition of DGIE t is the ratio of the number of links in G t to the number of genes in V, and it is used to alleviate the effects caused by different numbers of links at different time points. The numerator  (1) and the number of links between V (1) and V (2) to the number of genes in V (1) . The definition of DGIE t mainly concerns the sum of the number of links in V (1) and the number of links between V (1) and V (2) . To minimize the effects of parameters such as |V (1) |, |E t (V (1))| and |V|, we define DGIE t as shown above to measure the communication ability of the genes in V (1) .
Local differences: Dynamic subnetworks and DGIE scores for specific biological processes. Extracting subnetworks from a DGRN is an efficient way to clearly see the network differences. We choose genes related to the same biological process or pathway and extract their corresponding subnetworks. By comparing these subnetworks with the reference network, one can easily see the corresponding network differences from the subnetworks themselves, including changes in interactions and directions.
Then, we can compute the DGIE scores of the subnetworks and look for invariant characteristics. For real data applications, we focus here on subnetworks related to ES cell differentiation processes.
Global differences: DGIE scores of all known contributing genes. For the biological processes described by DGRNs, for example, differentiation from mouse ES cells to PrE cells, many genes contribute to related tasks, such as the regulation of embryonic development, the determination of cell fate, cell cycle regulation, and the encoding of de novo DNA methyltransferases. Information about gene annotation can help to identify these known contributing genes. With these contributing genes as V 1 , computing the DGIE scores enables us to learn more about changes in the communication strength of these genes.
Identifying key regulators responsible for changes in DGIE scores. To investigate the mechanisms underlying drastic changes in DGIE scores, it is important to identify the genes which are responsible for those changes. By removing one gene from V (1) at a time, we can observe the resulting changes in the DGIE scores. If the removed gene is irrelevant to the changes in the DGIE scores, the DGIE scores should still drastically vary. On the other hand, if the DGIE scores are almost identical at each time point after the removal of a certain gene, then this gene should be considered responsible for the originally observed variations. Furthermore, with the removal of a combination of genes (a complex), the standard deviation of the DGIE scores at all time points may also be reduced to a rather low level. In this case, the removed complex is our target. The method of complex identification involves the following steps. First, the differentiation-related genes are ranked in accordance with their ability to reduce the standard deviation of the DGIE scores. Then, the first d, d = 1, 2, � � �, genes in the ranked list are taken as a complex, and the DGIE scores after the removal of this complex are calculated. This process is repeated until the standard deviation of the DGIE scores no longer decreases. The corresponding complex is what we are looking for.
After identifying the complex responsible for the changes in the DGIE scores for each dataset, we can then investigate the role of complexes in DGRNs. We extract links adjacent to these genes at each time point and draw the corresponding differential network. By comparing the differential network with the reference network, some of the links can be confirmed to be biologically meaningful. The links without such confirmation are the links that we predict to be crucial to the biological process.

Results
In this section, we report simulation experiments carried out to demonstrate the effectiveness of the proposed algorithms. Then, we infer and analyze DGRNs based on three real scRNAseq datasets related to cell differentiation processes.

Numerical experiments on simulated data
Effects of noise level on network accuracy. The methods used to construct the simulated data are described in the materials and methods section. Here, two algorithms, the ADMM and PADMM algorithms, were tested. The runtime, numbers of iterations, reconstruction errors, and Areas Under the receiver operating characteristic Curves (AUCs) were calculated. Table 1 shows the results for 300 and 500 genes. The complete results are listed in S1 Table. From the results in Table 1 and S1 Table, reconstruction errors increase and AUCs decrease as the noise level increases, as expected. There is little difference on AUC for ADMM and PADMM while PADMM reduces runtime by 67.77% on average. From the perspective of binary classification, these two algorithms are both capable of identifying most links.
Effects of the number of cell clusters on network accuracy. We used two simulated datasets to examine the effects of the number of cell clusters. The number of clusters r is crucial because a smaller r corresponds to a smaller number of known variables. More specifically, the ratio of the number of known variables to the number of unknown variables is Nmr in problem (6). We need to know the extent of the effect of the number of clusters.

PLOS COMPUTATIONAL BIOLOGY
scPADGRN: A PADMM approach for reconstructing dynamic gene regulatory network The runtime, numbers of iterations, reconstruction errors and AUCs were computed. Table 2 shows the results obtained for 200 genes with numbers of clusters ranging from 40 to 200. The complete results are listed in S2 Table. As seen from the results in Table 2 and S2 Table, reconstruction errors increase and AUCs decrease with a decreasing number of clusters. When the number of clusters decreases to 2/5 of the number of genes (Table 2), the AUC remains above 0.99, which is sufficiently high. When the number of clusters decreases to 1/10 of the number of genes (with 400 genes), the AUC remains above 0.92, as shown in S2 Table. These results show that both algorithms are able to identify most of the links in a DGRN with a rather small number of clusters. The ADMM and PADMM algorithms both maintain good precision, as shown in the simulation experiments. In addition, PADMM is faster than ADMM by an average of 66.99%, as seen in Table 2.
As seen from the results of both simulation experiments, the ADMM and PADMM are both able to identify links in dynamic networks despite the occurrence of noise and a small number of clusters. However, the PADMM is superior to the ADMM in terms of runtime. Therefore, for the real data analyses reported below, we used the PADMM.

Applications to real scRNA-seq data Dataset 1: Mouse ES cells to PrE cells.
In accordance with the described methods for inferring DGRNs, we obtained the DGRN for dataset 1, as shown in S1 Fig. Furthermore, we visualized subnetworks of genes involved in GO:0048863 stem cell differentiation. We selected All network figures presented in this work were plotted using Cytoscape [30]. Transcription factor (TF)-TF interactions confirmed by biological experiments are marked in pink. Links marked with arrows and 'T' symbols represent positive and negative interactions, respectively. In these subnetworks, RBPJ and ESRRB regulate the other six genes without being regulated themselves. TRP53 and REST are activated at all times. FOXH1 is suppressed beginning at 24 h. GATA4 is both activated and suppressed at 24 h.
The DGIE scores of the genes in Fig 4 are shown in Fig 5(a). Datasets 1 and 3 describe differentiation processes for mouse and human ES cells, respectively. Therefore, we chose GO:0048863 stem cell differentiation for dataset 1 and hsa04550 signaling pathways regulating the pluripotency of stem cells for dataset 3, among other biological processes and KEGG pathways that are less relevant to the differentiation of ES cells. By observing the DGIE scores of genes in subnetworks, we may learn the activation states of the corresponding biological processes and KEGG pathways.   According to textbooks on cell biology, once ES cells begin to differentiate, they are no longer ES cells. The degree of differentiation of the cells becomes higher at that time. Therefore, it is natural to assume that the communication ability of these genes begins to fade since the cells become increasingly dissimilar to ES cells as time goes by. The same decreasing tendency is observed in both mouse and human ES cell differentiation, as shown in Fig 5. Next, we consider the process of the differentiation of mouse ES cells to PrE cells. We take all known contributing genes as V 1 . The DGIE scores are shown in Fig 6(a). The observed increasing tendency suggests that the interactions within the genes in V 1 and between V 1 and V 2 intensify over time.
In fact, the differentiation from ES cells to PrE cells is only an early stage of the differentiation of stem cells into terminally differentiated cells. Similar increasing tendencies are also observed in datasets 2 and 3. From the increasing tendency in Fig 6, we can infer that functions that facilitate cell differentiation, including cell growth, proliferation, and development, are gradually turned on. The DGIE score is a tool for determining the activation states of functions at the molecular level. S4 Fig shows boxplots of the DGIE scores when a gene or complex is removed from V 1 . We identify four genes, BHLHE40, MSX2, FOXA2 and DNMT3L, as targets.
According to the gene annotation information available from the Metascape database [29], BHLHE40 is involved in the control of the circadian rhythm and cell differentiation. MSX2 may promote cell growth under certain conditions. DNMT3L is crucial for embryonic development. Similar family members of FOXA2 regulate metabolism and play a role in the differentiation of pancreas and liver cells in mice. It is known that endoderm cells will differentiate into pancreas and liver cells. Thus, it is also natural to infer that FOXA2 may play a key role in early ES cell differentiation even before pancreas and liver cells are formed.
In addition, let T ðkÞ t denote the set of genes with the top k largest degrees in the DGRN at time t, with k = 10 and 50. We compare  Table. In S4(A) Table, it is clear that differentiation-related genes are denser among top-degree nodes, and top-degree nodes are usually regarded as possessing higher influence in a complex network.
S7 Fig shows the differential network formed based on the union of links that appear between the complex and other genes only once from 12 h to 72 h. Counts of the confirmed links in the differential network are shown in S5 Table. The unconfirmed links may play important roles in the biological process.
Dataset 2: Mouse embryonic fibroblast cells to myocytes. For dataset 2, we visualized subnetworks of genes involved in GO:0061614 pri-miRNA transcription by RNA. mi-RNA is hypothesized to regulate approximately one-third of human genes; therefore, we are interested in how genes interact with others to facilitate pri-miRNA transcription by RNA. Nine genes were selected, as shown in  In these subnetworks, ATF4, TGIF1, SP1, DDIT3 and FOSL2 are activated and suppressed at all times. EGR1 is suppressed beginning at 5 days. MAF is both suppressed and activated beginning at 5 days. A full image of the DGRN states is shown in S2 Fig. The DGIE scores of all known contributing genes are shown in Fig 6(b). As in the case of dataset 1, we perceive an increasing tendency of the DGIE scores over time. It is worth mentioning that dataset 2 does not describe cell differentiation from ES cells directly. Instead, it describes cell differentiation from less differentiated cells to myocytes, which are terminally differentiated cells.
For the process of differentiation from ES cells to terminally differentiated cells, we know that the DGIE scores increase from the ES cells to more highly differentiated cells, such as the PrE cells in dataset 1. The DGIE scores also increase from less differentiated cells (fibroblasts) to terminally differentiated cells (myocytes). Thus, it would not be too bold to infer that the communication strength of the known contributing genes increases from ES cells to terminally differentiated cells. Although no biological experiments yet confirm this claim, we present this speculation from the perspective of dynamic network analysis. S5 Fig shows boxplots of the DGIE scores when a gene or complex is removed from V (1) . We identify three genes as key transcriptional regulators: Scx, Fos and Tcf12. According to the gene annotation information available from the Metascape database, Scx regulates collagen type I gene expression in cardiac fibroblasts and myofibroblasts. Fos proteins regulate cell proliferation, differentiation, and transformation. Tcf12 is expressed in many tissues, including skeletal muscle.

Dataset 3: Human ES cells to definitive endoderm cells.
Dataset 3 describes differentiation from human ES cells to definitive endoderm cells. As in the case of dataset 1, we focused on biological processes or KEGG pathways that are directly involved in stem cell differentiation. Therefore, we chose ten genes in hsa04550 signaling pathways regulating the pluripotency of stem cells for visualization. The subnetworks are shown in Fig 8. The subnetworks in Fig 8 show that POU5F1 and NANOG are activated and suppressed at all times. According to the description of has04550, NANOG and its downstream target genes promote self-renewal and pluripotency. SRF and FOXH1 begin to be activated at 24 h. A full image of the DGRN states is presented in S3 Fig.  Fig 5(b) shows the DGIE scores of the genes in Fig 8. For dataset 3, we focus on hsa04550 signaling pathways regulating the pluripotency of stem cells. Fig 5(b) shows a decreasing tendency, along with Fig 5(a). Once ES cells start to differentiate, the communication ability of the genes in Fig 8 begins to fall. This finding suggests that the activation degree of the regulation of stem cell pluripotency is reduced.
The DGIE scores of all contributing genes in the DGRN are shown in Fig 6(c). Like datasets 1 and 2, dataset 3 also exhibits an increasing tendency of the DGIE scores. Notably, dataset 3 describes the differentiation of human cells from ES cells. The results help to confirm the conclusions drawn from datasets 1 and 2 with regard to the gradual turn-on of the functions of all known contributing genes. S6 Fig shows boxplots of the DGIE scores when a gene or complex is removed from V (1) . We identify Sox5, Meis2, Hoxb3, Tcf7l1 and Plagl1 as key regulators.
According to the gene annotation information available from the Metascape database, Sox5 is a member of the Sox family, which regulates embryonic development and determines cell fate. Meis2 essentially contributes to developmental processes. Hoxb3 is also involved in development. TCF7L1 plays a role in the regulation of cell cycle genes and cellular senescence. Overexpression of Plagl1 during fetal development causes transient neonatal diabetes mellitus.
The results in S4(C) Table are similar to those in S4(A) Table, indicating that differentiation-related genes are denser among top-degree genes. S9 Fig shows the differential network of the identified complex, and the counts of the confirmed links in the differential network are shown in S5 Table.

Discussion
A dynamic network is a powerful tool for elucidating relationships that change over time.
With the increasing popularization of single-cell sequencing technology, researchers are obtaining large quantities of time-series single-cell data, which are better able to characterize biological processes than a single snapshot is. To reveal dynamic changes based on time-series scRNA-seq data, we have proposed a novel method of inferring DGRNs with directed links. To ensure that the results are practically and biologically meaningful, we also incorporate the assumptions that the networks are sparse and that consecutive network states are similar into the modeling. Our method, with both the ADMM and PADMM algorithms, shows satisfactory performance on simulated and real datasets.
The greatest obstacle when shifting the level of analysis from bulk data to single-cell-level data lies in the fact that cells are ruined once sequenced by scRNA-seq technology. For this reason, the dynamics at the single-cell level cannot be directly established. Inspired by [31], we first order the cells by their pseudotimes and apply clustering to the ordered cells to obtain groups that can be linked over time. When modeling gene-gene interactions, non-linear differential equations [32][33][34] can offer more flexibility. However, it raises challenges in computation and modeling under our framework. Therefore, we use a piecewise linear interpolation to approximate nonlinear differential equations in Eq (1) for each time interval. We think it is a reasonable compromise between model fit and complexity. But we have to admit the limitation for using piecewise linear equations in reconstructing DGRNs. We also need to investigate the robustness of the scPADGRN when datasets are simulated with nonlinear gene-gene interactions or other scenarios that are ignored in scPADGRN. These will be our efforts for the further work.
In our algorithm, we specify a number of groups that is equal to the minimum number of cells across all time points in order to use the cell-level information to the greatest possible extent. Because of the complexity of the biological processes, our method may be a simple but compromised approach. The attempt to develop a better way to construct and link celllevel data is an ongoing effort. In practice, when group-level data are available, the proposed method can still be applied by skipping the ordering and clustering steps.
In applications of real time-series scRNA-seq data, it is of interest to characterize changes occurring during biological processes and identify the key regulators. Often, it is difficult to identify these essential differences by inspecting the dynamic graphs themselves (as shown in S1, S2 and S3 Figs). The proposed index DGIE serves this purpose by measuring the network differences. In our real data analysis, results obtained based on DGIE scores provide two major insights. First, the DGIE scores of the investigated subnetworks indicate that the differentiation functions of ES cells are most active initially and may gradually fade over time. Second, the DGIE scores of all known contributing genes indicate that the communication strength of known contributing genes increases from ES cells to terminally differentiated cells.

Conclusion
In this work, we have presented scPADGRN, a novel DGRN inference method using timeseries scRNA-seq data. scPADGRN shows advantages in terms of accuracy, robustness and fast convergence when implemented with the PADMM algorithm for network inference using simulated datasets.
In real scRNA-seq data applications, scPADGRN can be used to visualize gene-gene interactions among genes involved in the same biological process or KEGG pathway. These regulation relationships may either persist or disappear.
To quantify network differences, a quantitative index called DGIE has been presented. The DGIE score measures the communication ability of a certain set of genes. At the local level, we have computed the DGIE scores of processes or pathways that are directly related to ES cell differentiation. The decreasing tendency of the DGIE scores indicates that the differentiation functions of ES cells are most active initially and may gradually fade over time. At the global level, the DGIE scores of the three investigated datasets all show the same increasing tendency, indicating that the communication strength of the known contributing genes increases from ES cells to terminally differentiated cells. We have identified a set of genes responsible for changes in the DGIE scores during cell differentiation for each of the three single-cell datasets.
Our results affirm that single-cell analysis based on network inference coupled with quantitative computations can be applied to infer the activity states of gene functions in the process of differentiation from ES cells to terminally differentiated cells, thus potentially revealing key transcriptional regulators involved in cell differentiation and disease development.
In summary, our work provides three main contributions. First, we propose a new method of inferring DGRNs using scRNA-seq data. Second, a quantitative index, DGIE, is proposed to measure the communication ability of a certain set of genes in a DGRN; this index can reflect the activity states of functions in which these genes play a role. Third, key regulators of biological processes can be identified based on the DGIE scores.    Table. Number of links and confirmed links in the estimated differential networks. In the estimated differential networks, this table shows counts of links. (PDF)