System Identification Using Compressed Sensing Reveals Signaling-Decoding System by Gene Expression

Cells decode information of signaling activation at a scale of tens of minutes by downstream gene expression with a scale of hours to days, leading to cell fate decisions such as cell differentiation. However, no system identification method with such different time scales exists. Here we used compressed sensing technology and developed a system identification method using data of different time scales by recovering signals of missing time points. We measured phosphorylation of ERK and CREB, immediate early gene expression products, and mRNAs of decoder genes for neurite elongation in PC12 cell differentiation and performed system identification, revealing the input–output relationships between signaling and gene expression with sensitivity such as graded or switch-like response and with time delay and gain, representing signal transfer efficiency. We predicted and validated the identified system using pharmacological perturbation. Thus, we provide a versatile method for system identification using data with different time scales. Highlights We developed a system identification method using compressed sensing. This method allowed us to find a pathway using data of different time scales. We identified a selective signaling-decoding system by gene expression. We validated the identified system by pharmacological perturbation. eTOC Blurb We describe a system identification method of molecular networks with different time-scale data using a signal recovery technique in compressed sensing.

SUMMARY 1 Cells decode information of signaling activation at a scale of tens of minutes by downstream 2 gene expression with a scale of hours to days, leading to cell fate decisions such as cell 3 differentiation. However, no system identification method with such different time scales exists. 4 Here we used compressed sensing technology and developed a system identification method 5 using data of different time scales by recovering signals of missing time points. We measured 6 phosphorylation of ERK and CREB, immediate early gene expression products, and mRNAs of 7 decoder genes for neurite elongation in PC12 cell differentiation and performed system 8 identification, revealing the input-output relationships between signaling and gene expression 9 with sensitivity such as graded or switch-like response and with time delay and gain, 10 representing signal transfer efficiency. We predicted and validated the identified system using 11 pharmacological perturbation. Thus, we provide a versatile method for system identification 12 using data with different time scales.  (Gotoh et al., 1990;Marshall, 1995;Qiu and Green, 1992;Traverse et al., 1992), whereas 8 pituitary adenylate cyclase-activating polypeptide (PACAP) induces cell differentiation mainly 9 through cAMP-dependent CREB phosphorylation (Akimoto et al., 2013;Gerdin and Eiden, 10 2007;Saito et al., 2013;Vaudry et al., 2002;Watanabe et al., 2012). We showed that cell 11 differentiation in PC12 cells can be divided into two processes: a latent processes (0-12 h after 12 the stimulation) in preparation for neurite extension and a subsequent neurite extension process 13 (12-24 h) (Chung et al., 2010). We identified the three genes essential for cell differentiation, 14 Metrnl, Dclk1, and Serpinb1a, which are induced during the latent process and required for 15 subsequent neurite extension, and named LP (latent process) genes (Watanabe et al., 2012). 16 Although NGF and PACAP selectively induce the different combinations and temporal patterns 17 of signaling molecules, both growth factors commonly induce the LP genes (Watanabe et al., 18 2012). The expression levels of LP genes, but not the phosphorylation level of ERK, correlate 19 with neurite length regardless of growth factors (Watanabe et al., 2012), indicating that the LP 20 genes are the decoders of neurite length. Thus, how the distinct patterns of signaling molecules 21 are decoded by LP gene expression is critical for understanding the unknown mechanism 1 underlying cell differentiation in PC12 cells. Decoding the combinations and temporal patterns 2 of signaling molecules by downstream gene expression is a general mechanism underlying 3 various cellular functions (Behar and Hoffmann, 2010;Purvis and Lahav, 2013;Sumit et al., 4 2017). 5 Mathematical modeling is useful for the analysis of decoding mechanisms (Janes and 6 Lauffenburger, 2013). If the signaling pathways are well characterized, kinetic modeling based 7 on biochemical reactions reported in the literature is often used (Janes and Lauffenburger, 2006;8 Kholodenko et al., 2012;Price and Shmulevich, 2007). For example, growth factor-dependent 9 ERK activation in PC12 cells has been modelled by the kinetic model based on prior knowledge 10 of pathway information (Brightman and Fell, 2000;Filippi et al., 2016;Nakakuki et al., 2010;11 Ryu et al., 2015;Santos et al., 2007;Sasagawa et al., 2005). In general, however, decoding by 12 downstream genes involves more complex processes such as transcription and translation and 13 information on the precise pathway is not available. 14 To identify decoding mechanisms by gene expression, the system identification method (also 15 referred to as data-driven modeling) was developed for identifying quantitative input-output 16 relationships from time series data without detailed knowledge of signaling pathways (Janes and 17 Lauffenburger, 2006;Janes and Yaffe, 2006;Kholodenko et al., 2012;Ljung, 2010;Price and 18 Shmulevich, 2007;Zechner et al., 2016). We previously developed a system identification 19 method based on time series data of signaling molecules and gene expression, denoted as the 20 nonlinear autoregressive exogenous (NARX) model, and applied it to the signaling-dependent immediate early gene (IEG) expression during cell differentiation in PC12 cells (Saito et al., 1 2013). The NARX model involves the determination of lag-order numbers and use of the Hill 2 equation and the linear autoregressive exogenous (ARX) model (Saito et al., 2013). 3 Determination of lag-order numbers infers the selection of input molecules (Input) for an output 4 molecule (Output), which is referred to as the Input-Output (I-O). The Hill equation characterizes 5 sensitivity with a nonlinear dose-response curve (Hill, 1910). The linear ARX model 6 characterizes temporal changes with time constant and gain, the latter of which is an I-O 7 amplitude ratio, and indicates signal transfer efficiency (Ljung, 1998) such that time points and intervals for measuring upstream and downstream molecules may be 8 different. Thus, a system identification method using unequally spaced sparse time series data 9 with different time scale needs to be developed. 10 To solve this problem, here we used the signal recovery technique based on a low-rank 11 approach proposed in the field of compressed sensing to generate a sufficient number of time 12 points for equally spaced dense time series data from unequally spaced sparse time series data 13 with different time points and intervals. We applied this nonlinear system identification method 14 to the signaling-dependent gene expression underlying cell differentiation in PC12 cells and 15 identified the signaling-decoding system by gene expression. 16 Unequally spaced sparse time series data can be regarded as equally spaced dense time series 17 data with missing time points, and therefore we can generate equally spaced dense time series 18 data by applying a signal recovery technique, which has been studied in the field of compressed 19 sensing (Candès and Wakin, 2008;Donoho, 2006). Compressed sensing is a signal processing 20 method for efficient data acquisition by recovering missing signals/images from a small number 21 of randomly sampled signals including unequally spaced sparse data based on sparseness of a 1 vector (Candès et al., 2008) or low rankness of a matrix (Fazel, 2002). Both the sparse approach 2 and the low-rank approach have been applied to various fields, such as sampling and 3 reconstructing magnetic resonance images (Lustig et al., 2008;Ongie and Jacob, 2016), 4 super-resolution imaging (Candès and Fernandez-Granda, 2014;Yang et al., 2010), image 5 inpainting (Takahashi et al., 2012;Takahashi et al., 2016), and collaborative filtering (Candès 6 and Recht, 2009). In this study, we applied a matrix rank minimization algorithm (Konishi et al.,7 2014) to recover missing time points from unequally spaced time series data, and we generated 8 equally spaced time series data with the same time points from signaling and gene expression 9 data with different time scales. We previously developed a system idenfitication method from 10 equally spaced dense time series data of signaling and gene expression using the NARX model 11 (Saito, 2013). We developed a new system identification method from unequally spaced sparse 12 time series data with different time scales by integrating this signal recovery method using the 13 matrix rank minimization algorithm (Konishi et al., 2014)  In this study, we regarded unequally spaced sparse time series data as equally spaced dense time 21 series data with missing time points, and equally spaced time series data were generated by 1 restoring missing time points using a low-rank approach (Konishi et al., 2014). In the low-rank 2 approach for image recovery, we assumed that the value of each pixel is represented by a linear 3 combination of its neighbor pixels, which is mathematically represented by an autoregressive 4 (AR) model. Then a Hankel-like matrix composed of pixel values has a low rank because each 5 column is represented by the linear combination of the other columns (Figures 1 and S1A). This 6 means that the Hankel-like matrix is a low-rank matrix whose rank is determined by the system 7 order. Missing data can be recovered by estimating missing elements of the matrix so that the 8 rank of this matrix [Y] is r. When system order r is unknown, based on the idea that the system 9 can be described with as few parameters as possible, missing elements of this Hankel-like matrix 10 are recovered so as to minimize the rank of the matrix [Y]. Based on the low rankness of the 11 Hankel matrix, the signal recovery problem of the missing pixles can be formulated as a matrix 12 rank minimization problem, and we can restore an image by solving this problem (Takahashi et 13 al., 2012;Takahashi et al., 2016) (Figure 1). 14 We performed system identification from unequally spaced time series data of input 15 molecules (Inputs) and output molecules (Outputs). Although an AR model is used for image 16 recovery, we used an ARX model where the value at a time point is represented by a linear 17 combination of two kinds of signals, Inputs and Outputs. Therefore, we modified the 18 rank-minimization-based signal recovery method of the AR model to the ARX model and 19 performed system identification (Figures 1 and S1B). Several methods for system identification 20 using a linear ARX model with signal recovery of missing points of input and output based on 21 matrix rank minimization have been proposed (Liu et al., 2013;Markovsky, 2013). They can 1 recover missing time series input-output data even when missing time points of input are not 2 equal to those of output. 3 However, we cannot directly apply the method because we used the NARX model rather than 4 the ARX model due to the nonlinearity of signaling-dependent gene expression (Kudo et al., 5 2016;Saito et al., 2013). Therefore, by combining the nonlinear ARX system identification 6 method (Saito et al., 2013) and the signal recovery method based on the matrix rank 7 minimization problem (Konishi et al., 2014), we derived the signal recovery algorithm applicable 8 to the nonlinear ARX system and performed system identification using recovered equally 9 spaced time series input-output data (see "NARX Model and Data Representation" and 10 "Extension ARX System Identification from Unequally Spaced Time Series Data to the NARX 11 System" sections in the STAR METHODS).   Using these time series data sets, we selected three sets of Inputs-Outputs combinations from 10 upstream to downstream and performed system identification for each set ( Figure 5A). The  Table 1). The dose-responses from c-Jun and c-Fos to FosB 9 showed typical switch-like responses, whereas others showed graded or weaker switch-like 10 responses. Note that the gain from the converted c-Jun to FosB was much smaller than that from 11 the converted c-Fos (Table 1), indicating that FosB is mainly regulated by c-Fos but not c-Jun. 12 The time constants for c-Jun, c-Fos, Egr1, Metrnl, and Dclk1 were less than 1 h, whereas those 13 for FosB, JunB, and Serpinb1a were more than 100 min (Table 1), indicating that induction of 14 FosB and JunB temporally limit the overall induction of the LP genes from signaling molecules. 15 The transformation of Inputs by the Hill equation followed by the ARX model is shown in Figure   16 S2. In addition, when we integrated these three sets of the NARX model and simulated the 17 response using only pERK and pCREB as Inputs, we obtained a similar result ( Figure S3).

20
We validated the identified system by pharmacological perturbation. One of the key issues in PC12 cell differentiation is whether ERK or CREB phosphorylation mediates expression of the 1 downstream genes (Ravni et al., 2006;Vaudry et al., 2002;Watanabe et al., 2012). Therefore, we 2 selectively inhibited ERK phosphorylation by a specific MEK inhibitor, trametinib (Gilmartin et   3 al., 2011;Watanabe et al., 2013;Yamaguchi et al., 2007) in PACAP-stimulated PC12 cells. We 4 found that PACAP-induced ERK phosphorylation, but not CREB phosphorylation, was 5 specifically inhibited by trametinib ( Figure 6A, black dots). 6 For c-Jun, c-Fos, Egr1, and JunB, we recovered signals of the unequally spaced time series 7 data of Inputs and Output. For c-Jun, c-Fos, and Egr1, we simulated Outputs responses using 8 these recovered data and the identified NARX model ( Figure 6A black lines, see also the failure of the system identification of FosB and Metrnl in Figure 6A arose from the failure of 1 the system identification of FosB. Thus, all Outputs except FosB showed similar responses in the 2 experiment and simulation when the experimental and recovered data were used as Inputs, 3 indicating that in most cases the identified system is validated by pharmacological perturbation. In this study, we identified the system from signaling molecules to gene expression using the 7 unequally spaced time series data for 720 min after the stimulation. Given that expression levels are not likely to be reliable because the interpolation methods ignore biochemical property of 5 molecular network. By contrast, the interpolation used in this study is based on the NARX model, 6 which reflects biochemical property. Thus, the proposal method in this study is biologically more 7 plausible than a linear or spline interpolation.

CONTACTS FOR REAGENT AND RESOURCE SHARING
Further information and requests for reagents and resources should be directed to and will be fulfilled by the Lead Contact, Shinya Kuroda (skuroda@bs.s.u-tokyo.ac.jp).

Quantitative Image Cytometry
QIC was performed as previously described (Ozaki et al., 2010). Briefly, after stimulation by the growth factors, the cells were fixed, washed with phosphate-buffered saline, and permeabilized with blocking buffer (0.1% Triton X-100, 10% fetal bovine serum in phosphate-buffered saline).

The cells were washed and then incubated for 2 h with primary antibodies diluted in Can Get
Signal immunostain Solution A (Toyobo, Osaka, Japan

qRT-PCR analysis
Reverse transcription-polymerase chain reaction (RT-PCR) was performed as previously (Applied Biosystems). The sequences of the primers for the LP genes are shown in Table S2 ( Watanabe et al., 2012).

NARX model and data representation
Assuming that the input molecules (Input) and output molecules (Output) signals satisfy the following NARX model, Eqs. (1) and (2), the system identification is performed by estimating unknown parameters in the NARX model, and ‫ݏ‬ is an index of stimulation conditions defined as follows, with missing time points and identify the unknown NARX parameters after recovering missing time points based on the low rankness of the Hankel-like matrix, which is described in the next section.

Nonlinear ARX System
To deal with the nonlinear ARX system, we extend ARX system identification from unequally spaced time series data to the nonlinear ARX system. First consider the simple case of the linear ARX model and then extend it to the NARX model. To perform system identification from unequally spaced time series data, equally spaced time series data are generated by signal recovery of unknown values of missing time points. Because the Input and Output data of a linear system are missing and the order of the system is unknown as in this study, the system identification using the recovered Input and Output data based on the low rankness of the Hankel-like matrix has been proposed (Liu et al., 2013). This is a method to recover missing data by solving the matrix rank minimization problem and to generate equally spaced sampled data.
In this study, we apply this matrix rank minimization approach to simultaneously identify the NARX model and recover missing data.
First, for simplicity, let us consider the case of the linear ARX model with single Input and single Output described by is unknown in this study, we recover the unknown components so as to minimize the rank of the matrix [ܻ ܷ ሿ based on the idea that it is better to describe the system with as few parameters as possible. That is, the missing data are recovered by solving the matrix rank minimization problem as follows, In the NARX model used in this study, because the observed Input data is nonlinearly transformed using the nonlinear function f in Eq.
(2) and the Input data after transformation and the Output data follow the ARX system, signal recovery and system identification can be performed on the nonlinearly transformed Input data and Output data by the above method.
Based on this idea, we performed nonlinear ARX system identification. . The order of the system, the lag order of the ARX model, is determined based on the matrix rank obtained in step ii.

Procedure for System Identification by Integrating Signal Recovery and the NARX model
Step iv: Estimate and using the recovered data and ARX parameters.  (14) and (15) is recovered again by the IPMS algorithm.
Step   for each stimulation s of test data set.
Step ix: Obtain the Input combination with minimum . Perform steps i to viii for all combinations of Inputs. Select the combination of Inputs with the minimum Step x: Estimate the NARX model with signal recovery using all data sets. Using the combination of Input determined in step ix, estimate the NARX parameter with signal recovery by the procedure from steps i to v using all stimulation conditions as training data sets.
Note that, when simulating with the ARX model, set the value of Output to 0 before time 0, otherwise the value of the Output obtained by the simulation is used to obtain the next time value.
For stimulation in Figure 6, signal recovery was performed by step vi using experimental data with trametinib as the test data set.

Calculation of Gain and Time Constant from the Linear ARX Model
Gain and time constant ߬ were calculated from the frequency response function obtained from the linear ARX model. For simplicity, we consider here the case of a single Input single Output ARX model like Eq. (6), which can be re-described as follows, ݅ is an imaginary unit and ߱ is frequency. Therefore, gain and phase can be calculated from ARX parameters. The frequency response curve and phase diagram at each Input and Output of the identified linear ARX model are shown in Figure S4. Note that gain indicated in (23)

Simulation of the Integrated NARX Model
The simulation of the integrated NARX model was performed as follows. Experimental and recovered data of pERK and pCREB, and the simulated data of c-Jun, c-Fos, Egr1, FosB, and JunB were given as Input data and simulation was performed using the NARX model in Figure   5.  Step (i) Step (ii) Step (iii) Step (iv) Step (v) Step (vi) Step (vii) Step (viii) Step (ix) Step (