Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Analysis of Feedback Mechanisms with Unknown Delay Using Sparse Multivariate Autoregressive Method

  • Edward H. Ip ,

    Affiliation Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America

  • Qiang Zhang,

    Affiliation Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America

  • Tomasz Sowinski,

    Affiliation School of Information Sciences, University of Pittsburgh, Pennsylvania, United States of America

  • Sean L. Simpson

    Affiliation Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America

Analysis of Feedback Mechanisms with Unknown Delay Using Sparse Multivariate Autoregressive Method

  • Edward H. Ip, 
  • Qiang Zhang, 
  • Tomasz Sowinski, 
  • Sean L. Simpson


This paper discusses the study of two interacting processes in which a feedback mechanism exists between the processes. The study was motivated by problems such as the circadian oscillation of gene expression where two interacting protein transcriptions form both negative and positive feedback loops with long delays to equilibrium. Traditionally, data of this type could be examined using autoregressive analysis. However, in circadian oscillation the order of an autoregressive model cannot be determined a priori. We propose a sparse multivariate autoregressive method that incorporates mixed linear effects into regression analysis, and uses a forward-backward greedy search algorithm to select non-zero entries in the regression coefficients, the number of which is constrained not to exceed a pre-specified number. A small simulation study provides preliminary evidence of the validity of the method. Besides the circadian oscillation example, an additional example of blood pressure variations using data from an intervention study is used to illustrate the method and the interpretation of the results obtained from the sparse matrix method. These applications demonstrate how sparse representation can be used for handling high dimensional variables that feature dynamic, reciprocal relationships.


In randomized clinical trials, multivariate longitudinal data are often sampled, either sparsely or densely (intensively) [1], over a certain time period. A large part of the longitudinal data analysis literature has been focused on the sparsely sampled data; e.g., data acquired by annual or semi-annual visits. For intensive longitudinal data, however, relatively fewer methods have been proposed, one important component of which is time series analysis [2]. The traditional time series approaches, such as univariate or multivariate autoregressive (AR) models, are only applied to one or several time series; e.g., a stock market index series or a commodity price series. However, in biological and clinical studies, we often observe one time series per subject, and in the multivariate case, data often shows a three-dimensional tensor structure, including the subject, the variable, and the time dimensions.

Jointly modeling multivariate intensive longitudinal data could introduce quite a few parameters. For example, the AR(m) model below: (1) would require mp2 parameters. Here Yijt is the observed outcome of subject i at time t on variable j, ρkjτ is the contribution of the kth variables at time tτ to the jth variable at time t, and the error term, ϵijt, is assumed to be independently and identically distributed (i.i.d.) with time-independent or stationary distribution assumption. Specifically, it can be assumed that , and that ϵij1t1 and ϵij2t2 are independent if j1j2 or t1t2. We denote the number of variables as p and the order of the autoregression as m. The model specified by Eq (1) is a rather comprehensive model as it could include multiple possibly correlated variables, time-lagged effects from the same variable, as well as cross-lagged effects from all the other variables in the model. This kind of model has been found to be useful in applications such as fMRI time series analysis in which brain activities in various regions of the brain, intensively sampled over time, are modeled. For example, Harrison et. al, [3] used a multivariate AR model (p = 4, m = 3) for making inference about attention modulation of connectivity within the dorsal visual pathway and specifically across brain regions including the posterior parietal cortex and right prefrontal cortex. Therefore, it is possible that activity in the posterior parietal cortex at time t − 2 influences the right prefrontal cortex at time t.

Indiscriminatingly including all the variables and all time points as in Eq (1) is not always optimal especially when the sample size is small and overfitting problems often arise in such cases. Model selection criteria, such as Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), or other variations [4], would limit selection on the temporal component to the first few orders, but when the time period is long, one could miss significant autoregressive explanations from outcomes farther back in time. For example, daily blood pressure measurements often show strong correlations between hour 1 and hour 24. Only using measurements a few hours back would therefore miss the daily cycle. Another example is the circadian oscillations of gene expressions [5], where two interacting protein transcriptions can form both negative and positive feedback loops, with delays as long as 12 hours, while gene expression is measured hourly. These delays are essential to forming the periodic time series of protein densities, and trying to estimate these delays is an important step in understanding gene interactions on the molecular level. In terms of statistical modeling, neither an AR(1) nor an AR(12) are appropriate for these data because there exists only a few nonzero entries in the parameter set {ρkjτk = 1,…,p,j = 1,…,p,τ = 1,…,T}. This inspires a sparse autoregressive model, in which we only seek the first few most correlated autoregressive entries, regardless of the time lag or which variable. If we vectorize the parameter set into a vector, ρ, we would assume ρ is mostly zero except at a few entries. Some recent work on sparse autoregression models includes Fujita et. al. [6], who employ a multivariate AR model with l1 penalization to learn gene-regulatory mechanisms from time-course microarray data, and the Network Granger Causality (NGC) models of Lozano et. al. [7] and Basu [8] using group Lasso penality terms.

We often assume a time series has reached the equilibrium when samples are taken, but after an intervention, which could be time dependent—e.g., treatment dropped, switched, or with different dose levels given a subject’s conditions—we would like to know how the intervention alters the equilibrium. This is the case in our second motivating example, a multicenter randomized clinical trial in which hourly blood pressure data over a 24-hour period is recorded both before and after diet interventions. It is possible that after the interventions, equilibrium would reach a different state than before. This can be modeled through combining a linear mixed-effects (LME) model with an autogressive model [9], in which the LME model could include all the time dependent or independent predictors. Introducing random effects could also be beneficial, as subjects would often reach equilibrium differently, for example, depending on demographics or certain physiological characteristics.

Here we propose a sparse multivariate autoregressive analysis that takes into account the autocorrelations within the multiple observed outcomes over an arbitrarily long history, but only keeping those most correlated in the history. Hence while more variations can be explained, the model still remains parsimonious. We then combine the AR part with the LME part and jointly estimate both sets of parameters. The combined AR and LME model would specifically target time series that are often observed in clinical trials before and after intervention, which would be difficult to analyze using one sparse multivariate AR model, because an intervention often changes time series to a different equilibrium state.

Motivating Examples

Example 1. Circadian rhythms reflect oscillating expressions of genes. Fig 1 schematically describes a simplified model of Drosophilia circadian oscillations [5], in which dCLOCK and PER represent two proteins while dclock and per represent their transcriptors respectively. The model contains both a positive and a negative feedback loop. Using dCLOCK protein level as an example, the two feedback loops work as follows: (1) dCLOCK activates per transcription and thus PER synthesis with lag τ1; PER binds with dCLOCK, decreasing the presence of dCLOCK (the negative feedback loop), and thereby also de-activates per transcription; and (2) increase in dCLOCK also leads to more dCLOCK (the positive feedback loop) because the activated PER binds to dCLOCK, leading to the de-repression of dclock transcription, with lag τ2. The two different lagged feedback mechanisms can be respectively modeled by eqs (2) and (3). (2) (3) where we use Y1 to denote PER and Y2 for dCLOCK. The model parameters, K1,K2,v1,v2,k1, and k2, are given as constants. The two time delays, τ1 and τ2, are essential to forming the circadian oscillations of Y1 and Y2. Eqs (2) and (3) are based on the ordinary differential equations in [5] with a slight modification. The quantity freely available dCLOCK protein Free dCLOCK was originally calculated by the function Free dCLOCK(t) = max([dCLOCK(t) − PER(t)],0). To avoid a possible discontinuity at zero in simulated data, we instead used the logistic transform exp(αx)/[1+exp(αx)], where x is Free dCLOCK(t) and α is a scaling parameter.

Fig 1. A simplified model (a) of Drosophilia circadian oscillator and (b) the output of the system as a function of time.

Fig 1(b) is a rendition of Fig 1A in [5].

As we shall see later, the process can be approximated by the AR(m) model in Eq (1) in which an exponential transform exp(Y) replaces Y on the RHS of the equation. However, the traditional multivariate AR models would involve many unnecessary parameters, if, for example, the delays are long and/or more proteins are involved in the model; e.g., more complex ODE models in [10, 11]. It would be highly desirable if we could pinpoint the exact delays through an AR model but with nonzero entries only at certain delays. This example inspired our focus on sparsity.

Example 2. The Dietary Approaches to Stop Hypertension (DASH) trial was a multicenter, randomized parallel-arm feeding study that tested the effects of dietary patterns on blood pressure (BP). The three diets were a control diet (low in fruits, vegetables, and dairy products, with a fat content typical of the average diet in the United States), a diet rich in fruits and vegetables (a diet similar to the control except it provided more fruits and vegetables and fewer snacks and sweets), and a combination diet rich in fruits, vegetables, and low-fat dairy foods and reduced in saturated fat, total fat, and cholesterol (DASH diet). Participants were healthy adults 22 years of age or older who were not taking antihypertensive medication. The subjects’ BP measurements, including systolic blood pressure (SBP) and diastolic blood pressure (DBP), were taken over two 24-hour periods, one before the diet intervention and the other after. For more details, see [12] and [13].

After comparing the average BP (ABP) over a 24-hour period of cohorts before and after the intervention, Moore et. al. [13] found fruit/vegetable and DASH diets significantly (p < 0.0001) lowered ABP, when compared with the control diet (fruit/vegetable diet, -3.2/-1.0 mmHg; DASH Diet, -4.6/-2.6 mmHg). However, after considering within-subject correlation, the model by Simpson and Edwards [14] found the reduction in SBP by the DASH diet reduced from -4.6 mmHg to -3.6 mmHg. Presumably, the intervention altered the equilibrium of the BP cycles, and we can model this effect additively by adding intervention predictors onto the AR process. Because large BP variations are explained by previous measurements (the AR part), we expect a further reduction of the diet effects. Also, adding random effects would be useful for addressing subject-specific variations.

These two examples motivated us to combine a sparse multivate AR model with a linear mixed effects (LME) model to form a sparse multivariate autoregressive linear mixed effects model (SMARLME). The first example was used as a basis for simulations studies designed to determine how well our parsimonious model can accurately recover the original signal. We further analyzed the data from the second example to illustrate the utility of the model in a more traditional longitudinal data context. It is worth mentioning that the motivating example in [9]—i.e, parathyroid hormone (PTH) and serum calcium (Ca) levels interacting with the treatment Maxacalcitriol doze level—is also an excellent example for the SMARLME model. Compared to the AR(1) + LME model in [9], the SMARLME model could be more parsimonious and more far-reaching into the history of the interaction between PTH and Ca. These two examples demonstrate the flexibility of the SMARLME for modeling phenomena in which multi-variables in a system create feedback loops with specific lag times.


Let Yijt be the observed outcome of subject i at time t for variable j, and Xiut be the uth predictor for subject i at time t, i = 1,…,N, j = 1,…,p, and t = 1,…,T. The combined multivariate AR with the LME model can be described in scalar form as, (4) where vector Yik(tτ) is the observed kth outcome of subject i at time tτ, ρkjτ represents the contribution of the kth outcomes at time tτ to the jth outcome at time t, and βujτ represents the contribution of the uth predictor at time tτ to outcome j, and Xiu(tτ) represents the value of the uth predictor. The terms Zit and bi respectively represents the design matrix for the random effects and the vector of random effects. The simplest case would be Zit being identity and bi being a single random effect bi in which bi is normally distributed with mean zero and variable σ2. The error term, ϵijt, is assumed to be independent and normally distributed with constant variance, and independent from bi. We denote the number of included predictive outcome variables by q (qp), and the number of predictors by r.

In contrast to the linear AR(m) model, this model is more flexible as well as comprehensive because it considers the entire history of observations of all variables including both outcomes and predictors. Furthermore, to accommodate a wider array of dynamical systems, transformed variable of Yik(tτ) can be included as predictor. For example, for the circadian system described by the two ODEs in Eqs (2) and (3), we included exponentiated terms of Yik(tτ) on the RHS of Eq (1). For the dynamic system in the circadian rhythm example, the nonlinear feedback mechanism would ensure stationarity of the model without necessarily constraining linear AR parameters, ρ. It is beyond the scope of this paper to discuss model stationarity, and we refer interested readers to [15].

In practical implementation of the model, we limit the history up to a certain period, d, such as a 24-hour period for observations with a strong daily cycle, and for shared parameters as in [14], we remove the variable index in βujl so that it becomes βul. With the assumptions of equilibrium and time-independent Xiu, we can further remove the time index and simply denote the regression parameter by βu.

The model specified by Eq (4) can be succinctly represented using matrix notation. To set up notation, we denote the vector (Yijt, j = 1,…,p) by Yit, and the q × q coefficient matrix (ρkj(tτ),k,j = 1,…,q) at a given lag of τ by ρτ. Similarly, matrix βτ of size p × r is the coefficient matrix of Xi(tτ), where Xi(tτ) of size r×1 is the vector of predictors of subject i, and ϵit is the vector (ϵijt) of length p.

In vector notation, the model now can be expressed as: (5)

The sparsity constraint is implemented through the following steps: (1) group all autoregression coefficients into a single vector—i.e., ρ = {vec(ρ1)T,vec(ρ2)T,…,vec(ρd)T)T—and all predictor coefficients into a single vector —i.e., β = {vec(β1)T,vec(β2)T,…,vec(βd)T)T. Here vec(A) denotes the vector formed by vectorizing the I × J matrix A = (aij) to form the vector (a11,a21,…,aI1,a12,…,aIJ)T. (2) Limit the number of nonzero entries in ρ and β to a given constant—i.e., (6) where ‖⋅‖0 is the l0 norm, or the number of non-zero entries in the vector. This implementation enforces sparsity in the set of the predictor coefficients when the predictors are time-varying and are not necessarily shared by all outcomes.

For a simpler form of the model, observe that the time-varying predictors, Xi(tτ), along with the predictors from which we seek sparse coefficients can be included into the AR part and treated as part of the outcome set, Yi(tτ). Mathematically, the two forms are equivalent. Hence, we separate out the time-independent and shared predictors and simplify the model to: (7) where vector β of size r × 1 is the shared time-homogeneous regression coefficient vector. The covariance structure of the error term, ϵit, is assumed to be conditionally independent given the other terms, including the fixed and random effects, in the model. We use the model specified in Eq (7) as the basic SMARLME model for subsequent discussions.

Estimation method

Operationally, solving model Eq (7) involves both model selection and parameter estimation. We shall see that the proposed algorithm resolve the two problems jointly. To estimate the sparse ρ and β and the random effects, we take an alternating approach. In other words, we alternate between the estimation of the AR parameters and the fixed and random effects. First, given AR parameters at the sth iteration, the model becomes a regular LME model with pseudo-outcomes, , and hence any LME estimating algorithm can be applied here with the independent covariance structure. The current estimates of the LME model can be used for the pseudo-outcomes, , where is the predicted random effect vector, and can be used to solve the following sparse least-squares problem, (8) where ‖⋅‖2 is the l2 norm of vectors. Denote matrix of size N × p as observations of all outcomes and all subjects at time t, and group all observations into a single vector—i.e., . Similarly, vectorize ρτ into ρ. After some matrix manipulations, we have the following l0 minimization problem, (9) For illustration purpose, we ignore the fixed and random effects. Matrix A of size Np(T − 1) × p2 d has the following form, (10) where square matrix Ip of size p × p is the identity matrix, and ⊗ indicates the Kronecker product. An example of the A matrix and a practical refinement are given in S1 File.

The minimization problem in Eq (9) can be solved by a fast-computing Forward Backward greedy algorithm (FoBa), which we will briefly explain. For more details, see e.g., [16],[17]. The FoBa algorithm consists of two steps. The first step is forward searching. This step is equivalent to what statisticians call Forward Stepwise Regression or what signal processing researchers call Orthogonal Matching Pursuit [16]. See [18]. In this step, FoBa initializes a residual vector b = y, the solution ρ = 0, and an index set Γ = ∅. At each iteration, it first finds the largest absolute entry i of the vector ATb, and attaches it to Γ; i.e., Γ = Γ ∪ {i}. Next, it updates the solution entries in the index set Γ by solving b = AΓρΓ through Gauss elimination, where AΓ represents a matrix with only columns of A in the index set Γ, and ρΓ denotes the solution entries in Γ. Then it updates the residual vector, b = y, before the next iteration.

The step in seeking the largest absolute entry of ATb is equivalent to finding the most correlated column in A with y, before removing its contribution to y and moving on to search for the next most correlated. Conceptually, this is equivalent to seeking the most correlated Ytτ with Yt. Because there are only matrix-vector multiplications involved, the algorithm is very efficient. The procedure bears some apparent resemblance to the stepwise forward procedure in regression, which involves sequentially adding variable that improves the model the most in terms of criterion such as minimizing the residual sum of squares. Like the least square procedure typically used in stepwise forward selection, FoBa uses a greedy algorithm on the history of Y by sequentially searching for the next “best” variable. Thus the two approaches are similar in terms of their search strategy. However, they are also different in the following aspects: (1) the FoBa uses a selection procedure that is based on the largest inner product with the original elements in A, as opposed to based on the inner product with the normalized orthogonal elements in least square forward selection regression, and (2) a constraint is placed on the number of elements to be included in the selection set in FoBa, as opposed to stop adding variable according to a threshold of changing residual sum of squares in forward selection regression. The first point is subtle and carries computational implication: the FoBa only needs to orthogonalize the elements that are being selected whereas least square forward selection needs to orthogonalize all elements. See [19] for a detailed explanation.

The second step in FoBa is the backward step. It is designed to circumvent the problem that when an entry is chosen and included in Γ, it cannot be removed, thus implying that mistakes made in the early steps cannot be later corrected. The adaptive (FoBa) addresses this issue in the backward step [17]. At each iteration, FoBa searches through Γ to remove entries that would not significantly increase the least-square penalty term. The FoBa has shown to be a serious competitor to other algorithms for sparsifying matrices including LASSO [20][17]. Recently, other modifications in using the underlying orthogonal matching pursuit engine for finding a sparse solution to underdetermined systems of linear equation have been proposed, e.g., [18].

In terms of search strategy, the forward-backward approach in FoBa is analogous to forward-backward model selection in linear regressions, where significant variables are forwardly added in and then backwardly removed. Although the algorithm requires an input parameter, n, to restrict the number of nonzero entries in ρ, the backward-search step would typically generate results with fewer nonzero entries in its solution. In other words, if the true model contains nonzero entries, we can select nn*, and still be able to recover n* nonzero entries in ρ. We illustrate this through our first example using a simulation study, which is described in the next section. In this sense, the model selection and estimation in the proposed SMARLME procedure can be jointly accomplished. In practice, we recommend a strategy of incrementing n in steps and select a model based on information criterion; e.g., Bayesian information criterion (BIC) or Akaike information criterion (AIC).

We summarize the estimation procedure as follows:

Initialization. Initialize ρ as ρ(0) = 0.

Iterations. At the sth iteration,

  1. Given the current estimates , solve the linear mixed effects model through the pseudo-outcomes, (11) and denote the current estimates of β as , and also estimate the predicted random effects .
  2. Given , update the pseudo-outcomes as (12) and solve the l0 minimization problem stated in Eq (9) for , using the FoBa algorithm.

Note that if ρ = 0—i.e., the first step of the estimation procedure— we are solving a regular LME model without the AR part. The likelihood or the information criterion (AIC or BIC) of this model can be saved for later comparison with that of the SMARLME model for the justification of choosing the more complex SMARLME model.


Here we will present analysis results of two motivating examples, namely the circadian oscillator and BP measurements. The circadian-oscillator data were simulated using the ODEs in Eqs (2) and (3). The BP data set was a subset of data collected from the DASH study.

0.1 Simulation Studies: Circadian Oscillator

To facilitate simulation of data, we used the following discretized version of Eq (3) and Eq (3) by setting dt = 1. Here Y1 represents the variable PER, and Y2 represents the variable dCLOCK. (13) (14)

We further set the two delays as τ1 = τ2 = 12, and set parameters in Eqs (2) and (3) as v1 = .5,v2 = .25,k1 = .5,k2 = .5,K1 = .3, K2 = .1, and α = 10. These values were based on values suggested by [5] and for offering realistic biological rhythms in the simulated data. Using Eqs (13) and (14), the true curves of simulated dCLOCK and PER over time, referred to as no-noise data hereafter, are shown in Fig 1(b). To simulate realistic data, Gaussian white noise of different levels was then added to the no-noise data. We choose three levels of Gaussian noise—i.e., σ = 0.01,0.05, and 0.1—and the simulation and estimation are repeated 1,000 times for each noise level. The simulated sample of 100 curves with added noise of standard deviation σ = 0.1 are shown in Fig 2 over a 72-hour period.

Fig 2. Sample of 100 simulated curves with added Gaussian noise of σ = 0.1.

No time-independent and shared predictors are given in this simulation experiment; our sole purpose is to recover the most correlated entries in history with Yt. In addition to having the linear terms of Ytτ in the AR part, we also include expYtτ terms. To make the model as parsimonious as possible, we set the history period d = 15, three hours greater than τ1 and τ2, and the number of nonzero entries in ρ as n = 25. Using the no-noise data and the FoBa algorithm, we identified 7 nonzero locations. Thus using n = 25 is substantially larger than the true number of nonzeros, n* = 7. Setting nn* helps us justify whether the forward-backward greedy algorithm can successfully remove uncorrelated entries while keeping the most correlated entries.

The FoBa algorithm applied to the no-noise data resulted in 7 non-zero coefficients in the linear model. The positions, indexes, and values for the nonzero terms predicting the system (Y1,Y2) are depicted in Table 1. The observed no-noise data and the predicted values based on the linear system with 7 non-zero entries are depicted in Fig 3. It can be seen that the recovery of the original curve is almost perfect when noise is not present.

Fig 3. Observed values and fitted values based on estimates from FoBa for data without noise.

Fig 4 shows the mean and confidence intervals (in error bars) of estimates derived from 1,000 replicates by applying FoBa to simulated data for each noise level. The AR parameters ρkjτ are organized as a single vector, with the first index changes fastest. The vertical lines correspond to the positions of which true non-zero terms are located. Because only expYtτ terms were selected, and none of the Ytτ terms were chosen in any replication, Fig 4 does not include parameters for the Ytτ terms.

Fig 4. The mean and confidence interval of estimated ρ, shown as a single vector.

The vertical lines represent the positions of true nonzero values. The three panels (top to bottom) respectively show results for three levels of noise: σ = 0.01,0.05,0.1.

Fig 5 shows the observed and predicted values of the variables Y1 and Y2 at the three designated levels of noise. Because of space, out of n = 1,000 samples we randomly selected two at each level to show how well the FoBa algorithm recovers the pattern. To further summarize the fits of the SMARLME model based on FoBa estimates, in Table 1 we present the results of the simulation study in the form of bias and mean squared error (MSE) of the estimates over 1,000 replications. Bias and MSE are defined as follows: (15) where , l = 1,…,L denotes the AR parameter derived from the nth replication, where n = 1,…,N, and N was set at 1,000 in this experiment.

Fig 5. Observed and fitted values of two randomly selected samples (as rows) from each noise level σ = 0.01,0.05,0.1 respectively from the leftmost column to the rightmost column.

The observed data are represented in triangles (PER) and circles (dCLOCK), and the lines represent fitted values.

In general, the SMARLME procedure recovers the parameters quite well, as evidenced by the small biases and mean squared errors. The result also shows that both bias and MSE increase with the level of noise in the data, as expected. We also make the following observations: (1) FoBa provides almost perfect fit to the nonlinear no-noise data using a small number of nonzero coefficients. The coefficients at lag 1 and 12 are substantial and are consistent with the way data were generated. (2) When σ = 0.1, and given that the true signal variance at 0.0268, we have a rather low signal-noise ratio (SNR) of 2.7, suggesting that the algorithm can recover true time lags reasonably well even under very noisy situations. Here SNR is defined as the signal variance divided by the noise variance. (3) There exist non-zero coefficients in locations that are not expected—e.g., at lag 11. This may arise because data at lag 11 are highly correlated with data at lag 12. An implication of this observation is that there potentially exist multiple solutions that fit the observed data equally well. (4) There exist some small coefficients which are close to zero—e.g., ρ2,2,17 at position 68. For this position, as the noise level increases, FoBa is less likely to select coefficient at this location. This is reflected in the Count column in Table 1, which represents the number of times that the model select the correct position of predictor out of 1,000 replications. At σ = 0.1, FoBa does not select this location at all. An implication of this observation is that for coefficients with small nonzero values, they are not always selected especially when the noise level is substantial. (5) Regardless of the model selected, the FoBa provides predicted values that fit the observed values quite well (see Fig 5). This simulation indeed demonstrates that the SMARLME could effectively recover intrinsic highly-correlated delays in periodic data with feedback loops.

0.2 Data Application: Blood Pressure Data

As noted in [14], more work is needed on the longitudinal analysis of 24-hour blood pressure data given the lack of a generally accepted ‘standard’ analysis method. Hence the appeal of illustrating our method with the DASH data. The 24-hour hourly BP data of a sample of 340 subjects before and after intervention is concatenated together to form a 48 x 1 vector for the SBP and the DBP of each subject. The SBP and DBP data of a subsample of 50 subjects, before and after intervention, are shown in Fig 6, along with the sample mean curves in thick, black dashed lines. An intervention variable, δt, is introduced to differentiate the before from the after intervention period; i.e., δt = 0,1 ≤ t ≤ 24;δt = 1,25 ≤ t ≤ 48. Thus, our model accounts for the three week distance between the measurements before the intervention period and those after. Three diet groups are coded as two separate binary variables, namely the vegetable/fruit diet and the DASH diet. Eight predictors include intercept, vegetable/fruit diet, DASH diet, control diet and intervention period, vegetable/fruit diet and intervention period, DASH diet and intervention period, race, and age. The AR equilbria without intervention are assumed to be the same before and after intervention, and hence the intervention effects can be separately estimated. A subject-specific random effect is added onto the intercept term. We set d = 23, and the total number of entries in ρ is p2d = 22×23 = 92. For model selection, we vary n from 0 to 59, and choose the minimum-BIC model within this range.

Fig 6. The 24-h BP data of a subsample of 50 subjects before and after intervention.

From Fig 7(a), we see the minimum BIC appears at n = 56 (the actual number of nonzeros in ρ is 36), which is far less than the BIC of the LME model; i.e., when n = 0. The sharp decline of BIC even at n = 1 suggests that adding the AR part to the LME model would be more appropriate. Observing the flattened BIC after n = 20, we can choose more parsimonious models. The convergence of fixed-effects estimates is shown in Fig 7(b). Fig 8 shows the estimated ρ, and for a better presentation, we split ρ into four parts, each at a length of 23. For example, the first subplot shows {ρ11ττ = 1,…,23}, which corresponds to the contribution of SBP at time tτ to SBP at time t. The U-shapes observed in Fig 8 corresponds well to the correlation plots seen in Fig 9(a), which also inspire the circular autoregressive-correlation structure in [14]. We also plot the estimated correlations between predicted BP values in Fig 9(a) with 9(b), and comparing the two figures, we can see that our model can capture more subtle structures such as the W-shapes of the original correlations. The slight elevation of the predicted correlations is due to the removed noise term in the predicted values.

Fig 7. (a) The BICs of models with increasing n.

(b) The convergence of fix-effect estimates.

Fig 9. (a) The empirical correlations of SBP and DBP data, where the x-axis is the time lag, τ, and each circle represents the correlation between, for example, SBPt and DBPtτ.

Because for a fixed τ there can be multiple ts depending on the availability of data, we can observe multiple circles at some τ.(b) The model-estimated correlations of SBP and DBP data. Because the model estimates are not limited by the availability of data, we have the same number of circles at each τ.

Table 2 presents the estimates, standard errors, and p-values from our SMARLME mode fit, along with those from an LME model fit for comparison with the mean-value model in [13]. The intercept parameter represents the average of SBP and DBP for the non-white, control diet group during the pre-intervention period for the “No AR” model and the residual average after variation removal in the “AR” (SMARLME) model. The “diet group and intervention” parameters indicate the estimated differences in blood pressure from this average for the groups during the intervention period, while the diet parameters indicate these differences during the pre-intervention period. The “White” parameter indicates the estimated difference in blood pressure for White subjects, and the age parameter denotes the estimated change in blood pressure for each yearly increase in age. Clearly, the estimated effects of the DASH diet and the vegetable/fruit diet are both reduced from the mean-value (No AR) model with our (AR) model fit, although they are still significant. Interestingly, even within the control group, there appears to be a significant difference before and after the intervention period with our model fit, as seen in the estimate of the control diet and intervention period, while the mean-value model shows otherwise.


The main contribution of this paper is in its (1) explicit modeling of reciprocal features of multiple time series, and (2) offering of a simple and practical solution to the potentially high-dimensional lagged components in the model. There exists a large literature on either components (1) and (2) that could be dated back to early work such as [21]. More recent work in (2) includes the non-reciprocal dynamic-factor model [22], which aims to capture the dynamics of a time series such as a financial indicator with a large number of lagged-predictor variables, such as supply and order variables. Similar to our goal here, different methods such as the principal component and shrinkage method have been proposed to solve the high-dimensional problem [23, 24]. For reciprocal-causal models in (1), earlier work arose both in the psychometric literature, especially in structural equation modeling, and in economics. For example, the so-called cross-lagged models have been developed for reciprocal time series [25], although the method mostly addresses problems of relatively low dimension and short panel of cohort data (in the notation of Eq (1), p = 2 and m = 1). Thus, one can view this paper as a way to extend reciprocal models in time series to high dimensions—both in term of interacting variables and the time variable—and to offer a sparse representation of the model structure. One interesting feature of the proposed method for SMARLME is that it simultaneously addresses the model selection and the estimation problem. Additionally, as we have shown in the circadian oscillation example, nonlinear variations over time can also be modeled using transformed terms in the linear predictive model. Further research is required to evaluate the scope and limit of using linear models for nonlinear feedback systems. Although the current SMARLME modeling setup is such that the sparsity is induced by the cardinality constraint, it is possible that some specific sparse structure could be a priori defined, as pointed out by a reviewer. Such an implementation can indeed limit the FoBa search space and improve computational efficiency.

There are several limitations of the current work. First, we have not addressed stationary conditions of the model. It is possible that the estimated model is non-stationary. However, our focus has been in clinical applications in which the long-term behavior of the model may not be a primary concern. In fact, the FoBa algorithm proposed in this paper does not require that the time series are stationary. A second limitation is that we have not taken into account the impact of model selection on inference [26, 27]. In other words, the selected sparse model structure may not be correct and therefore it is possible that the coefficients and standard errors reported in Table 2 are biased. This is an issue that cannot be adequately covered in this paper. Further research will examine the impact of selecting different sparse model on coverage properties. Finally, a limitation of the current work is that we have restricted the discussion to linear models and avoided nonlinear regression models. The nonlinear circadian rhythm example used for the generative model in our simulation study has been linearized with exponentiated transformed variables. The estimation proceeds using the proposed linear algorithm, which actually brings some simplification to the problem. The simplification could also be useful when interpreting parameters in the fixed and random effect components of SMARLME, which in some cases could be the primary goal of inference, for example in medical applications in which the AR component is treated as a nuisance factor.

Supporting Information

S1 File. S1 File contains an example about the use of matrix formulation for the FoBa estimation of the multivariate autoregressive model.



This study is funded by the following grants: NIH U01HL101066-01, NIH 1R21AG042761-01, NIBIB K25 EB012236-01A1, and NSF SES-1424875. The DASH dataset is a limited-access dataset obtained from the NHLBI and this manuscript does not necessarily reflect the opinion or views of the DASH study or the NHLBI.

Author Contributions

Conceived and designed the experiments: EI SLS. Performed the experiments: QZ EI TS. Analyzed the data: QZ EI TS. Contributed reagents/materials/analysis tools: QZ EI. Wrote the paper: EI QZ SLS.


  1. 1. Walls TA. Intensive longitudinal data: The Oxford Handbook of Quantitative Methods in Psychology. OUP USA; 2013.
  2. 2. Wei WWS. Time series analysis. Addison-Wesley Redwood City, California; 1994.
  3. 3. Harrison L, Penny WD, Friston K. Multivariate autoregressive modeling of fMRI time series. NeuroImage. 2003;19(4):1477–1491. pmid:12948704
  4. 4. de Waele S, Broersen PM. Order selection for vector autoregressive models. IEEE Transactions on Signal Processing. 2003;51(2):427–433.
  5. 5. Smolen P, Baxter DA, Byrne JH. A Reduced Model Clarifies the Role of Feedback Loops and Time Delays in the Drosophila Circadian Oscillator. Biophysical Journal. 2002;83(5):2349–2359. pmid:12414672
  6. 6. Fujita A, Sato JR, Garay-Malpartida HM, Yamaguchi R, Miyano S, Sogayar MC, et al. Modeling gene expression regulatory networks with the sparse vector autoregressive model. BMC Systems Biology. 2007;1(1):1–39.
  7. 7. Lozano AC, Abe N, Liu Y, Rosset S. Grouped graphical Granger modeling for gene expression regulatory networks discovery. Bioinformatics. 2009;25(12):i110–i118. pmid:19477976
  8. 8. Basu S, Shojaie A, Michailidis G. Network Granger Causality with Inherent Grouping Structure. arXiv preprint arXiv:12103711. 2012;.
  9. 9. Funatogawa I, Funatogawa T, Ohashi Y. A bivariate autoregressive linear mixed effects model for the analysis of longitudinal data. Statistics in medicine. 2008;27(30):6367–6378. pmid:18825651
  10. 10. Smolen P, Baxter DA, Byrne JH. Modeling circadian oscillations with interlocking positive and negative feedback loops. The Journal of Neuroscience. 2001;21(17):6644–6656. pmid:11517254
  11. 11. Smolen P, Hardin PE, Lo BS, Baxter DA, Byrne JH. Simulation of Drosophila Circadian Oscillations, Mutations, and Light Responses by a Model with VRI, PDP-1, and CLK. Biophysical Journal. 2004;86(5):2786–2802. pmid:15111397
  12. 12. Appel LJ, Moore TJ, Obarzanek E, Vollmer WM, Svetkey LP, Sacks FM, et al. A clinical trial of the effects of dietary patterns on blood pressure. New England Journal of Medicine. 1997;336(16):1117–1124. pmid:9099655
  13. 13. Moore TJ, Vollmer WM, Appel LJ, Sacks FM, Svetkey LP, Vogt TM, et al. Effect of Dietary Patterns on Ambulatory Blood Pressure Results From the Dietary Approaches to Stop Hypertension (DASH) Trial. Hypertension. 1999;34(3):472–477. pmid:10489396
  14. 14. Simpson SL, Edwards LJ. A circular LEAR correlation structure for cyclical longitudinal data. Statistical Methods in Medical Research. 2013;22(3):296–306. pmid:21216801
  15. 15. Priestley MB. Spectral analysis and time series. Academic press; 1981.
  16. 16. Tropp JA, Gilbert AC. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory. 2007;53(12):4655–4666.
  17. 17. Zhang T. Adaptive forward-backward greedy algorithm for learning sparse representations. IEEE Transactions on Information Theory. 2011;57(7):4689–4708.
  18. 18. Donoho D, Tsaig Y, Drori I, Starck JL. Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. IEEE Transactions on Information Theory. 2012;58(2):1094–1121.
  19. 19. Blumensath T, Davies ME. On the difference between orthogonal matching pursuit and orthogonal least squares. Technical Report Available at http://eprintssotonacuk/142469/. 2007;.
  20. 20. Hsu D, Kakade S, Langford J, Zhang T. Multi-Label Prediction via Compressed Sensing. In: Proceedings of Neural Information Processing Systems (NIPS); 2009. p. 772–780.
  21. 21. Granger CW. Investigating causal relations by econometric models and cross-spectral methods. Econometrica. 1969;37:424–438.
  22. 22. Forni M, Hallin M, Lippi M, Reichlin L. The generalized dynamic factor model: one-sided estimation and forecasting. Journal of the American Statistical Association. 2005;100(471):830–840.
  23. 23. Stock JH, Watson MW. Forecasting using principal components from a large number of predictors. Journal of the American statistical association. 2002;97(460):1167–1179.
  24. 24. Giacomini R, White H. Tests of conditional predictive ability. Econometrica. 2006;74(6):1545–1578.
  25. 25. Finkel SE. Causal analysis with panel data. Sage; 1995.
  26. 26. Hurvich CM, Tsai C. The impact of model selection on inference in linear regression. The American Statistician. 1990;44(3):214–217.
  27. 27. Kabaila P. The coverage properties of confidence regions after model selection. International Statistical Review. 2009;77(3):405–414.