Analysis of Feedback Mechanisms with Unknown Delay Using Sparse Multivariate Autoregressive Method

This paper discusses the study of two interacting processes in which a feedback mechanism exists between the processes. The study was motivated by problems such as the circadian oscillation of gene expression where two interacting protein transcriptions form both negative and positive feedback loops with long delays to equilibrium. Traditionally, data of this type could be examined using autoregressive analysis. However, in circadian oscillation the order of an autoregressive model cannot be determined a priori. We propose a sparse multivariate autoregressive method that incorporates mixed linear effects into regression analysis, and uses a forward-backward greedy search algorithm to select non-zero entries in the regression coefficients, the number of which is constrained not to exceed a pre-specified number. A small simulation study provides preliminary evidence of the validity of the method. Besides the circadian oscillation example, an additional example of blood pressure variations using data from an intervention study is used to illustrate the method and the interpretation of the results obtained from the sparse matrix method. These applications demonstrate how sparse representation can be used for handling high dimensional variables that feature dynamic, reciprocal relationships.


Introduction
In randomized clinical trials, multivariate longitudinal data are often sampled, either sparsely or densely (intensively) [1], over a certain time period. A large part of the longitudinal data analysis literature has been focused on the sparsely sampled data; e.g., data acquired by annual or semi-annual visits. For intensive longitudinal data, however, relatively fewer methods have been proposed, one important component of which is time series analysis [2]. The traditional time series approaches, such as univariate or multivariate autoregressive (AR) models, are only applied to one or several time series; e.g., a stock market index series or a commodity price series. However, in biological and clinical studies, we often observe one time series per subject, and in the multivariate case, data often shows a three-dimensional tensor structure, including the subject, the variable, and the time dimensions.
Jointly modeling multivariate intensive longitudinal data could introduce quite a few parameters. For example, the AR(m) model below: would require mp 2 parameters. Here Y ijt is the observed outcome of subject i at time t on variable j, ρ kjτ is the contribution of the k th variables at time t − τ to the j th variable at time t, and the error term, ϵ ijt , is assumed to be independently and identically distributed (i.i.d.) with timeindependent or stationary distribution assumption. Specifically, it can be assumed that ϵ ijt $ Nð0; s 2 j Þ, and that ϵ ij 1 t 1 and ϵ ij 2 t 2 are independent if j 1 6 ¼ j 2 or t 1 6 ¼ t 2 . We denote the number of variables as p and the order of the autoregression as m. The model specified by Eq (1) is a rather comprehensive model as it could include multiple possibly correlated variables, timelagged effects from the same variable, as well as cross-lagged effects from all the other variables in the model. This kind of model has been found to be useful in applications such as fMRI time series analysis in which brain activities in various regions of the brain, intensively sampled over time, are modeled. For example, Harrison et. al, [3] used a multivariate AR model (p = 4, m = 3) for making inference about attention modulation of connectivity within the dorsal visual pathway and specifically across brain regions including the posterior parietal cortex and right prefrontal cortex. Therefore, it is possible that activity in the posterior parietal cortex at time t − 2 influences the right prefrontal cortex at time t.
Indiscriminatingly including all the variables and all time points as in Eq (1) is not always optimal especially when the sample size is small and overfitting problems often arise in such cases. Model selection criteria, such as Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), or other variations [4], would limit selection on the temporal component to the first few orders, but when the time period is long, one could miss significant autoregressive explanations from outcomes farther back in time. For example, daily blood pressure measurements often show strong correlations between hour 1 and hour 24. Only using measurements a few hours back would therefore miss the daily cycle. Another example is the circadian oscillations of gene expressions [5], where two interacting protein transcriptions can form both negative and positive feedback loops, with delays as long as 12 hours, while gene expression is measured hourly. These delays are essential to forming the periodic time series of protein densities, and trying to estimate these delays is an important step in understanding gene interactions on the molecular level. In terms of statistical modeling, neither an AR(1) nor an AR (12) are appropriate for these data because there exists only a few nonzero entries in the parameter set {ρ kjτ jk = 1,. . .,p,j = 1,. . .,p,τ = 1,. . .,T}. This inspires a sparse autoregressive model, in which we only seek the first few most correlated autoregressive entries, regardless of the time lag or which variable. If we vectorize the parameter set into a vector, ρ, we would assume ρ is mostly zero except at a few entries. Some recent work on sparse autoregression models includes Fujita et. al. [6], who employ a multivariate AR model with l 1 penalization to learn gene-regulatory mechanisms from time-course microarray data, and the Network Granger Causality (NGC) models of Lozano et. al. [7] and Basu et.al. [8] using group Lasso penality terms.
We often assume a time series has reached the equilibrium when samples are taken, but after an intervention, which could be time dependent-e.g., treatment dropped, switched, or with different dose levels given a subject's conditions-we would like to know how the intervention alters the equilibrium. This is the case in our second motivating example, a multicenter randomized clinical trial in which hourly blood pressure data over a 24-hour period is recorded both before and after diet interventions. It is possible that after the interventions, equilibrium would reach a different state than before. This can be modeled through combining a linear mixed-effects (LME) model with an autogressive model [9], in which the LME model could include all the time dependent or independent predictors. Introducing random effects could also be beneficial, as subjects would often reach equilibrium differently, for example, depending on demographics or certain physiological characteristics.
Here we propose a sparse multivariate autoregressive analysis that takes into account the autocorrelations within the multiple observed outcomes over an arbitrarily long history, but only keeping those most correlated in the history. Hence while more variations can be explained, the model still remains parsimonious. We then combine the AR part with the LME part and jointly estimate both sets of parameters. The combined AR and LME model would specifically target time series that are often observed in clinical trials before and after intervention, which would be difficult to analyze using one sparse multivariate AR model, because an intervention often changes time series to a different equilibrium state.

Motivating Examples
Example 1. Circadian rhythms reflect oscillating expressions of genes. Fig 1 schematically describes a simplified model of Drosophilia circadian oscillations [5], in which dCLOCK and PER represent two proteins while dclock and per represent their transcriptors respectively. The model contains both a positive and a negative feedback loop. Using dCLOCK protein level as an example, the two feedback loops work as follows: (1) dCLOCK activates per transcription and thus PER synthesis with lag τ 1 ; PER binds with dCLOCK, decreasing the presence of dCLOCK (the negative feedback loop), and thereby also de-activates per transcription; and (2) increase in dCLOCK also leads to more dCLOCK (the positive feedback loop) because the activated PER binds to dCLOCK, leading to the de-repression of dclock transcription, with lag τ 2 . The two different lagged feedback mechanisms can be respectively modeled by eqs (2) and (3).
where we use Y 1 to denote PER and Y 2 for dCLOCK. The model parameters, K 1 ,K 2 ,v 1 ,v 2 ,k 1 , and k 2 , are given as constants. The two time delays, τ 1 and τ 2 , are essential to forming the circadian oscillations of Y 1 and Y 2 . Eqs (2) and (3) are based on the ordinary differential equations in [5] with a slight modification. The quantity freely available dCLOCK protein Free dCLOCK was originally calculated by the function Free dCLOCK(t) = max([dCLOCK(t) − PER(t)],0). To avoid a possible discontinuity at zero in simulated data, we instead used the logistic trans- , where x is Free dCLOCK(t) and α is a scaling parameter. As we shall see later, the process can be approximated by the AR(m) model in Eq (1) in which an exponential transform exp(Y) replaces Y on the RHS of the equation. However, the traditional multivariate AR models would involve many unnecessary parameters, if, for example, the delays are long and/or more proteins are involved in the model; e.g., more complex ODE models in [10,11]. It would be highly desirable if we could pinpoint the exact delays through an AR model but with nonzero entries only at certain delays. This example inspired our focus on sparsity.
Example 2. The Dietary Approaches to Stop Hypertension (DASH) trial was a multicenter, randomized parallel-arm feeding study that tested the effects of dietary patterns on blood pressure (BP). The three diets were a control diet (low in fruits, vegetables, and dairy products, with a fat content typical of the average diet in the United States), a diet rich in fruits and vegetables (a diet similar to the control except it provided more fruits and vegetables and fewer snacks and sweets), and a combination diet rich in fruits, vegetables, and low-fat dairy foods  Feedback Using Sparse Autoregressive Method and reduced in saturated fat, total fat, and cholesterol (DASH diet). Participants were healthy adults 22 years of age or older who were not taking antihypertensive medication. The subjects' BP measurements, including systolic blood pressure (SBP) and diastolic blood pressure (DBP), were taken over two 24-hour periods, one before the diet intervention and the other after. For more details, see [12] and [13].
After comparing the average BP (ABP) over a 24-hour period of cohorts before and after the intervention, Moore et. al. [13] found fruit/vegetable and DASH diets significantly (p < 0.0001) lowered ABP, when compared with the control diet (fruit/vegetable diet, -3.2/-1.0 mmHg; DASH Diet, -4.6/-2.6 mmHg). However, after considering within-subject correlation, the model by Simpson and Edwards [14] found the reduction in SBP by the DASH diet reduced from -4.6 mmHg to -3.6 mmHg. Presumably, the intervention altered the equilibrium of the BP cycles, and we can model this effect additively by adding intervention predictors onto the AR process. Because large BP variations are explained by previous measurements (the AR part), we expect a further reduction of the diet effects. Also, adding random effects would be useful for addressing subject-specific variations.
These two examples motivated us to combine a sparse multivate AR model with a linear mixed effects (LME) model to form a sparse multivariate autoregressive linear mixed effects model (SMARLME). The first example was used as a basis for simulations studies designed to determine how well our parsimonious model can accurately recover the original signal. We further analyzed the data from the second example to illustrate the utility of the model in a more traditional longitudinal data context. It is worth mentioning that the motivating example in [9]-i.e, parathyroid hormone (PTH) and serum calcium (Ca) levels interacting with the treatment Maxacalcitriol doze level-is also an excellent example for the SMARLME model. Compared to the AR(1) + LME model in [9], the SMARLME model could be more parsimonious and more farreaching into the history of the interaction between PTH and Ca. These two examples demonstrate the flexibility of the SMARLME for modeling phenomena in which multi-variables in a system create feedback loops with specific lag times.

Analysis
Let Y ijt be the observed outcome of subject i at time t for variable j, and X iut be the u th predictor for subject i at time t, i = 1,. . .,N, j = 1,. . .,p, and t = 1,. . .,T. The combined multivariate AR with the LME model can be described in scalar form as, where vector Y ik(t − τ) is the observed k th outcome of subject i at time t − τ, ρ kjτ represents the contribution of the k th outcomes at time t − τ to the j th outcome at time t, and β ujτ represents the contribution of the u th predictor at time t − τ to outcome j, and X iu(t − τ) represents the value of the u th predictor. The terms Z it and b i respectively represents the design matrix for the random effects and the vector of random effects. The simplest case would be Z it being identity and b i being a single random effect b i in which b i is normally distributed with mean zero and variable σ 2 . The error term, ϵ ijt , is assumed to be independent and normally distributed with constant variance, and independent from b i . We denote the number of included predictive outcome variables by q (q p), and the number of predictors by r.
In contrast to the linear AR(m) model, this model is more flexible as well as comprehensive because it considers the entire history of observations of all variables including both outcomes and predictors. Furthermore, to accommodate a wider array of dynamical systems, transformed variable of Y ik(t − τ) can be included as predictor. For example, for the circadian system described by the two ODEs in Eqs (2) and (3), we included exponentiated terms of Y ik(t − τ) on the RHS of Eq (1). For the dynamic system in the circadian rhythm example, the nonlinear feedback mechanism would ensure stationarity of the model without necessarily constraining linear AR parameters, ρ. It is beyond the scope of this paper to discuss model stationarity, and we refer interested readers to [15].
In practical implementation of the model, we limit the history up to a certain period, d, such as a 24-hour period for observations with a strong daily cycle, and for shared parameters as in [14], we remove the variable index in β ujl so that it becomes β ul . With the assumptions of equilibrium and time-independent X iu , we can further remove the time index and simply denote the regression parameter by β u .
The model specified by Eq (4) can be succinctly represented using matrix notation. To set up notation, we denote the vector ( is the vector of predictors of subject i, and ϵ it is the vector (ϵ ijt ) of length p.
In vector notation, the model now can be expressed as: The sparsity constraint is implemented through the following steps: (1) group all autoregression coefficients into a single vector-i.e., ρ = {vec(ρ 1  where kÁk 0 is the l 0 norm, or the number of non-zero entries in the vector. This implementation enforces sparsity in the set of the predictor coefficients when the predictors are time-varying and are not necessarily shared by all outcomes. For a simpler form of the model, observe that the time-varying predictors, X i(t − τ) , along with the predictors from which we seek sparse coefficients can be included into the AR part and treated as part of the outcome set, Y i(t − τ) . Mathematically, the two forms are equivalent. Hence, we separate out the time-independent and shared predictors and simplify the model to: where vector β of size r × 1 is the shared time-homogeneous regression coefficient vector. The covariance structure of the error term, ϵ it , is assumed to be conditionally independent given the other terms, including the fixed and random effects, in the model. We use the model specified in Eq (7) as the basic SMARLME model for subsequent discussions.

Estimation method
Operationally, solving model Eq (7) involves both model selection and parameter estimation.
We shall see that the proposed algorithm resolve the two problems jointly. To estimate the sparse ρ and β and the random effects, we take an alternating approach. In other words, we alternate between the estimation of the AR parameters and the fixed and random effects.
First, given AR parametersρ ðsÞ at the s th iteration, the model becomes a regular LME model ðsÞ t Y iðtÀtÞ , and hence any LME estimating algorithm can be applied here with the independent covariance structure. The current estimates of the LME model can be used for the pseudo-outcomes, i is the predicted random effect vector, and can be used to solve the following sparse leastsquares problem, where kÁk 2 is the l 2 norm of vectors. Denote matrix Y Ã t of size N × p as observations of all outcomes and all subjects at time t, and group all observations into a single vector-i.e., Similarly, vectorize ρ τ into ρ. After some matrix manipulations, we have the following l 0 minimization problem, min ρ ky Ã À Aρk 2 2 ; subject to kρk 0 n: For illustration purpose, we ignore the fixed and random effects. Matrix A of size Np(T − 1) × p 2 d has the following form, :::: I p Y tÀ1 ::: I p Y tÀd :::: I p Y TÀ1 ::: where square matrix I p of size p×p is the identity matrix, and indicates the Kronecker product. An example of the A matrix and a practical refinement are given in S1 File. The minimization problem in Eq (9) can be solved by a fast-computing Forward Backward greedy algorithm (FoBa), which we will briefly explain. For more details, see e.g., [16], [17]. The FoBa algorithm consists of two steps. The first step is forward searching. This step is equivalent to what statisticians call Forward Stepwise Regression or what signal processing researchers call Orthogonal Matching Pursuit [16]. See [18]. In this step, FoBa initializes a residual vector b = y, the solution ρ = 0, and an index set Γ = ;. At each iteration, it first finds the largest absolute entry i of the vector A T b, and attaches it to Γ; i.e., Γ = Γ [ {i}. Next, it updates the solution entries in the index set Γ by solving b = A Γ ρ Γ through Gauss elimination, where A Γ represents a matrix with only columns of A in the index set Γ, and ρ Γ denotes the solution entries in Γ. Then it updates the residual vector, b = y − Aρ, before the next iteration.
The step in seeking the largest absolute entry of A T b is equivalent to finding the most correlated column in A with y, before removing its contribution to y and moving on to search for the next most correlated. Conceptually, this is equivalent to seeking the most correlated Y t − τ with Y t . Because there are only matrix-vector multiplications involved, the algorithm is very efficient. The procedure bears some apparent resemblance to the stepwise forward procedure in regression, which involves sequentially adding variable that improves the model the most in terms of criterion such as minimizing the residual sum of squares. Like the least square procedure typically used in stepwise forward selection, FoBa uses a greedy algorithm on the history of Y by sequentially searching for the next "best" variable. Thus the two approaches are similar in terms of their search strategy. However, they are also different in the following aspects: (1) the FoBa uses a selection procedure that is based on the largest inner product with the original elements in A, as opposed to based on the inner product with the normalized orthogonal elements in least square forward selection regression, and (2) a constraint is placed on the number of elements to be included in the selection set in FoBa, as opposed to stop adding variable according to a threshold of changing residual sum of squares in forward selection regression. The first point is subtle and carries computational implication: the FoBa only needs to orthogonalize the elements that are being selected whereas least square forward selection needs to orthogonalize all elements. See [19] for a detailed explanation.
The second step in FoBa is the backward step. It is designed to circumvent the problem that when an entry is chosen and included in Γ, it cannot be removed, thus implying that mistakes made in the early steps cannot be later corrected. The adaptive (FoBa) addresses this issue in the backward step [17]. At each iteration, FoBa searches through Γ to remove entries that would not significantly increase the least-square penalty term. The FoBa has shown to be a serious competitor to other algorithms for sparsifying matrices including LASSO [20] [17]. Recently, other modifications in using the underlying orthogonal matching pursuit engine for finding a sparse solution to underdetermined systems of linear equation have been proposed, e.g., [18].
In terms of search strategy, the forward-backward approach in FoBa is analogous to forward-backward model selection in linear regressions, where significant variables are forwardly added in and then backwardly removed. Although the algorithm requires an input parameter, n, to restrict the number of nonzero entries in ρ, the backward-search step would typically generate results with fewer nonzero entries in its solution. In other words, if the true model containsn nonzero entries, we can select n ) n Ã , and still be able to recover n Ã nonzero entries in ρ. We illustrate this through our first example using a simulation study, which is described in the next section. In this sense, the model selection and estimation in the proposed SMARLME procedure can be jointly accomplished. In practice, we recommend a strategy of incrementing n in steps and select a model based on information criterion; e.g., Bayesian information criterion (BIC) or Akaike information criterion (AIC).
Iterations. At the s th iteration, 1. Given the current estimatesr ðsÀ1Þ , solve the linear mixed effects model through the pseudooutcomes, and denote the current estimates of β asβ ðsÞ , and also estimate the predicted random effectŝ b ðsÞ i .
Note that if ρ = 0-i.e., the first step of the estimation procedure-we are solving a regular LME model without the AR part. The likelihood or the information criterion (AIC or BIC) of this model can be saved for later comparison with that of the SMARLME model for the justification of choosing the more complex SMARLME model.

Results
Here we will present analysis results of two motivating examples, namely the circadian oscillator and BP measurements. The circadian-oscillator data were simulated using the ODEs in Eqs (2) and (3). The BP data set was a subset of data collected from the DASH study.

Simulation Studies: Circadian Oscillator
To facilitate simulation of data, we used the following discretized version of Eq (3) and Eq (3) by setting dt = 1. Here Y 1 represents the variable PER, and Y 2 represents the variable dCLOCK.
We further set the two delays as τ 1 = τ 2 = 12, and set parameters in Eqs (2) and (3) as v 1 = .5,v 2 = .25,k 1 = .5,k 2 = .5,K 1 = .3, K 2 = .1, and α = 10. These values were based on values suggested by [5] and for offering realistic biological rhythms in the simulated data. Using Eqs (13) and (14), the true curves of simulated dCLOCK and PER over time, referred to as no-noise data hereafter, are shown in Fig 1(b). To simulate realistic data, Gaussian white noise of different levels was then added to the no-noise data. We choose three levels of Gaussian noise-i.e., σ = 0.01,0.05, and 0.1-and the simulation and estimation are repeated 1,000 times for each noise level. The simulated sample of 100 curves with added noise of standard deviation σ = 0.1 are shown in Fig 2 over a 72-hour period.
No time-independent and shared predictors are given in this simulation experiment; our sole purpose is to recover the most correlated entries in history with Y t . In addition to having the linear terms of Y t − τ in the AR part, we also include expY t − τ terms. To make the model as parsimonious as possible, we set the history period d = 15, three hours greater than τ 1 and τ 2 , and the number of nonzero entries in ρ as n = 25. Using the no-noise data and the FoBa algorithm, we identified 7 nonzero locations. Thus using n = 25 is substantially larger than the true number of nonzeros, n Ã = 7. Setting n ) n Ã helps us justify whether the forward-backward greedy algorithm can successfully remove uncorrelated entries while keeping the most correlated entries.
The FoBa algorithm applied to the no-noise data resulted in 7 non-zero coefficients in the linear model. The positions, indexes, and values for the nonzero terms predicting the system (Y 1 ,Y 2 ) are depicted in Table 1. The observed no-noise data and the predicted values based on the linear system with 7 non-zero entries are depicted in Fig 3. It can be seen that the recovery of the original curve is almost perfect when noise is not present.   Table 1 we present the results of the simulation study in the form of bias and mean squared error (MSE) of the estimates over 1,000 replications. Bias and MSE are defined as follows: where r ðnÞ l , l = 1,. . .,L denotes the AR parameter derived from the n th replication, where n = 1,. . .,N, and N was set at 1,000 in this experiment.
In general, the SMARLME procedure recovers the parameters quite well, as evidenced by the small biases and mean squared errors. The result also shows that both bias and MSE increase with the level of noise in the data, as expected. We also make the following observations: (1) FoBa provides almost perfect fit to the nonlinear no-noise data using a small number of nonzero coefficients. The coefficients at lag 1 and 12 are substantial and are consistent with Feedback Using Sparse Autoregressive Method the way data were generated. (2) When σ = 0.1, and given that the true signal variance at 0.0268, we have a rather low signal-noise ratio (SNR) of 2.7, suggesting that the algorithm can recover true time lags reasonably well even under very noisy situations. Here SNR is defined as the signal variance divided by the noise variance. (3) There exist non-zero coefficients in locations that are not expected-e.g., at lag 11. This may arise because data at lag 11 are highly correlated with data at lag 12. An implication of this observation is that there potentially exist multiple solutions that fit the observed data equally well. (4) There exist some small coefficients which are close to zero-e.g., ρ 2,2,17 at position 68. For this position, as the noise level increases, FoBa is less likely to select coefficient at this location. This is reflected in the Count column in Table 1, which represents the number of times that the model select the correct position of predictor out of 1,000 replications. At σ = 0.1, FoBa does not select this location at all. An implication of this observation is that for coefficients with small nonzero values, they are not always selected especially when the noise level is substantial. (5) Regardless of the model selected, the FoBa provides predicted values that fit the observed values quite well (see Fig 5). This simulation indeed demonstrates that the SMARLME could effectively recover intrinsic highly-correlated delays in periodic data with feedback loops.

Data Application: Blood Pressure Data
As noted in [14], more work is needed on the longitudinal analysis of 24-hour blood pressure data given the lack of a generally accepted 'standard' analysis method. Hence the appeal of illustrating our method with the DASH data. The 24-hour hourly BP data of a sample of 340 subjects before and after intervention is concatenated together to form a 48 x 1 vector for the SBP and the DBP of each subject. The SBP and DBP data of a subsample of 50 subjects, before and after intervention, are shown in Fig 6, along with the sample mean curves in thick, black Feedback Using Sparse Autoregressive Method dashed lines. An intervention variable, δ t , is introduced to differentiate the before from the after intervention period; i.e., δ t = 0,1 t 24;δ t = 1,25 t 48. Thus, our model accounts for the three week distance between the measurements before the intervention period and those after. Three diet groups are coded as two separate binary variables, namely the vegetable/ fruit diet and the DASH diet. Eight predictors include intercept, vegetable/fruit diet, DASH diet, control diet and intervention period, vegetable/fruit diet and intervention period, DASH diet and intervention period, race, and age. The AR equilbria without intervention are assumed to be the same before and after intervention, and hence the intervention effects can be separately estimated. A subject-specific random effect is added onto the intercept term. We set d = 23, and the total number of entries in ρ is p 2 d = 2 2 ×23 = 92. For model selection, we vary n from 0 to 59, and choose the minimum-BIC model within this range. From Fig 7(a), we see the minimum BIC appears at n = 56 (the actual number of nonzeros in ρ is 36), which is far less than the BIC of the LME model; i.e., when n = 0. The sharp decline of BIC even at n = 1 suggests that adding the AR part to the LME model would be more appropriate. Observing the flattened BIC after n = 20, we can choose more parsimonious models. The convergence of fixed-effects estimates is shown in Fig 7(b). Fig 8 shows the estimated ρ,  Fig 9(a), which also inspire the circular autoregressive-correlation structure in [14]. We also plot the estimated correlations between predicted BP values in Fig 9(a) with 9(b), and comparing the two figures, we can see that our model can capture more subtle structures such as the W-shapes of the original correlations. The slight elevation of the predicted correlations is due to the removed noise term in the predicted values. Table 2 presents the estimates, standard errors, and p-values from our SMARLME mode fit, along with those from an LME model fit for comparison with the mean-value model in [13]. The intercept parameter represents the average of SBP and DBP for the non-white, control diet group during the pre-intervention period for the "No AR" model and the residual average after variation removal in the "AR" (SMARLME) model. The "diet group and intervention" parameters indicate the estimated differences in blood pressure from this average for the groups during the intervention period, while the diet parameters indicate these differences during the preintervention period. The "White" parameter indicates the estimated difference in blood pressure for White subjects, and the age parameter denotes the estimated change in blood pressure for each yearly increase in age. Clearly, the estimated effects of the DASH diet and the vegetable/fruit diet are both reduced from the mean-value (No AR) model with our (AR) model fit, although they are still significant. Interestingly, even within the control group, there appears to be a significant difference before and after the intervention period with our model fit, as seen in

Discussion
The main contribution of this paper is in its (1) explicit modeling of reciprocal features of multiple time series, and (2) offering of a simple and practical solution to the potentially highdimensional lagged components in the model. There exists a large literature on either components (1) and (2) that could be dated back to early work such as [21]. More recent work in (2) includes the non-reciprocal dynamic-factor model [22], which aims to capture the dynamics of a time series such as a financial indicator with a large number of lagged-predictor variables, such as supply and order variables. Similar to our goal here, different methods such as the principal component and shrinkage method have been proposed to solve the high-dimensional problem [23,24]. For reciprocal-causal models in (1), earlier work arose both in the psychometric literature, especially in structural equation modeling, and in economics. For example, the so-called cross-lagged models have been developed for reciprocal time series [25], although the method mostly addresses problems of relatively low dimension and short panel of cohort data (in the notation of Eq (1), p = 2 and m = 1). Thus, one can view this paper as a way to extend reciprocal models in time series to high dimensions-both in term of interacting variables and the time variable-and to offer a sparse representation of the model structure. One interesting feature of the proposed method for SMARLME is that it simultaneously addresses the model selection and the estimation problem. Additionally, as we have shown in the circadian oscillation example, nonlinear variations over time can also be modeled using transformed terms in the linear predictive model. Further research is required to evaluate the scope and limit of using linear models for nonlinear feedback systems. Although the current SMARLME modeling setup is such that the sparsity is induced by the cardinality constraint, it is possible that some specific sparse structure could be a priori defined, as pointed out by a reviewer. Such an implementation can indeed limit the FoBa search space and improve computational efficiency.
There are several limitations of the current work. First, we have not addressed stationary conditions of the model. It is possible that the estimated model is non-stationary. However, our focus has been in clinical applications in which the long-term behavior of the model may not be a primary concern. In fact, the FoBa algorithm proposed in this paper does not require that the time series are stationary. A second limitation is that we have not taken into account the impact of model selection on inference [26,27]. In other words, the selected sparse model structure may not be correct and therefore it is possible that the coefficients and standard errors reported in Table 2 are biased. This is an issue that cannot be adequately covered in this paper. Further research will examine the impact of selecting different sparse model on coverage properties. Finally, a limitation of the current work is that we have restricted the discussion to linear models and avoided nonlinear regression models. The nonlinear circadian rhythm example used for the generative model in our simulation study has been linearized with exponentiated transformed variables. The estimation proceeds using the proposed linear algorithm, which actually brings some simplification to the problem. The simplification could also be useful when interpreting parameters in the fixed and random effect components of SMARLME, which in some cases could be the primary goal of inference, for example in medical applications in which the AR component is treated as a nuisance factor.
Supporting Information S1 File. S1 File contains an example about the use of matrix formulation for the FoBa estimation of the multivariate autoregressive model. (PDF)