Hybrid modeling and prediction of dynamical systems

Scientific analysis often relies on the ability to make accurate predictions of a system’s dynamics. Mechanistic models, parameterized by a number of unknown parameters, are often used for this purpose. Accurate estimation of the model state and parameters prior to prediction is necessary, but may be complicated by issues such as noisy data and uncertainty in parameters and initial conditions. At the other end of the spectrum exist nonparametric methods, which rely solely on data to build their predictions. While these nonparametric methods do not require a model of the system, their performance is strongly influenced by the amount and noisiness of the data. In this article, we consider a hybrid approach to modeling and prediction which merges recent advancements in nonparametric analysis with standard parametric methods. The general idea is to replace a subset of a mechanistic model’s equations with their corresponding nonparametric representations, resulting in a hybrid modeling and prediction scheme. Overall, we find that this hybrid approach allows for more robust parameter estimation and improved short-term prediction in situations where there is a large uncertainty in model parameters. We demonstrate these advantages in the classical Lorenz-63 chaotic system and in networks of Hindmarsh-Rose neurons before application to experimentally collected structured population data.


Author summary
The question of how best to predict the evolution of a dynamical system has received substantial interest in the scientific community. While traditional mechanistic modeling approaches have dominated, data-driven approaches which rely on data to build predictive models have gained increasing popularity. The reality is, both approaches have their drawbacks and limitations. In this article we ask the question of whether or not a hybrid approach to prediction, which combines characteristics of both mechanistic modeling and data-driven modeling, can offer improvements over the standalone methodologies.

Introduction
Parametric modeling involves defining an underlying set of mechanistic equations which describe a system's dynamics. These mechanistic models often contain a number of unknown parameters as well as an uncertain state, both of which need to be quantified prior to use of the model for prediction. The success of parametric prediction is tied closely to the ability to construct accurate estimates of the model parameters and state. This can be particularly challenging in high dimensional estimation problems as well as in chaotic systems [1,2]. Additionally, there is often a degree of model error, or a discrepancy between the structure of the model and that of the system, further complicating the estimation process and hindering prediction accuracy. Despite these potential issues, mechanistic models are frequently utilized in data analysis. The question we aim to address is when is it advantageous to use them? Under suitable conditions where model error is relatively small and parameters can be reliably estimated, parametric predictions can provide a great deal of accuracy. However, as we will see in the subsequent examples, a large uncertainty in the initial parameter values often leads to inaccurate estimates resulting in poor model-based predictions.
An alternative approach to modeling and prediction abandons the use of any mechanistic equations, instead relying on predictive models built from data. These nonparametric methods have received considerable attention, in particular those methods based on Takens' delaycoordinate method for attractor reconstruction [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17]. The success of nonparametric methods is strongly influenced by the amount of data available as well as the dimension of the dynamical system. If only a sparse amount of training data is available, the result is often inaccurate predictions due to the lack of suitable nearby neighbors in delay-coordinate space. Furthermore, as the dimension and complexity of the dynamical system increases, nonparametric prediction becomes significantly more difficult due to the necessary data requirements [17].
Several recent works have investigated the situation where only a portion of a mechanistic model is known [18,19]. Our motivation here though is to explore how best to use a full mechanistic model when it is available. We consider a hybrid methodology to modeling and prediction that combines the complementary features of both parametric and nonparametric methods. In our proposed hybrid method, a subset of a mechanistic model's equations are replaced by nonparametric evolution. These nonparametrically advanced variables are then incorporated into the remaining mechanistic equations during the data fitting and prediction process. The result of this approach is a more robust estimation of model parameters as well as an improvement in short-term prediction accuracy when initial parameter uncertainty is large.
Our proposed hybrid method is in a sense related to the field of grey-box modeling and identification [20][21][22][23][24][25]. This broad class of methods is referred to as "grey" since they exist somewhere between white-box/clear models (physical definitions and principles only) and black-box models (data-based only). For example, a grey method resulting from the combination of semi-physical and black-box modeling was proposed in [26,27], where a blackbox artificial neural network was used to explain the error residuals from the physical portion of the model. Our hybrid method offers a novel take on this grey box philosophy, merging white and black-box modeling to obtain robust estimation of model parameters.
The utility of the hybrid method is demonstrated in several example systems. The assumption throughout is that noisy training data from a system are available as well as a mechanistic model that describes the underlying dynamics. However, several of the model parameters are unknown and the model state is uncertain due to the noisy measurements. The goal is to make accurate predictions of the system state up to some forecast horizon beyond the end of the training data. We compare the prediction accuracy of the standard parametric and nonparametric methodologies with the novel hybrid method presented here.
We begin our analysis by examining prediction in the classical Lorenz-63 system [28], which exhibits chaotic dynamics. Motivated by the success of the hybrid method in the Lorenz-63 system, we consider a more sophisticated example of predicting the spiking dynamics of a neuron in a network of Hindmarsh-Rose [29] cells. Finally, we examine the prediction problem in a well-known experimental dataset from beetle population dynamics [30].

Materials and methods
The assumption throughout is that a set of noisy data is available over the time interval [t(0), t(T)]. This is referred to as the training data of the system. Using these training data, the question is how best to predict the system dynamics over the interval [t(T + 1), t(T + T F )], known as the prediction interval. Standard parametric and nonparametric methods are presented before our discussion of the novel hybrid method which blends the two approaches.

Parametric modeling and prediction
When a full set of mechanistic equations is used for modeling and prediction, we refer to this as the parametric approach. Assume a general nonlinear system of the form xðk þ 1Þ ¼ fðtðkÞ; xðkÞ; pÞ þ wðkÞ yðkÞ ¼ hðtðkÞ; xðkÞ; pÞ þ vðkÞ ð1Þ where x = [x 1 , x 2 , . . ., x n ] T is an n-dimensional vector of model state variables and p = [p 1 , p 2 , . . ., p l ] T is an l-dimensional vector of model parameters which may be known from first principles, partially known or completely unknown. f represents our system dynamics which describe the evolution of the state x over time and h is an observation function which maps x to an m-dimensional vector of model observations, y = [y 1 , y 2 , . . ., y m ] T . To simplify the description of our analysis, we assume that the training data maps directly to some subset of x. w(k) and v(k) are assumed to be mean 0 Gaussian noise terms with covariances Q and R respectively. While discrete notation is used in Eq 1 for notational convenience, the evolution of x is often described by continuous-time systems. In this situation numerical solvers, such as Runge-Kutta or Adams-Moulton methods, are used to obtain solutions to the continuoustime system at discrete time points.
When the state of a system is uncertain due to noisy or incomplete observations, nonlinear Kalman filtering can be used for state estimation [1,[31][32][33][34][35][36][37][38][39][40][41][42][43]. Here we choose the unscented Kalman filter (UKF), which approximates the propagation of the mean and covariance of a random variable through a nonlinear function using a deterministic ensemble selected through the unscented transformation [44][45][46]. We initialize the filter with state vector x + (0) and covariance matrix P + (0). At the kth step of the filter there is an estimate of the state x + (k − 1) and the covariance matrix P + (k − 1). In the UKF, the singular value decomposition is used to find the square root of the matrix P + (k − 1), which is used to form an ensemble of 2n + 1 state vectors.
The model f is applied to the ensemble, advancing it forward one time step, and then observed with h. The weighted average of the resulting state ensemble gives the prior state estimate x − (k) and the weighted average of the observed ensemble is the model-predicted observation y − (k). Covariance matrices P − (k) and P y (k) of the resulting state and observed ensemble, and the cross-covariance matrix P xy (k) between the state and observed ensembles, are formed and the equations KðkÞ ¼ P xy ðkÞ ðP y ðkÞÞ À 1 P þ ðkÞ ¼ P À ðkÞ À P xy ðkÞ ðP y ðkÞÞ À 1 P yx ðkÞ x þ ðkÞ ¼ x À ðkÞ þ KðkÞðyðkÞ À y À ðkÞÞ: ð2Þ are used to update the state and covariance estimates with the observation y(k).
The UKF algorithm described above can be extended to include the joint estimation problem allowing for parameter estimation. In this framework, the parameters p are considered as auxiliary state variables with trivial dynamics, namely p k+1 = p k . An augmented n + l dimensional state vector can then be formed consisting of the original n state variables and l model parameters allowing for simultaneous state and parameter estimation [1,43].
To implement parametric prediction, the UKF is used to process the training data and obtain an estimate of p, as well as the state at the end of the training set, x(T). The parameter values are fixed and Eq 1 is forward solved from t(T) to generate predictions of the system dynamics over the prediction interval [t(T + 1), t(T + T F )]. Namely, predictions x(T + 1), x(T + 2), . . ., x(T + T F ) are calculated.

Takens' method for nonparametric prediction
Instead of using the mechanistic model described by Eq 1, the system can be represented nonparametrically. Without loss of generality consider the observed variable x j . Using Takens' theorem [47,48], the d + 1 dimensional delay coordinate vector x d j ðTÞ ¼ ½x j ðTÞ; x j ðT À tÞ; x j ðT À 2tÞ; . . . x j ðT À dtÞ is formed which represents the state of the system at time t(T). Here d is the number of delays and τ is the time-delay.
The goal of nonparametric prediction is to utilize the training data in the interval [t(0), t(T)] to build local models for predicting the dynamics over the interval [t(T + 1), t(T + T F )]. Here, the method of direct prediction is chosen. Prior to implementation of direct prediction, a library of delay vectors is formed from the training data of x j . Direct prediction begins by finding the κ nearest neighbors, as a function of Euclidean distance, to the current delay-coordinate vector x d j ðTÞ within the library of delay vectors. Neighboring delay vectors x d j ðT 0 Þ ¼ ½x j ðT 0 Þ; x j ðT 0 À tÞ; x j ðT 0 À 2tÞ; . . . x j ðT 0 À dtÞ x d j ðT 00 Þ ¼ ½x j ðT 00 Þ; x j ðT 00 À tÞ; x j ðT 00 À 2tÞ; . . . x j ðT 00 À dtÞ . . .
x d j ðT k Þ ¼ ½x j ðT k Þ; x j ðT k À tÞ; x j ðT k À 2tÞ; . . . x j ðT k À dtÞ are found within the training data and the known x j (T 0 + i), x j (T 00 + i), . . ., x j (T κ + i) points are used in a local model to predict the unknown value x j (T + i) where i = 1, 2, . . ., T F . In this article, a locally constant model is chosen where w 0 j ; w 00 j ; . . . ; w k j are the weights for the j th state that determine the contribution of each neighbor in building the prediction. In its simplest form, Eq 3 is an average of the nearest More sophisticated weighting schemes can be chosen, for example assigning the weights based on the Euclidean distance from each neighbor to the current delay vector [18,49,50]. Selection of values for d, τ and κ is necessary for implementation of the direct prediction algorithm. There is a wide range of literature on the selection of embedding parameters d and τ (see for example [51][52][53][54][55][56][57][58]). These parameters can also be chosen using a cross-validation approach whereby the training data are partitioned into a smaller training set and a validation set to allow for within-sample prediction and evaluation. The known training data can thus be used to select d, τ and κ prior to the out-of-sample prediction that occurs during the prediction interval [t(T + 1), t(T + T F )]. Parameters for nonparametric prediction were chosen within each example.
The accuracy of the predicted x j (T + i) is subject to several factors. The presence of noise in the training data plays a substantial role in decreasing prediction accuracy. However, recent advancements in nonparametric analysis have addressed the problem of filtering time series without use of a mechanistic model. In [17], a nonparametric filter was developed which merged Kalman filtering theory and Takens' method. The resulting Kalman-Takens filter was demonstrated to be able to reduce significant amounts of noise in data. Application of the method was extended in [59] to the case of filtering stochastic variables without a model. In the results presented below, the training data used for nonparametric prediction are filtered first using the method of [17,59].

Hybrid modeling and prediction: Merging parametric and nonparametric methods
As an alternative to the parametric and nonparametric methods described above, we propose a hybrid approach which blends the two methods together. In this framework, we assume that a full mechanistic model as described by Eq 1 is available. However, rather than using the full model, a subset of the mechanistic equations are used and the remainder of the variables are represented nonparametrically using delay-coordinates.
In formulating this method it is convenient to first think of Eq 1 without vector notation Now assume only the first n − 1 equations of Eq 4 are used to model state variables x 1 , x 2 , . . ., x n−1 , while x n is described nonparametrically . .
We refer to Eq 5 as the hybrid model. Note, in Eq 5 only x n is assumed to be advanced nonparametrically. This is done purely for ease of presentation and the hybrid model can instead contain several variables whose equations are replaced by nonparametric advancement. Nonparametric advancement in the hybrid model similarly requires setting the number of delays, time-delay and the number of nearest neighbors with which to build the local model.
Throughout our examples, these values are kept the same as those used by the standalone nonparametric method to ensure fair comparison. The hybrid model has several distinguishing features. Notice, in this framework nonparametrically advanced dynamics are incorporated into mechanistic equations, essentially merging the two lines of mathematical thought. Furthermore, equations for state variables within Eq 4 can be replaced only if there are observations which map directly to them, otherwise their dynamics can not be nonparametrically advanced. Finally, the process of replacing equations in the hybrid method will generally result in a reduction in the number of unknown model parameters to be estimated. We will observe two benefits of this methodology. By reducing the number of mechanistic equations, and consequently the number of unknown parameters, we are able to obtain a more robust estimation of the model parameters in the mechanistic portion of the model. This is to be expected since in a sense, the hybrid approach allows us to reduce the dimension and complexity of the parameter space over which we have to optimize, allowing us to obtain better estimates of the parameters compared to the full dimensional estimation problem required by standard parametric modeling. This improvement is particularly noticeable when there is large uncertainty in the initial parameter values. Secondly, as a consequence of this robust estimation, the hybrid method offers improved predictions for those variables it models mechanistically. The more accurate parameterization results in improved short-term prediction accuracy when compared to the other methods.
In this hybrid scheme, obtaining an estimate of the unknown parameters in the n − 1 mechanistic equations and an estimate of x(T) requires a combination of the nonparametric analysis developed in [17] and traditional parametric methodology. The state variable x n , which is not defined by a mechanistic equation in Eq 4, is represented by delay coordinates within the UKF. Therefore at step k we have the hybrid state x H ðkÞ ¼ ½x 1 ðkÞ; x 2 ðkÞ; . . . ; x nÀ 1 ðkÞ; x n ðkÞ; x n ðk À tÞ; x n ðk À 2tÞ; . . . ; x n ðk À dtÞ T The UKF as described above is implemented with this hybrid state x H (k) and the model described by Eq 5. Notice that in the case of the hybrid model when we have to advance the state dynamics and form the prior estimate in the UKF, the advancement is done parametrically for the first n − 1 states and nonparametrically for the n th state. Similarly to before, we can augment x H with the unknown parameters in the n − 1 mechanistic equations allowing for simultaneous parameter estimation. Once the training data are processed and an estimate of x H (T) and the parameters are obtained, the hybrid model in Eq 5 is implemented to generate predictions The natural question to ask is which mechanistic equations should be replaced by nonparametric representation in this hybrid scheme? This is a complicated question and inevitably is situation dependent. In some situations, which hybrid model to choose may not be clear a priori, and several different models may need to be considered in combination with an uncertainty analysis on the results to determine the optimal hybridization. In other situations, the mechanistic equations to replace may be clearer. For example, if the researcher is less confident in the mechanistic description of certain variables, or certain equations include parameters about which one has less confidence, then those equations should be replaced by their nonparametric counterpart to reduce the overall error in the model. If one is only interested in certain parameters or processes, then the appropriate mechanistic equations need to be included and the remainder of the equations can be replaced by nonparametric evolution to reduce the estimation complexity and obtain better estimates and predictions for the included equations. Nevertheless, individual experimental situations will dictate if and how much confidence we have in our mechanistic model as well as which portions are most important to our analysis. These factors must be used as guidance for constructing the preferred hybrid formulation.

Results
We demonstrate the utility of the hybrid methodology, with comparison to standard parametric and nonparametric modeling and prediction, in the following example systems. When conducting this analysis, two types of error are considered. The first, error in the observations, manifests itself as noise in the training data which all three methods will have to confront. The second type, error in the parameters, takes the form of an uncertainty in the initial parameter values used by the UKF for parameter estimation. Only the parametric and hybrid methods will have to deal with this parameter error. Throughout, we will refer to a percentage uncertainty which corresponds to the standard deviation of the distribution from which the initial parameter value is drawn relative to the mean. For example, if the true value for a parameter p 1 is 12 and we have 50% uncertainty in this value, then the initial parameter value used for estimating p 1 will be drawn from the distribution N(12, (0.5 Ã 12) 2 ).
To quantify prediction accuracy, the normalized root-mean-square-error, or SRMSE, is calculated for each prediction method as a function of forecast horizon. Normalization is done with respect to the standard deviation of the variable as calculated from the training data. In using the SRMSE metric, the goal is to be more accurate than if the prediction was simply the mean of the training data (corresponding to SRMSE = 1). Thus a prediction is better than a naive prediction when SRMSE < 1, though for chaotic systems prediction accuracy will eventually converge to this error level since only short-term prediction is possible.

Prediction in the Lorenz-63 system
As a demonstrative example, consider the Lorenz-63 system [28] where σ = 10, ρ = 28, β = 8/3. Data are generated from this system using a fourth-order Adams-Moulton method with sample rate h = 0.05. We assume that 500 training data points of the x, y and z variables are available, or 25 units of time. The Lorenz-63 system oscillates approximately once every unit of time, meaning the training set consists of about 25 oscillations. The goal is to accurately predict the dynamics of x, y and z one time unit after the end of the training set. However, the observations of each variable are corrupted by Gaussian observational noise with mean zero and variance equal to 4. Additionally the true value of parameters σ, ρ and β are unknown. Fig 1 shows an example simulation of this system.
The parametric method utilizes Eq 6 to estimate the model state and parameters, and to predict the x, y and z dynamics. For the nonparametric method, delay coordinates of the variables are formed with d = 9 and τ = 1. The local constant model for prediction is built using κ = 20 nearest neighbors. For the hybrid method, the mechanistic equation governing the dynamics of y are replaced nonparametrically resulting in the reduced Lorenz-63 model Note, this hybrid model in Eq 7 does not require estimation of the ρ parameter since the mechanistic equation for y is removed .  Fig 2 shows a comparison of parametric (black), nonparametric (blue) and hybrid (red) prediction error as a function of forecast horizon. SRMSE results averaged over 500 system realizations with error bars denoting standard error. With an initial parameter uncertainty level of 80% (solid lines), the hybrid method offers improved short-term prediction of the Lorenz-63 x (Fig 2a), y (Fig 2b) and z (Fig 2c) variables over standalone parametric prediction. Note that parametric prediction at this uncertainty level does very poorly and in the cases of y and z its result is not seen due to the scale of the error.
The success of the hybrid method can be traced to more accurate estimates of the model parameters in the mechanistic equations that it uses. Table 1 shows the resulting hybrid and parametric estimation of the Lorenz-63 parameters. The hybrid method is able to construct accurate estimates of both σ and β, with a mean close to the true value and a small standard deviation of the estimates. The parametric method at the same 80% uncertainty level is unable to obtain reliable estimates, exemplified by the large standard deviation of the estimates. Error bars denote standard error over the 500 realizations. Training data consists of 500 data points generated from Eq 6 with σ = 10, ρ = 28 and β = 8/3 with sample rate h = 0.05. Data are corrupted by Gaussian observational noise with mean 0 and variance of 4. Parametric (black), nonparametric (blue) and hybrid (red) prediction accuracy with parameter uncertainty of 80% (solid line) plotted as a function of forecast horizon. Hybrid prediction, which utilizes mechanistic equations in describing x and z but nonparametrically represents y, offers an improvement in short-term prediction accuracy over standalone nonparametric prediction. Parametric prediction at this uncertainty level performs poorly in predicting all three variables and in the case of (b) and (c) is not seen due to the scale of the error. For comparison purposes, the parametric method with 50% (dotted line) and 20% (dashed line) uncertainty is also considered. As the uncertainty shrinks, performance of the parametric method improves. However, only at a small uncertainty level does it outperform the short-term improvement in prediction afforded by the hybrid method. For comparison purposes, we consider the parametric method at smaller parameter uncertainty levels of 50% (Fig 2, dotted line) and 20% (Fig 2, dashed line). As the uncertainty decreases for the parametric method, its performance improves. Similarly in Table 1, we observe that the shrinking uncertainty results in a more accurate estimation of the model parameters. However at 50% uncertainty, the parametric method is still outperformed by the hybrid method with a higher uncertainty level. When the uncertainty is small, parametric prediction offers better performance, which is to be expected since it has access to the true model equations and starts out with close to optimal parameter values. The hybrid method was tested at these additional uncertainty levels, however its performance did not change substantially. This can be attributed to the fact that the hybrid method already obtains an accurate estimation of the model parameters at 80% uncertainty and thus shrinking the initial parameter uncertainty does not drastically change the results.
The hybrid method also provides improved prediction of the x and z variables compared to standalone nonparametric prediction. We note that hybrid and nonparametric prediction of y is comparable, which is to be expected since y is represented nonparametrically in the hybrid model chosen. This is a subtle, but important point to note and throughout our results we will see that the hybrid method offers improved prediction only for those variables that it models mechanistically. Those that are modeled nonparametrically will have similar performance to standalone nonparametric prediction.
The formulation chosen in Eq 7 is of course one of several possible hybrid model decompositions. As another example, we can consider the hybrid model where y and z are represented nonparametrically. Notice that for this hybrid model, nonparametric advancement of z does not enter into the mechanistic equation.

Predicting neuronal network dynamics
We now consider the difficult high dimensional estimation and prediction problem posed by neuronal network studies. If we are only interested in predicting a portion of the network, then we can use the proposed hybrid method to refine our estimation and prediction while simultaneously reducing estimation complexity. As an example of this potential network application we consider the prediction of spiking dynamics in a network of M Hindmarsh-Rose neurons [29] _ where i = 1, 2, . . ., M. x i corresponds to the spiking potential while y i and z i describe the fast and slow-scale dynamics, respectively, of neuron i. Each individual neuron in the network has parameters a i = 1, b i = 3 and c i = 5 which are assumed to be unknown. β im represents the connectivity coefficient from neuron i to neuron m. For a network of size M, we have M 2 − M possible connection parameters since neuron self connections are not allowed (i.e. β ii = 0). These connection parameters are also assumed to be unknown. The neurons in this network are coupled to one another through the voltages by an exponential term which acts as a gating function, allowing the transfer of information only when a neuron is about to spike. This prevents constant communication across the network, and has biophysical plausibility [40]. For this example we examine networks of size M = 3 with 5 random connections. Data from these networks are generated using a fourth-order Adams-Moulton method with sample rate h = 0.08 ms. We assume that the training data consists of 3000 observations, or 240 ms, of the x 1 , x 2 , x 3 variables each of which are corrupted by Gaussian noise with mean 0 and variance of 0.2. Under the stated parameter regime, the neurons in the network spike approximately every 6 ms, meaning our training set has on average around 40 spikes per neuron. In this example, we restrict our focus to predicting 8 ms of the x 3 variable (though a similar analysis follows for the prediction of x 1 and x 2 ). Fig 4a shows a representative realization of this problem. Fig 4b shows the resulting accuracy in predicting x 3 when using parametric (black), nonparametric (blue) and hybrid (red) methods. Results are averaged over 200 realizations with error bars denoting standard error. Error bars are only shown for every tenth forecast point so as to aid in visualization. At the 80% uncertainty level (solid line), the parametric method performs poorly. The parametric approach uses the full mechanistic model described by Eq 9 for modeling and prediction, requiring estimation of the x, y and z state variables and parameters a, b and c for each neuron, as well as the full connectivity matrix. Notice that once again with 80% uncertainty, the scale of error for the parametric method is much larger compared to the other methods.
The nonparametric method for this example (τ = 1, d = 9) uses κ = 10 neighbors for building the local model for prediction. Network level dynamics such as these can prove problematic for nonparametric prediction due to the increased complexity of the system dynamics. As the size of the network grows, the requisite data needed to make accurate multi-step-ahead predictions substantially increases. For the small network in this example, the nonparametric method is still able to provide fairly accurate predictions. However, dimensionality of the system and the required data needed is a considerable limiting factor when analyzing highdimensional networks. When considering the parametric method with 50% uncertainty, prediction accuracy between it and the hybrid method is comparable over the first 2 ms.
The benefit of the hybrid method in analyzing this network is that since we are only interested in neuron 3, we can assume a mechanistic equation for only this neuron and nonparametrically represent neuron 1 and neuron 2. The hybrid model in Eq 10 results in a substantial reduction in the complexity of the estimation problem while still giving us some information about the dynamics of neuron 3. This blending once again results in improved prediction accuracy. We see that the the hybrid method outperforms both parametric and nonparametric predictions. For comparison purposes, we show the parametric method at a smaller uncertainty level of 50% (Fig 4b, dotted line). The hybrid method with higher uncertainty offers comparable prediction within the first 2 ms to the parametric method with this lower uncertainty level. Note that unlike in the Lorenz-63 example, we do not consider the parametric method with 20% uncertainty since reasonable parameter estimates and predictions are obtained with 50% uncertainty. Table 2 shows the robustness of the hybrid method in estimating the individual parameters for neuron 3. Even with a high uncertainty, the hybrid method is able to obtain accurate and reliable estimates of the neuron parameters compared to the parametric method at both the high and medium uncertainty levels.

Predicting flour beetle population dynamics
We now investigate the prediction problem in a well-known data set from an ecological study involving the cannibalistic red flour beetle Tribolium castaneum. In [30], the authors present experimentally collected data and a mechanistic model describing the life cycle dynamics of T. castaneum. Their discrete time model describing the progression of the beetle through the larvae, pupae, and adult stages is given by where L, P and A correspond to larvae, pupae and adult populations, respectively. The essential interactions described by this model are (i) flour beetles become reproductive only in the adult stage, (ii) adults produce new larvae, (iii) adults and larvae can both cannibalize larvae, and (iv) adults cannibalize pupae. We note that since Eq 11 only approximates the life cycle dynamics of the beetle, there is a degree of model error in the proposed system, unlike the previous examples. The authors of [30] experimentally set the adult mortality rate (μ a ) to 0.96 and the recruitment rate (c pa ) from pupae to adult to seven different values (0, 0.05, 0.10, 0.25, 0.35, 0.50, 1.0). Experiments at each recruitment rate value were replicated three times resulting in 21 different datasets. Each dataset consists of total numbers of larvae, pupae, and adults measured biweekly over 82 weeks resulting in 41 measurements for each life stage. These data were fit to Eq 11 in [30] and parameter estimates b = 6.598, c el = 1.209 × 10 −2 , c ea = 1.155 × 10 −2 and μ l = 0.2055 were obtained. We treat these parameter values as ground truth when considering the different parameter uncertainty levels for fitting the data to the model.
In our analysis of this system, we treat the first 37 measurements (or 74 weeks) within an experiment as training data and use the remaining 4 time points (or 8 weeks) for forecast evaluation. Fig 5 shows an example of this setup for a representative dataset. Fig 6 shows the results of predicting the larvae (Fig 6a), pupae (Fig 6b) and adult (Fig 6c) populations using parametric (black), nonparametric (blue) and hybrid prediction methods with 80% (solid lines) parameter uncertainty. Error bars correspond to the standard error over the 21 datasets. The parametric method uses the full mechanistic model described in Eq 11 to estimate the population state and parameters b, c el , c ea and μ l before prediction. We note in Fig 6 that the parametric method is not seen due to the scale of the error, and is significantly outperformed by the nonparametric prediction (τ = 1, d = 2, κ = 5). For the hybrid method, we only consider the mechanistic equations for pupae and adult population dynamics and nonparametrically represent larvae. Similarly to our previous examples, the hybrid method outperforms parametric prediction at the 80% uncertainty level. The parametric method with a smaller uncertainty of 50% (dotted lines) offers comparable performance to the hybrid method with larger uncertainty. As a result of the hybrid model described by Eq 12, prediction accuracy for larvae (Fig 6a) should be comparable between the hybrid and nonparametric methods, and we do in fact see overlap between the two prediction curves. However, the hybrid method in this example does give us improved one-step ahead (2 weeks) prediction of pupae levels and improved two-step ahead (4 weeks) prediction for adult population levels over standalone nonparametric prediction. After these forecast horizons, there is overlap in the error bars between the two methods.

Conclusion
By blending characteristics of parametric and nonparametric methodologies, the proposed hybrid method for modeling and prediction offers several advantages over standalone methods. From the perspective of model fitting and the required parameter estimation that arises in this process, we have shown that the hybrid approach allows for a more robust estimation of model parameters. Particularly for situations where there is a large uncertainty in the true parameter values, the hybrid method is able to construct accurate estimates of model parameters when the standard parametric model fitting fails to do so. At first this may seem counterintuitive, but in fact it is not that surprising. The replacement of mechanistic equations with their nonparametric representations in effect reduces the dimension of the parameter space that we have to optimize in, resulting in better parameter estimates. As we have demonstrated in the above examples, this refinement in the parameter estimates leads to an improvement in short-term prediction accuracy for the mechanistic equations used by the hybrid model.
The limitations of the hybrid method are similar to those of parametric and nonparametric methods in that if not enough training data are available then accurate estimation and prediction becomes difficult. However, the demonstrated robustness of the hybrid method to large parameter uncertainty is encouraging, particularly when considering experimental situations where we may not have a good prior estimate of the model parameters. One could consider implementing the hybrid method in an iterative fashion, estimating the parameters of each equation separately, then piecing the model back together for analysis and prediction. We can think of this as an iterative hybrid method, and is the subject of future work.
We view this work as complementary to recent publications on forecasting [49,50,60]. The authors of [49,50] advocate nonparametric methods over parametric methods in general, while a letter [60] addressing the work of [49] showed that a more sophisticated method for model fitting results in better parameter estimates and therefore model-based predictions which outperform model-free methods. Our results support the view that no one method is uniformly better than the other. As we showed in the above examples, in situations where the model error and uncertainty in initial parameters are relatively small, the parametric approach Average SRMSE over 21 experimental datasets when using parametric (black curve), nonparametric (blue curve) and hybrid (red curve) methods for predicting (a) larvae, (b) pupae and (c) adult population levels with uncertainty of 80% (solid line) and 50% (dashed-dotted line). Error bars correspond to standard error over the 21 datasets. Hybrid prediction with 80% uncertainty offers improved prediction over both nonparametric and parametric with 80% uncertainty (not visible due to scale of error), and comparable performance to parametric with 50% uncertainty.
https://doi.org/10.1371/journal.pcbi.1005655.g006 outperforms other prediction methods. Furthermore it gives us system relevant parameter values that have some quantifiable interpretation. Often in experimental studies though, we are not operating in this ideal situation and instead are working with a model that has substantial error with a large uncertainty in parameters which can lead to inaccurate system inference. In situations such as these, nonparametric methods are particularly useful given their reliance on data only. Of course their limitation is that they sacrifice the potential for biophysical interpretability during the model fitting process.
The main appeal of the hybrid method is that we can confront these situations without having to completely abandon the use of either parametric or nonparametric methodologies, effectively operating within the grey spectrum of mathematical modeling. While we explored in detail the robustness of the hybrid method to large levels of parameter uncertainty in the mechanistic portion of the model, its usefulness stretches well beyond that. In some instances, we may only have a model for some of the states, or portions of the model may have higher error than others. By supplementing these parts with their nonparametric representation, the hybrid method would allow us to only use the parts of the model we are confident in and thus improve our analysis. The quantification of model error and uncertainty is thus an important future direction to incorporate into this hybrid modeling framework. Uncertainty quantification methods will allow us to further evaluate the confidence of the various estimates and predictions. Additionally, incorporating ideas from adaptive filtering [38] would give us a framework with which to evaluate the error present during the data fitting process for each model. Leveraging these fields of mathematical analysis would provide us with additional guidance in forming these hybrid models.