Confidence intervals by constrained optimization—An algorithm and software package for practical identifiability analysis in systems biology

Practical identifiability of Systems Biology models has received a lot of attention in recent scientific research. It addresses the crucial question for models’ predictability: how accurately can the models’ parameters be recovered from available experimental data. The methods based on profile likelihood are among the most reliable methods of practical identification. However, these methods are often computationally demanding or lead to inaccurate estimations of parameters’ confidence intervals. Development of methods, which can accurately produce parameters’ confidence intervals in reasonable computational time, is of utmost importance for Systems Biology and QSP modeling. We propose an algorithm Confidence Intervals by Constraint Optimization (CICO) based on profile likelihood, designed to speed-up confidence intervals estimation and reduce computational cost. The numerical implementation of the algorithm includes settings to control the accuracy of confidence intervals estimates. The algorithm was tested on a number of Systems Biology models, including Taxol treatment model and STAT5 Dimerization model, discussed in the current article. The CICO algorithm is implemented in a software package freely available in Julia (https://github.com/insysbio/LikelihoodProfiler.jl) and Python (https://github.com/insysbio/LikelihoodProfiler.py).


Practical and structural identifiability
Reliability and predictability of a kinetic systems biology model depends on how precisely the parameters of the model can be recovered from the given experimental data. Fitting a model to experimental data is not enough to estimate all the parameters unambiguously. Noisy or incomplete experimental data as well as the models structure often result in uncertainty in parameters estimations.
Identifiability analysis is crucial for models verification. It addresses the question to what extent and with what level of certainty can parameters of a model be recovered from the available experimental data. Two branches of identifiability analysis are distinguished [1] often referred to as structural identification and practical identification. While structural identifiability is the characteristic of a model's structure and does not take into account available experimental data, practical identifiability considers real noisy and incomplete experimental data.
The goal of structural approach [2,3] (prior identifiability analysis) is to verify model's identifiability by exploring the model's structure independently from the experimental data. A wide range of methods have been proposed for testing structural identifiability. The strengths and weaknesses of those methods have been thoroughly analyzed in scientific literature [1,4].
Practical identification (posterior identifiability analysis) is a data-based approach. The approach addresses the possibility and the precision of parameters estimation based on available data. It takes into account the measurement noise and data incompleteness. Hence, parameters' values can be recovered only with some level of certainty, typically described by confidence intervals and confidence regions. The authors of the study [5] define practical identifiability on the basis of profile likelihood notion: identifiable parameter is one that has finite profile likelihood-based confidence interval. Accordingly, the non-identifiable parameters' profile likelihood-based confidence interval is infinite.
Even if a model includes only structurally identifiable parameters it doesn't imply their practical identifiability. While structural non-identifiability implies practical non-identifiability, structurally identifiable models often appear to be practically non-identifiable [6].
Profile likelihood is a reliable though computationally demanding approach to test parameters' identifiability in Systems Biology (SB). It helps us understand how the data can be mapped to parameters' values and how accurate the model predictions are.
Following the definitions of [5], in the current study we propose new algorithm for practical identification and confidence intervals estimation. This algorithm is designed to produce confidence intervals in shorter computational time compared to other profile likelihood-based approaches while controlling the accuracy of estimates. It does not require the intermediate points to lie on the likelihood profile, which leads to less likelihood function calls. We also propose an implementation of the algorithm in a free open source package tested on a number of published kinetic models.

A kinetic systems biology model
A kinetic systems biology model can be expressed as an ODE system: The state vector x(t) denotes variables of the model (e.g. concentrations of molecular compounds or other values), u(t)-known input or control (e.g. treatment regime), p -parameters of the model and f is defined by rate laws. x(t) variables can be numerically integrated for the time range (0, t end ) given nominal initial values x 0 = x(0) and parameters p.

Parameters evaluation and point estimates
The subset of unknown parameters can be estimated using the experimental dataset by solving the inverse problem. Typically, not all the variables x are directly measured and observableŝ y i ðtÞ denote experimentally accessible quantities. The observables can be defined as function of x(t), set of additional parameters s (observation parameters) and random values usually representing measurement errors. An important case of measurement error is additive error with known variance:ŷ where ε i are the measurement errors and g i are observation functions, n is the number of measured components. The unknown parameters θ�{p,s,x 0 } can be estimated by fitting simulated values y i (θ,t) = g i (x(t,p,x 0 ), s) to experimental dataŷ i . Assuming the joint distribution of the measurement noise ε is known, the estimates of parametersθ are typically obtained with MLE approach [7]. It implies maximizing the probability of obtainingŷ i values, given the model with θ parameters. This is usually performed by minimizing the corresponding negative logarithm of the likelihood function (objective function): lðθÞ ¼ À 2log½LðθÞ� The exact choice of the likelihood function Λ(θ) is based on measurement error model. For additive error with known variance according to (2) it can be represented as sum of squared residuals: Here the double summation is performed over n -the number of measured components and k -the number of measured time points.ŷ ij denote experimental data points, y i (θ,t j )-simulated values andŝ 2 ij is the error variance. MLE provides the point estimatesθ for the unknown parameters θ but does not tell us anything about the uncertainty in θ estimates. Indeed, the estimated parametersθ may not be unique: another set of parameters may give the same objective function value or be very close to it. The accuracy of the estimates can be expressed by confidence intervals or confidence bands.

Profile Likelihood based confidence intervals
Confidence interval (CI) is an estimate of the unknown parameter which characterizes it by the range of values for particular confidence level α. The confidence interval is a better alternative to the point estimate because it gives more information about possible parameter values. A confidence interval with confidence level α for the parameter θ i is an interval defined by probability It is important to note that the definition uses the probability term. It implies constructing a confidence interval many times using numerous data samples, which is typically impossible. Researchers often use different asymptotic methods to estimate confidence intervals, which can produce different estimations [8].
Different methods of CI estimation may lead to different definitions of parameters' identifiability. Profile likelihood is one of the most common and robust ways to construct CIs and state practical identifiability of the estimated parameters [9] based on likelihood-ratio test. It implies constructing likelihood-based CIs by exploring l(θ) as a function of a single parameter θ i [10] l PL ðy i Þ ¼ min Corresponding confidence interval for an estimateŷ i with confidence level α is defined by where Δ α is α quantile of the χ 2 distribution if the likelihood ratio test is used,θ is the point estimate of the unknown parameters θ which corresponds to the minimum of l(θ). Confidence intervals estimation is the major goal of practical identifiability analysis.
According to [5] "a parameter estimateŷ i is practically non-identifiable, if the likelihood-based confidence region is infinitely extended in increasing and/or decreasing direction of θ i , although the likelihood (negative log-likelihood) has a unique minimum for this parameter".

Available methods
Two general numerical approaches to construct parameters profiles and PL-based CIs are currently developed and implemented in software packages [11][12][13]. They can be distinguished as stepwise optimization-based approaches and integration-based approaches. These approaches sequentially calculate l PL (θ i ) until the profile function reaches the threshold lðθÞ þ D a .
Stepwise optimization-based approaches are based on the definition of l PL (θ i ). They imply exploring the shape of l PL (θ i ) by making small steps from the minima y i ¼ŷ i in the increasing or decreasing direction and re-optimizing l(θ) for all θ j6 ¼i at each step of θ i . The smaller θ i steps the numerical algorithm takes while exploring l PL (θ i ) the more accurate the profiles are. At the same time, re-optimizing l(θ) at each θ i step may require thousands of likelihood function calls, which can be inacceptable for high dimensional ODE models. Progressive derivativebased [5] and linearly extrapolated stepping [11] have been proposed to make appropriate steps and more accurate profile estimations.
Integration-based approaches suggest obtaining θ i profile as a solution of the ODE system. The ODE system itself is derived from optimal conditions for constrained optimization of l(θ) defined in Lagrangian form. Potentially solving the modified ODE system should produce arg min y j6 ¼i ½lðθÞ�. However, numerical integration of these ODEs requires Hessian of the likelihood function, which is hard or impossible to compute in many real cases. A number of ideas have been proposed to relax the requirements and either approximate Hessian [14] or obtain it from adjoint sensitivity analysis [12].
Various numerical implementations of stepwise optimization-based and integrationbased approaches have been developed [13,15] CI endpoints can be obtained with these methods as sequence of optimizations or numerical integration steps, which is often unstable or computationally expensive. The success of these methods critically depends on the initial step choice, and calculations become even more expensive when parameter is not identifiable or has wider confidence interval than expected. Existing PL methods are mainly focused on visualizing the profiles and stating if the parameter is identifiable or non-identifiable. The accuracy of CI endpoints estimation is in general beyond the scope of these methods.

Algorithm
The current study presents a new approach for confidence intervals estimation and profile likelihood-based analysis of identifiability: Confidence Intervals estimated by Constrained Optimization (CICO). It addresses the above-mentioned difficulties of stepwise optimizationbased and integration-based PL implementations, namely computational effort, accuracy of CI endpoints estimation and algorithm termination criteria. The key idea of the method is to obtain CI endpoints and avoid the calculation of profiles as the most computationally expensive part of the analysis.

Method rationale
According to [10] for a given significance level α CI a;y i endpoint values y � i ¼ fy L i ; y U i g can be found as solutions of the system of m equations: where j = 1,. . .,i−1,i+1,. . .,m; m is the number of parameters, and l � a ¼ lðθÞ þ D a in terms of (6).
Modified version of Newton-Raphson algorithm is proposed in [10] to solve (7) and obtain y � i . Here we propose a different approach to solve (7) based on constrained optimization. Assuming there exists a solution of (7) and l(θ) possesses derivatives at θ � , we can denote @l @y i θ � ð Þ ¼ s.
1. In case s<0, we can multiply the right and left side of Eq (7) by a positive parameter m ¼ À 1 s > 0 and rewrite the system in the following form: or using matrix notation: Note, that c T θ is a hyperplane with normal vector c T : The system (8) states the necessary optimality conditions (Karush-Kuhn-Tucker conditions) at θ � for the following Lagrangian function: which refers to minimization of target function f(θ) = c T θ = θ i with inequality constraint lðθÞ À l � a � 0. The minimal θ i value is the lower CI endpoint y L i : 2. Likewise, in case s>0 we can denote m ¼ 1 s > 0 and apply the similar transformations to the system (7) to obtain optimality conditions for Lagrangian function: which refers to minimization of target function f(θ) = −θ i with inequality constraint lðθÞ À l � a � 0 and. The maximal θ i value is the upper CI endpoint y U i : 3. @l @y i θ � ð Þ ¼ s ¼ 0 is a special case. In this case rl(θ � ) = 0 and θ � is a stationary point of l(θ) which can be a solution of (7) but does not satisfy (8). Theoretically, the CICO algorithm excludes this case and additional assumption @l @y i θ � ð Þ 6 ¼ 0 should be made for (7) and (8) to be equivalent. In practice, exact equality @l @y i θ � ð Þ ¼ 0 can hardly happen and derivatives close to zero can be handled by lowering the tolerance of the chosen optimizer and ODE solver.

Interpretation
In the previous section we have reformulated the problem of confidence intervals estimation in the terms of constrained optimization. This approach has a clear geometrical interpretation. We are looking for tangent hyperplanes to the confidence region CR a ¼ fθ : lðθÞ À l � a � 0g, which correspond to the minimal and maximal feasible θ i . For θ2R 2 the approach can be illustrated by Fig 1. The contour lines reflect confidence regions for different l � a values. (A) plot stands for identifiable case and (B) for non-identifiable. In identifiable case (A) each confidence region is limited. Hence, corresponding confidence intervals CI a;y i have finite endpoints. In non-identifiable case (B) confidence intervals for parameter θ 1 is infinite and confidence interval for θ 2 has no finite upper endpoint. CI endpoints were calculated using CICO method.

Scan bounds and termination criteria
All PL-based approaches: stepwise optimization, integration-based algorithm and CICO imply exploring θ space by calculating an objective function l(θ) at different θ points. For a given parameter θ i no a-priori information about its identifiability is usually available. In case θ i is identifiable we can expect that the profile will intersect with the threshold. In contrast, to state parameter's non-identifiability we have to check all θ i feasible values, which can be the whole R space. The definition of practical non-identifiability [9] requires exploration of the whole θ i domain but in practice it is never performed. Due to the limitations of computational resources a limited region of θ i is often utilized in practice for general identifiability analysis.
To address the discrepancy between identifiability definition and its practical application the numerical implementation of CICO proposes the notion of scan bounds ðy BL i ; y BU i Þ which represent feasible parameters' values. The scan bounds may be selected based on biologically acceptable values or available computational resources. In practice this approach was utilized by researchers implicitly but the bounds were not used for algorithms termination criteria.
The proposed scan bounds naturally suggest the notion of practical identifiability within the bounds. We will call a parameter "practically identifiable within the bounds" if its whole confidence interval for a particular confidence level α is located inside the pre-defined scan bounds, i.e. ½y L i ; y U i � � ðy BL i ; y BU i Þ. If the condition is not satisfied, i.e. 9y � i 2 ½y L i ; y U i �, but y � i 2 ðÀ 1; y BL i � [ ½y BU i ; þ1Þ we will call this parameter practically non-identifiable within the bounds.
It is necessary to note that the PL-based confidence intervals may be asymmetric relative tô θ in contrast to asymptotic confidence intervals. In some cases CIs have finite endpoint in one direction and infinite endpoint in another. In practice it is reasonable to analyze the identifiability of lower and upper sides separately. The definition of identifiability within the bounds is utilized in the CICO implementation.
If lower or upper CI endpoint is present within the scan bounds ðy BL i ; y BU i Þ the algorithm converges to the endpoint with preset tolerance. If one of confidence interval's point is found out of scan bounds ðy BL i ; y BU i Þ the algorithm terminates and the appropriate message is displayed.

Software implementation: LikelihoodProfiler
We provide an implementation of CICO algorithm in an open source free package Likelihood-Profiler https://github.com/insysbio/LikelihoodProfiler.jl written in Julia language [16]. The package was also translated to free open source package in Python https://github.com/ insysbio/LikelihoodProfiler.py. LikelihoodProfiler allows the user to perform CI estimation and state parameter's identifiability. The main function exposed to the end-user is get_interval which calculates the upper and lower CI endpoints for the selected parameter θ i . Currently the CICO implementation depends on NLopt package [17] and the user can choose any suitable optimization algorithm from this package.
To test parameters' identifiability the user should provide loss_func which is the likelihood function of unknown parameters θ. The function is expected to be based on MLE approach. The user should also set theta_init which is the initial values of parameters which are typically (but not necessary) the optimal valuesθ obtained by fitting parameters to experimental data. Other mandatory settings are loss_crit, which denotes l � a ¼ lðθÞ þ D a and index denoting the parameter of interest in vector. The user may also set scan_bounds which is the feasible θ i range ðy BL i ; y BU i Þ, or use the default values (1e-9, 1e9). The following Julia code loads LikelihoodProfiler package and evaluates theta endpoints for likelihood function l(theta). The implementation utilizes two termination criteria, which address two possible situations. In case there is a confidence interval endpoint within the scan_bounds, optimization stops when the algorithm converges to the endpoint with the preset tolerance and BORDER_-FOUND_BY_SCAN_TOL message is displayed. In case the algorithm doesn't find any feasible point above the threshold the algorithm stops with SCAN_BOUNDS_REACHED message.
The algorithm can also work in transformed space (log or logit) which can speed up the optimization process for complex nonlinear models. An optional argument scale of get_interval function can set search space for each parameter individually. It supports three options:: direct,: log,: logit with default scale set to: direct for all parameters. The package also includes a set of useful tools for visualization.
Internally LikelihoodProfiler uses Augmented Lagrangian algorithm [18,19] from NLopt package [17], which implies combining the objective function and the constraint into a single function. Then the augmented objective function with no constraints is passed to an optimization algorithm. Augmented Lagrangian implementation used in the package was proved to converge to KKT points [18]. The optimization of the augmented objective function can be performed with any gradient-based or derivative-free algorithm including global optimization methods.

Validation: The cancer taxol treatment model
Here we provide identifiability analysis of the cancer taxol treatment model [20]. Though the primary goal of this analysis is to verify CI endpoints computed with CICO, we also provide performance estimations of CICO algorithm vs. original implementation [20]. The original Matlab code is based on stepwise-optimization approach which implies recovering the whole parameters profile to obtain CI endpoint values (https://github.com/marisae/cancer-chemoidentifiability).
The taxol treatment model is defined by the set of ODEs with three state variables, five unknown parameters (a0, ka, r0, d0, kd), dosage regime and experimental data. The unknown parameters have been fitted to experimental data and their estimated values were taken from original Matlab implementation. Even though the model is structurally identifiable, practically available experimental data, as it was shown [20], is insufficient to recover all the unknown parameters.
The same authors provide an open GitHub repository with Matlab implementation of the taxol treatment model (https://github.com/marisae/cancer-chemo-identifiability). This implementation was used to verify the results obtained by CICO algorithm. The repository includes Matlab script for a0 identification. We have adapted this script to estimate CI for other four unknown parameters (ka, r0, d0, kd). No changes were made to the original Matlab code with the exception of counters, which were added to count the number of likelihood function calls the algorithm makes until it reaches the threshold. Internally the Matlab implementation uses lsqcurvefit function for fitting.
To run identifiability analysis with LikelihoodProfiler package the taxol treatment model was rewritten in Julia language. To make the numerical simulations comparable with original Matlab implementation Julia's analogue of Matlab ode23s solver Rosenbrock23 from Differen-tialEquations.jl package [21] was used with the same tolerances setup: relative 1e-3, absolute 1e-6. Search bounds for all unknown parameters were set to (1e-3,1e3). CICO CI endpoints were estimated with Nelder-Mead derivative-free solver from NLopt package.
CI endpoints estimated with CICO ( Table 1) correspond with the values obtained in the original code.
As most of computational efforts in "profiling" approach are focused on solving ODEs with different parameters' sets, the performance of the algorithms was measured by the number of likelihood function calls ( Table 1) the algorithm makes until it reaches (or converges to) the endpoint. In the taxol treatment model each likelihood function computation requires solving ODE system four times for four different treatment doses.
In general, CICO needs less likelihood function evaluations than stepwise optimizationbased profiling to converge to endpoint value. Efficacy of CICO is especially evident in nonidentifiable cases. This is due to the constraints incorporated in the objective function as a penalty part. It starts to penalize the algorithm only when optimizer gets near to the threshold, which doesn't happen in many non-identifiable cases where profiles are flat. Fig 2 illustrates the search path of stepwise "profiling" and CICO for identifiable a0 parameter and non-identifiable kd parameter. Stepwise-optimization tends to follow the profile path while CICO algorithm doesn't require the intermediate points to lie on the profile, which leads to fewer likelihood-function calls.
Validation: STAT5 dimerization model STAT5 Dimerization Model [22] consists of eight state variables, nine parameters and experimental dataset. It is proposed as one of the benchmark models in dMod simulation package [13]. We have translated the model from PEtab format used by dMod into Julia. The model's files include best-fit parameter values, which were taken as initial values for identifiability analysis. The boundaries for parameters deviance were set according to PEtab data to (1e-5,1e5). We have reproduced the identifiability analysis of the model in R with dMod and in Julia with LikelihoodProfiler.
dMod implements integration-based approach to parameters identification, according to which parameters' profiles are obtained as a solution of ODE system. This approach mentioned in Section 2.4 (Available methods) relies on first derivatives of the likelihood function and Hessian approximation. To ensure the integration accurately follows the profile path each point proposed by integration step can be used as the initial point for optimization. This option is controlled by method ="optimize" setting. In case of STAT5 Dimerization Model we have used the"optimize" method because default"integrate" method had not produced all the profiles due to Hessian-related issues. We have added iteration counter to R code to count likelihood function calls. dMod stops the profile integration when it intersects the threshold or when parameter bounds are reached. Hence, CI endpoints are reported as intervals with average width approximately equal to 3e-2 ( Table 2).
This allowed us to set tolerance of endpoint estimation in LikelihoodProfiler scan_tol = 1e-2. To make Julia simulations close to deSolve.lsoda used in dMod we have chosen LSODA differential equations solver (supported by DifferentialEquations.jl) with the same tolerance setup: relative 1e-7, absolute 1e-7. Nelder-Mead derivative-free solver from NLopt package was used to estimate CI endpoints.
Taking into account the difference of the underlying optimizers, the endpoints reported by LikelihoodProfiler correspond to the values obtained in dMod. The performance of each package was measured by the number of likelihood function evaluations and time required to compute CI endpoints. The results indicate the efficiency of CICO, which on average overperforms integration-based approach implemented in dMod even though dMod relies on model's functions compiled to C. Only for k_exp_hetero parameter dMod "optimize" method has recorded fewer likelihood function calls. Timings indicate significant practical efficacy of both CICO and Julia language for this task.
The detailed identifiability analysis of the Taxol treatment model and STAT5 dimerization model, the source code as well as other use-case models' identifiability analyses are published on our GitHub repository (https://github.com/insysbio/likelihoodprofiler-cases).

Discussion
A number of recent studies have demonstrated that profile likelihood-based methods are efficient to analyze identifiability of the parameters reconstructed on the basis of experimental data. In the absence of identifiability analysis one can never be certain how reliable parameters estimations and how accurate the model predictions are. However, practical usage of profile likelihood-based methods has not become a standard routine yet due to a number of challenges.
Indeed, profile likelihood-based methods are computationally demanding. Progressive stepping and other optimizations of the basic profile likelihood approach impose restrictions on the likelihood function (such as the need to calculate gradients) and limits the set of the applicable optimization methods. The CICO algorithm attempts to solve this problem by replacing multiple calculations of the likelihood function with constrained optimization. For each individual parameter only two optimization iterations are required to calculate the lower and upper CI endpoints. CICO doesn't require the gradient of the likelihood function and allows the user to choose derivative-free or gradient-based optimization algorithm. Other challenges originate from uncertainty in practical non-identifiability definition. It is implied that researchers have to scan sufficiently wide but finite intervals to state a non-identifiable case. In practice it is usually performed by visualizing the profiles on a chosen interval and extrapolating profiles behavior to the global parameters feasible region. In the current study we have proposed a formal criteria of the algorithm termination, utilizing the scan bounds notion, which can automate the analysis process and get rid of subjectivity.
The numerical experiments have demonstrated that confidence intervals obtained with CICO algorithm coincide with the results reported in the publications. As it was shown, on average the algorithm overperforms considered above optimization-based and integrationbased PL implementations. This comparison was performed with the default solver settings and can possibly be optimized for greater efficiency. Moreover, the optimization-based PL approach doesn't converge to the endpoint, while the CICO algorithm was developed to accurately estimate CI endpoints. Hence a more thorough comparison of the algorithms is difficult, since the termination criteria of the optimization-based PL doesn't take into account the accuracy of CI endpoints estimation.
To compare the methods we have measured efficacy in terms of elapsed time and likelihood function calls required to obtain CI endpoints. In general, CICO implementation in Likeli-hoodProfiler is about 100 times faster than dMod integration-based approach (R) and optimization-based method (Matlab). However, it is important to note that timings highly depend on the programming language, optimization method and ODE solver used while the number of likelihood function evaluations is a language independent measurement, though it also is affected by the efficacy of optimization algorithm and ODE solver.
In addition to confidence intervals, other interval estimates may also be of interest: confidence n-dimensional parameters' regions, prediction bands, etc. The CICO algorithm usage can be potentially expanded to calculate these generalizations of confidence intervals, and we plan to test its use for these classes of tasks in our future studies.