When the Optimal Is Not the Best: Parameter Estimation in Complex Biological Models

Background The vast computational resources that became available during the past decade enabled the development and simulation of increasingly complex mathematical models of cancer growth. These models typically involve many free parameters whose determination is a substantial obstacle to model development. Direct measurement of biochemical parameters in vivo is often difficult and sometimes impracticable, while fitting them under data-poor conditions may result in biologically implausible values. Results We discuss different methodological approaches to estimate parameters in complex biological models. We make use of the high computational power of the Blue Gene technology to perform an extensive study of the parameter space in a model of avascular tumor growth. We explicitly show that the landscape of the cost function used to optimize the model to the data has a very rugged surface in parameter space. This cost function has many local minima with unrealistic solutions, including the global minimum corresponding to the best fit. Conclusions The case studied in this paper shows one example in which model parameters that optimally fit the data are not necessarily the best ones from a biological point of view. To avoid force-fitting a model to a dataset, we propose that the best model parameters should be found by choosing, among suboptimal parameters, those that match criteria other than the ones used to fit the model. We also conclude that the model, data and optimization approach form a new complex system and point to the need of a theory that addresses this problem more generally.


Model Behavior
Although the model does not consider explicitly the saturation of the tumor size r bd to a final stable radius, modifications of the parameter δ allow the model to reach saturation. This parameter sets the relationship between dead and living cell volume. So, with δ = 1 both volumes are equal and the tumor will go on growing indefinitely. On the contrary, if δ = 0 the dead cell is absorbed by the tumor, and eventually secreted, leaving a free space to be replaced by another nascent cell. In this case, when cell death compensates cell division, a saturation state can be achieved. The three possible regimes for the evolution of tumor radius as a function of time are illustrated in Fig. (S1). Different combinations of parameters lead to one of three possible evolutions: linearly increasing, saturating, and decreasing.  Figure S1: Avascular tumor growth model trends. Simulations were run using different parameter values, yielding three possible growth evolutions: (a) an exponential initial growth phase followed by indefinite linear growth, (b) an exponential initial growth phase followed by linear growth and followed by a saturation, and (c) a tumor of shrinking size.

Parameter Estimation
The parameters included in the tumor growth model are δ, σ, β, c c , c d and c bd , which are represented by a 6-dimensional array θ, with components (δ, σ, β, c c , c d , c bd ). In order to obtain numerical values for our model parameters, we minimize the differences between our tumor growth model and the data by finding the minima of the quadratic cost function [1]: where y k (θ) is the model's prediction for observation k which depends on the parameters θ, and data k represents the experimentally measured data values (in our case, synthetic data) at observation k. In our particular application, synthetic experimental observations are taken at times t k , and the value data k corresponds to the radius of the spheroid at time t k . The model prediction for the spheroid radius at time t k is denoted by y k (θ), and the sum is over all the time points t k . For future reference, the residuals vector f is defined as f k = y k (θ) − data k .
The methods used for the minimization of the cost function are described below.
• Levenberg-Marquardt Introduced by Levenberg [2] and rediscovered by Marquardt [3], this method belongs to the optimization family called descent methods [1]. One of the most famous member of this family is the Gauss-Newton method [1], which calculates the descending direction using the Jacobian of the cost function. As the Gauss-Newton approach, this method uses the Jacobian of the cost function to find the descending direction, but incorporates a damping parameter µ. Let J be the Jacobian of the residuals f k (θ) = y k (θ) − data k , i.e., J ki = ∂f k /∂θ i and h be the descending direction, then: With large values of µ, LM is close to the steepest descent method, and achieves a faster convergence if the current value of the parameters is far from the solution. On the other hand, for small values of µ, the method is close to the Gauss-Newton method, which is generally good in the final steps of the search. The strength of the LM method is that it may vary the value of µ to behave closer to the Gauss-Newton or the steepest descent methods. We followed Ref. [4] for the implementation of this method.
• MIGRAD This is another well known descent method included in the MINUIT package implemented by CERN [5]. It implements the Davidon-Fletcher-Powell (DFP) method, which belongs to a group of methods variously called quasi-Newton methods. Given a point θ k in parameter space, the Newton-Raphson's method generates the search direction ∆θ k = H k J k where H k is the k-th update of the inverse of the Hessian matrix of χ 2 (θ), i.e. H ij = ∂ 2 χ 2 ∂θi∂θj , and J k = ∇χ(θ k + ∆θ k ) − ∇χ(θ k ). The quasi-Newton methods avoid the calculation of the Hessian directly and successively estimate the inverse of the Hessian matrix using only the gradients of the cost function. The update rule for the inverse Hessian in the DFP method is: • Downhill Simplex This is a direct search method for function minimization introduced in 1965 [6,7]; we used its implementation as included in the MINUIT package. The Nelder-Mead method is simplex-based, defined as the convex hull of n + 1 vertices x 0 , . . . , x n , e.g. a triangle in R 2 or a tetrahedron in R 3 . From an initial starting simplex, the method performs a sequence of transformations of the working simplex S, decreasing the function values at its vertices computing one or more test points, together with their function values. This method does not compute the gradient or Hessian of the cost function.
• Parallel Tempering This is a physics-inspired method for space sampling and optimization introduced in [8]. Our implementation is based on [9]. In Parallel Tempering (PT) multiple independent replicas of a system are evolved, with each replica being at a different "temperature" than the others. For the thermodynamic analogy the energy function is taken to be the cost function. High temperature generally permits the sampling point to search a larger volume of the phase space while low temperature replicas search the local landscape. Neighboring simulations (i.e., adjacent temperatures) may exchange configurations, subject to the acceptance criterion (that depends on the temperature and cost function values of the configurations to be exchanged) given by the Metropolis algorithm. Note that the difference between parallel tempering and simulated annealing [10] is that the former allows the fast sampling of a big space whereas the latter tends to locally explore a smaller area of the same space. If the smallest of the temperatures of parallel tempering is sufficiently low, then parallel tempering can be considered both as a method for global search (for the replicas at higher temperatures) and local optimization (for the replicas at lower temperatures).

Growth Curves
In Fig. (S2) we show the growth curve corresponding to the minima found by PT and LM methods with the Gompertz curve used as surrogate experimental data. It is evident that all of these minima generate growth curves that are good matches to the experimental data, at least qualitatively.    In Figure S3 we divided the sets of minima into seven clusters for each method, and highlighted the identity of the clusters by red squares. It is clear that the clusters resulting from the LM optimization are tighter and more homogeneous within clusters than those resulting from PT. For each of the seven clusters and each method the mean value of each parameter was chosen as the representative solution.
The resulting cluster centroids are listed in tables S1 and S2 for the LM and PT centroids respectively. The resulting cluster centroids are listed in tables S1 and S2 for the LM and PT centroids respectively.