Geometric analysis enables biological insight from complex non-identifiable models using simple surrogates

An enduring challenge in computational biology is to balance data quality and quantity with model complexity. Tools such as identifiability analysis and information criterion have been developed to harmonise this juxtaposition, yet cannot always resolve the mismatch between available data and the granularity required in mathematical models to answer important biological questions. Often, it is only simple phenomenological models, such as the logistic and Gompertz growth models, that are identifiable from standard experimental measurements. To draw insights from complex, non-identifiable models that incorporate key biological mechanisms of interest, we study the geometry of a map in parameter space from the complex model to a simple, identifiable, surrogate model. By studying how non-identifiable parameters in the complex model quantitatively relate to identifiable parameters in surrogate, we introduce and exploit a layer of interpretation between the set of non-identifiable parameters and the goodness-of-fit metric or likelihood studied in typical identifiability analysis. We demonstrate our approach by analysing a hierarchy of mathematical models for multicellular tumour spheroid growth experiments. Typical data from tumour spheroid experiments are limited and noisy, and corresponding mathematical models are very often made arbitrarily complex. Our geometric approach is able to predict non-identifiabilities, classify non-identifiable parameter spaces into identifiable parameter combinations that relate to features in the data characterised by parameters in a surrogate model, and overall provide additional biological insight from complex non-identifiable models.


Introduction
Mathematical models play an important role in the interpretation of data and the design of experiments. The complexity of many experiments and biological systems means that parameters relating to key biological mechanisms cannot be directly measured, but are rather quantified through the calibration of mechanistic mathematical models to experimental observations [1,2]. Given that biological data are often limited and noisy, model parameters provide an objective means of quantifying observations and comparing behaviours across different types of experiments or different conditions within the same experiments [3,4]. Minimising, or at least quantifying, parameter uncertainties is, therefore, of paramount importance for effective interpretation of experimental results.
A critical step in the application of mathematical models to interpret biological experiments is that of model selection [5][6][7]. Complex models-traditionally associated with a large number of unknown parameters-have potential to provide insights about a correspondingly large number of biological mechanisms, but often result in large parameter uncertainties when calibrated to typical experimental data [8][9][10]. Conversely, simpler models-including canonical models such as the logistic and Gompertz growth models-typically involve parameters that can be tightly constrained by data, but provide limited direct mechanistic insight [11].
In practice, model selection is routinely guided by information criterion; statistical metrics that quantify model parsimony, the trade-off between model fit and model complexity [7,12]. One of many criteria used is the Akaike information criterion (AIC), given by measurements, related to the overall size of radius of spheroids, are typically available throughout an experiment. We generate synthetic radius measurements from a mathematical model of intermediate complexity (the Greenspan model with k = 4 parameters) that was recently validated against experimental data for the first time [13,14]. We corrupt measurements with normally distributed measurement noise with standard deviation σ and attempt to distinguish between a range of spheroid growth models, with complexity ranging from the logistic growth model (k = 2) [15] to the complex multiphase spatial model of Ward and King (k = 8) [16,17]. In Fig 1B we set σ = 20 μm and in Fig 1C we vary σ. Once calibrated, all models lead to predictions of the spheroid radius that are visually indistinguishable (Fig 1B), and all except for Ward and King's model are indistinguishable using AIC for a sufficiently large, and biologically realistic, noise standard deviation (Fig 1C). Full mathematical details of all models are given in Models.
Aside from being unable to distinguish between models in the tumour spheroid example, criterion-based choices cannot account for the biological question-or more specifically, the biological mechanisms-of interest. For example, the logistic growth and Gompertz growth models produce an excellent match to synthetic tumour spheroid data and quantify behaviour in terms of a growth rate parameter and long-time limiting spheroid size. However, these  [18], harvested, and imaged using confocal microscopy at various time points. Cells are transduced with fluorescent cell cycle indicators, showing cells in gap 1 (purple) and gap 2 (green). From day 7, a necrotic core void of living cells is evident in the spheroid centre. (B) Synthetic spheroid data generated from Greenspan's model [13] (black discs) with additive normal noise with standard deviation σ = 20 μm (red diamonds). (B-C) Several mathematical models, including the Greenspan model, are able to match synthetic data. (C) AIC results for the model fitting exercise in (B) repeated over several values of the noise standard deviation. Shown is the mean and standard deviation from 100 repeats for each model. (D) Spectrum of the observed Fisher information matrix. Eigenvalues are shown on the log-scale and scaled such that the spectral radius is unity.
https://doi.org/10.1371/journal.pcbi.1010844.g001 models cannot provide information relating to the mechanisms that govern growth or determine the long-time limiting spheroid size; mechanisms such as sensitivity to and availability of oxygen and other essential nutrients. More recently, the mathematical modelling literature has moved toward tools such as parameter identifiability analysis to guide model selection [19][20][21]. Identifiability analysis can determine if model parameters are identifiable and can be estimated from data; both in a theoretical noise-free data limit (structural identifiability) [22][23][24][25], and in the more realistic case of a finite amount of noisy data (practical identifiability) [19,26]. In comparison to model selection criterion like AIC, identifiability analysis provides information about the identifiability of individual model parameters. While a complex model may have a large number of non-identifiable parameters and a high AIC value, it may still prove useful provided the parameters of interest (for example, the oxygen sensitivity) are identifiable.
In the vicinity of the MLE, the identifiability of model parameters can be assessed using the local curvature of the expected log-likelihood function, also known as the Fisher information matrix (FIM), denoted IðpÞ 2 R k�k . The FIM is a k × k positive semi-definite matrix that quantifies the amount of information about the parameters contained in the data, and has both a statistical and geometric interpretation. Statistically, the inverse of the FIM provides a lowerbound on the covariance of parameter estimates. Therefore, a FIM that is singular corresponds to at least one model parameter that can only be estimated with infinite variance and, therefore, cannot be determined from data. Geometrically, the FIM is related to the Hessian of the log-likelihood function and therefore contains information about the directions in parameter space in which the log-likelihood (and therefore the model) is sensitive and directions in which the log-likelihood is insensitive [27]. Specifically, the eigenvalues of the FIM correspond to the curvature in the direction of the corresponding eigenvectors; eigenvectors associated with zero or near-zero eigenvalues correspond to directions in parameter space (also referred to as eigenparameters) to which the model output is insensitive [28,29]. Conversely, eigenvectors associated with relatively large eigenvalues give informative directions; the directions to which the model is most sensitive. So-called analysis of model sloppiness is concerned with studying the spectrum of the FIM to determine the number of sloppy, or insensitive, eigenparameters in a model [8,[30][31][32]. To demonstrate, in Fig 1D we show the spectrum of the FIM for each tumour spheroid model. As the relative difference between eigenvalues is scale-dependent, it is difficult to interpret results from the two parameter models. However, results for the Greenspan and Ward and King models show two disparate clusters of eigenvalues, indicating a group of informative directions (corresponding to eigenvalues that are relatively large), and a group of uninformative or sloppy directions (corresponding to eigenvalues that are closer to zero). For the Greenspan model, the single insensitive direction identified from analysis of model sloppiness corresponds to a one-dimensional manifold (i.e., a curve) in parameter space along which the parameters can be identified. At the core of identifiability and sloppiness analysis is that data are unable to constrain the model parameter space to a point estimate, but rather a one-or higher-dimensional manifold [33]. Of practical application, analysis of this manifold allows for model reduction, where the number of parameters in a model can be reduced by pre-constraining or removing sloppy eigenparameters without significantly reducing the predictive power of a model [34,35]. However, to date, analysis of the interrelationship between models using the model parameter manifold has been constrained to simpler models nested within a complex model; that is, where simpler models can be recovered by placing constraints on the parameters in the complex model, for example by setting certain parameters to zero. Examples of nested models include recovering the logistic growth model from the Fisher-Kolmogorov model by assuming the population is well mixed [36], and recovering the Gompertz or logistic growth models from Richards growth model by constraining the shape parameter [21]. Moreover, the FIM is based on the expected log-likelihood, a one-dimensional measure of overall model fit that determines manifolds in parameter space to which parameters are constrained by data or to which the model output is insensitive. FIM-based tools cannot, therefore, provide information about how features of the model output change with parameters.
Our contribution is to study models with non-identifiable parameters using identifiable models that produce quantitatively similar behaviour; models that may be indistinguishable from information-criterion based analysis. To study the interrelationship between parameters in any two models (nested or non-nested), we define model equivalence in the least-squares sense, and study the associated map from the parameters in a complicated, possibly heavilyparameterised and non-identifiable model, to parameters in a simpler, identifiable model ( Fig  2B). For example, we study identifiability of mechanistic ordinary and partial differential equation (ODE and PDE) models of tumour spheroid growth-relatively complicated models containing parameters quantifying nutrient sensitivities, oxygen diffusion, and oxygen consumption-through simple models like the well known logistic and Gompertz growth models that do not explicitly incorporate biophysical mechanisms that influence growth, but rather describe behaviour with largely phenomenological parameters such as the early-time growth rate and long-time limiting spheroid size.
We demonstrate our framework through identifiability analysis of tumour spheroid data. Noisy measurements relating to the outer radius of tumour spheroids are collected ( Fig 1A) and quantified with models ranging from the phenomenological logistic growth model, to detailed spatial models involving coupled nonlinear PDEs which require experimental measurements in addition to spheroid radius to parameterise [14]. We work with synthetic data generated from a model of intermediate complexity, the Greenspan model (Fig 1B), and present a series of new and existing models from the literature that produce similar agreement with the data. Initially, we focus on models with a small number of parameters so that model equivalence manifolds can be visualised in R 3 . Subsequently, we study Ward and King's model, a model with a large number of parameters for which we must rely on non-graphical means, such as the sensitivity matrix and the Jacobian of the model link, for analysis. Aside from the requirement that the Jacobian of model outputs with respect to parameters be available, which may limit our analysis to primarily to models that are deterministic, we expect our methodology to generalise to any hierarchy of models in biology and systems biology.

Models
We study a hierarchy of mathematical models that describe the time-evolution of tumour spheroid radius. Such models have a long history in the mathematical biology literature, and range from simple sigmoid growth models [11], to models that describe the spatial distribution of cells in spheroids and the eventual saturation of growth due to nutrient deprivation and mass loss due to necrosis in the tumour core [17]. We generate synthetic data using the canonical Greenspan model [13] [14,18]. Therefore, we treat the Greenspan model as the true model, and the corresponding set of parameters as the true parameters. In this section, we present the mathematical models we use for analysis.
For all models, we denote the spheroid radius by R(t), and fix the initial spheroid size R(0) = R 0 = 10 μm as a known parameter that can be directly measured from data. The choice of R 0 = 10 μm � R max is made to ensure that the simple models we consider are identifiable. We consider that data comprise of spheroid radius measurements at discrete observation times The formulation in Eq (2) is highly flexible: for example, we could incorporate model predictions relating to the inner structure of the spheroid by appending elements to the end of m i (p).
In this work, we set T = [0, 1, 2, . . ., 21] ⊺ d, as shown in Fig 1B, and denote R i (t;p) predictions of the spheroid radius at time t from model i. In this work we take great care to connect our mathematical models with experimental data by working with a combination of dimensional and non-dimensional quantities. We are motivated by experimental data of tumour spheroids that comprise observations of spheroid radius over a period of approximately 16 days [14,18]. However, standard imaging techniques do not provide measurements of cell densities or nutrient concentrations. Therefore, we non-dimensionalise dependent variables relating to cell densities and nutrient concentrations, and leave the independent variables related to measurable time and space scales dimensional.

Model 1. Greenspan's model (k = 4)
First, we consider the canonical Greenspan model [13] that describes spherically symmetric spheroid growth due to cell proliferation dependent on a nutrient (such as oxygen) that diffuses into the spheroid from the surrounding medium ( Fig 3A). We have previously validated the Greenspan model against experimental data [14]. Spheroid growth progresses through the three phases observed in experimental data ( Fig 1A; [18,37,38]). First, if the initial spheroid size is sufficiently small, spheroids progress through an exponential growth phase, where nutrient is available throughout the spheroid above the minimum threshold concentration required for cell proliferation. We denote the critical concentration as c 1 = ω 1 /ω 1 , where ω 1 mol μm −3 is the threshold concentration required for cell proliferation, and ω 1 mol μm −3 is the concentration in the surrounding medium and at the spheroid boundary. During this first phase, cells proliferate exponentially at a per-unit-volume rate of λ d −1 .
Fluorescent cell cycle indicators indicate that spheroids eventually enter a second phase of growth where cell proliferation in the centre of the spheroid is inhibited such that cells remain viable but enter cell cycle arrest. We make the standard assumption that this is due to the nutrient concentration falling below the relative concentration c 1 at the spheroid centre, but remaining above the relative concentration required for cell viability, denoted by c 2 (Fig 3A). During this second phase, cells located close to the spheroid periphery remain proliferative since the nutrient density is sufficiently high in this region.
Finally, spheroids progress to a size such that the nutrient concentration at the centre of the spheroid is lower than that required for viability. Cells in the centre of spheroids die, leading to the formation of a necrotic core and resulting in a per-unit-volume mass loss of z d −1 . This phase is evident in experimental measurements from day 7 ( Fig 1A). In summary, the Greenspan model predicts the eventual formation of a three-layered compound sphere with a central necrotic core, an intermediate shell of living but non-proliferative cells, and an outer shell of living proliferative cells (Fig 3A). This final structure is consistent with experimental observations of spheroid growth shown in Fig 1A [18]. While the Greenspan model explicitly incorporates a proliferation and death process dependent upon a spatially diffusive nutrient, the assumption that the nutrient-related dependencies are Heaviside functions yields a series of implicit analytical expressions for the radius of the inhibited region, R i (t), and radius of the necrotic region, R n (t), in terms of the overall spheroid radius (Fig 3A) [13]. We define three composite parameters Q ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 À c 1 1 À c 2 r 2 ð0; 1Þ; R d ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where α mol d −1 and D c μm 2 d −1 is the nutrient consumption rate and diffusivity, respectively. Therefore, R i (t) and R n (t) are given by the solution of the following algebraic equations [13] R n ðtÞ ¼ 0; R i ðtÞ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi and during phases two and three, respectively. During the first phase, R i = R n = 0. Greenspan [13] also showed that the first phase applies for Overall, the time-evolution of the spheroid radius is governed by the ODE Here, we have expressed dR(t)/dt in the form of a generalised logistic growth model where λ is the volumetric growth rate for f(R(t)) = 1 (the factor of 1/3 arises from applying the chain rule to convert from working in terms of spheroid volume to spheroid radius). f(R) is sometimes referred to as a crowding function, defined such that f(R) ! 0 for R sufficiently large. We show the solution to the Greenspan model and the corresponding crowding function in Fig 4 using

Model 2. Logistic model (k = 2)
The logistic model is used widely throughout biology and ecology as, for example, a model of in vitro cell growth [11,15,[39][40][41] and coral regrowth [21,41]. Whereas in the Greenspan model assumptions relating cell proliferation to the local density of a diffusive nutrient yield a crowding function that eventually caused overall growth to cease, in the logistic model we make the simplistic assumption that spheroid growth eventually ceases when the size reaches a maximum radius, R max μm [42]. The logistic model and generalisations thereof (including the Gompertz and Richards models [42]) are phenomenological in the context of tumour spheroid models; they are not explicitly constructed from biological mechanisms by which overall growth is inhibited and eventually ceases.
While the logistic growth model is commonly used to describe the time-evolution of spheroid volume [11], we find that the time-evolution of spheroid radius in our synthetic data is more consistent with logistic growth (Fig 1B). The logistic model for spheroid radius is given by Here, the factor of 1/3 arises from working in terms of spheroid radius instead of spheroid volume. Both the Greenspan and logistic models describe exponential growth at the per-volume rate of λ d −1 for sufficiently small spheroids, R(t)/R max � 1. However, the logistic model differs in that it does not include an initial period of time where growth is exponential. We demonstrate this by comparing the crowding function for the logistic model to that of the Greenspan model in Fig 4; in the logistic model, f(R) is a linearly decreasing function of R. Therefore, in the logistic model, the overall growth rate of the spheroid is never equal to the maximum growth rate of λ/3. Rather, it is always less than this maximum for R(t) > 0. The logistic model depends on two unknown parameters p 2 = [λ, R max ] ⊺ .
β Shape parameter in Richards model.
z Necrotic material loss rate.
c 1 Relative nutrient concentration for median growth.

Model 3. Bounded Gompertz model (k = 2)
Gompertz's growth model has been used since the 1960s to describe the growth of solid tumours [11,43,44], and is given by One feature of the Gompertz model is that the growth rate is unbounded for R(t)/R max � 1.
As we do not see this in the experiments or the Greenspan model, we consider a bounded Gompertz model by setting Therefore, the bounded Gompertz model undergoes a period of exponential growth for R(t) < R max /exp(1) before growth becomes inhibited, which is similar to the first phase of growth described by the Greenspan model. The solution of the bounded Gompertz model and the corresponding crowding function is shown in Fig 4. The bounded Gompertz model depends on two unknown parameters p 3 = [λ, R max ] ⊺ each with the same interpretation as those in the logistic model.

Model 4. Richards' model (k = 3)
Richards' model interpolates between the logistic and standard Gompertz models by introducing an additional shape parameter into the crowding function, β, that alters the shape of the solution [21,42,45]. The Richards model is given by The logistic model can be recovered by setting β = 1, and the standard formulation of the Gompertz model in the limit β ! 0 + . The solution to the Richards model and the corresponding crowding function is shown in Fig 4. The Richards model has three unknown parameters

Model 5. Radial-death model (k = 3)
We introduce the radial-death model, a simplistic compartment model that captures the key elements of Greenspan's model, namely that the inhibition of overall growth is caused by the nutrient deprivation in the spheroid core. Due to consumption by cells, the nutrient concentration is a decreasing function of the distance to the spheroid periphary, so we assume that nutrients (such as oxygen) can diffuse into the spheroid up to a distance R d μm from the spheroid periphery before reaching a critically low concentration where cell proliferation ceases and cell death begins (Fig 3B). While R(t) � R d , the spheroid is composed proliferative cells, and for R(t) > R d of an expanding annulus of constant density with thickness R d , and a necrotic core of radius R(t) − R d . We assume that living cells proliferate at per-volume rate λ d −1 , and necrotic material is lost at per-volume rate z d −1 .
Denoting the volume of living and necrotic cells V 1 (t) and N(t), respectively, the dynamics are governed by dV 1 ðtÞ dt ¼ lV 1 ðtÞ À gðV 1 ðtÞÞ; Here, g(V 1 (t)) represents the transfer of living cells in the periphery annulus to the necrotic core at the spheroid centre to maintain an annulus width less than R d (Fig 3B). Note that g or equivalently Therefore, the radial-death model is fully described by a single independent variable R(t) (or equivalently V(t)).
We show the solution to the radial-death model and the corresponding crowding function in Fig 4. The radial-death model has three unknown parameters, p 5 = [λ, z, R d ] ⊺ . All three parameters have an equivalent interpretation in the Greenspan model, and λ has an equivalent interpretation in all other models.

Model 6. Ward and King's multiphase model (k = 8)
Lastly, we consider the growth-saturation spheroid model of Ward and King [16,17], a moving boundary PDE model that explicitly incorporates the cell density, cellular material density (i.e., the DNA, lipids, proteins, and other material that living cells are composed of), and nutrient density, as functions of space and time ( Fig 3C). By assuming that spheroid growth is spherically symmetric, we end up working with a system of three time-dependent PDEs with one spatial coordinate. Full details are available in [17], however we apply several simplifications and so now provide a summary of the key mechanics in the model.
We denote the relative density of living cells n(r, t), that of cellular material p(r, t), and that of nutrient (i.e., oxygen) c(r, t). Here, 0 � r � R(t) is the spatial variable describing the distance from the spheroid centre to the moving spheroid boundary at r = R(t). We assume that the spheroid contains no voids so that 1 = n(r, t) + p(r, t), and that living cells and cellular material are incompressible and are transported throughout the spheroid with velocity v(r, t). Cells are subject to a maximum per-unit-volume growth rate of λ d −1 , which is a increasing function of nutrient concentration, specified by the Hill function, We fix the Hill exponent m 1 = 10 to capture the Heaviside-like switch behaviour in Greenspan's model, and c 1 is the nutrient concentration at which the proliferation rate is half of the maximum. Similarly, cells are subject to the nutrient-dependent death rate, where the death rate is a decreasing function of nutrient concentration, Again, we fix m 2 = 10 and c 2 is the nutrient concentration for which the death rate is half of the maximum rate, denoted by δ d −1 . We assume that nutrient is consumed at rate k(C) = αk m (C). Cellular material, p(r, t) is assumed to have the same density as living cells, is consumed during mitosis, diffuses freely throughout the spheroid with diffusivity D p μm 2 d −1 , is available in the surrounding media at relative density p 0 , and enters the spheroid from the surrounding media at the spheroid boundary at flux Q p μm d −1 .
Since the nutrients diffuse faster that cells proliferate, we assume that the nutrient is in diffusive equilibrium. These assumptions give rise to the coupled system of PDEs The moving boundary r = R(t) corresponds to the outer radius of the spheroid. Here, v(r, t) denotes the velocity of material throughout the spheroid, describes the movement of both cells and cellular material. Therefore, the velocity at the spheroid boundary, v(R(t), t), corresponds to the growth rate of the spheroid outer radius. In contrast to the other models considered in this work, overall spheroid growth in the Ward and King model is driven explicitly by the spatial structure and composition of material inside the spheroid. While the Ward and King model cannot be expressed in the form of Eq 7 generalised logistic as a generalised logistic growth model, we can calculate a crowding function empirically In

Results
There are two main components to our analysis. First, we apply standard approaches to determine the practical identifiability of the Greenspan model and two simplistic models in one hierarchy using synthetically generated, noisy data. Second, we develop our novel geometric approach to study non-identifiabilities in the Greenspan model using the geometric relationship between models in this hierarchy. As this geometric analysis considers features of the model outputs, which are deterministic and do not depend on data, this analysis is akin to both structural identifiability and sensitivity analysis. However, as the models are not nested, model outputs do not become identical in the no-noise limit, and so our analysis implicitly incorporates modelling bias which is a feature of analysis of most experimental data, where every mathematical model is an abstraction.

Identifiability analysis
We make the standard assumption that observations, denoted y = [y (1) , y (2) , . . ., y (n) ] ⊺ , are subject to independent additive normal noise [46,47], such that where ε * N(0, σ 2 ) and m ðkÞ i ð�Þ denotes the kth element of m i (�). In our case, m i (p) = R(t (k) ;p). Therefore, the log-likelihood function is given by The maximum likelihood estimate (MLE) is the parameter vector that maximises the loglikelihood function, or equivalently, minimises the error term or loss function Therefore, the MLE is equivalent to the least-squares estimate. We calculate the MLE for each model using synthetic data generated from the Greenspan model with a pre-specified constant noise with standard deviation σ = 20 μm in Fig 1B, demonstrating that all models are capable of producing an excellent fit to the synthetic data. Profile likelihood. To establish the identifiability of individual model parameters, we apply the profile likelihood method [19,48,49]. The profile likelihood method profiles the loglikelihood function by finding the MLE subject to the constraint that the profiled parameter is fixed. Denoting the parameter to be profiled by φ and the remaining parameters as η such that p = (φ, η), the profile log-likelihood is given by wherep i is the MLE when all parameters are varied. The value of the profile likelihood at φ = φ 0 corresponds to the test-statistic for the likelihood ratio test and has an asymptotic χ 2 -distribution. Therefore, we can establish identifiability by comparing the profile log-likelihood of a parameter to the threshold for an approximate 95% confidence interval, equal to −Δ 1,0.95 /2 � −1.92, where Δ ν,q refers to the qth quantile of a χ 2 distribution with ν degrees of freedom [50].
Model parameters with tightly-constrained intervals for which the profile likelihood is above this threshold are classified as identifiable. In this work, we take the supremum of the log-likelihood function numerically using the Nelder-Mead algorithm over a region that covers the true parameters over several orders of magnitude [51]. We profile the log-likelihood in sequence, starting at the point closest to the MLE. To ensure we find the global maximum, we initiate the optimisation routine at three points, corresponding to the MLE, the previously profiled point, and the initially specified guess; in the case of a disparity between these three optimisations, the result with the largest log-likelihood is taken as the maximum. We establish the identifiability of parameters in the logistic, radial-death, and Greenspan models using the profile likelihood method in Fig 5A, 5C and 5D using synthetic data generated from the Greenspan model (shown previously in Fig 1B). Both the logistic and radial- PLOS COMPUTATIONAL BIOLOGY death models are identifiable; parameter estimates can be established to within a two-sided 95% confidence interval. We also see that confidence intervals for the per-volume growth rate, λ, are consistent between each model. Results in Fig 5D show that parameters in the Greenspan model; specifically, Q and γ, are non-identifiable, or one-sided identifiable. Estimates for the spheroid radius at which necrosis first occurs, R d , are constrained to a two-sided 95% confidence interval, however the upper bound of this confidence interval corresponds to the maximum spheroid size observed, suggesting that R d is also only one-sided identifiable.
To explore whether parameters in the Greenspan model are identifiable in the limit of noise-free data, we produce profiles of the error function (Eq 21) in the hypothetical case that noise free observations are made so that σ = 0 μm. This is done in a similar manner to profiles of the log-likelihood function, albeit where the maximisation is replaced by a minimisation. Results are shown in Fig 5E. Our aim is to determine whether the parameters uniquely map to the data in the noise-free limit or, equivalently, whether there exist other parameter combinations in the vicinity of p 1 where the error function is zero. Results in Fig 5E show that the error function has a clearly defined minimum, indicating that the parameters in the Greenspan model are theoretically identifiable in the noise-free limit.
Fisher information. For models with additive normal noise, the Fisher information matrix (FIM) for model i is given by where J i is the Jacobian of model i, sometimes referred to as the parameter sensitivity matrix [49]. For spheroid radius data collected at n time points, J i is an n × k i matrix with elements where k i is the number of parameters. The FIM is a k i × k i matrix related to the expected Hessian (and, therefore, the curvature) of the log-likelihood function (Eq 20) and least-squares cost function (Eq 21). The rank of the FIM at the MLE relates to the number of identifiable parameter combinations [49]. We find that the FIM of models studied in Fig 5 (the logistic, radial-death, and Greenspan models) have full rank. This is consistent with results from the profiled error function ( Fig 5E) where we found that, although model parameters were not practically identifiable from the available data, they are identifiable from noise-free data.
For many models the FIM may be of full rank but have a large condition number and is, therefore, close to singular. For example, we find that the condition number of the FIM for the Greenspan model is Oð10 9 Þ, suggesting that the FIM is full rank and non-singular, but has at least one uninformative direction. We can also see this from profile likelihood results in Fig  5D, which show that Q and γ cannot be constrained to be within 95% confidence, indicating a large parameter variance and correspondingly close-to-singular FIM. Analysis of model sloppiness provides finer-grained information about identifiability by gaining insight from the full spectrum of the FIM (Fig 1D). In summary, such analysis establishes directions in parameter space that are stiff, i.e., identifiable from data; and those that are sloppy, i.e., non-identifiable.
We demonstrate the relationship between the log-likelihood surface and the eigenvectors of the FIM for the logistic model in Fig 5B. The direction defined by the eigenvector with the largest eigenvalue, v 2 , points in the direction of steepest descent from the MLE. The direction defined by v 1 , the eigenvector with the smallest eigenvalue, points in the direction of shallowest descent from the MLE. Should an eigenvalue tend to zero, the likelihood becomes flat in the direction of the corresponding eigenvector and parameters that lie on this contour are indistinguishable: this direction is sloppy.
In Fig 1D we show the log eigenvalues of the FIM for each model, scaled by the largest eigenvalue for each model. All models, aside from the Greenspan model and Ward and King's model have eigenvalues constrained over a relatively small number of decades. Greenspan's model has one eigenvalue much smaller in magnitude than the remaining, suggesting a single sloppy or uninformative direction. Similarly, Ward and King's model has two eigenvalues much smaller in magnitude than the remaining, suggesting two sloppy directions.

Geometric analysis
Identifiability and model sloppiness analysis indicates that several parameters in the Greenspan model are not identifiable from spheroid radius measurements, however we cannot gain further information relating to the impact each non-identifiable parameter has on the features of the data. To study this further, we examine the geometric relationship between the Greenspan model, and the simplistic, identifiable, logistic and radial-death models.
We define a map from the parameters in model i, denoted p i , to parameters in the identifiable model j, denoted p j , in the least-squares sense such that An interpretation of p j = f ij (p i ) is the maximum likelihood or least-squares estimate for the parameters in model j if noise-free data from model i is observed. To quantify goodness-of-fit, which we interpret as a measure of the correspondence between models, we compute the R 2 statistic where E[�] denotes the sample mean. In this work, we take the supremum of the log-likelihood function numerically using the Nelder-Mead algorithm over a region that covers the true parameters over several orders of magnitude [51].
In Fig 4B, we show noise-free data generated from the Greenspan model with p 1 = [Q, R d , γ, λ] ⊺ = [0.8, 150, 1, 1] ⊺ . In this case, we have good correspondence between the logistic and Greenspan models (R 2 = 0.998) with p 2 = [λ, R max ] ⊺ = [1.21, 316] ⊺ . Despite both models sharing a parameter, λ, with an equivalent biological interpretation (the per volume growth rate), estimates for this parameter differ between models. This result highlights that parameter estimates are not necessarily directly comparable between models despite sharing a similar biological interpretation [21]. In our case, both the Greenspan and logistic models assume that cells proliferate exponentially at rate λ for infinitesimally small spheroids, however the crowding functions differ between the models such that the logistic model does not capture the initial exponential growth phase seen in the Greenspan model (Fig 4B). Given that the bounded Gompertz and Greenspan models have comparable crowding functions, we see better agreement between estimates for λ between these models (R 2 = 0.99997 with p 3 = [λ, R max ] ⊺ = [1.00, 317] ⊺ ). In all cases, estimates for the maximum radius, R max , agree with the long-term solution of the Greenspan model calculated numerically (R max = 320 μm).
To study between-model sensitivities, we compute the Jacobian matrix of f ij (p i ), denoted We compute J ij (p i ) numerically, using a finite difference approximation that is implemented in a standardised algorithm that is robust to numerical noise introduced from the optimisation algorithm used to calculate f ij (p i ) [52]. The rows of J ij (p i ) correspond to the gradients of each element in p j , denoted by rp ðkÞ j . These vectors are normal to, and hence define, a hyperplane in model i parameter-space that locally give identical estimates of p ðkÞ j in the vicinity of p i . For example, we can use J ij (p i ) to visualise the parameter combinations in the Greenspan model that give identical estimates of λ and R max in the logistic and bounded Gompertz models.
While geometrically useful, it is difficult to interpret the elements of J ij (p i ) directly as the scales of each parameters differ significantly, even within each model. Therefore, we introduce the sensitivity matrix of p j = f ij (p i ), denoted The (k 1 , k 2 ) element of the sensitivity matrix is given by and can be interpreted as the relative increase in parameters in model j due to increases in parameters in model i. For the map from the Greenspan to logistic model we have Here, we see a near one-to-one correspondence in λ between models and, for example, see that a 1% increase in Q in the Greenspan model is associated with a 0.0156% decrease in λ in the logistic model. Furthermore, this analysis demonstrates that, roughly speaking, the parameter subset (Q, R d , γ) corresponds primarily to the maximum spheroid size. We draw this conclusion based on the relative sizes of elements in each row of S 12 (p 1 ) (Eq 31). In other words, there exists a trivariate function of (Q, R d , γ) that maps to R max , and by extension the likelihood. This explains why the (Q, R d , γ) are practically non-identifiable Fig 5, and since R max is identifiable we expect that the relationship between (Q, R d , γ) is also identifiable while the individual parameters are not. This analysis is similar to existing techniques used to establish identifiable parameter combinations using the likelihood, through profiling [53,54] or the FIM [55]. From Eq (31) we also see that the per-volume proliferation rates correspond in each model. This latter observation is entirely consistent with profile likelihood analysis in Fig 5, where we see that estimates for λ in the Greenspan model are identifiable. Geometric analysis using the logistic model. As the Greenspan model has only four parameters, one of which is practically identifiable (Fig 5D), we can visually explore the geometric link between the Greenspan and logistic models to provide insight into the relationship between these models. To do this, we fix the identifiable parameter at the true value, λ = 1 h −1 , and numerically compute the coordinates of the remaining parameter values [γ, Q, R d ] ⊺ for which f 12 (p) is constant; that is, parameters in the Greenspan model that map to the same set of parameters in the logistic model as at the true value p 1 = [Q, R d , γ, λ] ⊺ = [0.8, 150, 1, 1] ⊺ . We show the associated two-dimensional manifolds that give a constant proliferation rate, λ, and constant maximum spheroid size, R max , in Fig 6A. The intersection of these manifolds corresponds to the one-dimensional manifold that give a set of parameters in the Greenspan model that map to the same solution curve to the logistic model in the least-squares sense. As the logistic model has good correspondence to the Greenspan model (R 2 = 0.998), we expect that  (32), and give the curve of near-constant likelihood. Note that this curve is visualised in front of the manifolds for clarity. In (B), we compare the manifolds (semi-transparent) to a linearisation corresponding to two planes normal to the rows of the model-map Jacobian (Eq (33)). We show the sloppy direction v 1 (black) and the cross product of the model-map Jacobian rows (green dotted). In (C) we show a third manifold corresponding to constant final necrotic core size, which is sufficient to identify the parameters in the Greenspan model. this intersection corresponds to a curve of near-constant likelihood; parameter sets that lie on this line are indistinguishable, leading to non-identifiability of individual parameters. The dimensionality of this manifold corresponds to the number of uninformative or sloppy directions in the Greenspan model, observed in Fig 1D. However, as the corresponding eigenvalue was small but non-zero, this curve is not a curve of constant likelihood, but rather a curve of near-constant likelihood. We consider that the eigenvector associated with the smallest eigenvalue of F 1 (p) (the FIM of the Greenspan model), denoted at each point by v 1 (p), defines a sloppy direction in parameter space; that is, a direction in which the model is relatively insensitive. Starting at the true values, we follow the sloppy direction through parameter space by solving the ODE dp dt subject to p(0) = p 1 , as shown in Fig 6A. As expected, this curve follows the intersection of the constant λ and constant R max manifolds. In Fig 6B we demonstrate that the sloppy direction is orthogonal to both manifolds using the linearisation of each manifold formulated from the model-map Jacobian, Here, u 1 gives the gradient of λ in the logistic model with respect to the parameters in the Greenspan model, and similar for u 2 to R max . Therefore, u 1 and u 2 are normal to the constant λ and constant R max manifolds at p 1 , and, therefore, define a tangent plane to each manifold at p 1 . In Fig 6B we show that the intersection of both tangent planes, given by the vector cross product u 1 × u 2 , corresponds to the sloppy direction v 1 .
We have established that data from outer radius measurements are insufficient to identify the parameters in the Greenspan model. Rather we can only constrain parameter estimates to a onedimensional line that corresponds to the intersection of two two-dimensional manifolds. These results are consistent with our previous work [14,18], where we demonstrate that measurements of the inner structure of spheroids, specifically, the necrotic core size, are required to identify parameters. We explore this in our geometric framework by considering a third two-dimensional manifold in the parameter space corresponding to realisations of the Greenspan model that give identical measurements of the necrotic core size at the conclusion of our synthetic experiment (t = 21 d). In Fig 6C we show that, as expected, the intersection of three two-dimensional manifolds corresponds to a zero-dimensional point, indicating parameter identifiability.
Geometric analysis using the radial-death model. The radial-death model, having three parameters, sits between the logistic and Greenspan model with respect to model complexity and identifiability. Therefore, we can apply the radial-death model to aid interpretation of parameters in the Greenspan model, and also learn about features of the radial-death model using the logistic model.
The sensitivity matrix for the map from the Greenspan to the radial-death model is given by First, we see a near one-to-one correspondence between the per-volume growth rate, λ, in each model. We expect this, as both models include a finite period of time where spheroid growth is exponential. Secondly, we see that γ and λ in the Greenspan model have a near oneto-one correspondence to z in the radial-death model. Finally, we see a near one-to-one correspondence between Q and R d in Greenspan's model and R d in the radial-death model. Although R d has a similar interpretation in both models, in the radial-death model it must capture both the second phase of growth inhibition and the third phase of necrosis, both of which relate to Q and R d .
We study the map between the radial-death and Greenspan models at p 5 = f 15 (p 1 ), the parameters in the radial-death model we find to be equivalent to those in the Greenspan model (Fig 4). The sensitivity matrix is given by The maximum spheroid size, R max is sensitive to all parameters in the radial-death model, whereas the per-volume growth rate, λ, has a near one-to-one correspondence between models. While the radial-death model is identifiable, we still see that the sloppiest direction corresponds to the intersection of the constant λ and constant R max manifolds defined by the map from the radial-death to logistic models (Fig 6D). In this case, we solve Eq (32) using the FIM for the radial-death model, and show the resultant curve within a 95% likelihood-based confidence region in the three-dimensional parameter space. Predicting non-identifiability. Whereas traditional identifiability and sloppiness analysis provides directions (if any) in the parameter space to which the model output is insensitive, our geometric analysis provides information about the sensitivity of model features to changes in parameters. This allows us to predict non-identifiability using known identifiability results for the simpler, phenomenological models. For instance, for R(t)/R max � 1 the solution of the logistic model corresponds to exponential growth which does not depend on R max . We conclude that R max is non-identifiable from early-time data. In Fig 7, we establish the practical identifiability of the logistic, radial-death, and Greenspan models in the case that 22 equally-spaced early-time observations are made for 0 � t � 5 d, using profile likelihood analysis. Again, we generate synthetic data from the Greenspan model, however reduce the variance of observations σ = 2 μm so that the confidence interval for estimates of λ in the logistic model are comparable to those in Fig 5, where measurements are taken for 0 � t � 21 d. As expected, R max is non-identifiable (specifically, R max is one-sided identifiable, as we can establish that the maximum spheroid size must be greater than the observed size of spheroids).
Geometric analysis of the map between the radial-death and logistic models (Eq (35)) indicates that λ in the logistic model is insensitive to changes in (z, R d ) in the radial-death model. Therefore, since only λ is identifiable from early-time data, we expect that (z;, R d ) are now non-identifiable. We see this in profile likelihood analysis for the radial-death model in Fig 7C. From the geometric analysis we were able to determine that information about (z, R d ) is contained in late-time data. Similarly, geometric analysis of the map between the Greenspan and logistic models (Eq (31)) established that changes in (Q, R d , γ) correspond to changes in R max , but not λ. Again, our analysis indicates that information about (Q, R d , γ) would not be available from early-time data. These results are consistent with profile likelihood analysis (Fig 7D), which shows that profiles for (Q, R d , γ) are flat; while (Q, R d , γ) are also non-identifiable from observations up to t = 21 d, there is less information about the parameters from early-time data.
Gaining insights from complex, non-identifiable models. For interpreting spheroid radius data, the Ward and King model performs poorly compared to the other models considered based on AIC and model sloppiness analysis (Fig 1C and 1D). As significantly simpler models can provide comparable fits, we conclude that the information contained in outer radius measurements is insufficient to identify the eight parameters in the (already simplified) Ward and King model. The reason for number of parameters in the Ward and King model is the number of biological mechanisms it includes: it is the only model we consider that explicitly incorporates the consumption of nutrients by cells, the passage of nutrients from the spheroid exterior inside the spheroid, cell death, and the initial spatial distribution of cells inside the spheroid.
We study the sensitivity of parameters in the Ward and King model to features indicated by the logistic model at p 6 = f 16 Table 1 contains a summary describing the biological interpretation and biological mechanism associated with each parameter. First, we see a correspondence in the per-volume growth rate between models. As expected, increasing the cell per-volume growth and death rates have opposing effects on the maximum spheroid size. Interestingly, we find that increasing the relative nutrient consumption rate, α, has little impact on the the growth rate, but does cause a decrease in the maximum spheroid size. Further, we see that increasing the nutrient threshold for growth inhibition, c 1 , causes an increase in the maximum spheroid size; if cells require more oxygen to proliferate, they do so more slowly but this may, overall, yield larger spheroids. Given that there are eight parameters in the Ward and King model, it is difficult to visualise the geometry of the map from the Ward and King model to the logistic model. However, we can apply the Jacobian of the Ward and King to logistic model-map, J 62 (p 6 ), to interpret the model sloppiness results for the Ward and King model in Fig 1D. We denote the rows of J 62 as u 1 and u 2 , corresponding to gradients with respect to λ and R max , respectively. Recalling that the eigenvalues of the FIM correspond to the curvature of the expected log-likelihood in the direction of each corresponding eigenvector, we interpret the relative magnitude of each eigenvalue as a measure of how informative the corresponding direction is. In Table 2 we tabulate the eigenvalues of the FIM and the dot products between the corresponding eigenvectors and unit vectors in the direction of u 1 and u 2 . Dot products close in absolute value to zero indicate orthogonality, dot products close in absolute value to one indicate a correspondence between the directions. These results confirm that sloppy or uninformative directions are orthogonal to directions that correspond to large changes in λ and R max .
The elements of the Jacobian of the model-map relate to absolute changes in the parameters, whereas elements of the sensitivity matrix relate to relative changes in the parameters. Therefore, we can use the rows of the sensitivity matrix, denoted s 1 and s 2 to move around the parameter space of the Ward and King model to achieve relative changes in the volumetric growth rate, λ, and relative changes in the maximum spheroid radius, R max , respectively (full details are available in S1 File). We demonstrate this in Fig 8A and 8B, where we choose Table 2. Orthogonality between the model-map and uninformative directions. We tabulate the eigenvalues of the FIM relative to the largest eigenvalue, and the dot product between each corresponding eigenvector and rows of the Jacobian of the Ward and King to logistic model-map. adjusted parameters in the Ward and King model to achieve a 10% relative increase and decrease in both λ and R max . Since s 1 and s 2 are not orthogonal, moving in the relative direction of s 1 results in an increase to both λ and R max . However, and potentially more usefully, we can move in the direction of an orthogonal projection of s 1 onto s 2 to achieve, approximately, a relative change in λ without changing R max . We demonstrate this in Fig 8C, showing an increase in the volumetric growth rate but a much smaller change in the maximum spheroid radius compared to results in Fig 8A.

Discussion
The nexus between model complexity and data quantity and quality is an ongoing challenge in computational biology that is often resolved subjectively rather than objectively. While new experimental technologies are rapidly increasing the detail and resolution obtainable in biological data, mathematical models can always be made arbitrarily complex. On the other hand, data is often limited in light of the biological questions that are posed. Identifiability and sloppiness analyses have been developed to harmonise model and data complexity, to guide model selection and reduction, in order to ensure parameter identifiability [19,56]. However, the complex, highly-detailed, heavily-parameterised models that are commonplace in mathematical and computational biology are often required to answer important biological questions: a model of tumour spheroid growth must incorporate nutrient dependencies to provide insight into the role nutrients play on growth [8,57]. As we show, for some data only simple phenomenological models, such as the logistic and Gompertz growth models, are those that are identifiable. These models can provide excellent agreement to experimental data, allow the comparison and interpretation of experiments, however not being constructivist, provide only limited insights into underlying biological mechanisms.
In many cases, simple phenomenological models produce a goodness-of-fit on par with that of a complex mechanistic model (Fig 1B and 1C). As a result, traditional model selection methodology will favour the simplicity and identifiability of the simple model, penalising the number of parameters in the complex model. Where the non-identifiable parameters in complex mechanistic models carry direct biological interpretations (the nutrient sensitivity, for instance) of prime interest to experimental scientists and biologists, the identifiable parameters in the simple model carry interpretations relating to features of the data (the early-time rate of change or the maximum spheroid size, for instance). In this work, we utilise this key difference We apply the rows in the sensitivity matrix to adjust the parameters in the Ward and King model to achieve approximate relative changes in the parameters in the logistic model. In (A-B) we move in the direction of each row of the corresponding row of the sensitivity matrix; in (C) we move in direction of a projection of row 1 orthogonal to row 2, resulting in relative changes to the volumetric growth rate, but not to maximum spheroid radius. https://doi.org/10.1371/journal.pcbi.1010844.g008 to draw biological insight from complex mechanistic models by studying the geometry of a map from the parameters in the complex model to those in the identifiable surrogate. One interpretation of our approach is to provide an intermediate mode of interpretation that sits between the model parameters and the likelihood (or other goodness-of-fit metric) that is traditionally studied in identifiability and model sloppiness analysis. In contrast to studying the sensitivity of the model in terms of the overall fit, we effectively decompose the fit into features and study the sensitivity of model parameters to these features. This approach enables us leverage mechanistic modelling to gain insights from data that would otherwise be lost if a onedimensional goodness-of-fit metric, such as the likelihood, were to be studied directly.
We demonstrate our approach by analysing common models and typical data of tumour spheroid growth. Mathematical models of tumour spheroid experiments range from the simplistic-however routinely and effectively applied-logistic growth models [11,58], to spatial models that can capture the density of arbitrary numbers of cell and nutrient species [13,17,59,60], and to individual-based models that describe the individual behaviour of every cell in the spheroid [61][62][63]. Despite the complexity of even this simple experimental model of tumour growth, data often comprise only measurements of overall tumour spheroid size. More complicated experimental systems, such as in vivo vascularised tumour growth, are accompanied by a corresponding menagerie of complex models [64][65][66], however data from these experiments can be even more limited, noisy, or sparse in comparison with experimental models of avascular tumour growth [18]. Even the relatively simple Greenspan model, which comprises only four unknown parameters, is non-identifiable without measurements of inner spheroid structure [14]. Our goal in this work is to draw insights from such models with complexity mismatched to that of the available data.
The model-data relationship is typically explored with structural or practical identifiability analysis [26]; the former in an infinite-data, model-only frame of reference, the latter in consideration of the noisy observation process that ties the model to the data. While we first establish the practical identifiability of each model, our geometric analysis does not fall into either of these classifications for a number of reasons. First, the model-map is defined in the least squares sense, and does not explicitly incorporate data. Secondly, as the surrogate model is not necessarily nested within the complex model of interest, the two are not equivalent in a meaningful infinite-noise-free-data limit. As a consequence, if the complex model is considered reality, the surrogate model produces predictions that are biased. In the context of data, we see this as an advantage as even complex models are by definition abstractions of reality. As we utilise the surrogate model to characterise features of the data, our approach is overall robust to this bias. We demonstrate this by using the logistic model as a surrogate for the Greenspan model in the main text, despite the bounded Gompertz model having a crowding function far more similar to that of the Greenspan model. Analysis using the Gompertz and Richards models (S1 File) is similar to that using the logistic model.
A limitation of FIM-based identifiability and sloppiness analysis, and our sensitivitymatrix-based geometric analysis, is a restriction to providing only local information. Effectively, these techniques relate to a quadratic approximation and linearisation, respectively, about the MLE (or parameter values otherwise under consideration) of the complex model and are consequentially sensitive in cases where the corresponding likelihood is multimodal. While the manifolds relating to the map between the logistic and Greenspan models (Fig 6A  and 6B) are locally linear near the parameter combination of interest, globally the manifold relating to λ appears hyperbolic. Different points on the constant-likelihood curve have the potential to produce substantially different sensitivity matrices. One approach to address this is to incorporate prior knowledge to regularise the parameter fitting problem. Recent work considers identifiability and sloppiness analysis based directly on the parameter covariance matrix estimated from Bayesian methods such as Markov-chain Monte-Carlo to provide an overall snapshot of the global parameter sensitivities [35]. However, we expect this approach to be problematic in our geometric framework, since the model-map is based on an equivalence between models that may only apply locally in the vicinity of the points used to compute the between-model sensitivity matrix.
The relatively small number of unknown parameters in the Greenspan model allow us to visually explore the geometry of the parameter space using surrogate models, providing insight into non-linearities that are not captured by the model-map sensitivity matrix. However, in contrast to traditional identifiability analysis where parameters are generally classified as identifiable or not, the model-map sensitivity matrix has the ability to further classify non-identifiable parameters by which feature they relate to. In the vicinity of the parameter values of interest, this classification can allow for graphical geometric analysis even for models with more than three unknown parameters, by decomposing the parameter space into low-dimensional subsets that relate to individual features. For example, in the Ward and King model, the three parameters with the strongest correspondence to the maximum spheroid size, (λ, α, δ), can be prioritised for further analysis ahead of the full, eight-dimensional parameter space [29].
Aside from providing insight into the sensitivity of model features to parameters and predicting non-identifiability, we provide a simple demonstration of how the model-map relationship can be used to move in the parameter space to produce changes to specific model features ( Fig  8C). Akin to moving in the direction of the sloppiest direction, these results show how to constrain movements in the parameter space to model feature manifolds. For heavily-parameterised models that are difficult to calibrate (perhaps, for example, due to multi-modal likelihoods), constraining movements in the optimisation algorithm used for model calibration to these manifolds allows successive matching of model features: for instance, first moving to parameter combinations that produce the desired maximum spheroid radius, and then moving on this manifold to match the growth curve shape and scale. More generally, applying surrogate models that are themselves candidate models raises interesting possibilities for future analysis. Generalisations of the logistic model, such as the three-parameter Richards model, provide a lowdimensional summary of model behaviour that can be characterised using machine learning or Gaussian processes [67]. Building up a global model-map between the complex and surrogate models is another approach to overcome the localisation limitation of our methods, and could be computationally advantageous in the case of computationally expensive complex models.
Experimental data are often limited in light of the biological questions posed of them. Likewise, in the mathematical and modelling literature, complex models are numerous and often more suitable to biological questions of interest, yet can be ill-suited for parameterisation from the available data. In this work, we develop a geometric analysis to gain insights from complex, non-identifiable models using simple surrogate models with parameters that relate to features in the data. We expect our analysis to apply to any hierarchy of non-identifiable and identifiable models of biological systems.
Supporting information S1 File. Supporting material document. (PDF)