Parameter Identifiability and Redundancy: Theoretical Considerations

Background Models for complex biological systems may involve a large number of parameters. It may well be that some of these parameters cannot be derived from observed data via regression techniques. Such parameters are said to be unidentifiable, the remaining parameters being identifiable. Closely related to this idea is that of redundancy, that a set of parameters can be expressed in terms of some smaller set. Before data is analysed it is critical to determine which model parameters are identifiable or redundant to avoid ill-defined and poorly convergent regression. Methodology/Principal Findings In this paper we outline general considerations on parameter identifiability, and introduce the notion of weak local identifiability and gradient weak local identifiability. These are based on local properties of the likelihood, in particular the rank of the Hessian matrix. We relate these to the notions of parameter identifiability and redundancy previously introduced by Rothenberg (Econometrica 39 (1971) 577–591) and Catchpole and Morgan (Biometrika 84 (1997) 187–196). Within the widely used exponential family, parameter irredundancy, local identifiability, gradient weak local identifiability and weak local identifiability are shown to be largely equivalent. We consider applications to a recently developed class of cancer models of Little and Wright (Math Biosciences 183 (2003) 111–134) and Little et al. (J Theoret Biol 254 (2008) 229–238) that generalize a large number of other recently used quasi-biological cancer models. Conclusions/Significance We have shown that the previously developed concepts of parameter local identifiability and redundancy are closely related to the apparently weaker properties of weak local identifiability and gradient weak local identifiability—within the widely used exponential family these concepts largely coincide.


Introduction
Models for complex biological systems may involve a large number of parameters. It may well be that some of these parameters cannot be derived from observed data via regression techniques. Such parameters are said to be unidentifiable or non-identifiable, the remaining parameters being identifiable. Closely related to this idea is that of redundancy, that a set of parameters can be expressed in terms of some smaller set. Before data is analysed it is critical to determine which model parameters are identifiable or redundant to avoid ill-defined and poorly convergent regression.
Identifiability in stochastic models has been considered previously in various contexts. Rothenberg [1] and Silvey [2] (pp. 50, 81) defined a set of parameters for a model to be identifiable if no two sets of parameter values yield the same distribution of the data. Catchpole and Morgan [3] considered identifiability and parameter redundancy and the relations between them in a general class of (exponential family) models. Rothenberg [1], Jacquez and Perry [4] and Catchpole and Morgan [3] also defined a notion of local identifiability, which we shall extend in the Analysis Section. [There is also a large literature on identifability in deterministic (rather than stochastic) models, for example the papers of Audoly et al. [5], and Bellu [6], which we shall not consider further.] Catchpole et al. [7] and Gimenez et al. [8] outlined use of computer algebra techniques to determine numbers of identifiable parameters in the exponential family. Viallefont et al. [9] considered parameter identifiability issues in a general setting, and outlined a method based on considering the rank of the Hessian for determining identifiable parameters; however, some of their claimed results are incorrect (as we outline briefly later). Gimenez et al. [8] used Hessian-based techniques, as well as a number of purely numerical techniques, for determining the number of identifiable parameters. Further general observations on parameter identifiability and its relation to properties of sufficient statistics are given by Picci [10], and a more recent review of the literature is given by Paulino and de Bragança Pereira [11].
In this paper we outline some general considerations on parameter identifiability. We shall demonstrate that the concepts of parameter local identifiability and redundancy are closely related to apparently weaker properties of weak local identifiability and gradient weak local identifiability that we introduce in the Analysis Section. These latter properties relate to the uniqueness of likelihood maxima and likelihood turning points within the vicinity of sets of parameter values, and are shown to be based on local properties of the likelihood, in particular the rank of the Hessian matrix. Within the widely-used exponential family we demonstrate that these concepts (local identifiability, redundancy, weak local identifiability, gradient weak local identifiability) largely coincide. We briefly consider applications of all these ideas to a recently developed general class of carcinogenesis models [12,13,14], presenting results that generalize those of Heidenreich [15] and Heidenreich et al. [16] in the context of the two-mutation cancer model [17]. These are outlined in the later parts of the Analysis and the Discussion, and in more detail in a companion paper [12].

General Considerations on Parameter Identifiability
As outlined in the Introduction, a general criterion for parameter identifiability has been set out by Jacquez and Perry [4]. They proposed a simple linearization of the problem, in the context of models with normal error. They defined a notion of local identifiability, which is that in a local region of the parameter space, there is a unique h 0 that fits some specified body of data, x i ,y i ð Þ n i~1 , i.e. for which the model predicted mean h xjh ð Þ is such that the residual sum of squares: has a unique minimum. We present here a straightforward generalization of this to other error structures. If the model prediction h x ð Þ~h xjh ð Þ for the observed data y is a function of some vector parameters h~h j À Á p j~1 then in general it can be assumed, under the general equivalence of likelihood maximization and iteratively reweighted least squares for generalized linear models [18](chapter 2) that one is trying to minimize: where y l 1ƒlƒn ð Þ n §p ð Þ is the observed measurement (e.g., the numbers of observed cases in the case of binomial or Poisson models) at point l and the v l 1ƒlƒn ð Þare the current estimates of variance at each point. This has a unique minimum in the perturbing Dh~Dh j À Á p j~1 (h~h 0 zDh) given by H T DHDh~H T Dd, where , D~diag 1=v 1 ,1=v 2 ,:::,1=v n ½ , whenever H T DH has full rank ( = p). More generally, suppose that the likelihood associated with observation x l is l(x l jh) and let L x l jh ð Þ~ln l x l jh ð Þ ½ . Then generalizing the least squares criterion (1) we now extend the definition of local identifiability to mean that there is at most one maximum of: and in general this system of p equations has a unique solution in has full rank (= p). This turns out to be (nearly) the case, and will be proved later (Corollary 2). More rigorously, we have the following result.
We prove this result in Text S1 Section A. As an immediate consequence we have the following result.
Corollary 1. For a given x~(x 1 ,:::,x n )[ S n , a sufficient condition for the likelihood (3) to have at most one maximum and one turning point in the neighborhood of a given h~(h 1 ,:: 5~p . In particular, if this condition is satisfied h is gradient weakly locally identifiable (and therefore weakly locally identifiable). (V5R p is the parameter space.) That this condition is not necessary is seen by consideration , where C is chosen so that this has unit mass. Then which has rank 0 at h~x and a unique maximum there. In particular, this shows that the result claimed by Viallefont et al.
(for some permutation p : f1,2,:::, pg?f1,2,:::, pg) is weakly maximal (respectively weakly gradient maximal) if for any permissible fixed h p(i) is weakly locally identifiable (respectively gradient weakly locally identifiable) at that point, but that this is not the case for any larger number of parameters. A subset of parameters h p(i) À Á k i~1 is strongly maximal (respectively strongly gradient maximal) if for any permissible fixed h p(i) À Á p i~kz1 and any open U5V restricted to the set U is weakly maximal (respectively weakly gradient maximal), i.e., all [U are weakly maximal (respectively weakly gradient maximal). From this it easily follows that a strongly (gradient) maximal set of parameters h p(i) À Á k i~1 is a fortiori weakly (gradient) maximal at all points h 0 for any permissible h p(i) À Á p i~kz1 . Assume now that k of the p h i are a weakly maximal set of parameters. So for some permutation p : f1,2,:::, pg?f1,2,:::, pg and for and some data x~(x 1 ,:: but that this is not the case for any larger number of parameters. Assume that rm in the obvious sense. Assume now that the (h p(i) ) k i~1 are strongly maximal. Suppose that for some h 1~h1i ð Þ p i~1 [ V and some x~(x 1 ,:::,x n ) [ S n it is the case is symmetric, there is a permutation p 0 : f1,:::, pg?f1,:::, pg for which rk L 2 L(xjh) is not a strongly maximal set of parameters in N 0 . With small changes everything above also goes through with ''weakly gradient maximal'' substituted for ''weakly maximal'' and ''strongly gradient maximal'' substituted for ''strongly maximal''. Therefore we have proved the following result.
We assume that the natural parameters z l~zl h i ð Þ p i~1 ,z l Â Ã are functions of the model parameters (h i ) p i~1 and some auxiliary data z l , but that the scaling parameter w is not. Let m l~b 0 (z l )~E½x l , so that m l~b 0 (z l ½(h i ) p i~1 ,z l ). In all that follows we shall assume that the function b(z) is C 2 . The following definition was introduced by Catchpole and Morgan [3]. Definition 3. With the above notation, a set of parameters (h i ) p i~1 [ V is parameter redundant for an exponential family model if m l~b 0 (z l ½(r i ) q i~1 ,z l ) can be expressed in terms of some strictly smaller parameter vector (r i ) q i~1 (qvp). Otherwise, the set of parameters (h i ) p i~1 is parameter irredundant or full rank. Catchpole and Morgan [3] is of full rank and so negative definite, so by the strong law of large numbers we can choose x~(x l ) n l~1 [ P n so that the same is true of . This implies is of full rank, and therefore by Corollary 1 h~(h i ) p i~1 is (gradient) weakly locally identifiable. Furthermore, the above argument shows that if h~(h i ) p i~1 are a conditionally full rank set of parameters then on the (open) set Remarks: It should be noted that part (i) of this generalizes part (i) of Theorem 4 of Catchpole and Morgan [3], who proved that if a model is parameter redundant then it is not locally identifiable. However, some components of part (ii) (that being essentially full rank implies gradient weak local identifiability) is weaker than the other result, proved in part (ii) of Theorem 4 of Catchpole and Morgan [3], namely that if a model is of essentially full rank it is locally identifiable. As noted by Catchpole and Morgan [3] (pp. 193-4), there are exponential-family models that are conditionally full rank, but not locally identifiable, so part (iii) is about as strong a result as can be hoped for.
From Theorem 3 we deduce the following. k then this subset is gradient weakly locally identifiable at this point.
(ii) If a subset of parameters (h p(i) ) k i~1 is weakly locally identifiable and for some x [ P n this point is a local maximum of the likelihood then it is parameter irredundant, i.e., of full rank, so rk½I(h)~k, so that for some Proof. This is an immediate consequence of the remarks after Definition 1, Corollary 1, Theorem 3 (i) and Theorems 1 and 3 of Catchpole and Morgan [3]. QED.
Remarks: (i) By the remarks preceding Theorem 3 the conditions of part (i) (that for some x~(x 1 ,:::,x n ) [ S n it is the case that are an essentially full rank set of parameters for the model. (ii) Assume the model is constructed from a stochastic cancer model embedded in the exponential family, in the sense outlined in Text S1 Section B, so that the natural parameters z l~zl ½(h i ) p i~1 ,z l are functions of the model parameters (h i ) p i~1 and some auxiliary data (z l ) n l~1 , and the means are given by ,y l is the cancer hazard function. In this case, as shown in Text S1 Section B, is a rank 1 matrix and can be made small in relation to the first term, e.g., by making z l small. Therefore finding data (x,y,z)~(x 1 ,:::,x n ,y 1 ,:::,y n ,z 1 ,:::,z n ) [S n for which rk

Hessian vs Fisher Information Matrix as a Method of Determining Redundancy and Identifiability in Generalised Linear Models
We, as with Catchpole and Morgan [3], emphasise use of the Hessian of the likelihood rather than the Fisher information matrix considered by Rothenberg [1]. In the context of GLMs, we have L(xjh)~P n l~1 x l z l {b(z l ) a(w) zc(x l ,w) ! and g(m i )g A ij h j for some link function g() and fixed matrix ,:::, 1 b 00 (z n ) . The Fisher information is therefore given by i, j is the data variance. Theorem 1 of Rothenberg [1] states that a model is locally identifiable if and only if rk½I(h)~p. As above (Corollary 2 (ii)), heuristically parameter irredundancy, local identifiability, gradient weak local identifiability and weak local identifiability are all equivalent and occur whenever rk(DDV DD T )~rk(D)~p. Clearly evaluating the rank of D is generally much easier than that of DDV DD T . Catchpole and Morgan [3] demonstrate use of Hessian-based methods to estimate parameter redundancy in a class of capturerecapture models.
However, for certain applications, both the Fisher information and the Hessian must be employed, as we now outline. Assume that the model is constructed from a stochastic cancer model embedded in an exponential family model in the sense outlined in Text S1 Section B. The key to showing that such an embedded model has no more than N irredundant parameters is to construct (as is done in Little et al. [12]) some scalar functions G 1 (:),G 2 (:),:::,G N (:) such that the cancer hazard function h(h) can be written as h(G 1 (h),G 2 (h),:::,G N (h)). Since the cancer model is embedded in a member of the exponential family (in the sense outlined in Text S1 Section B) the same will be true of the total log-likelihood L(xjh)~L(xjG 1 (h),G 2 (h),:::,G N (h)). By means of the Chain Rule we obtain L 2 L(xjh) Lh i Lh j~X N l,k~1 L 2 L(xjG 1 ,:::,G N ) LG l LG k LG l Lh i LG k Lh j z P N l~1 LL(xjG 1 ,:::,G N ) LG l L 2 G l Lh i Lh j , so that the Fisher information matrix is given by: L 2 L(xjG 1 ,:::,G N ) LG l Lh i E L 2 L(xjG 1 ,:::,G N ) which therefore has rank at most N. Therefore by Corollary 2 there can be at most N irredundant parameters, or indeed (gradient) weak locally identifiable parameters. [A similar argument shows that if one were to reparameterise (via some invertible C 2 mapping h~f (v)) then the embedded log-likelihood L(xjf {1 (h))~L(xjv) associated with h(f {1 (h))~h(v) must also have Fisher information matrix of rank at most N.] By remark (ii) after Corollary 2, to show that a subset of cardinality N of the parameters (h i ) p i~1 is (gradient) weak locally identifiable parameters, requires that one show that has rank at least N for some (h,y l ). This is the approach adopted in the paper of Little et al. [12].

Discussion
In this paper we have introduced the notions of weak local identifiability and gradient weak local identifiability, which we have related to the notions of parameter identifiability and redundancy previously introduced by Rothenberg [1] and Catchpole and Morgan [3]. In particular we have shown that within the exponential family models parameter irredundancy, local identifiability, gradient weak local identifiability and weak local identifiability are largely equivalent.
The slight novelty of our approach is that the notions of weak local identifiability and gradient weak local identifiability that we introduce are related much more to the Hessian of the likelihood rather than the Fisher information matrix that was considered by Rothenberg [1]. However, in practice, the two approaches are very similar; Catchpole and Morgan [3] used the Hessian of the likelihood, as do we, because of its greater analytic tractability. The use of this approach is motivated by the application, namely to determine identifiable parameter combinations in a large class of stochastic cancer models, as we outline at the end of the Analysis Section. In certain applications the Fisher information may be best for estimating the upper bound to the number of irredundant parameters, but the Hessian may be best for estimating the lower bound of this quantity.
In the companion paper of Little et al. [12] we consider the problem of parameter identifiability in a particular class of stochastic cancer models, those of Little and Wright [13] and Little et al. [14]. These models generalize a large number of other quasi-biological cancer models, in particular those of Armitage and Doll [21], the two-mutation model [17], the generalized multistage model of Little [22], and a recently developed cancer model of Nowak et al. [23] that incorporates genomic instability. These and other cancer models are generally embedded in an exponential family model in the sense outlined in Text S1 Section B, in particular when cohort data are analysed using Poisson regression models, e.g., as in Little et al. [13,14,24]. As we show at the end of the Analysis Section, proving (gradient) weak local identifiability of a subset of cardinality k of the parameters (h i ) p Little et al. [12] demonstrate (by exhibiting a particular parameterization) that there is redundancy in the parameterization for this model: the number of theoretically estimable parameters in the models of Little and Wright [13] and Little et al. [14] is at most two less than the number that are theoretically available, demonstrating (by Corollary 2) that there can be no more than this number of irredundant parameters. Two numerical examples suggest that this bound is sharp -we show that the rank of the Hessian, rk L 2 h(h,y) Lh i Lh j ! p i, j~1 2 4 3 5 , is two less than the row dimension of this matrix. This result generalizes previously derived results of Heidenreich and others [15,16] relating to the twomutation model.