# On testing structural identifiability by a simple scaling method: Relying on scaling symmetries can be misleading

• Alejandro F. Villaverde ,

Roles Conceptualization, Funding acquisition, Investigation, Methodology, Writing – original draft, Writing – review & editing

afvillaverde@uvigo.gal

Affiliation Universidade de Vigo, Department of Systems Engineering and Control, Vigo, Galicia, Spain

• Gemma Massonis

Roles Investigation, Methodology, Writing – original draft, Writing – review & editing

Affiliations Universidade de Vigo, Department of Applied Mathematics II, Vigo, Galicia, Spain, Bioprocess Engineering Group, IIM-CSIC, Vigo, Galicia, Spain

# On testing structural identifiability by a simple scaling method: Relying on scaling symmetries can be misleading

• Alejandro F. Villaverde,
• Gemma Massonis
x

## Abstract

A recent paper published in PLOS Computational Biology [1] introduces the Scaling Invariance Method (SIM) for analysing structural local identifiability and observability. These two properties define mathematically the possibility of determining the values of the parameters (identifiability) and states (observability) of a dynamic model by observing its output. In this note we warn that SIM considers scaling symmetries as the only possible cause of non-identifiability and non-observability. We show that other types of symmetries can cause the same problems without being detected by SIM, and that in those cases the method may lead one to conclude that the model is identifiable and observable when it is actually not.

## Notation and definitions

Consider an ordinary differential equation (ODE) model given by: where x(t), λ, u(t), and y(t) are the state, parameter, input, and output vectors, respectively, and f and h are analytic vector functions. Assuming perfect knowledge of u(t) and y(t), the ith component λi of the parameter vector λ is structurally locally identifiable if for almost any λ* there is a neighbourhood and an admissible input u(t) for which the following property holds: Similarly, a state xi is observable if its initial value xi(t0) can be determined from y(t) and u(t) in some finite time tf > t0.

A parameter is structurally locally identifiable if its value can be distinguished from other values in its neighbourhood, but not necessarily from other discrete transformations. In contrast, structurally globally identifiable parameters are uniquely determined in the whole parameter space. The SIM test analysed in this paper is in the category of methods that study structural local identifiability.

It should be noted that structural local identifiability and observability are properties that hold for almost all points in parameter or state space—that is, with the possible exception of a subset of measure zero. Likewise, these properties may not hold for all the possible input vectors—the above definitions entail that at least a subset of the admissible inputs is sufficiently exciting for that purpose.

## Description of the SIM test

The existence of symmetries in the equations of a dynamic model is one reason a model may not be structurally identifiable and observable. Such symmetries [2, 3] allow for similarity transformations [4], that is, transformations of parameters and state variables that leave the model output invariant [5]. Since the parameters and states involved in such symmetries cannot be distinguished by observing the output, they are unidentifiable and unobservable, respectively.

The SIM test proposed in [1] begins by decomposing the dynamic equations of a model as a sum of functionally independent functions. That is, the ordinary differential equation of each state is expressed as: where are the subsets of states and parameters that appear in fik, and each fik is functionally independent of fil, meaning that they satisfy the generalized Wronskian theorem [6]. SIM assumes that the model output consists of a subset of the state variables, i.e. the observation function must be of the form y(t) = h(x(t)) = xY(t), where xY(t) is a subset of the elements of x(t). Next, unknown parameters λi and unobserved states xj are multiplied by unknown scaling factors, and each functionally independent function is equated to its scaled version,

We remark that the observed—i.e. directly measured—states must not be multiplied by scaling factors. Finally, combinations of the scaling factors that leave the equations invariant are sought. The SIM classifies the parameters (respectively, state variables) for which the above equations imply as structurally locally identifiable (respectively, observable, ).

Thus, the SIM approach to analysing structural identifiability and observability is to search for a particular type of symmetry, namely scaling symmetries. If a model only contains scaling symmetries, or if it does not contain symmetries of any kind, SIM provides a correct result. However, other types of symmetries can also be present in ordinary differential equation models. A number of examples from biology have been discussed in the recent literature, see e.g. [79]. If the equations of a model only have symmetries that are not of the scaling type, the SIM test does not detect them and wrongly classifies the related parameters as structurally identifiable and the related state variables as observable.

SIM’s limitation to scaling symmetries is mentioned in [1] (“our identifiability test (…) provides a simple way to find a type of symmetry that is related to scale invariance”). However, that paper does not mention that due to this limitation the SIM test can yield wrong results; instead, it claims that “scaling invariance of the model equations can be used to determine whether the parameters are unidentifiable or not”. Indeed, the existence of a scaling invariance indicates that the parameters are unidentifiable. However, the opposite is not true, i.e. its absence does not mean that the parameters are identifiable. We discuss two counter-examples to illustrate this risk.

## Counter-example 1: The FitzHugh-Nagumo model

We first consider the classical FitzHugh-Nagumo model, whose structural identifiability and symmetries were discussed in [7]. It is a nonlinear model that can describe an excitable system such as a neuron, and it can exhibit oscillatory behaviour. Its equations are: (1) (2) (3) where the xi(t) are the states, y(t) is the measurable output, and a, b, c, d are unknown parameters. The initial condition of the first state, x1(t = 0), is known, because it is a directly measured state; x2(t = 0) is unknown. In what follows we omit the dependency on t to simplify the notation.

We show the calculations of the SIM test below. Briefly, each functionally independent term in (1) and (2) is equated to its scaled counterpart, which introduces scaling factors u* for every state and parameter (except for x1, which is directly measured). The scaled terms in the ODE of are divided by to account for the fact that the derivative of the state is also scaled. The procedure yields the following equations, where (4)–(7) come from (1) and (8)–(10) come from (2): (4) (5) (6) (7) (8) (9) (10) The implications in the above equations are deduced in cascade, i.e. the finding from (4) that uc = 1 is used in (6) to obtain , and so on. Note that these results are valid for almost all values of the involved parameters and states, but not if they are zero. Under these conditions, since the only possible solution is the trivial one (i.e. all the scaling factors are equal to one), the SIM test classifies this model as structurally locally identifiable (s.l.i.) and observable.

However, this result is incorrect: the model is in fact unidentifiable and unobservable, due to the existence of an affine symmetry [7]. This result, which we obtained with the STRIKE-GOLDD toolbox [10], can be easily verified analytically, as we now show. The symmetry analysis, performed using the procedure described in [9], finds the following symmetries, expressed as one-parameter Lie groups of transformations: (11) (12) (13) The symmetries are defined as a function of a new parameter ε. The above expressions (11)–(13) indicate that replacing the terms to the left of the arrow with their right hand equivalents does not modify the model output. Replacing the right hand expressions of (11)–(13) in the model ODEs (1) and (2) it is immediate to see that the above transformations leave the model equations invariant: (14) (15) Thus, the state x2 and the parameters a, d cannot be distinguished from their transformed values (11)–(13), and are therefore unobservable and structurally unidentifiable, respectively. However, since the cause of this lack of structural identifiability and observability is not a scaling symmetry but an affine one, SIM wrongly classifies them as observable and structurally identifiable.

## Counter-example 2: A linear compartment model

This second case study shows that not only nonlinear models can have non-scaling symmetries. The following linear compartment model was presented as Example 6.3 in [11], where it was reported that it is unidentifiable and has no identifiable scaling reparameterization. The model consists of four states (one of which is directly measured, x1), ten parameters, and one known input, g, which is a generic analytic function. As in the previous example, only the initial condition of the measured state, x1(t = 0), is known. The initial conditions of the other states are assumed to have generic unknown values. (16) (17) (18) (19) (20) The model above is structurally unidentifiable and non-observable due to the existence of scaling symmetries and one higher order Lie symmetry. Reformulating the model by removing the scaling symmetries, it is possible to obtain an equivalent model with only seven parameters: (21) (22) (23) (24) where the measured output is again y = x1. By applying the SIM procedure we obtain the following equations: (25) (26) (27) (28) (29) (30) (31) (32) (33) (34) Note that, since x1 is measured and g is a known input, no scaling factors are introduced for them. Replacing the values obtained in (25), (26), (28), (30), and (34) in the remaining equations yields: (35) (36) (37) (38) (39) where the numbers above the ⇒ operator indicate the equations used in the derivations. Thus, we obtain that all and, consequently, the SIM test classifies this model as structurally locally identifiable and observable.

However, this result is incorrect: in fact, only a11, a21, a22 and a34 are s.l.i.. The remaining parameters (a33, a43 and a44) are structurally unidentifiable, and x4 is non-observable. As in the previous example, we can verify the correctness of this result (obtained with the STRIKE-GOLDD toolbox) by examining the symmetries of the model, which can be written as follows: (40) (41) (42) (43) Following the same procedure as in the previous example, it can be checked that, by replacing in the model Eqs (23) and (24) the left hand side of (40)–(43) with the right hand side, the transformations are cancelled and we obtain the original equations again (calculations not shown). This proves that the model is structurally unidentifiable, and therefore the result of the SIM test is incorrect.

## Simple methods are appealing, but their applicability must be examined with caution

The previous example demonstrates that even relatively simple models can have symmetries that make them unidentifiable and non-observable, and which are not of the scaling type. The counter-examples have also shown that, while the SIM test does not find these symmetries, it is possible to analyse the structural identifiability and observability of these models with symbolic computation methods. These results prompt us to comment on another aspect of [1], namely its assessment of the performance of computational methods. That paper analyses thirteen models with a number of symbolic computation methods, which, when compared to SIM, are portrayed as less applicable, less conclusive, and/or producing incompatible results. However, we have found that the performance of at least some of those computational tools is misrepresented. Whereas a fair comparison of computational times is generally hard to establish due to dependence on computational platforms and implementation/tuning details, correctness of the results can be established more objectively. In particular, we have examined the case of the STRIKE-GOLDD toolbox; other available computational tools include COMBOS [12], DAISY [13], EAR [14], GenSSI [15], ObservabilityTest [16], and SIAN [17]. In [1] it is claimed that the STRIKE-GOLDD toolbox cannot analyse four of the models and yields wrong results for a fifth one. In contrast, we analysed these five models with STRIKE-GOLDD and obtained conclusive and correct results in all cases. The files that reproduce our calculations can be downloaded from https://www.biorxiv.org/content/10.1101/2020.11.30.403956v1.supplementary-material; implementations of the two case studies analysed in this paper are also provided in the link. The issues reported in [1] suggest an incorrect use of the toolbox; however, without additional information we prefer to leave the speculations about their specific causes to the reader.

The ability to perform calculations by hand is a desirable feature, not only due to the convenience of not requiring a computing environment, but also because this process can provide unique insights about a problem. In this regard, the SIM test proposed in [1] is appealing and, indeed, it yields correct results in many cases. Unfortunately, it also gives wrong results in other cases, without providing any hint whatsoever. As this note has shown, even apparently simple models can have non-scaling symmetries for which SIM fails. Therefore, SIM can be used as a preliminary test for structural unidentifiability. If the test classifies a model as identifiable, the result is inconclusive, and it should be double-checked with a different method.

Structural identifiability and observability are properties that often defy intuition, and the search for a simple approach to analyse them has proven elusive for decades. Their analysis usually entails complex symbolic computations that require specialized software. Fortunately, there is a number of well-established tools, available in a variety of computing environments, which can help in this endeavour.

## References

1. 1. Castro M, de Boer RJ. Testing structural identifiability by a simple scaling method. PLOS Computational Biology. 2020;16(11):e1008248. pmid:33141821
2. 2. Bluman G, Anco S. Symmetry and integration methods for differential equations. vol. 154 of Applied Mathematical Sciences. New York, USA: Springer-Verlag; 2008.
3. 3. Arrigo DJ. Symmetry analysis of differential equations: an introduction. Hoboken, NJ, USA: John Wiley & Sons; 2015.
4. 4. Yates JW, Evans ND, Chappell MJ. Structural identifiability analysis via symmetries of differential equations. Automatica. 2009;45(11):2585–2591.
5. 5. Vajda S, Godfrey KR, Rabitz H. Similarity transformation approach to identifiability analysis of nonlinear compartmental models. Mathematical Biosciences. 1989;93(2):217–248. pmid:2520030
6. 6. Larson R. Elementary linear algebra. Nelson Education; 2016.
7. 7. Anguelova M, Karlsson J, Jirstrand M. Minimal output sets for identifiability. Mathematical Biosciences. 2012;239(1):139–153. pmid:22609467
8. 8. Merkt B, Timmer J, Kaschek D. Higher-order Lie symmetries in identifiability and predictability analysis of dynamic models. Physical Review E. 2015;92(1):012920. pmid:26274260
9. 9. Massonis G, Villaverde AF. Finding and breaking Lie symmetries: implications for structural identifiability and observability in biological modelling. Symmetry. 2020;12(3):469.
10. 10. Villaverde AF, Tsiantis N, Banga JR. Full observability and estimation of unknown inputs, states, and parameters of nonlinear biological models. Journal of the Royal Society Interface. 2019;16:20190043. pmid:31266417
11. 11. Meshkat N, Sullivant S. Identifiable reparametrizations of linear compartment models. Journal of Symbolic Computation. 2014;63:46–67.
12. 12. Meshkat N, Kuo CEz, DiStefano J III. On finding and using identifiable parameter combinations in nonlinear dynamic systems biology models and COMBOS: a novel web implementation. PLoS One. 2014;9(10). pmid:25350289
13. 13. Saccomani M, Bellu G, Audoly S, D’Angiò L. A New Version of DAISY to Test Structural Identifiability of Biological Models. In: International Conference on Computational Methods in Systems Biology. Springer; 2019. p. 329–334.
14. 14. Karlsson J, Anguelova M, Jirstrand M. An efficient method for structural identifiability analysis of large dynamic systems. IFAC Proceedings Volumes. 2012;45(16):941–946.
15. 15. Ligon TS, Fröhlich F, Chiş OT, Banga JR, Balsa-Canto E, Hasenauer J. GenSSI 2.0: multi-experiment structural identifiability analysis of SBML models. Bioinformatics. 2018;34(8):1421–1423. pmid:29206901
16. 16. Sedoglavic A. A probabilistic algorithm to test local algebraic observability in polynomial time. Journal of Symbolic Computation. 2002;33:735–755.
17. 17. Hong H, Ovchinnikov A, Pogudin G, Yap C. SIAN: software for structural identifiability analysis of ODE models. Bioinformatics. 2019;35(16):2873–2874. pmid:30601937