## Figures

## Abstract

Finding a model-based optimal design that can optimally discriminate among a class of plausible models is a difficult task because the design criterion is non-differentiable and requires 2 or more layers of nested optimization. We propose hybrid algorithms based on particle swarm optimization (PSO) to solve such optimization problems, including cases when the optimal design is singular, the mean response of some models are not fully specified and problems that involve 4 layers of nested optimization. Using several classical examples, we show that the proposed PSO-based algorithms are not models or criteria specific, and with a few repeated runs, can produce either an optimal design or a highly efficient design. They are also generally faster than the current algorithms, which are generally slow and work for only specific models or discriminating criteria. As an application, we apply our techniques to find optimal discriminating designs for a dose-response study in toxicology with 5 possible models and compare their performances with traditional and a recently proposed algorithm. In the supplementary material, we provide a R package to generate different types of discriminating designs and evaluate efficiencies of competing designs so that the user can implement an informed design.

**Citation: **Chen R-B, Chen P-Y, Hsu C-L, Wong WK (2020) Hybrid algorithms for generating optimal designs for discriminating multiple nonlinear models under various error distributional assumptions. PLoS ONE 15(10):
e0239864.
https://doi.org/10.1371/journal.pone.0239864

**Editor: **Ping He,
Jinan University, China, HONG KONG

**Received: **April 15, 2020; **Accepted: **September 14, 2020; **Published: ** October 5, 2020

**Copyright: ** © 2020 Chen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **They are no real data involved because this work describes how to collect data to identify the right statistical models in the most efficient manner under various criteria and assumptions. However codes are available to reproduce the results in the paper.

**Funding: **Authors P-YC, R-BC, and WKW received partial support for this study in the form of a grant from the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R01GM107639. R-BC is also partially supported by the Mathematics Division of the National Center for Theoretical Sciences in Taiwan. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Much of the work in optimal design of experiments assumes a known parametric model, apart from the unknown model parameters and the objective is to develop a plan to collect data judiciously for accurate statistical inference. For example, one may wish to design a study to estimate parameters in a nonlinear regression model. In practice, the model is rarely known with certainty and it is likely that there are a few plausible models. Optimal design problems concern identifying the best design, i.e. how to collect data to judiciously select the right model among the plausible models. When there are 2 models and errors are normally distributed and one of the 2 models is fully known, [1] introduced *T*-optimality as a design discrimination criterion based on the squared difference between the 2 mean predictions. [2] reviewed optimal discriminating design problems and since then, locally *T*-optimal designs have been applied and studied in various setups, see for example, [3–8] and [9]. When the outcomes are binary [10] or model errors are not normally distributed, [11] proposed *KL*-optimality criterion based on the Kullback-Leibler (*KL*) divergence as the distance measure between the 2 competing models.

Analytical descriptions of optimal discriminating designs rarely exist unless there are simple settings, such as when we want to find an optimal design to discriminate between a constant model and a quadratic model, and both models have homoscedastic errors [1]. When there are multiple models to discriminate, [12] proposed a Fedorov-Wynn type algorithm to find a *T*-optimal design and the convergence of such an algorithm to the optimal discriminating design was established recently under some restrictive conditions [13]. Over time, there were several modifications of the algorithm to find various optimal designs, including [11], who amended it to find *KL*-optimal designs.

Algorithms are a practical way to find optimal discriminating designs. Recently, nature-inspired metaheuristic algorithms have been repeatedly shown to be fast, flexible and efficient for solving hard and high dimensional optimization problems in engineering and computer science. 2 such algorithms are differential evolutionary (DE) algorithm proposed by [14] and particle swarm optimization (PSO) proposed by [15]. [16] was the first to show that PSO outperformed traditional algorithms in statistics for finding a variety of optimal designs. Maximin design problems are much harder problems to solve because the design criterion is non-differentiable and require multiple nested optimization. [17] developed hybridized PSO-based algorithms to solve more complicated optimal design problems such as the standardized maximin optimal criteria, which includes the simpler minimax design problems. Most recently, [18] applied DE to find optimal approximate designs for logistic models with up to 5 factors with all pairwise interaction terms. The number of variables to optimize for such a model is at least 95 if the optimal design is minimally supported; otherwise, there will be many more variables to optimize. For example, if the optimal design has 30 support points, there are 179 variables to optimize.

Our goal is to develop flexible and effective algorithms to solve a broad class of optimal discriminating design problems when there are 2 or more nonlinear models and errors may or may not be normally distributed. Unlike the traditional setup in optimal discriminating design problems, we may not require the null model be fully specified. The work is novel because we apply PSO-based algorithms to solve a broad class of optimal discrimination design problems, including those that require solving 4-level nested optimization problems. Further, we demonstrate that they are more effective than traditional algorithms for finding optimal discriminating designs and also capable of finding optimal discriminating designs that require 4 levels of nested optimization. Commercial statistical software packages do not have programs for finding optimal discriminating designs and there is only one *R* package for searching specific types of optimal discriminating designs. We develop PSO-based R codes and the reader can freely use them to replicate results in this paper and amend them to solve their optimal discriminating design problems.

Section 2 reviews background, optimal discriminating criteria and search algorithms. In Section 3, we propose 2 algorithms based on PSO to find the optimal discriminating design when there are 2 or more competing nonlinear models with normal or non-normal errors. We also evaluate the performances of the proposed algorithms using several examples. In Section 4, we apply them to construct an optimal design to discriminate among 5 nonlinear models for a toxicology study. Section 5 further demonstrates flexibility and ability of the proposed algorithms to find optimal discriminating designs with singular information matrices and find a robust discrimination design proposed in [9] where the problem has 4 layers of optimization. In Section 6, we compare efficiencies of the proposed algorithms with a few other algorithms and describe a software package that we have developed for finding a user-selected optimal discriminating design. In addition, we compare the performance of the proposed algorithms with a recent R-package that finds optimal discriminating designs. The last section reinforces the importance and ubiquity of optimal discriminating design problems and contains a summary. The appendix further compares results from both current and the proposed algorithms and the supplementary material contains our R-codes.

## Background

Let *y* be the univariate response variable and let *f*(*y* ∣ *x*, *θ*, *σ*^{2}) be its probability distribution function. The mean response is *η*(*x*, *θ*), where *x* is an independent variable from a known compact design space , *θ* is an unknown parameter vector and *σ*^{2} is the variance of *y*, which we may treat as a nuisance parameter. Suppose that there are *K* models with different underlying probability distributions, , where for some known positive integers *m*_{i}, *i* = 1, …, *K*. Here Θ_{i} is the user-selected parameter space for the parameters in the *i*^{th} model and a compact subspace of the *m*_{i}-dimensional Euclidean space .

Approximate designs were proposed by [19] and they are probability measures defined on . If an approximate design *ξ* has support at and *p*_{i} is its weight at the *i*^{th} support point *s*_{i}, we denote it by *ξ* = {*s*_{1}, *s*_{2}, …, *s*_{n};*p*_{1}, *p*_{2}, …, *p*_{n}} with ∑_{i} *p*_{i} = 1. If the total budget allows for taking a total of *N* observations for the study, the approximate design *ξ* takes roughly *Np*_{i} observations at the *i*^{th} support point of *ξ* subject to each *Np*_{i} is an integer and *Np*_{1} + … + *Np*_{n} = *N*. When the design criterion is convex (or concave), there are algorithms for finding optimal approximate designs and we can use an equivalence theorem to confirm optimality of a design, including an efficiency lower bound to assess its proximity to the optimum, without knowing the optimum.

*T*- and *KL*-optimal design criteria

Suppose we have 2 homoscedastic Gaussian models with common variance *σ*^{2} and different mean functions, *η*_{1}(*x*, *θ*_{1}) and *η*_{2}(*x*, *θ*_{2}) respectively. Additionally, suppose *η*_{tr}(*x*) = *η*_{1}(*x*, *θ*_{tr}) is the assumed true model with pre-specified parameter vector *θ*_{tr}. To discriminate *η*_{tr} from *η*_{2}(*x*, *θ*_{2}), [1] proposed the *T*-optimal criterion,
(1)
where Δ_{2,tr}(*x*, *θ*_{2}) = [*η*_{tr}(*x*) − *η*_{2}(*x*, *θ*_{2})]^{2} is the *L*_{2}-distance between the the mean responses from the 2 models and Θ_{2} is a user-specified set. A design is *T*-optimal if it maximizes (1) over Ξ, the set of all designs on . Because the criterion is concave, optimality of can be checked using an equivalence theorem based on the directional derivative of the criterion evaluated at the optimum [1]: the design is *T*-optimal if and only if
(2)
for all , with equality at the support points of and is the parameter in Θ_{2} that minimizes .

When models do not have homoscedastic or normally distributed errors, [11] proposed the *KL*-optimal criterion to discriminate between them. Suppose and are the probability density functions of the 2 competing models and is the true model with a pre-specified *θ*_{tr}. To measure the difference between the 2 competing models, the criterion uses the Kullback-Leibler (*KL*) divergence given by
(3)
The *KL*-optimal criterion of a design *ξ* is the minimal value of over *θ*_{2} ∈ Θ_{2}, after the quantity is averaged out with respect to the design *ξ*. We denote this value by
(4)
and the design that maximizes *I*_{2,tr}(*ξ*) among Ξ is the *KL*-optimal for discriminating between *f*_{tr} and *f*_{2}. For simplicity, we also reference the assumed known mean response from the true model *f*_{tr} by “*tr*” and represent *f*_{2} by “2” when convenient, as the subscript of *I*_{2,tr} in (4). Clearly, *T*-optimality is a special case of the *KL*-optimal criterion when errors are homoscedastic and normally distributed. [11] showed that the design is *KL*-optimal if and only if , the directional derivative of the criterion in the direction of the degenerate design at *x* evaluated at satisfies
(5)
for all with equality at the support points of . Here is the *θ*_{2} value in (4) that minimizes the *KL* divergence when .

Most algorithms for finding optimal discriminating designs are based on Fedorov-Wynn type of algorithms and they work well for discriminating between 2 liner models. When there are several nonlinear models, [20] proposed using a weighted sum of the *T* (or *KL*)-optimal criteria values for discriminating between each pair of models in the class along with a Newton-type algorithm to enhance the search. A potential issue with this approach is that the choice of the weights can be problematic and an improper choice may result in a design having low efficiencies for discriminating between some of the pairs. [21] also proposed max-min optimal discriminating designs for discriminating among 4 logistic models with various predictor functions. By working with 2 models at a time, she modified the algorithm proposed in [20] to maximize the minimum efficiencies across all pairs among all models using a grid of weights. The algorithm took 2,400 seconds to find the maximin *KL*-optimal design.

[6] used nonlinear approximation theory to find *T*-optimal designs and characterized them by considering the maximal absolute difference and not the squared difference between the means of the 2 models. They found that the number of support points could be determined by counting the number of sign changes in the differences between the mean responses over the design space. By taking the absolute value of this difference, they treated the *T*-optimal design problem as a uniform approximation problem and identified those support points in advance. They then calculated the weights for the resulting support points based on the equivalence theorem. To identify the support points, they used Remes algorithm [22], which is motivated from uniform approximation theory. Based on the sign-changing positions in the difference function, this algorithm alternates the support points iteratively by allocating each one between 2 sign-changing positions. The algorithm stops when all absolute values of the difference at the support points are about the same. The success of their approach depends on the performance of the Remes algorithm, which we will later discuss, including how this and Tommasiś algorithm perform relative to the proposed algorithms.

### Maximin *T*- and *KL*-optimal design criteria

In this subsection, we consider the case when there are 3 or more competing models to discriminate. We present discussion for finding maximin KL-optimal designs, with the understanding that when Gaussian models with homoscedastic errors are assumed, the design maximizing (7) below is the max-min *T*-optimal design. [20] and [21] studied the *KL*-optimal discriminating design problems using relative design efficiencies. Without loss of generality, we assume the first model is the true model, *f*_{tr} = *f*_{1}, and describe their 2-step approach. First, we identify the *KL*-optimal designs, , *i* = 2, …, *K*, for discriminating between the *i*^{th} rival model and the true model *f*_{tr}. Given a design *ξ*, the *KL*-efficiency of *ξ* relative to the *KL*-optimal design is defined by
(6)
where *I*_{i,tr}(*ξ*) is given in (4). The optimal discrimination design maximizes the *KL*-efficiencies for all *i*. Therefore, one may find the optimal discriminating design by treating the problem as a multiple objective optimization problem. [20] assumed a pre-specified weight vector, *α* = (*α*_{2}, …, *α*_{K}) satisfying 0 ≤ *α*_{i} ≤ 1 with is available and proposed finding generalized *KL*-optimal designs that maximize the weighted sum of the *KL*-efficiencies and the vector of weights is *α*. The *i*^{th} component in *α* represents the relative importance of identifying the correct model from the *i*^{th} rival pair of models. If it is problematic to specify *α*, an alternative is to consider the worst possible *KL*-efficiencies [21] and find a design that maximizes the minimal *KL*-efficiency among Eff_{i}(*ξ*), *i* = 2, …, *K*, i.e. we want a max-min *KL*-optimal design in Ξ that maximizes
(7)
This criterion is concave and we note that the subset comprising the indices of the closest rival model to the true model satisfies:
(8)
[21] showed that there is a weight vector that satisfies
(9)
such that is the max-min *KL*-optimal design if and only if it is also a generalized *KL*-optimal design with weight vector . The equivalence theorem then states that the design is a generalized *KL*-optimal design if and only if
(10)
for all with equality at all the support points of and

To find the maxi-min *KL*-optimal design, [21] proposed a search algorithm based on the equivalence theorems for the max-min *KL*-optimal design and the generalized *KL*-optimal design. The algorithm first searches for a special *α* vector in (9) so that the generalized *KL*-optimal design corresponds to the sought max-min *KL*-optimal design. [21] implemented MATHEMATICA codes for the iterative search in a laptop with 2.3 GHz CPU and 4Gb RAM and reported in Section 4 of her paper that the CPU time required to generate the optimal design for discriminating among 4 nonlinear models was about 2,400 seconds, which is expensive. This motivates us to propose an algorithm that avoids the high computational burden for finding the right *α* vector by searching over a set of user-selected grid points. Our new algorithm uses a metaheuristic algorithm and directly optimizes the max-min *KL*-optimal criterion using a single optimization procedure.

## Hybrid algorithms for finding optimal discriminating designs

Hybridization of 2 or more ways of numerical searches is increasingly common in algorithmic development. The idea is to take advantages of the strengths in the selected algorithms and combine them to solve the optimization problem more effectively than either of the algorithms can. For instance, some algorithms are more effective at determining where the optimum is roughly located (i.e. exploration) and others are more effective at determining the optimum precisely and quickly once it is in its vicinity (i.e. exploitation). The literature is replete with hybrid algorithms and the questions are which is the most appropriate algorithm to hybridize and how to do so.

We now propose hybrid algorithms to find different types of optimal discriminating designs and show that they are generally more effective than current algorithms. The recent successes of using PSO to solve a variety of optimal design problems [16, 17] motivated us to hybridize PSO with another algorithm to find optimal discriminating designs more effectively. After a brief review of PSO, we show how PSO can be hybridized to solve various types of optimal discriminating design problems. These are more challenging design problems than those tackled earlier and as an example, we also apply PSO to solve a complex problem that requires 4 levels of nested optimization.

### Particle swarm optimization

Particle swarm optimization (PSO) is a metaheuristic optimization method proposed by [15]. This nature-inspired algorithm simulates how the birds fly in a coordinated way to look for the optimum, which is where the food is on the ground. Throughout the birds communicate and adjust their velocities and positions iteratively until convergence or the algorithm is terminated by a user-specified stopping rule.

We initiate PSO by generating a flock of *N* birds (particles) randomly in the given design space. Each particle is a design *ξ* and we represent it by a vector (*s*_{1}, …, *s*_{n}, *p*_{1}, …, *p*_{n−1})^{⊤}, since . Let be the *i*^{th} particle at the *t*^{th} iteration. PSO has 2 defining concepts: local best and global best. The design with the maximal design criterion value discovered by the *i*^{th} particle before the *t*^{th} iteration is the local best for the *i*^{th} particle and we denote it by . The global best design is the one found by the whole swarm before the *t*^{th} iteration and we denote it by . The velocity of the *i*^{th} particle at the *t*^{th} iteration is and each particle updates its velocity and position iteratively as follows:
(11)
and
(12)
Here *R*_{1} and *R*_{2} are 2 independent random vectors whose components are independently drawn from a uniform variate on [0, 1] and the notation ⊗ indicates component-wise product. As with all metaheuristic algorithms, there are tuning parameters. The inertia weight, *ω*^{(t)}, represents how active the particles are and it is chosen to be a linearly decreasing sequence from 0.95 to 0.2 over the first 80% iterations and fixed at 0.2 for the remaining 20% of the iterations. [15] proposed the parameters *c*_{1} and *c*_{2} have default values equal to 2 and these choices have been consistently reported to work well in the literature, including [16], who applied PSO to find different types of optimal designs for several biomedical models. [23] provides more details on PSO.

The choice of the initial flock size *N* is quite arbitrary and likely depends on the size and complexity of the optimization problem. All designs in the flock must have the same number of support points which is usually chosen to be the number of parameters in the mean function, or larger. The typical stopping criterion of PSO is a pre-specified number of the maximum iterations allowed or CPU time or number of function evaluations. Because PSO is quite fast for moderate sized problems and typically converges in a few seconds of CPU time, we can allow a large maximum number of iterations or function evaluations. This also suggests the choice value of *N* is likely not very important because if the algorithm does not find the optimum, the algorithm can be quickly rerun using another value of *N*. The algorithm PSO can also be terminated when the generated design *ξ*_{g} satisfies the equivalence theorem up to a user-specified tolerance or meets the user-specified efficiency lower bound requirement. Algorithm 1 summarizes the basic PSO algorithm.

**Algorithm 1** PSO for finding optimal designs

1: Define the design criterion function Φ(*ξ*, *θ*), e.g. (1), and **Input** the following: the swarm size *N*, along with values for the tuning parameters (other than the default values)

- (1.1). Generate
*initial particles (designs)*and velocities ,*i*= 1, …,*N*. - (1.2). Calculate
*design criterion values*for each*i*. - (1.3). Initialize
*the local and global best designs*, and .

2: At the *t*^{th} iteration, **do**

- (2.1). Calculate
*particles’ velocities**by*(11). - (2.2). Update
*particles**by*(12). - (2.3). Calculate
*design criterion values*. - (2.4). Update
*the local best designs*. - (2.5). Update
*the global best design*.

3: **Output** the final global best design *ξ*_{g} and Φ(*ξ*_{g}, *θ*).

### PSO-QN algorithm for finding an optimal design for discriminating between 2 competing models

We now extend PSO to find *T*- and *KL*-optimal designs when there are 2 competing models. As an illustration, we describe the search for a *T*-optimal design. Given the design space , the assumed true model *η*_{tr}(*x*) and the alternative mean function *η*_{2}(*x*, *θ*_{2}), our objective is to find a design that satisfies
(13)
To find *KL*-optimal designs, we replace the inner objective function in (13) by (4).

There are 2 layers of optimization in this maximin problem with outer and inner optimization problems. To tackle a similar maximin optimization problem, [17] showed their Nested-PSO algorithm was successful in finding different types of maximin optimal designs. The Nested-PSO algorithm utilizes another PSO in Step (2.3) of Algorithm 1 to obtain the fitness value for the outer problem. However, a direct application of the Nested-PSO algorithm to find optimal discrimination designs is computationally demanding and our first proposed algorithm reduces the computational burden by incorporating properties of the optimal discriminating design criteria.

Specifically, we note that the inner objective function in (13) is differentiable with respect to the parameter vector, *θ*_{2} and this implies that we can use derivative-based optimization algorithms, such as Newton’s method to obtain the optimization values instead of PSO. We used the limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) algorithm which is an extension of Newton’s method and widely available, like in a R package lbfgs [24] or as a MATLAB function fminunc. We also compare its performances and PSO algorithms for solving the inner objective function in (13).

In summary, the proposed algorithm uses PSO to solve the outer problem in (13) with a non-differentiable objective function. Its value found from Step (2.3) in Algorithm 1 is obtained by the L-BFGS algorithm. We call this proposed search strategy the PSO-QN algorithm. Our experience is that the L-BFGS algorithm may fail to work if an improper initial point of *θ*_{2} is chosen. We suggest that when this happens, we randomly choose another initial point and rerun L-BFGS.

We applied PSO-QN algorithm to find an optimal discrimination design when we have 2 rival pharmacokinetic models considered in [11]. The design found by our PSO-QN algorithm is similar to their *KL*-optimal designs; details are in Section A.1 of the appendix.

### PSO-S-QN algorithm for finding a maximin optimal design for discriminating among 3 or more models

This sub-section discuses how we used PSO ideas find the maximin *KL*- (or T-)optimal design, . Let the reference model *f*_{tr} be *f*_{1} and let solve the following nested optimization problem:
where *I*_{j,tr}(*ξ*), *j* = 2, …, *K*, is defined in (4).

To find , we apply the PSO-QN algorithm *K* − 1 times to identify the *KL*-optimal designs, for each *j* = 2, …, *K*. These optimal designs are then incorporated into the max-min *KL*-optimal criterion *I*_{m}(*ξ*) before we solve the 3-layer optimization problem. To solve this optimization problem, we propose modifying Step (2.3) in Algorithm 1 in 2 ways at the *t*^{th} iteration:

- (2.3a). For the
*i*^{th}particle, , use L-BFGS algorithm to compute for*j*= 2, …,*K*. - (2.3b). Calculate the design criterion value (14)

We call this modified algorithm PSO-S-QN which also works for finding max-min *T*-optimal designs after replacing the objective function by the *T*-optimality criterion. We note that “S” in PSO-S-QN stands for“screening” because we need to find the minimal one among all the *K*−1 models and the letter “QN” stands for quasi-Newton.

To show the PSO-S-QN-generated design, *ξ*_{mmKL}, is max-min *KL*-optimal, we first identify the model index set satisfying (8) where the corresponding efficiency values are minimum among all competing pairs. We then implement a basic PSO algorithm to find the weight vector in (9) by minimizing
over at the support points of *ξ*_{mmKL} subject to the constraints in (9).

In Sections A.2 and A.3 of the Appendix, we re-visit a couple of max-min optimal design problems for discriminating 3 and 4 models in the literature and demonstrate that the PSO-S-QN algorithms are able to find the same optimal designs or designs that are very close to the reported optimum.

## Application to toxicological experiments

We now apply the PSO-QN and PSO-S-QN algorithms to find an optimal design to discriminate among 5 models in a toxicological study. [25] proposed 5 dose-response models which they found adequate for modelling a continuous endpoint in toxicology. The mean responses from these models are
(15)
(16)
(17)
(18)
(19)
All errors are assumed to be independent with mean 0 and homoscedastic, and the design space is user-specified. [25] were interested in how exposure to butyl benzyl phthalate (BBP) in maternal animals during gestation affects the fetal weights. Their study design had eight dose groups with BBP dosages at 0, 270, 350, 450, 580, 750, 970 and 1250 mg/kg body weight/day and 10 female pregnant rats were assigned to each dose. We denote their design by *ξ*_{P2000} on the dose interval .

[26] used the same study design *ξ*_{P2000} to illustrate the model selection procedure from this class of models and concluded that model (19) accurately describes the data and the estimated parameters were . To fix ideas, we assume the largest model (19) is the true model with nominal values given by . Here, we address design issues, i.e. how to judiciously collect observations early on to have a good sense which one of the models is likely be the true model? To this end, we apply the PSO-QN and PSO-S-QN algorithms to search for an optimal design to discriminate among the models, (15) to (19) when errors are normally distributed and when errors are lognormally distributed.

When errors are normally distributed, we use the PSO-QN algorithm to identify all the *T*-optimal designs for discriminating between model (19) and each of the rival models, (15), (16), (17) and (18). The left panel of Table 1 shows the *T*-optimal designs. We then applied PSO-S-QN algorithm to find the max-min *T*-optimal design for discriminating among the models (15)–(19). The max-min *T*-optimal design is *ξ*_{mmT} = {0.000, 433.345, 1027.333, 1250.000; 0.214, 0.338, 0.249, 0.200} and its *T*-efficiencies relative to each *T*-optimal design are all 77.47%. This implies that . To show that the PSO-S-QN-generated design *ξ*_{mmT} is max-min *T*-optimal, we calculated the vector in (9) to be . Fig 1(a) shows the graph of *ψ*_{mmKL} on the left-hand-side of (10) and confirms the max-min *T*-optimality of the generated design.

The figures confirm the max-min *T*-optimality and the max-min *KL*-optimality of *ξ*_{mmT} and *ξ*_{mmKL}, respectively.

When errors are lognormally distributed and the nuisance parameters have a constant coefficient of variation as described in Section A.1 of the Appendix, we follow a similar procedure to find the max-min *KL*-optimal design. Table 1 displays *KL*-optimal designs for pairwise discrimination on the right panel and we observe that they are similar in structure to the *T*-optimal designs. Interestingly, regardless whether the errors are normally distributed or not, the maximum dose of the optimal designs for discriminating between models (19) and (15) and between models (19) and (17) is the largest possible dose allowed, whereas for the other 2 cases, the largest dose in the optimal designs is about the same and equal to about 1064.5. The max-min *KL*-optimal design found by PSO-S-QN algorithm is *ξ*_{mmKL} = {0.000, 451.530, 1043.591, 1250.000; 0.223, 0.342, 0.248, 0.188} and its *KL*-efficiencies relative to each of the *KL*-optimal designs are all equal to 76.78%. A direct calculation shows the vector in (9) is (0.504, 0.001, 0.145, 0.350) and the plot in Fig 1(b) confirms its optimality by (10). Our conclusion is that the PSO-S-QN algorithm generated design *ξ*_{mmKL} is max-min *KL*-optimal.

We now compare our optimal designs with the design *ξ*_{P2000} with eight doses in [25]. Table 2 shows the *T*- and *KL*-efficiencies for our max-min discrimination designs, *ξ*_{mmT} and *ξ*_{mmKL}, and *ξ*_{P2000}. The notation *T*−Eff_{j} is the *T*-efficiency of a design relative to the *T*-optimal design for discriminating between models with mean responses *υ*_{5} and *υ*_{j}, *j* = 1, 2, 3, 4; similarly, *KL*−Eff_{j} is the corresponding *KL*-efficiency. On the left panel of Table 2, the competing models have normally distributed errors and *ξ*_{mmT} is the best design because its maximized minimal value of *T*-efficiency is 77.47%. If one uses *ξ*_{mmKL} as the design to discriminate models with normally distributed data, its *T*-efficiency is at least 71.45%. In contrast, the design *ξ*_{P2000} has less than 60% *T*-efficiency for discriminating any of the other models with model (19). When the pharmacokinetic data is lognormally distributed, the right panel of Table 2 shows the *KL*-efficiency of each design. The performances of the various designs are similar except for the design *ξ*_{P2000} which has poor minimal *KL*-efficiency relative to the max-min *T*-optimal design, *ξ*_{mmT}, which has at least 73.02% *KL*-efficiency. These findings suggest that care must be exercised to implement a design to discriminate among a class of models. For this application, it appears that the performances of the various optimal discriminating designs are not much affected whether the errors are normally distributed or not.

## Further examples

We now further demonstrate that the proposed algorithms are flexible and are also able to (i) generate singular optimal discriminating designs, (ii) discriminate models when there are constraints on the model parameters, and (iii) solve discrimination optimal design problem that requires 4 layers of nested optimization over different spaces. For (i), we use an example from [6] and for (ii) we use an example from [1]. [9] proposed robust discrimination designs when there is uncertainty in both the models and their model parameters and we show our algorithms are also able to solve the 4-layer nested optimization problem and produce the same designs as they did analytically.

### Optimal design with singular information matrix

The problem of finding an optimal design to discriminate between a cubic polynomial model and a linear model defined on [−1, 1] was considered in [6]. The mean responses from the 2 models are
and
and the vector of nominal values of the parameters in *η*_{1} is *γ* = (1, 1, 0, 1).

A direct application of the PSO-QN algorithm shows that the *T*-optimal design for this example is . This design has 3 unequally supported points and is singular. Its *T*-optimality is confirmed by its directional derivative function (2) plot on the left panel of Fig 2. Clearly, a drawback of this optimal discriminating design is that it cannot be used to estimate the 4 parameters in *η*_{1}. The PSO-QN algorithm first searches for the best 4-point design, which is . Its *T*-optimality is confirmed by the directional derivative plot (2) on the right panel of Fig 2. This design has 4 points and so it can estimate all the parameters in *η*_{1}. Both these designs agree with the designs reported in [6], who also showed that such optimal designs are not unique.

### A larger rival model with a constraint on the model parameters

The design problem to discriminate between 2 models, where the hypothesized true model *η*_{1} is simpler in structure than the alternative model *η*_{2} with a constraint on its model parameters was considered in [1]. The 2 models are defined on [−1, 1] and their mean responses are
and
where .

We first transform the constraint in *η*_{2} to box-type constraint by letting *θ*_{1} = *r* cos *ϕ* and *θ*_{2} = *r* sin *ϕ* where *r* ∈ [1, ∞) and *ϕ* ∈ [0, 2*π*]. the PSO-QN algorithm generated the *T*-optimal design *ξ** = {−1.000, 0.000, 1.000;0.25, 0.50, 0.25}, which coincides with the *T*-optimal design found in [1]. Fig 3 displays the plot of the directional derivative of the T-optimality criterion evaluated at *ξ** and confirms its optimality.

### Standardized maximin *T*-optimal design

To find *T*- and *KL*-optimal designs, we need to pre-specify the true model with assumed parameter values. However, mis-specified model parameter values can lead to a much less efficient discrimination design. To overcome the mis-specification problem, a robust *T*-optimal criterion was proposed by [9]. Let Θ_{tr} be a user-selected set containing plausible true values of the model parameters. The strategy is to find a design which is robust to mis-specification of the nominal values of Θ_{tr}. If *θ*_{tr} is the vector of model parameters in the true model, the *T*-efficiency of a design *ξ* is
and is the locally *T*-optimal design when the true model has parameter *θ*_{tr}, i.e. . [9] proposed finding a standardized maximin *T*-optimal design, , that maximizes the minimal *T*-efficiency, i.e.
(20)

To tackle this 4-layer optimization problem, we propose the Nested-PSO-QN algorithm that combines the Nested-PSO in [17] and the PSO-QN algorithm. The outer loop of the Nested-PSO-QN maximizes the minimal *T*-efficiency across the design space and this minimal *T*-efficiency is obtained by searching the interior of the parameter space,Θ_{tr} in the inner loop. In calculating the *T*-efficiency, we note that the term in the numerator is differentiable and so we used the L-BFGS algorithm to optimize it. The denominator in the *T*-efficiency formula is a locally *T*-optimal design problem, and we had solved it using the PSO-QN algorithm. In the event that the locally *T*-optimal design can be analytically described, the speed of the Nested-PSO-QN algorithm could be accelerated and computation time will be greatly reduced. Below is an example where we used 64 particles and 200 iterations for the outer Nested-PSO-QN loop computation using 64 particles and 50 iterations for the inner Nested-PSO-QN loop computation.

**Example in Dette et al. [9]**. Consider 2 homoscedastic polynomial models defined on *x* ∈ [−1, 1] with normally distributed errors and the mean responses are
(21)
(22)
Here the larger of the 2 nested models is the true model, i.e. *η*_{tr} = *η*_{1}. If *θ*_{tr} = *β*_{m−1}/*β*_{m} and *θ*_{tr} ∈ Θ_{tr}, [9] showed that the problem of finding a standardized maximin *T*-optimal design to discriminate between (21) and (22) is equivalent to that for discriminating between the 2 models with means given by

Suppose the parameter of *η*_{tr} is known to be in the interval Θ_{tr} = [−1, 1] and we use the Nested-PSO-QN algorithm to find a standardized maximin *T*-optimal design. In this example, the locally *T*-optimal design has a closed-form solution for each *θ*_{tr} ∈ Θ_{tr} [8] and so we were able to accelerate the Nested-PSO-QN algorithm by incorporating the information into the denominator of the *T*-efficiency formula without using the PSO-QN algorithm. We ran the Nested-PSO-QN algorithm for 2 cases when *m* = 2 and *m* = 3 in the above problem. Table 3 displays the standardized maximin designs (*ξ*_{rbstT}), along with the optimal designs *ξ*_{DMS2013} found by [9], who used a special algorithm to convert the problem to one of finding the root for a Chebyshev’s polynomial. The table also displays the various *T*-efficiencies and shows our algorithms were able to produce optimal designs similar to those in [9].

## Implementation, computational efficiency of proposed algorithms and an online tool for finding optimal discriminating designs

We now discuss (i), performances of our algorithms PSO-QN, PSO-S-QN and Nested-PSO-QN relative to other algorithms, (ii) our package for generating a tailor-made optimal discriminating design, and (iii) how to implement our algorithms using C++ codes in a Rcpp package in R [27]. All computations were done on the Linux server with Intel Xeon CPU E5-2620 2.0 GHz and 64GB RAM. In addition, we compare the performance with a R-package which contains 2 functions for the *T*-optimal design and the *KL*-optimal designs.

### Runtime

Table 4 shows the CPU times of one run for all cases investigated in this paper. The computing time for the PSO-QN algorithm depends on the complexity of the model structure. For example, it took only 7 seconds of CPU time to find the optimal design for discriminating between model (19) versus a constant rival model (15). When the rival model is a more complicated model, like model (17), the algorithm took 90 seconds to find the optimal design for discriminating between models (19) and (17).

We expect the PSO-S-QN algorithm requires more time to search for the max-min optimal discriminating designs because we have a 3-layer optimization problem. The total computing time for finding such an optimal discrimination design should becomes noticeably longer when we include time for finding the optimal designs for all the pairwise optimal discriminating design problems. For example, consider the problem, where there are 5 competing models and the PSO-S-QN algorithm was applied to find a max-min *T*-optimal design. We first applied the PSO-QN algorithm to find *T*-optimal designs for discriminating between the assumed true model and each of the rival models (15), (16), (17) and (18). The computing time for searching the *T*-optimal design for each of these 4 2-model discrimination problems was 7.12, 29.02, 90.94 and 73.28 seconds, respectively. We then ran the PSO-S-QN algorithm and it took 399.12 seconds to find the max-min *T*-optimal design. The total computing time for finding the max-min discrimination design is the sum of these computing time which equals 599.48 seconds.

For the standardized maximin *T*-optimal design problems, the Nested-PSO-QN algorithm required 1057.83 and 8344.12 seconds for solving the same problems just discussed when *m* = 2 and *m* = 3, respectively. The computational time for each problem is unsurprisingly long because we were trying to solve 4-layer optimization problems.

### Efficiency of the PSO-QN algorithm

This subsection compares the performance of the PSO-QN algorithm with some well-known algorithms for finding optimal discriminating designs. For *T*-optimal design problems, we consider 2 algorithms, the Fedorov-Wynn algorithm in [1] and the Remes algorithm in [6]. For *KL*-optimal design problems, we consider the Fedorov-Wynn algorithm and also the Nested-PSO algorithm proposed in [17] for solving *T*- and *KL*-optimal design problems.

We used 32 particles and 200 iterations for the PSO-QN and Nested-PSO algorithms and 32 particles and 100 iterations in the inner loop of the Nested-PSO algorithm to minimize the squared difference between the 2 means from the 2 models over the parameter space. For the Fedorov-Wynn type algorithm, we started with a random initial design and pruned the design every 3 iterations during the 200 iterations. For the Remes algorithm, the initial support points were randomly chosen before we ran it for 200 iterations. We implemented them using Rcpp package in R [27] and ran them repeatedly for 50 times by randomly selecting the initial status of the different approaches and computed the efficiencies of the resulting designs relative to the optimal designs.

Table 5 shows the performances of the 4 algorithms for finding the *T*-optimal designs for the toxicological Experiments and Section A.2 of the Appendix. The results are based on 50 replications and show the range of *T*-efficiency values of the generated designs by different algorithms and the frequencies of their success in finding a design with at least 90% *T*-efficiency. We also report average computing time for each algorithm.

Our overall numerical results show that PSO-QN algorithm outperforms the other 3 algorithms in 5 out of 6 cases in terms of frequency for finding optimal designs. For example, to discriminate between toxicological models (19) and (17), PSO-QN algorithm can find the *T*-optimal design while the rest of the 3 algorithms cannot. For the case of discriminating models (19) and (18), PSO-QN algorithm finds designs with at least 90% *T*-efficiency in all 50 replications and 49 out of them are *T*-optimal. For the same case, Nested-PSO algorithm finds the *T*-optimal design for 26 times; Fedorov-Wynn algorithm and Remes algorithms perform the worst due to low frequency in identifying the optimal design. Only when a simple competing model like model (15) is involved, all algorithms performs similarly. In terms of computational cost, Fedorov-Wynn algorithm and Remes algorithm require shorter computing time than PSO-based algorithms because they start with a single initial design. However, with the same 32 initial designs, PSO-QN algorithm is faster and more efficient than Nested-PSO algorithm. This shows the need for having a specialized algorithm for optimal discrimination design problems.

The Nested-PSO algorithm may not converge when it searches in the inner loop of PSO. One may wonder whether the performance of Nested-PSO depends on the accuracy sought for the optimal solution in the inner loop. Our experience is that the L-BFSG algorithm as the inner loop solver in our PSO-QN algorithm tends to work better. For example, consider the case of discriminating between models (A3) and (A4) in Section A.2 of the appendix. We calculated the inner optimization problem in (1) at the *T*-optimal design, *T*_{2,tr}(*ξ*_{T,2}), by L-BFGS and PSO algorithms. To have a fair comparison, we terminate both algorithms when the stopping criterion, |*g*(*t* − 1) − *g*(*t*)|/*g*(*t*) < 10^{−6}, is achieved and *g*(*t*) is the value of the objective function at the *t*^{th} iteration.

We use 4 different swarm sizes in PSO and there are 32, 64, 128 and 256 particles. We ran both algorithms 100 times, each time with a randomly chosen initial value of *θ*_{2}, and report the mean value of *T*_{2,tr}(*ξ*_{T,2}) in Table 6.

Our results suggest that with more particles, PSO is more likely to find the value of *T*_{2,tr}(*ξ*_{T,2}). This can be seen from Table 6 that shows the standard deviations of the minimal values decreases as the swarm size increases. However, L-BFGS algorithm finds the minimal value, which is smaller than those found by PSO using different swarm sizes. Table 6 also reports the average computing time required for convergence and suggests that L-BFGS algorithm is also faster than PSO. This is a reason that encourages us to use the L-BFGS algorithm to solve the inner optimization problem in (13).

Lastly we compare the performances of the various algorithms for finding *KL*-optimal designs. The Remes algorithm in [6] is not included because we cannot find the details on how to modify the Remes algorithm to find *KL*-optimal designs in their paper. Table 7 shows performances of PSO-QN, Nested-PSO and Fedorov-Wynn algorithms. The results are similar to the previous discussion and suggests that the proposed PSO-QN algorithm is more effective for finding *KL*-optimal discrimination designs since it has the highest frequency for identifying the *KL*-optimal designs in all cases. Fedorov-Wynn algorithm seems adequate for finding highly efficient designs under the *KL*-optimality criterion but seems to have trouble finding the optimal designs. Nested-PSO requires more computing time to find the optimum and its overall performance is not as good as that from PSO-QN.

### Comparison with a R-package

It is instructive to compare performance of the proposed algorithm with other algorithms coded in R for compatibility. After an extensive search, we were only able to find an appropriate R package called *rodd* for comparison. The R package was published in 2016 and it generates locally and Bayesian optimal discriminating designs [28]. In the *rodd* package, the function, tpopt, is for constructing *T*-optimal designs and the function, KLopt.lnorm, is for finding *KL*-optimal designs with lognormal errors. These 2 functions were coded based on the algorithms in [29] and [30], respectively. After an initial design is provided, the 2 functions search for an optimal discriminating design using 2 common steps. The first common step is to update the candidate set of the support points by combining the current support points and points that locally maximize Φ_{T} or Φ_{KL}. The second common step determines the weights of the candidate support points by maximizing the *T*- or *KL*-criterion directly, and support points with extremely small weights are removed. To speed up the optimization process, a quadratic programming method was proposed and [29, 30] showed that these functions were able to find the optimal discriminating designs after a few iterations.

We report the performances of these 2 functions for searching *T*-optimal designs and *KL*-optimal design for the 5 models, (15)–(19), in the toxicological experiment and errors are lognormally distributed when we consider the *KL*-optimal criterion. We assume model (19) is the true model, as was the case in the earlier comparison section. The tuning parameters in the 2 functions are the same as the default settings in the package. For each function, we ran the algorithm independently 50 times using a specially selected initial design. In the first instance, the initial design was the design equally supported at 10 points generated from Uniform[0, 1250]. For the other 49 instances, the initial design was selected as follows. The number of support points of each of the initial designs was randomly generated from a Poisson distribution with a mean equal to 10. Then we independently sample the required number of support points from Uniform[0, 1250], generate a random sample from Uniform(0,1] and assign weight *w*_{i}/∑_{i} *w*_{i} to the *i*^{th} support point. The relative *T*- and *KL*-efficiencies are then recorded and compared with other search algorithms. Due to the different initial designs, we also report the frequencies that the function can successfully generate designs without an error message.

Tables 8 and 9 report the comparison results. We observe that the function, tpopt, for finding *T*-optimal designs is sensitive to the initial design. In particular, there were only 29 times that the tpopt function was able to generate a design without an error message for discriminating between models (15) and (19). In contrast, the other function KLopt.lnorm appeared more numerically stable because there was no error message for all the 50 runs and had fast computational time. The design generated by KLopt.lnorm frequently had more than 90% design efficiencies, except for the case when we want to discriminate between models (15) and (19), which can be low. In contrast, Table 7 shows the PSO-QN generated designs consistently have higher *KL*-efficiencies in all the 4 cases.

### An open resource in R software for finding optimal discrimination designs

We have devoted much time to develop a software package called **DiscrimOD** for R users to find various types of optimal discrimination designs in this paper. The user can download the file, DiscrimOD_0.1.1.tar.gz, from the supplementary material and install the **DiscrimOD** package by the R code, DiscrimOD_Install.r. This package allows the user to implement the PSO-QN and the PSO-S-QN algorithms to find the discrimination designs for their own problems. For comparison purposes, we have included both the Fedorov-Wynn and Remes algorithms for finding optimal discrimination designs when there are 2 competing models.

There are previously developed R packages, such as Rcpp [27], RcppDE [31] and lbfgs [24] that have high-end programming techniques and we had incorporated them to make our software package more flexible and broadly applicable. For instance, the user can input his or her distance measures between 2 models, along with the error distributional assumptions and compute the optimal discriminating design of interest. All the algorithms in the **DiscrimOD** package are built using C++ coding for faster computation. The user only needs basic knowledge of R programming to modify the codes by redefining a function or list object in R. For an advanced R user, one can input the competing models and distance function in C++ codes to accelerate the computation.

We provide R codes for implementing all the examples in this paper and Sections A.1 to A.3 in the Appendix. For example, by running the R codes in demo_Section_4_tox_T.r, our package will generate *T*-optimal and max-min *T*-optimal designs for the 5 toxicology models in the section of application to toxicological experiments. Specifically, there are 6 steps:

- (#1) define the 5 competing models (15)–(19) using the R function object;
- (#2) specify the set of nominal values for the parameters in the true model and the parameter space for each rival model;
- (#3) define the distance measure function, which is the squared difference, between any 2 models;
- (#4) set the values of the tuning parameters for the algorithms;
- (#5) use the PSO-QN algorithm to find the
*T*-optimal designs for each pair of the models to be discriminated and check their*T*-optimality by the equivalence theorem; and - (#6) use
*T*-optimal designs obtained in the previous step and the PSO-S-QN algorithm to find the max-min*T*-optimal design for discriminating among the 5 models, and confirm its max-min*T*-optimality by the equivalence theorem.

Similar to the first case shown in Table 5, we also provide an illustrative set of the R codes that we have implemented in demo_Section_62_comparison.r. This file shows how to run PSO-QN, NestedPSO, Fedorov-Wynn and Remes algorithms in R and compare the resulting designs. We also provide the codes to generate the results in Table 6, where we show that the L-BFGS algorithm is more efficient than PSO in solving the inner optimization problem in the *T*-optimal design criterion.

## Summary

Optimal discriminating design problems are common across disciplines. For example, [32] developed an optimal design for model discrimination and parameter estimation for studying population pharmacokinetics in cystic fibrosis patients treated with itraconazole. Their design found optimal sampling times to provide reliable estimates of the population parameters and at the same time, discriminate between 2 competing models. Other examples of optimal discriminating design problems are available in cognitive science [33], psychology [34] and chemical engineering [35], to name a few. These are important optimization problems that are still both theoretically and computationally challenging.

We believe the practical way to solve optimal discriminating design problems in practice is to develop increasingly effective algorithms and make them available to the reader. This paper proposes using nature-inspired metaheuristic algorithms to find these hard to find optimal discriminating designs for the first time and we show that they generally perform as well or outperform current algorithms for finding optimal discriminating designs; the Remes algorithm appears competitive in terms of CPU times, except that in all our examples, it did not find the optimal designs as often as our algorithms. Unlike traditional algorithms, PSO is able to generate optimal designs neatly without need to periodically collapse clusters of points into distinct points. It is also able to generate singular optimal designs seamlessly. Another advantage of PSO is that is does not require the design space to be discretized, which is helpful for solving high-dimensional optimization problems. We applied our algorithms to a toxicology study and generated a design that optimally discriminates among 5 nonlinear models all with a continuous outcome.

To facilitate practitioners implement the proposed algorithms, we provide as supplementary material, a R package for generating optimal designs in this paper. The user-friendly codes can additionally evaluate efficiencies of other designs and be amended to find tailor-made optimal discriminating designs for user-specified problems.

## Appendix

We re-visit a couple of optimal discriminating problems and demonstrate our algorithms can find the same optimal designs. For all the examples, we set tuning parameters for the proposed algorithms in the following way. For the PSO-QN algorithm to identify the *T*- and *KL*-optimal designs, we employed 32 particles and the stopping criterion was 200 iterations. For the PSO-S-QN algorithm to find max-min *T*- and *KL*-optimal designs, we used 32 particles and 400 iterations. The remaining PSO parameters were the same as what we had set before. In the inner loop of both algorithms, we ran the L-BFGS algorithm for 4 times with randomly chosen initial values to check whether it had converged to the same criterion value. The values of the tuning parameters we used for the L-BFGS algorithm were their default values in [24].

### A.1 2 pharmacokinetic models

[11] constructed *KL*-optimal designs for discriminating between the Michaelis-Menten (MM) model and modified Michaelis-Menten (MMM). The 2 mean functions, respectively, are
(A23)
(A24)
The variable *x* is the substrate concentration in an experimental range . For *j* = 1, 2, the parameters *V*_{1} and *V*_{2} are the reaction rates at maximal concentration level, and *K*_{1} and *K*_{2} are the Michaelis-Menten constants that represent the concentrations at which half of the maximum velocity rates are reached for the 2 models. The MMM model generalizes the MM model by adding a linear term with coefficient *F*_{2}.

In this example, we assumed that the MMM model *η*_{tr}(*x*) = *η*_{2}(*x*, *θ*_{2}) is the true model with nominal values *θ*_{2} = (*V*_{2}, *K*_{2}, *F*_{2}) = (1, 1, 1). [36] assumed the model errors can have a log-normal or gamma distribution. For such distributions, a common assumption of the nuisance parameters is that the response has a constant coefficient of variation [37]. Let and be the variances of the random errors in the MM and MMM models, respectively, and assume that . The analytical form of the *KL*-divergence is given in [11].

Table 10 shows the PSO-QN-generated designs *ξ*_{KL} and their *KL*-optimal criterion values, along with the corresponding designs, *ξ*_{LTT2007} for the 2 error distributions from [11]. We observe that they are similar. Fig 4 shows the plot for the directional derivative of the criterion evaluated at the generated design for each error distribution and confirms that the PSO-QN generated designs are numerically *KL*-optimal because both graphs have non-positive values with values close to zero at the support points of the generated designs.

The figures confirm the *KL*-optimality of the 2 designs in Section A.1.

### A.2 3 models with normal errors

Suppose we wish to find an optimal design to discriminate among 3 linear models with homoscedastic errors defined on with mean responses given by (A25) (A26) (A27)

We assume the true model is *η*_{tr}(*x*) = *η*_{1}(*x*, *θ*_{1}) with nominal values *θ*_{1} = (*θ*_{10}, *θ*_{11}, *θ*_{12}) = (4.5, −1.5, −2). We first applied PSO-QN algorithm to find *T*-optimal designs for discriminating between the 2 rival pairs of models, (A25) and (A26), and, (A25) and (A27). The *T*-optimal designs are, respectively, given by
and their *T*-optimal criterion values are *T*_{2,tr}(*ξ*_{T,2}) = 0.001087 and *T*_{3,tr}(*ξ*_{T,3}) = 0.005715. [1] considered discriminating between the first pair only as an example in their work and their *T*-optimal design is the same as ours. Results for the second rival pair are new. Fig 5(a) and 5(b) display plots of the directional derivative of the *T*-optimality criterion evaluated at these designs in the direction of the degenerate design at *x* and they confirm their *T*-optimality because the graphs satisfy the conditions of the equivalence theorem.

The figures confirm the *T*-optimality and the max-min *T*-optimality of the 3 designs in Section A.2.

To use the above results to discriminate among 3 models, the first step is to substitute the 2 *T*-optimal designs for discriminating each pair of the rival models and their optimal values into the numerator of
to calculate their *T*-efficiencies required in the PSO-S-QN algorithm. The resulting max-min *T*-optimal design found from the PSO-S-QN algorithm is
and the max-min *T*-optimal criterion value is *I*_{m}(*ξ*_{T,23}) = Eff_{2}(*ξ*_{T,23}) = Eff_{3}(*ξ*_{T,23}) = 0.806.

We next show that *ξ*_{T,23} is max-min *T*-optimal. Our numerical results suggest that the model index set is and a further application of PSO gives . Fig 5(c) displays the directional derivative plot of the criterion in the direction of the degenerated design at *x* and evaluated at the 5-point design *ξ*_{T,23} and its graph confirms its optimality.

### A.3 Four logistic regression models

[21] considered the design problem for discriminating among 4 logistic models with different regression mean structures:
(A28)
(A29)
(A30)
(A31)
It is assumed that the true model is *η*_{tr}(*x*) = *η*_{4}(*x*, *θ*_{4}) with nominal values *θ*_{4} = (1, 1, 1).

To find the max-min *KL*-optimal design, we first use PSO-QN to find *KL*-optimal designs for discriminating between the true model *η*_{tr} = *η*_{4} and each of the rival model *η*_{i}, *i* = 1, 2, 3. A direct application of the proposed algorithm produces designs that are similar to those in [21]. Then we apply the PSO-S-QN algorithm and obtain the max-min *KL*-optimal design, *ξ*_{KL,123} = {0.0000, 0.3598, 1.0000; 0.6185, 0.2393, 0.1423}. The optimal criterion values is *I*_{m}(*ξ*_{KL,123}) = 0.619. We were also able to use the proposed algorithm and reproduce the design, *ξ*_{TML2016} = {0.0000, 0.3615, 1.0000; 0.6184, 0.2391, 0.1425} found by [21]. The optimal value for this design is *I*_{m}(*ξ*_{TML2016}) = 0.618 and the *KL*-efficiencies of *ξ*_{KL,123} relative to the *KL*-optimal designs are
which imply that . A further calculation shows the sought vector of *α* is and the directional derivative plot in Fig 6 confirms the max-min *KL*-optimality of *ξ*_{KL,123}.

This figure confirms the max-min *KL*-optimality of *ξ*_{KL,123}.

## References

- 1. Atkinson AC, Fedorov VV. The design of experiments for discriminating between two rival models. Biometrika. 1975;62(1):57–70.
- 2. Hill PDH. A review of experimental design procedures for regression model discrimination. Technometrics. 1978;20(1):15–21.
- 3.
Ucinski D, Bogacka B. Heteroscedastic
*T*-optimum designs for multiresponse dynamic models. In: Proceedings of the 7th International Workshop on Model-Oriented Design and Analysis. Physica-Verlag HD; 2004. p. 191–199. - 4.
Uciński D, Bogacka B.
*T*-optimum designs for discrimination between two multiresponse dynamic models. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2005;67(1):3–18. - 5. López-Fidalgo J, Tommasi C, Trandafir PC. Optimal designs for discriminating between some extensions of the Michaelis–Menten model. Journal of Statistical Planning and Inference.2008;138(12):3797–3804.
- 6. Dette H, Titoff S. Optimal discrimination designs. The Annals of Statistics. 2009;37(4):2056–2082.
- 7.
Atkinson AC. The non-uniqueness of some designs for discriminating between two polynomial models in one variable. In: mODa 9–Advances in Model-Oriented Design and Analysis. Springer; 2010. p. 9–16.
- 8.
Dette H, Melas VB, Shpilev P.
*T*-optimal designs for discrimination between two polynomial models. The Annals of Statistics. 2012;40(1):188–205. - 9.
Dette H, Melas VB, Shpilev P. Robust
*T*-optimal discriminating designs. The Annals of Statistics. 2013;41(4):1693–1715. - 10.
Carlos Monteiro Ponce de Leon A. Optimum experimental design for model discrimination and generalized linear models. London School of Economics and Political Science (United Kingdom); 1993.
- 11. López-Fidalgo J, Tommasi C, Trandafir PC. An optimal experimental design criterion for discriminating between non-normal models. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2007;69(2):231–242.
- 12. Atkinson AC, Fedorov VV. Optimal design: experiments for discriminating between several models. Biometrika. 1975;62(2):289–303.
- 13.
Aletti G, May C, Tommasi C.
*KL*-optimum designs: theoretical properties and practical computation. Statistics and Computing. 2016;26(1-2):107–117. - 14. Storn R, Price K. Differential Evolution A Simple and Efficient Heuristic for global Optimization over Continuous Spaces. Journal of Global Optimization. 1997;11:341–359.
- 15.
Eberhart RC, Kennedy J. A new optimizer using particle swarm theory. In: Proceedings of the sixth international symposium on micro machine and human science. IEEE; 1995. p. 39–43.
- 16. Qiu J, Chen RB, Wang W, Wong WK. Using animal instincts to design efficient biomedical studies via particle swarm optimization. Swarm and evolutionary computation. 2014;18:1–10. pmid:25285268
- 17. Chen RB, Chang SP, Wang W, Tung HC, Wong WK. Minimax optimal designs via particle swarm optimization methods. Statistics and Computing. 2015;25(5):975–988.
- 18. Xu W, Wong WK, Tan KC, Xu JX. Finding high-dimensional D-optimal designs for logistic modles via differential evolution. IEEE Access. 2019;7:7133–7146. pmid:31058044
- 19. Kiefer J. On the nonrandomized optimality and randomized nonoptimality of symmetrical designs. The Annals of Mathematical Statistics. 1958; p. 675–699.
- 20.
Tommasi C. Optimal designs for discriminating among several non-normal models. In: mODa 8-Advances in Model-Oriented Design and Analysis. Springer; 2007. p. 213–220.
- 21. Tommasi C, Martín-Martín R, López-Fidalgo J. Max–min optimal discriminating designs for several statistical models. Statistics and Computing. 2016;26(6):1163–1172.
- 22.
Rice JR. The approximation of functions: nonlinear and multivariate theory. vol. 2. Addison-Wesley; 1969.
- 23.
Yang XS. Engineering optimization: an introduction with metaheuristic applications. In: Particle Swarm Optimization. John Wiley & Sons; 2010.
- 24.
Coppola A, Stewart BM. lbfgs: Efficient L-BFGS and OWL-QN Optimization in R; 2014.
- 25. Piersma AH, Verhoef A, te Biesebeek J, Pieters MN, Slob W. Developmental toxicity of butyl benzyl phthalate in the rat using a multiple dose study design. Reproductive Toxicology. 2000;14(5):417–425. pmid:11020653
- 26. Slob W. Dose-response modeling of continuous endpoints. Toxicological Sciences. 2002;66(2):298–312. pmid:11896297
- 27. Eddelbuettel D, François R, Allaire J, Chambers J, Bates D, Ushey K. Rcpp: Seamless R and C++ integration. Journal of Statistical Software. 2011;40(8):1–18.
- 28.
Guchenko R. rodd: Optimal Discriminating Designs; 2016. Available from: https://CRAN.R-project.org/package=rodd.
- 29.
Dette H, Melas VB, Guchenko R. Bayesian
*T*-optimal discriminating designs. Annals of Statistics. 2015;43(5):1959–1985. pmid:26997684 - 30. Dette H, Guchenko R, Melas VB. Efficient Computation of Bayesian Optimal Discriminating Designs. Journal of Computational and Graphical Statistics. 2017;26(2):424–433.
- 31.
Eddelbuettel D, Ardia D, Mullen K, Peterson B, Ulrich J, Storn R. RcppDE: Global Optimization by Differential Evolution in C++; 2016.
- 32. Waterhouse TH, Redmann S, Duffull SB, Eccleston JA. Optimal design for model discrimination and Parameter Estimation for Itraconazole Population Pharmacokinetics in Cystic Fibrosis Patients. J of Pharmacikinetics and Pharmacodynamics. 2005;32(3):521–545. pmid:16307208
- 33. Covagnaro DR, Myung JI, Pitt MA, Kujala JV. Adaptive design optimization: a mutual information-based approach to model discrimination in cognitive science. Neural Computation. 2010;22:887–905. pmid:20028226
- 34. Myung JI, Pitt MA. Optimal experimental design for model discrimination. Psychol Rev. 2009;116(3):499–518. pmid:19618983
- 35. Alberton AL, Schwaab M, Labao MWN, Pinto JC. Experimental design for the joint model discrimination and precise parameter estimation through information measures. Chemical Engineering Science. 2011;66:1940–1952.
- 36. Lindsey JK, Jones B, Jarvis P. Some statistical issues in modelling pharmacokinetic data. Statistics in medicine. 2001;20(17-18):2775–2783. pmid:11523082
- 37.
McCullagh P, Nelder JA. Generalized linear models. CRC Press; 1989.