A cost/benefit analysis of clinical trial designs for COVID-19 vaccine candidates

We compare and contrast the expected duration and number of infections and deaths averted among several designs for clinical trials of COVID-19 vaccine candidates, including traditional and adaptive randomized clinical trials and human challenge trials. Using epidemiological models calibrated to the current pandemic, we simulate the time course of each clinical trial design for 756 unique combinations of parameters, allowing us to determine which trial design is most effective for a given scenario. A human challenge trial provides maximal net benefits—averting an additional 1.1M infections and 8,000 deaths in the U.S. compared to the next best clinical trial design—if its set-up time is short or the pandemic spreads slowly. In most of the other cases, an adaptive trial provides greater net benefits.


A1 Efficacy Analysis
The protective effect of a vaccine-that is, vaccine efficacy-is defined as [1]: where ε refers to the vaccine efficacy, p 1 and p 0 are the attack rates observed in the treatment arm and the control arm, respectively, n 1 and n 0 refer to the sample sizes of the treatment arm and the control arm, respectively, and c 1 and c 0 refer to the number of infections observed in the treatment arm and the control arm, respectively. The attack rate is defined as the fraction of a cohort at risk that becomes infected during the surveillance period. There are conflicting views on the possibility of human reinfections [2,3]; for simplicity, we rule out recurrent infections in our simulations.

Superiority Testing
First, we consider superiority testing to determine the licensure of a vaccine candidate at the end of a clinical study, e.g., RCT, ORCT, or HCT. The aim is to demonstrate that the efficacy of the candidate in the prevention of infections is greater than zero. Such a criteria might be appropriate for emergency use authorization during a pandemic where no alternative treatments are available. For this, we consider the following null and alternative hypotheses: The test statistic under the null hypothesis is given by: , a = r + 1 2rn 0 , r = n 1 n 0 (A.3) p = c 1 + c 0 n 0 (r + 1) = rp 1 + p 0 r + 1 ,q = 1 −p (A. 4) where z is the test statistic. For large samples, z is approximately the standard Normal distribution.
The power of a vaccine efficacy study under superiority testing is given by [4,5]: z β = |P 1 − P 0 | rn 0 − (r + 1)/|P 1 − P 0 | − z α/2 (r + 1)PQ √ P 1 Q 1 + rP 0 Q 0 (A.5) P = rP 1 + P 0 r + 1 ,Q = 1 −P (A.6) where α is the level of significance, β refers to the type II error under the alternative hypothesis, z a is the 100(1 − a) percentage points of the standard Normal distribution, P 1 and P 0 refer to the underlying (true) attack rate in the treatment arm and the control arm, respectively, and refers to the true vaccine efficacy.

Superiority-by-Margin Testing
Next, we consider the case where superiority by margin (also known as super-superiority)that is, a vaccine efficacy that is greater than some minimum threshold-must be demonstrated for full licensure: where ϑ = p 1 /p 0 , and θ is a specified minimum threshold larger than 0 and smaller than 1.
The test statistic under the null hypothesis is given by [4]: where z is the test statistic, andp 1 andp 0 are the large sample approximations of the constrained maximum likelihood estimate of P 1 and P 0 , respectively, under the null hypothesis (see below for closed-form solutions). For large samples, z is approximately the standard Normal distribution.
The power of a vaccine efficacy study under superiority-by-margin testing is given by:

Asymptotics for Superiority-by-Margin Testing
The constraint is: wherep 1 andp 0 are the constrained maximum likelihood estimates of P 1 and P 0 , respectively, under the null hypothesis.

A2 Adaptive Vaccine Efficacy RCT
We propose an adaptive vaccine efficacy RCT design (ARCT) based on group sequential methods. First, we consider an alternative definition of vaccine efficacy based on relative force of infection, as opposed to relative risk of infection in Eq. A.1: where λ 1 and λ 0 refer to the force of infection in the treatment arm and the control arm, respectively, and t s refers to the duration of the surveillance period. The force of infection of an infectious disease is defined as the expected number of new cases of the disease per unit person-time at risk. When the risk of infection is small, e.g., smaller than 0.10, the risk of infection is approximately equal to the cumulative force of infection [1].
Next, we note that the force of infection and the hazard function in survival analysis actually take the same functional form [1]. This suggests that infections can also be treated as time-to-event data, in addition to binary variables as in Eq. A.1. By performing Cox regression on the time-to-infections data of a clinical trial, we can estimate the efficacy of the vaccine candidate from the hazard ratio of the treatment arm versus the control arm: where z refers to the treatment variable, i.e., whether the patient is vaccinated or not, λ baseline is the baseline hazard function, and β is the log hazard ratio. We note that the proportional hazards assumption is not unreasonable if we assume that the proportion of cases prevented by the vaccine is independent of the possibly non-homogeneous force of infection [1].
We consider the following null and alternative hypotheses based on the coefficient of the treatment variable in the Cox model: where β 0 is 1 for superiority testing and smaller than 1 for superiority-by-margin testing.
The test statistic under the null hypothesis is given by: whereβ is the maximum partial likelihood estimate of β and se(β) is its standard error, and z is asymptotically Normal. This is also known the Wald test. It turns out this statistic satisfies the criteria for group sequential testing [6], allowing us to perform periodic interim analyses of accumulating trial data, rather than just a single final analysis at the end of a traditional vaccine efficacy RCT (see Fig. A.1). Under the group sequential testing framework, we estimate a new Cox model at each interim calendar time point based on the infections data that have accrued up to that point, over the course of the study surveillance period. At the interim analyses, we decide whether to stop the study early by rejecting the null hypothesis, i.e., approving the vaccine candidate, or to continue on to the next analysis by monitoring the subjects for a longer period of time [6].
We adopt Pocock's test for sequential testing [7]. It involves repeated testing at successive interim analyses at some constant nominal significance level over the course of the study (see Algorithm 1). The critical value is chosen to satisfy the maximum type I error requirement, e.g., 5%.
In our simulations, we consider a maximum of six interim analyses spaced 30 days apart, with the first analysis performed when the first 10,000 subjects enrolled have been monitored for at least 30 days. To keep the type I error at 5%, we consider a nominal significance level of 2.453 at each interim analyses [7].
For each of the epidemiological-model and population-vaccination schedule assumptions, we compute the expected net value of ARCT over 100,000 Monte Carlo simulation paths. For each path, we track the infections data of 30,000 patients for up to 180 days of surveillance. In addition, we estimate up to six Cox proportional hazards models, one at each interim analysis. The simulation process is computationally intensive despite parallelization, requiring approximately 8 hours to complete on the MIT Sloan "Engaging" high-performance computing cluster using over 400 processors.
While we have considered a simple adaptive design in this paper, we note that our framework can be easily extended to other sequential boundaries such as the O'Brien & Infections as time-to-event data, measured from the start of surveillance. The horizontal lines represent the time to infection of ten subjects enrolled at different times. We monitor the subjects until an infection occurs or the end of study, whichever comes earlier. A solid circle at the right end denotes an infection, whereas a hollow circle indicates censoring. In the figure, we consider up to six analyses. At an interim analysis, subjects are considered censored if they are known to be uninfected and at risk at that point in time. Information on these subjects will continue to accrue through the surveillance period.
Fleming's test, to two-sided tests that allow for early stopping under the null hypothesis, i.e., early stopping for both futility and efficacy, and to flexible monitoring using the error spending approach, instead of using a constant nominal significance level for all interim analyses [6].
Algorithm 1 Pocock's test. k refers to the k th interim analysis, K refers to the maximum number of interim analyses planned, z k refers to the test statistic at the k th interim analysis, and c(K, α) refers to the nominal significance level which is a function of K and α, the maximum type I error allowed. for

A4 Financial Cost of Vaccine Efficacy Studies
There are many sources of costs involved in a clinical trial, e.g., patient recruitment and retention, medical and administrative staff, clinical procedures and central laboratory, site management, and data collection and analysis. For a back-of-the-envelope calculation, we assume that the cost per subject in a phase 3 vaccine efficacy trial is around US$5,000. This suggests a cost of US$150M for a study with 30,000 subjects, close to that estimated for rotavirus vaccines [8] in one of the very few studies that estimate the cost of vaccine development [9]. The figure is very high as compared to the median expense of a phase 3 trial for novel therapeutic agents, estimated to be US$19M [10]. However, this is not surprising because vaccine efficacy studies are notorious for being costly due to the large sample sizes and lengthy follow-up durations. If we assume that challenge studies have a cost per subject that is ten times higher, i.e., US$50,000 per volunteer, the estimated cost of an HCT is approximately US$37.5M, where we have assumed a cost of US$5,000 per subject for the follow-up single-arm safety study comprising of 5,000 subjects. This makes up just 25% of the cost of an RCT with 30,000 subjects.

A5 SIRDC with Social Distancing (SIRDC-SD) Model
We assume that there is a constant population of N people. The number of people who are susceptible to infection, infected, resolving their infected status, dead, and recovered are denoted as S t , I t , R t , D t , and C t respectively.
The dynamics of the epidemic are governed by the following differential equations: Unlike most epidemiological models, the SIRDC-SD model assumes a contact rate parameter, β(t), that decreases exponentially over time at a rate of λ from an initial value of β 0 to β * instead of a static one.
This dynamic β(t) incorporates the belief that social distancing over time will lead to a lower contact rate. This is particularly true in the U.S., where many cities have issued stay-athome orders. Many people are also voluntarily wearing masks and are avoiding crowded places, which serve to reduce the contact rate.
The model also assumes that infections resolve at a Poisson rate γ, which implies that a person is infectious for a period of 1/γ on average. Thereafter, he will stop being infectious and transition into the 'resolving' state. Resolving cases will clear up at a Poisson rate of θ. There is an implicit assumption that people who recovered from the virus gain immunity to the virus and cannot be reinfected.

A6 Parameter Estimation/Calibration for SIRDC-SD Model
Let D t and d t be the cumulative and daily number of deaths from data at time t, respectively. Let variables with hats denote the model's estimated values. We use the following optimization program to estimate the parameters of the model.
subject to: Our loss function is given by Eq. A.27, which says that we minimize the sum of 1) the natural logarithm of the sum of squared errors for the cumulative deaths, and 2) the natural logarithm of the sum of squared errors for the daily deaths. The minimization program is subjected to the four constraints. Eq. A.28 says that the initial number of infected must be less than the entire population. Eq. A.29 imposes that the number of initial resolving cases must be less than the number of initial infected cases. Eq. A.30 states that the conservation of population must hold at time = 0 and Eq. A.31 constrains the initial contact rate to be greater than the final contact rate.
The optimization program is solved using the constrained Trust-Region algorithm as implemented in the SciPy Optimize package for each of the 50 U.S. states and Washington, D.C. Our estimated parameters for each state are reported in Table A.3.

A8 SIRDCV Model
We letV and be the number of persons vaccinated at every time step and the effectiveness of the vaccine, respectively. Effectiveness is defined as the performance of the vaccine under real-world conditions in a general population whereas efficacy is defined as the ability to protect against a virus under ideal conditions in a homogeneous population. The former is usually is less than the latter due to several reasons, e.g., improper storage of vaccines leading to loss of potency and non-compliance with the vaccine dosing schedule. For simplicity, we assume that the effectiveness of the vaccine in the epidemiological model is identical to the efficacy of the vaccine in the clinical trials. V r t and V nr t represent the stock of people who are inoculated, and respond (r) and do not respond (nr) to the vaccine, respectively.
Eq. A.21 has been modified to remove vaccinated persons at every time step in Eq. A.32. We also modify Eq. A.22 to allow people who are vaccinated but do not respond to the inoculation to be infected in Eq. A.33. Eq. A.34 and Eq. A.35 keep track of the stock of people who are vaccinated. With this specification, the virus is allowed to spread even when the entire population is vaccinated because not everyone will respond to the mass inoculation.

A9 Evolution of the Epidemic
As mentioned in the main text, we model three different scenarios regarding the evolution of the epidemic after lockdown is relaxed. We explain them here. Below, β ss is defined to be max(0.22, β(T v )), where β(T v ) is the value of β when the lockdown is released.

Status Quo
For the 'status quo' scenario, we will use the estimated dynamic β(t) to perform our forecast.

Ramp Response
For the 'ramp' scenario, we model β(t) with Eq. A.39. We have explained our rationale for this function in the main text.

Behavioral Response
The 'behavioral' scenario is modeled by making the percentage change in contact rate parameter negatively proportionate to the change in the observed death rate over an interval of t o . That is, Integrating Eq. A.40 will yield Eq. A.41.
The exponent of c is the long term steady-state value of β. k can be interpreted as the percentage increase/decrease in β if there is a decrease/increase in the death rate. In our simulations, t 0 , c, and k are set to 7, ln β ss , and 50,000, respectively. The default scenario of c = ln 0.2 will correspond to a R 0 of 1 when approximately 16,000 deaths per week are observed in the U.S. This behavior will start immediately on June 15, 2020, to be consistent with the second scenario.
The new contact rate parameter in this case is defined by Eq. A.42.

Illustration of the Evolution of Epidemic
We give an example of how R 0 = β/γ may look for each of the scenario in Fig. A.3. The actual evolution of R 0 for a state may differ pending on estimated parameters.

A10 Trade-off Between Time and Power
As mentioned in the main text, there is a trade-off between time and power. A shorter surveillance period will, ceteris paribus, reduce the power of the RCT. However, it will also reduce the time to licensure of the vaccine (if approved), which would prevent more infections and save more lives. Conversely, a longer surveillance period would increase the power of the RCT but also prolong the time it takes for the vaccine to be approved. We illustrate the interaction between power and infections avoided over time in Fig. A.4.

A12 Steps in HCT Setup
Steps in HCT setup include: Selection of SARS-CoV-2 challenge strain (assuming a currently circulating and predominant wild-type strain) with careful validation of provenance and health status of the subject from which the strain is procured or generation of viral strain by reverse genetics Selection of a high-level containment laboratory to prepare and manufacture the challenge strain, and contracting with said laboratory

Recruitment of volunteers
Intensive screening of volunteers for susceptibility to SARS-CoV-2, including prior exposure to human coronaviruses, known risk factors, including comorbidities, preexisting conditions, known genetic risk factors for severe COVID-19, and anti-interferon antibodies Final go-ahead from study sponsor and regulatory authority Conduct dose-ranging study to determine the lowest infectious dose/appropriate inoculum to reliably infect susceptible volunteers with challenge virus before proceeding with vaccinating and challenging volunteers per updated/revised study protocol