Exploring the impact of inoculum dose on host immunity and morbidity to inform model-based vaccine design

Vaccination is an effective method to protect against infectious diseases. An important consideration in any vaccine formulation is the inoculum dose, i.e., amount of antigen or live attenuated pathogen that is used. Higher levels generally lead to better stimulation of the immune response but might cause more severe side effects and allow for less population coverage in the presence of vaccine shortages. Determining the optimal amount of inoculum dose is an important component of rational vaccine design. A combination of mathematical models with experimental data can help determine the impact of the inoculum dose. We illustrate the concept of using data and models to inform inoculum dose determination for vaccines, wby fitting a mathematical model to data from influenza A virus (IAV) infection of mice and human parainfluenza virus (HPIV) infection of cotton rats at different inoculum doses. We use the model to map inoculum dose to the level of immune protection and morbidity and to explore how such a framework might be used to determine an optimal inoculum dose. We show how a framework that combines mathematical models with experimental data can be used to study the impact of inoculum dose on important outcomes such as immune protection and morbidity. Our findings illustrate that the impact of inoculum dose on immune protection and morbidity can depend on the specific pathogen and that both protection and morbidity do not necessarily increase monotonically with increasing inoculum dose. Once vaccine design goals are specified with required levels of protection and acceptable levels of morbidity, our proposed framework can help in the rational design of vaccines and determination of the optimal amount of inoculum.


Introduction
Vaccines are the best and most cost-effective defenses we have against many infectious diseases. While the composition of a vaccine can be complex, the most important component is the antigen of the pathogen against which one wants to immunize [1]. Different types of vaccines exist, those based on antigens that contain the pathogen in a non-replicating form, and those that contain the pathogen in a replicating form, usually attenuated to reduce morbidity and mortality [1].
When deciding on the inoculum dose for a vaccine, one often needs to strike a balance between conflicting goals. Higher doses generally lead to more immunity and better protection [2]. Lower doses might reduce vaccine side effects and might also be required if there is a vaccine shortage, for instance, due to a pandemic emergency, manufacturing issues or high costs [3,4]. The ability to predict how changes in inoculum dose impact immune protection and morbidity, and how to achieve the best balance between low dose to reduce costs and side effects and high dose to trigger a robust immune response would significantly contribute toward better vaccine design [5][6][7][8][9][10][11][12].
Currently, the main way to determine vaccine inoculum dose is by trial and error, which is expensive and logistically challenging [13][14][15][16]. A way to improve this approach is to combine mathematical models with experimental data. Such approaches are commonly applied to drugs, where pharmacokinetic/pharmacodynamic (PK/PD) models are used in combination with experimental data to optimize drug dosing regimens [17]. Application of a similar approach to vaccines has been recently proposed for tuberculosis [18].
Here, we develop and analyze a quantitative modeling framework that might allow us to eventually predict the optimal inoculum dose for a given vaccine and setting. We develop our modeling framework for live attenuated vaccines using data from two infection experiments, namely influenza A virus (IAV) and human parainfluenza virus (HPIV). We further investigate a scenario for an inactivated vaccine.
Influenza A virus remains a serious health concern. While a vaccine exists, it needs to be reformulated regularly. Even when the vaccine is well-matched to the circulating strain, its efficacy is not as good as that of other vaccines, especially in the elderly. It has been suggested that using a higher inoculum dose in vaccines for this population might be beneficial [19]. Development of a better vaccine that remains protective in the presence of antigenic drift and that has a higher efficacy remains a priority.
While the two pathogens we analyze here are important, the fact that the data we use comes from animal studies means are results are not directly applicable to possible human vaccines. We consider the important contribution of this study to be the development of a conceptual, quantitative framework that may be used to rationally design vaccines and determine an optimal inoculum dose for any pathogen.

Experimental data
We analyzed data from two previously published studies, one on influenza A virus (IAV) infections in mice [26] and the other on human parainfluenza virus (HPIV) type 3 infection in cotton rats [27].
For the IAV study, groups of mice were infected with 6 different inoculum doses of the H1N1 PR8 strain of influenza. Geometric mean viral titers were recorded at different times following the infection with each dose. In addition, lung damage was measured and scored.
For the HPIV study, groups of cotton rats were infected with 5 different doses of HPIV-3. Geometric mean viral titers were recorded at different times following infection in both lung and nose. For the highest inoculum dose, the study additionally reported several virus measurements over the first 96 hours. The study also reported antibody titers 21 days after infection for the 3 lowest inoculum doses for which virus data was reported.
We used an additional data set to estimate a mapping between innate immune response strength and morbidity. This data was taken from a previously reported challenge study of influenza infection in human volunteers [28]. We used the reported values for different components of the innate response (IFN-a, IL6, IL8 and TNF-a) and total symptom score as measure of morbidity.
For further experimental details, we refer the reader to the original studies.

Mechanistic dynamical infection model
We formulated and implemented a mechanistic, dynamical model of the infection dynamics based on a set of ordinary differential equations. The model is based on our previous work, where we analyzed the relationship between inoculum dose and viral load dynamics [29]. The model is also similar to many other models that have recently been used to model acute viral infections (see e.g. [30][31][32][33] Free virus infects cells at rate b 0 , is cleared by antibodies at rate k 0 A or removed due to other mechanisms (e.g. mechanical transport) at rate d V . Note that b 0 and k 0 A differ from parameters b and k A to account for experimental units. Since we are modeling short, acute infections, we follow the usual assumption and ignore growth and death of uninfected target cells [30,31].
In addition to the basic infection process, we model components of the innate and adaptive immune response. We consider a generic innate response, F, which is produced and decays at rates p F and d F in the absence of an infection. Presence of virus leads to an increase in the innate response, with growth leveling off at a maximum rate g F . The virus level at which growth levels off is determined by the parameter h V . The maximum level the innate response can reach is given by the saturation parameter F max . Since the innate response units are arbitrary, the model is set up such that in the absence of infection, the innate response is at a steady level of F = 1, which leads to p F = d F . We fix the parameter representing the decay rate at d F = 1 per day, which is in line with estimates from an influenza infection analysis in ponies [34].
The innate response is modeled as having two main mechanisms of action. First, it can directly counteract the virus by reducing virus production rate of infected cells [35]. In our model, the strength of production suppression is determined by the parameter s F . The second action of the innate response is to induce the adaptive response, as described next.
For the adaptive response, we focus on B-cells and antibodies, which are the major correlates of protective immunity for most vaccines, including HPIV and IAV [23,36]. The dynamics of activated B cells is modeled as increasing in a sigmoidal manner dependent on both the amount of virus (antigen) and the innate response, with a maximum rate g B . The parameter h F determines the level of virus and innate response at which B-cell growth saturates. Since we are focusing on the short-term dynamics of the system, B-cell decay is ignored. In the absence of an infection, B-cells are set to an arbitrary level of 1. B-cells produce antibodies at rate r A . Antibodies decay naturally at rate d A and bind to and remove free virus at rate k A .
The model is implemented as a set of ordinary differential equations given by the following set of equations:

Model fitting
The model is fit to the IAV and HPIV data. For IAV the fit is to the virus load and lung damage data. For HPIV, the fit is to the virus load and antibody data. For each pathogen, we fit the model to data for all different inoculum doses simultaneously. For each inoculum dose, i, we estimate the starting value for the virus inoculum, V i . All other model parameters are shared across different inoculum doses. Model performance is assessed by the sum of squared residuals (SSR). To allow computation of a single SSR value for the different experimental variables, the contribution of each variable is non-dimensionalized by dividing by the variance of the data. To give the different experimental variables comparable importance, we divide each variable by the number of data points. This amounts to over-weighting the few data points for lung damage (IAV) and antibody response (HPIV) and reducing the weight for the more plentiful viral load data. Mathematically, the expression for the SSR is given by Here, V is viral load (on a log scale), and X represents either antibodies (for HPIV) or damage (for IAV), the superscript indicates model (m) or data (d), the sum runs over all inoculum doses, i, and all time points, t. N indicates the number of data points for either the virus or the other variable, σ 2 indicates the variance for that variable. Since both damage and antibodies (X) are measured in different units in the data and the model, each are normalized before subtracting and squaring. The experimental data for virus load is quantified in hemagglutination units and expressed in virus per lung (IAV), and plaque forming units per gram of lung (HPIV). We assume that the virus variable in our model is in the same units. For damage, the experimental data is reported as percent of lung area that is damaged/destroyed. For IAV, both the model-predicted damage (number of dead cells) and the damage as reported in the data is divided by the maximum value of model prediction or data measurements (for the whole dataset, across all inoculum doses). This leads to a re-scaling of both quantities, thus allowing one to compare model and data. We thus assume that damage as tracked by the model is proportional to damage reported in the data. The same approach is used for the antibody data for HPIV, where the model predictions are 'antibodies in the system' while the experimental data are antibodies based on a plaque reduction neutralization assay. The re-scaling again allows for fitting, with the assumption that the model and experimental quantities are proportional.
While this re-scaling is only necessary for the instances where we compare model and data (lung damage for IAV and antibodies for HPIV), for consistency between models, we plot rescaled values for both lung damage and antibodies for both IAV and HPIV.
The model is being fit by varying model parameters to minimize the SSR. When doing so, we take into account left-censored nature of the data. If the reported virus load is at or below the limit of detection (LOD, which is 0.27 log10 units for IAV and 2 log10 units for HPIV as reported in the original studies), and the model prediction is below the data, we do not count any difference between model and data for any model prediction [37,38].

Model implementation
All computations were done in the R programming language version 3.4.3 [39]. Fitting was done using the nloptr optimizer package [40], differential equations were integrated using the deSolve package [41]. All data and code required to reproduce all results presented here are supplied in S1 Code.

Data extraction
The data used for our study was obtained from the original reports as follows.
For the IAV study, we obtained log viral load and lung lesion score expressed in percent lung damage from table 1 of [26]. The viral kinetics of the highest inoculum dose strongly hints at survivor bias (see figure 1 of [26]). Specifically, the data suggest that sicker mice, with presumably higher virus load, were killed and sampled first, while less sick mice, with presumably lower virus load, were kept alive and sampled later. We, therefore, decided to exclude the data for the highest inoculum dose from consideration, leaving us with viral load and percent lung damage data for 5 different inoculum doses.
For the HPIV study, we focused on lung viral load. The data was extracted from figures 1 and 2 of [27] using Engauge Digitizer [42]. Viral load kinetics for the highest inoculum were measured twice with some overlap in times (24h and 96h). We averaged data for these times from the 2 experiments. We additionally obtained data on antibody titers for those inoculum doses for which viral load was reported (the 3 lowest inoculum doses) from figure 3 of [27].
For the data linking innate response to symptoms, total symptoms score data was extracted from figure 3 and innate immune response data from figures 5 and 6 of [28] using Engauge Digitizer. More details on how this data was used are provided in other parts of this paper. The data are part of S1 Code.

Model development and fitting
The model we developed is described in detail in the methods section. For each data set, we fit the model to the viral load data, and either lung damage (IAV) or antibody (HPIV) data. Details on the fitting approach are provided in the methods section. The model fits and data for IAV and HPIV are shown in Figs 1 and 2 respectively. Parameter values for the best fits are given in the S1 Text.
While the dynamics for the 2 pathogen-host systems look broadly similar, it is noticeable that the IAV infection leads to a stronger innate response and more host damage (fraction dead cells). This is a particular feature of the influenza strain used in those experiments. Different influenza strains in mice, and certainly most influenza infections in humans, do not cause a large amount of cell damage-though some of the more serious influenza infections, e.g. with the H5N1 strain, can lead to a large amount of damage.

Quantifying immune protection
We want to quantify the amount of protective immunity induced by different inoculum doses. We focus on the B-cell and antibody component of the adaptive immune response. Provided antibodies are specific to the pathogen, higher levels of antibodies generally lead to better protection [43][44][45]. Recent studies for influenza vaccines [46,47] demonstrated that the following function provides a good mapping from antibody titer to the level of protection from infection: Here, the level of protection, P, varies between 0 and 1, with low protection for low levels of antibody titer, A, and maximum protection at high levels. The constants k 1 and k 2 determine the slope of the curve and the level at which protection is at 50% respectively (see [46] for more details). This functional shape is also consistent with data for other pathogens [43][44][45]. Fig 3  illustrates this relationship between antibody levels and protection graphically. Our model represents antibodies in units of numbers of antibodies. In general, experimental studies report antibody neutralizing titers or similar assay-specific units. For this reason, and because we have no data for the correlation between antibody titers and protection for either the HPIV or IAV data we analyze, it is impossible to determine specific choices for k 1 and k 2 for our study systems. We instead chose values such that the antibody levels considered span the full range from low to high protection levels. Specifically, we set k 1 = 1 and k 2 = E(log (A)) where A is the range of antibody levels predicted by our model for different inoculum doses and E is the expected value. This choice is essentially arbitrary and therefore the protection curves we present below are to be understood conceptually. Exploring the impact of inoculum dose to inform model-based vaccine design

Quantifying morbidity
It is still not fully understood how virus and immune response affect host morbidity, i.e. the severity of symptoms. For virus infections, host morbidity can be due to virus-induced death of infected cells, as well as immune response mediated pathology. A study of influenza infection in humans suggested that a model in which symptom score was proportional to innate cytokine levels provided an adequate fit to the data [48]. Another study of influenza infections used a combination of innate cytokine (interferon) levels and cell death to define morbidity as where D is the total number of dead cells, and g(F) was chosen to be a sigmoidal mapping of log interferon levels [49]. Similarly, a previous model for dengue infections assumed that morbidity was proportional to the peak of the innate response, i.e. M~max(F) [50]. In the case of vaccines, strong pathological effects such as the death of a meaningful fraction of target cells do not occur. It therefore seems most reasonable to express morbidity (strength of symptoms) as a function of the innate immune response.
To obtain an estimate for a mapping between innate immune response and morbidity, we use data from a previously reported challenge experiment of influenza infection in human volunteers [28]. We use the reported values for different components of the local innate response (IFN-a, IL6, IL8, and TNF-a) and, after scaling each component to a maximum value of 1, sum them to obtain an estimate for the total innate response strength. This total response quantity is again scaled and then mapped to morbidity, measured as total symptom score. Fig 4 shows the data and the best fit of a sigmoidal model that provides a mapping between innate response and morbidity.
The model is given by where M is morbidity as measured by total symptom score, and F is the scaled innate response. Best fit parameter values are b = 6.5 and c = 0.66. The parameter a was fixed at 36, corresponding to the maximum score possible based on the study protocol [28]. While a simpler linear model would fit the data equally well, it is less biologically reasonable since it would allow an unbounded increase in symptoms. From our simulations, we obtain the time course of the innate response, F, for each inoculum. After rescaling this quantity, i.e. dividing by the maximum of F across all doses, we use Eq (3) to compute the time course for morbidity, M. Finally, we take the integral of the morbidity to compute total morbidity as the area under the morbidity curve (MAUC), given by: The integral goes over the duration of the infection. Since this approach mixes model simulations based on animal infections with morbidity estimates based on human data, the resulting morbidity curve should be interpreted in a similar conceptual way as the protection curve described above.

Immunity and pathogenesis as function of inoculum
After fitting the dynamical infection model (1) to each data set, we used the best-fit parameter values and ran simulations for a range of inoculum doses. Several time-series for the IAV and HPIV model simulations spanning the whole range of simulated inoculum doses are plotted in Figs 5 and 6.
The model fit to IAV shows increase in the innate response as inoculum dose increases, and an increase, followed by a leveling off, for the antibodies. For HPIV, we see the innate response increasing more rapidly as dose increases but reaching the same saturating level. Antibodies initially increase, peak at intermediate inoculum dose, and then decrease.
From these time-series, the level of protection and morbidity was computed. The model is simulated for 21 days, predicted antibodies are recorded at the final time. From these antibody levels, we compute immune protection using Eq (2). We also record the predicted innate immune response, and, after scaling, use Eq (3) to compute morbidity, and by integrating the area under the curve, determine the total amount of morbidity during the infection. Those results are shown for IAV in Fig 7 and for HPIV in Fig 8. We find for these examples that protection initially increases with inoculum dose, followed by a slight (IAV) or strong (HPIV) decrease for very high doses. Morbidity increases for IAV through the whole range, and shows the same "increase, then decrease" pattern as HPIV.

An inactivated vaccine model
The model and data above are for replicating pathogens, as such representing live, attenuated vaccines. Another important category of vaccines are those where the pathogen is killed and    Exploring the impact of inoculum dose to inform model-based vaccine design  Exploring the impact of inoculum dose to inform model-based vaccine design non-replicating. A modification of the above model can be used to simulate such a vaccine. For such a vaccine, cells do not get productively infected, and one can remove the variables tracking uninfected and infected cells. The model simplifies to We were not able to find data in the published literature that was detailed enough to allow model fitting. We therefore instead chose values for model parameters that produced reasonable dynamics and explored the impact of inoculum/antigen dose on protection and morbidity for such a generic model.

Determining an optimal inoculum dose
For any given vaccine, one wants to find out the inoculum dose which is "optimal". There is no single definition of "optimal". One important criterion is to get immune memory and thus protection as high as possible. Another criterion could be to achieve high protection with as few side effects (morbidity) as possible. Yet another criterion could be to achieve maximum protection at the lowest possible dose to allow vaccination of as many individuals as possible in the presence of vaccine shortages. These different targets lead to a general optimization problem, where the objective is to find a dose D that maximizes a function f(D) which is made up of Exploring the impact of inoculum dose to inform model-based vaccine design targets for each criterion. For instance a simple hypothetical function to be maximized could be f(D) = b 1 P(D) + b 2 M(D), where b 1 is a positive parameter weighing the criterion of protection, and b 2 is a negative parameter weighing the criterion for morbidity. Another function could be a min-max scenario, where one tries to find a range bounded by the minimum dose that still provides some critical level of required protection and the maximum dose above which morbidity is above an acceptable level.
For illustrative purposes, we show results for one such hypothetical objective function f(D), namely the ratio of protection and morbidity, in S1 Text. This is purely illustrative and not meant to suggest this as a particularly useful choice of criteria. The specific criteria will be different for different pathogens and scenarios. For instance for a very severe disease in an outbreak setting (e.g. Ebola), the goal could be to produce vaccines at the lowest required dose to achieve some critical level of protection, without worrying much about side effects. In contrast, routinely given childhood vaccines where shortage is not a problem will need to focus on the balance between protection and side-effects, likely without too much emphasis on dose sparing to stretch vaccine supply.

Discussion
In this study, we used a combination of data and models to explore how inoculum dose impacts immune protection and morbidity. We found that for the examples investigated, sometimes there is a monotonic or almost monotonic increase of protection and morbidity as inoculum increases (inactivated vaccine and IAV examples), while other times protection and morbidity can decline once dose increases beyond some value (HPIV example). In this latter scenario, we found with our model that once inoculum doses increase beyond some threshold, the innate immune response is triggered so strongly that it leads to quick pathogen clearance, which in turn leads to a reduced activation of the adaptive immune response and thus reduced immune protection. Exploring the impact of inoculum dose to inform model-based vaccine design If or how this pattern shown in our model applies to real settings is unclear. In many situations, a higher inoculum dose of either live attenuated or killed antigen in a vaccine leads to a stronger immune response and subsequently likely better antibody or T-cell mediated immune protection [51]. We are not aware of a clear example showing increased inoculum dose of an attenuated or killed vaccine leading to reduced immune protection, however, our model and general biological mechanisms suggests it as a possibility.
We want to re-emphasize that our analysis presented here should be understood as the development of a conceptual framework towards the use of models and data for potential vaccine optimization. Several obvious limitations do not allow one to consider this study more than conceptual. The data we had available are reported as averages of multiple animals, the hosts being studied (mice and cotton rats) are not natural hosts for the pathogens they were infected with, we combined some human data with the animal data, and the relative sparsity of the data does not allow us to fully validate our model. We built and used a model that we believe can be justified based on our biological understanding of the processes happing during such acute infections. However, our model is too complex for the data at hand and leads to overfitting. We chose to use this model based on biological grounds. There are likely many alternative, biologically reasonable, model formulations which might lead to different results, again emphasizing the conceptual nature of our work.
We believe that using an approach that combines modeling with data can help in the development of more efficient vaccines. The key toward that goal is the availability and integration of the right kind of models and data. Data should be collected for correlates of immune protection (e.g. specific antibodies, B-cells, T-cells) and direct or indirect markers of morbidity (e.g. symptom score, side-effects) for a few different inoculum doses. Further measurements that can help validate models, e.g. pathogen load over time, might be also be useful. Similarly, data that measures risk of infection at several different levels of the correlates of protection will be needed. Combining those data with models would allow testing and calibration of a model, which could then be used to produce predictions for protection and morbidity over a full range of inoculum doses, including those not experimentally measured.
Being able to predict the expected protection achieved for a given inoculum dose can help in the design of vaccines in cases when only limited antigen is available, e.g. in emergency situations [4]. Having information on both immune protection and expected morbidity allows one to determine an optimal inoculum dose based on the-often conflicting-goals of high protection and low morbidity. For instance, one could systematically answer questions such as, "If we require at least 80% immune protection, what would the minimum amount of inoculum need to be? And what level of morbidity/side-effects would this induce?" Currently, both modeling and experiments are not yet able to be used in such a specific manner. However, a tighter integration of experiments with models, and further model refinement should allow one to use the modeling approach discussed here in the future to help design vaccines.
Some promising extensions and refinements of the models are inclusion of further components of the immune response. For instance, given that T-cells are also known to play an important role in immune protection and are affected by inoculum dose [52,53], it would be beneficial to extend experimental and modeling studies in the future and consider both the Bcell and T-cell components of the adaptive response. Similarly, provided more detailed data on specific components of the innate response is available, including those components explicitly in the models might be useful. Another extension would be to consider stochastic models, which would better be able to capture variation among patients. This would require individual host data to be available for analysis and modeling.
Our study fits into the recently proposed framework of Immunostimulation/Immunodynamic (IS/ID) modelling, which has been proposed as a framework to combine models and data for better vaccine formulation decisions [18], in analogy to the well-established pharmacokinetic/pharmacodynamic (PK/PD) modelling approach widely used in drug development [17].
To summarize, we developed a modeling framework that might allow a systematic and quantitative determination of the impact of different inoculum doses on resulting immune protection and morbidity. We applied this approach to several data sets to illustrate how the general concept can lead to important insights, e.g. 'more inoculum does not always lead to more immune protection'. The modeling and analysis framework presented here could be applied to data from specific vaccine candidates and help to more efficiently determine the optimal dose. Supporting information S1 Text. Additional details and results. (DOCX) S1 Code. All R scripts and data needed to reproduce all results shown in the paper.