Computational Characterization of Transient Strain-Transcending Immunity against Influenza A

The enigmatic observation that the rapidly evolving influenza A (H3N2) virus exhibits, at any given time, a limited standing genetic diversity has been an impetus for much research. One of the first generative computational models to successfully recapitulate this pattern of consistently constrained diversity posits the existence of a strong and short-lived strain-transcending immunity. Building on that model, we explored a much broader set of scenarios (parameterizations) of a transient strain-transcending immunity, ran long-term simulations of each such scenario, and assessed its plausibility with respect to a set of known or estimated influenza empirical measures. We evaluated simulated outcomes using a variety of measures, both epidemiological (annual attack rate, epidemic duration, reproductive number, and peak weekly incidence), and evolutionary (pairwise antigenic diversity, fixation rate, most recent common ancestor, and kappa, which quantifies the potential for antigenic evolution). Taking cumulative support from all these measures, we show which parameterizations of strain-transcending immunity are plausible with respect to the set of empirically derived target values. We conclude that strain-transcending immunity which is milder and longer lasting than previously suggested is more congruent with the observed short- and long-term behavior of influenza.

. Visualization of epidemiological measures. A hypothetical weekly incidence curve is shown in blue. The x-axis represents time, and the y-axis represents incidence. The four epidemiological measures label the properties of the epidemic curve they capture.
➔ AAR: cumulative incidence per season. ➔ ED: number of weeks of sustained epidemic per season. Initially defined as 1 week (the single week of peak incidence), and expanded until 90% of the season's attack rate has been captured. Duration expands one week at a time, either forward or backward, whichever direction has a higher incidence (Listing S1). ➔ R p : reproductive number of the virus per season, assuming a host population with lingering immunity from previous exposure. It is defined as the average ratio of incidence between consecutive weeks during the four weeks prior to the peak week. See [1]. ➔ PWI: peak incidence per season. start = end = <week of peak incidence> sum = <peak incidence> limit = <attack rate> * 0.9 while sum < limit: if(incidence[start 1] > incidence[end + 1]): start = start 1 sum = sum + incidence[start] else: end = end + 1 sum = sum + incidence[end] ED = 1 + (end start) Reproduction of the results reported herein is straightforward, assuming appropriate hardware is available. The simulator is CPU and memory intensive, and at least 32 GB of RAM is recommended. After compiling the simulator (assuming the Java version is being used), the first step is to generate a population file. The random seed used for the simulations in this study was 0x123abc (1194684 decimal). The total population size is 100 million hosts, but these are distributed evenly among 20 patches of 5 million hosts. The population file can be generated with the following command: java jar FM_Java.jar patch 1194684 5000000 The outfile file, named "patch-5000000.bin" will have an MD5 hash of 57a831d1dd1d8c66e6fc20525af3badb. The next step is to simulate the initial 100-year burnin period. The purpose of this genesis simulation is to allow the system time to equilibrate under the default parameterization of the [2] model. At the end of this initial 100-year simulation, the simulator will dump it's state to disk for further processing. To run the simulator, a compressed parameter string (also called the data string) is required. For the particular genesis simulation used here, the random seed was 0x12345 (74565 decimal), and the other parameters were set according to Table S1. The following command can be used to run the genesis simulation and generate the requisite saved state file for subsequent simulations: java jar FM_Java.jar new eJxjZCAXpNg5aYS11pCsj1HZFYULAKqtAs8= After the simulation finishes, the simulator will generate a 64-bit final state hash to uniquely identify the simulated trajectory. The value of this hash will be 906e89e62c5def76 for the genesis simulation described above. The output file will be named according to the following pattern: "stats-#.bin", where # is the current system time in milliseconds. For the sake of convenience later on, the file can be renamed to, for example, "genesis.bin". At this point, the 1200 individual 50-year simulations can be run. To repeat any simulation, lookup its data string (see the supplementary file "S1_Dataset.csv", Dataset S1) and instruct the simulator to resume the genesis simulation with the new set of parameters. The command, using as an example the first simulation listed, would be: java jar FM_Java.jar resume genesis.bin eJxjZMAFFKC0bQJ2+RQ7J42w1hqc+pHBr/jUzRmfFjIwMDq+0qgFCpy8B5cDAIoXCwY= The final state of the simulation executed by the above command will be 33db064d66af386e. The simulator includes an API to read the generated output file, and the algorithms described in Text S1 can be used to calculate the value of each of the 8 measures for any given simulation (these values are also provided in the supplementary file "S1_Dataset.csv", Dataset S1). Table S2 contains, among other things, the mean and sample standard deviation of these values grouped by parameterization (n=20 for each set of parameters).

Analysis Text S3: Explored Parameter Space
The aim of this study is to explore all computationally practical, non-trivial parameterizations of strain-transcending immunity and to then suggest which of those are also likely to be biologically plausible. Because this parameter space is continuous (and extends infinitely in the half-life dimension), it is necessary to place some reasonable bounds on the exploration space and to sample a discrete set of parameterizations within those bounds. In the limits of strength approaching zero efficacy or half-life approaching zero time, no protection is conferred by strain-transcending immunity; in the limit of exceedingly long half-life at any appreciable strength, strain-transcending immunity degenerates into a static reduction in susceptibility [2]. Pragmatically, we bounded the primary exploration to strengths within [0.25, 1.00] and half-lives within [41 days, 60 years]. Additional simulations (not shown) indicate that only extremely implausible parameterizations lie outside of these bounds. The discrete grid of 60 points that we imposed on the bounded region is linear in strength of strain-transcending immunity and exponential in half-life of strain-transcending immunity. The exponential step-up in half-life was chosen to satisfy two constraints: selecting a computationally practical number of grid points while sufficiently sampling each regime.

Text S4: Relaxing Assumptions
Throughout the main-text and supporting information, plausibility at a particular point in parameter space is determined by (1) measuring the Mahalanobis distance (MD) from averaged measures taken on model output to the empirically derived targets for influenza and (2) testing the null hypothesis that the point is not a multivariate outlier. If the hypothesis is rejected (at the α =0.05 level), then the parameterization is deemed implausible; otherwise, the point is considered to be plausible. In the main-text, two simplifying assumptions are made: that empirical targets are normally distributed and that all measures are independent (Figures 1 and 2; Table S3). Here, we relax both assumptions and show that outcomes in terms of plausibility are largely unaffected by these assumptions.
To relax the assumption that the empirical targets ranges are normally distributed for all measures, we exclude the two of the eight measures which are explicitly not normally distributed: MRCA and κ. We repeat the process of establishing plausibility by now comparing Mahalanobis distance to a chi-squared distribution with six degrees of freedom ( Figure S2, Table S3). We observe only minor changes to the plausibility region and conclude again that the intermediate regime of strain-transcending immunity is plausible.
Seperately, we relax the assumption of independence among measures. We take two approaches: (1) excluding measures which are suspected to be highly correlated, and (2) using an estimated covariance matrix to calculate MD. Regarding the first approach, we exclude two presumably highly correlated measures: PWI and AD. We again compare Mahalanobis distance to a chi-squared distribution with six degrees of freedom ( Figure S2, Table S3) and observe only minor variation in the shape of the plausibility region.
➔ Of the PWI (incidence, height ), ED (duration, width ) AAR (attack rate, area ) triplet, any individual measure can be reasonably estimated given the other two, assuming there is some canonical epidemic shape. We exclude PWI because we are more confident in the target distributions of ED and AAR than of PWI. ➔ AD (diversity) and MRCA (common ancestor) both essentially measure the amount of divergence among extant strains. Given a distant MRCA, high diversity is not surprising; given a recent MRCA, low diversity is not surprising (and vice versa ). We exclude AD because true MRCA can be reasonably estimated from empirical sequence data, whereas it is not clear how to estimate true AD from empirical sequence data. Regarding the second approach, we first estimate the sample correlation among measures. We generate this estimate separately at each grid point (recall that n = 20 simulations at each point) and observe that much of the correlation structure is preserved across parameter space. We illustrate this correlation structure below, showing the average of all individual correlation matrices ( Figure S3). Until now we implicitly employed a diagonal covariance matrix when calculating MD, where diagonal elements were variances and non-diagonal elements were zero (no correlation between measures). Now, we build a more complex covariance matrix by using simulator output to estimate correlations between measures. We generate each covariance matrix separately at each grid point by (1) converting the estimated correlation matrix into a covariance matrix (scaling by standard deviations of target values across rows and columns) and (2) element-wise averaging the old (diagonal) and new (non-diagonal) covariance matrices (diagonals are equal; non-diagonals take 50% of the correlation structure to avoid overfitting). We use the resulting covariance matrices to calculate a MD which takes into account correlations between measures ( Figure S2, Table S3). Again, the shape of the plausible region remains essentially unchanged.

Host Lifespan
Global life expectancy is on the order of 60-70 years [6], and this is the value that we matched in our simulations.
However, because the simulated population is unable to accommodate a variable (increasing) number of hosts, it is also interesting to simulate a lifespan that is closer to the global median age, which is on the order of 30 years [7]. To determine to what extent shorter host lifespans affect our results regarding the plausibility of transient strain-transcending immunity, we (1) held the immunity parameters constant (at a strength of 87.5% and half-life of 1 year) while varying host lifespan from 30-60 years and (2) remapped a large portion of parameter space under a host lifespan of 30 years. For the first sensitivity analysis of host lifespan (constant immunity parameters, variable lifespan), we ran sets of at least 10 replicates at 30, 40, 50, and 60 year lifespans and measured median Mahalanobis distance for each set ( Figure S4). Although the distance increases slightly as the host lifespan is decreased, all of the parameterizations remain well within the plausible range; a distance of 15.51 or greater is required to reject a hypothesis using a chi-squared distribution with 8 degrees of freedom. For the second sensitivity analysis of host lifespan (constant 30-year lifespan, variable immunity parameters), we ran sets of at least 5 replicates across the most salient portion of parameter space to determine how the shape of the plausible region is affected ( Figure S5). Although the shape of the plausible region changes slightly, we still find that our main conclusion -that it is possible that transient strain-transcending immunity could be weaker and longer lasting than previously thought -still holds. As expected, reducing lifespan (equivalently, increasing respawn frequency) results in an overall increase in the number of naive hosts, and attack rates are uniformly, but not unreasonably, increased compared to those of the simulations using 60 year lifespans (shifting the region of acceptable attack rates to the right on the map; Figure S6). Figure S5. Plausibility map using a reduced host lifespan of 30 years. Map illustrates rejection of immunity parameterizations as previously described. Although the plausible region is smaller, the shape is essentially unchanged; immunity half-lives on the order of years are plausible with a reduction in strength. (n=5 at all points, except the point used in the lifespan sweep described above, for which n=10.) Figure S6. Annual attack rate map using a reduced host lifespan of 30 years. Map is as in main text Figure 1A. Reduced lifespan results in more frequent host respawning and a more rapid increase in the number of naive hosts. Although attack rates are correspondingly higher across parameter space, they are not always unreasonably high. (n=5 at all points, except the point used in the lifespan sweep described above, for which n=10.)

Inclusion of a Tropical Deme
The simple world of the model consists of two equally populated hemispheres which are subjected to sinusoidal seasonal forcing in opposite phase. In the real world, seasonal forcing is much more complex, and only the temperate regions of the world experience the type of seasonal forcing that can be described by a simple sinusoid with an annual peak in the winter months. The highly-populated tropical regions of the world are not subject to such well-defined forcing, yet it has been shown that strains arising and circulating in these tropical regions play a significant role in the diversification and spread of flu globally [8,9]. It is therefore of great interest what impact the simple geography of the model has on immunity results and how a more realistic model of the world would change those results. To this end we made a straightforward extension to the model to allow for patches representing tropical regions. These patches differ from the temperate patches only in that there is no imposed seasonal forcing. Additionally we redistributed the 20 patches (originally divided into 10 northern and 10 southern patches) into 8 northern, 10 tropical, and 2 southern patches to more closely approximate the actual distribution of the world population. All other aspects of the model, including between-patch contact rates, remained the same. As with the sensitivity analysis of host lifespan, we ran sets of 5 replicates across a broad range of parameter space to assess the impact of including tropical dynamics on the shape of the plausible region of immune parameters ( Figure S7). We find that the shape of the plausible region changes somewhat more significantly, in particular becoming more jagged (presumably an artifact of a small number of replicates), and more importantly only extending up to a half-life of 2 years. Despite these moderate changes to the shape of the plausible region under a remarkably different model of the world, our conclusion remains in essence the same: transient strain-transcending immunity could plausibly persist with a half-life meaningfully longer than originally anticipated.

Data
The tables below contain the values which were used to generate the figures in the main-text. The complete list of parameter strings, random seeds, final state hashes, and measured values for all 1200 main-text simulations are available in the supplementary data file "S1_Dataset.csv" (S1 Dataset). Table S2. Epidemiological and evolutionary parsimony measured across strain-transcending immunity parameter space.
Half-life is shown in years; other units are as described in the main text. n=20 for all rows.  Half-life is shown in years; strength is dimensionless; other units are Mahalanobis distances. n=20 for all rows. *Not rejected (p > 0.05), shaded green.