CAPYBARA: A generalizable framework for predicting serological measurements across human cohorts

Sierra Orsinelli-Rivers; Daniel Beaglehole; Tal Einav

doi:10.1371/journal.pcbi.1014129

Abstract

The rapid growth of biological datasets presents an opportunity to leverage past studies to inform and predict outcomes in new experiments. A central challenge is to distinguish which serological patterns are universally conserved and which are specific to individual datasets. In the context of human serology studies, where antibody-virus interactions assess the strength and breadth of the antibody response and inform vaccine strain selection, differences in cohort demographics or experimental design can markedly impact responses, yet few methods can translate these differences into the value±uncertainty of future measurements. Here, we introduce CAPYBARA, a data-driven framework that quantifies how serological relations map across datasets. As a case study, we applied CAPYBARA to 25 influenza datasets from 1997-2023 that measured vaccine or infection responses against multiple influenza variants using hemagglutination inhibition (HAI). To demonstrate how a subset of measurements in each study can infer the remaining data, we withheld all HAI measurements for each variant and accurately predicted them with a 2.0-fold mean absolute error—on par with experimental assay variability. Although studies with similar designs showed the best predictive power (e.g., children data are better predicted by children than adult data), predictions across age groups, between vaccination and infection studies, and across studies conducted <10 years apart showed comparable 2‒3-fold accuracy. By analyzing feature importance in this interpretable model, we identified global cross-reactivity trends that can be directly applied in future longitudinal or vaccine studies to infer broad serological responses from a small subset of measurements.

Author summary

The potential to integrate data from multiple studies is hampered by differences in cohort demographics or study design – such effects, even when known, are hard to estimate. Here, we analyzed 25 studies quantifying how influenza antibody responses inhibited different sets of viral variants. We developed a computational approach that learned which studies accurately predict one another and estimated the value and uncertainty of antibody inhibition against each variant. Although these studies were conducted over two decades, used different study designs, and assessed different age groups, each study was well-predicted by at least one other dataset, enabling rapid cross-dataset integration. This tool can be readily applied to future studies by unbiasedly quantifying study similarity based on prediction accuracy, or by measuring the antibody response against a small set of variants to predict the response against dozens of other variants, thereby leveraging the wealth of prior data to fuel future efforts.

Citation: Orsinelli-Rivers S, Beaglehole D, Einav T (2026) CAPYBARA: A generalizable framework for predicting serological measurements across human cohorts. PLoS Comput Biol 22(3): e1014129. https://doi.org/10.1371/journal.pcbi.1014129

Editor: Tyler Cassidy, University of Leeds, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND

Received: June 16, 2025; Accepted: March 16, 2026; Published: March 30, 2026

Copyright: © 2026 Orsinelli-Rivers et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Code is available through the accompanying GitHub repository (https://github.com/TalEinav/CAPYBARA). Analyses were implemented in Python using standard scientific libraries (NumPy, SciPy, scikit-learn).

Funding: This research was supported by LJI & Kyowa Kirin, Inc. (KKNA - Kyowa Kirin North America, TE), and the Bodman family (TE). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

As biological datasets continue to expand in size and complexity, it is becoming increasingly more challenging to integrate information from prior datasets to inform and predict the outcomes of future experiments. Patterns found in one group of individuals may not apply to another group where factors such as age, exposure history, or immune state differ [1–6]. More subtle, and often unknowable, differences in experimental methodology or batch effects may further affect which datasets can predict one another. While many studies have identified that cohorts differ in some way (e.g., children and adults show significantly different immune responses [7–11]), we lack methods that estimate how these differences translate into future measurements. Such quantitative predictions are not only the hallmark of deeply understanding a system, but they also facilitate head-to-head comparisons across studies measuring different features.

This work tackles this problem in the context of the antibody response against the rapidly evolving influenza virus, which underpins the annual vaccine selection process [12,13]. Specifically, we consider serum hemagglutination inhibition (HAI) against multiple influenza variants, where higher HAI titers correlate with greater protection [14–16]. While thousands of new variants (or strains) emerge each year [17–19], only a small fraction can be functionally characterized using HAI, and the variants measured often differ between studies. Critically, we still lack methods that take a person’s HAI titers against a few variants and infer their titers for other variants, which would quantify the holes in a population’s immunity that should be closed when the vaccine is next updated.

Currently available HAI datasets have several direct clinical applications. Prior work has shown that a person’s HAI against multiple strains can infer their influenza exposure history [20,21] or help predict their response to future vaccines [22]. Serum-virus HAI titers have been shown to be inherently low dimensional [23,24]. where titers against some variants can infer the titers of other strains [25,26]. As such, a new study seeking to measure HAI against numerous variants could theoretically extract these cross-reactivity relations from existing datasets, measure a minimal number of variants, and then predict the HAI of the remaining strains. One key hurdle is that cross-reactivity relations may differ with age, influenza exposure, and other immune variables. As the number of prior studies continues to increase, it is unclear a priori which datasets will best predict the cross-reactivity relations in another study, nor what form those relations will take.

To that end, we introduce the method Cross-study Adaptive Predictions Yielding Bayesian Aggregation with Recursive Analysis (CAPYBARA), a generalizable framework that efficiently selects the most predictive features within each dataset, determines their cross-reactivity relations, estimates prediction error, and then combines predictions from multiple studies weighted by their confidence. Fig 1 provides an overview of the CAPYBARA workflow, including the feature learning process, model training, error calibration, and Bayesian weighing of predictions across datasets. We demonstrate the utility of this approach by applying CAPYBARA to 25 influenza HAI datasets, providing a comprehensive analysis of cross-study prediction performance in a large-scale serology compilation. We first apply CAPYBARA to H3N2 data and then demonstrate its generalizability by predicting H1N1, B Victoria, and B Yamagata titers.

Download:

Fig 1. Combining datasets using CAPYBARA to predict new antibody-virus interactions.

(A) Given studies (...S_j-1, S_j, S_j+1…) measuring serum HAI against a subset of influenza variants V₀-V_n, and study-of-interest S₀ measuring HAI against V₁-V_n, CAPYBARA predicts V₀’s measurements in S₀. (B) CAPYBARA first identifies the most predictive features (HAI against a subset of variants) using Recursive Feature Machines (pink boxes). Ridge regression is applied using those features, training on a subset of data in S_j and cross-validating on the rest (error σ_Internal, Table 1). This model predicts titer values μ_j from S_j → S₀ without uncertainty. (C) To estimate cross-study prediction error, every other variant is withheld and predicted from S_j → S₀ to determine the internal (σ_Internal) and cross-study (σ_External) error. Combining the errors from every overlapping variant yields the transferability function f_j that is applied to V₀’s σ_Internal from Panel B to estimate the uncertainty σ_j in S_j. (D) Predictions from all studies are combined through a Bayesian approach to yield a consensus prediction for the study-of-interest (S₀).

https://doi.org/10.1371/journal.pcbi.1014129.g001

Download:

Table 1. Definitions of CAPYBARA error terms and their roles in model training, transferability, and evaluation.

Models are trained to infer HAI titers for variant V₀ in study S_j and then applied to predict V₀‘s titers in study S₀. Titers from other variants V_k can be chosen as model features.

https://doi.org/10.1371/journal.pcbi.1014129.t001

In the context of influenza immunity, CAPYBARA addresses two essential questions: First, how accurately can we leverage prior studies to predict future antibody inhibition data? Second, how few measurements are needed in order to extrapolate all antibody-virus interactions for any set of variants? Accurate cross-study predictions in the face of differences in study populations, experimental conditions, and virus panels would not only expedite future experiments but also help quantify the magnitude and breadth of the immune response in greater resolution.

Results

Overview of the algorithm

The CAPYBARA algorithm predicts the HAI titers of multiple sera against a withheld or unmeasured variant-of-interest V₀ in study-of-interest S₀. As input, we assume that HAI titers from other variants V₁, V₂, V₃… were measured for these same sera, and that other studies S₁, S₂… also measured HAI for V₀ and a subset of other variants (Fig 1A). Model features are the titers against different influenza variants, with HAI against V₁, V₂, V₃… used to predict the withheld HAI titer against V₀.

The algorithm proceeds as follows: 1) In every other study S_j∊{S₁, S₂… S_n}, identify the subset of variants (features) that best predict HAI titers for V₀ (Fig 1B). 2) Train a model in S_j to predict each subject’s titer (μ_j) for V₀ (Fig 1B). 3) Repeat step 2 on all other variants V₁, V₂, V₃… whose values are known, so that within-study and cross-study error can be computed. This determines the error relationship when predicting from S_j to S₀ (Fig 1C), which is then applied to determine the uncertainty σ_j for V₀ predictions in S_j. 4) Combine predictions from all studies to estimate the HAI titer±error for each subject (Fig 1D, Methods). Prediction accuracy should only increase as more datasets are included, and adding a very noisy dataset (σ_j→∞) will negligibly change predictions. Multiple methods were tested for each model component (e.g., random forest, ridge, lasso) on the Fonv studies, and the final CAPYBARA method is composed of Recursive Feature Machines, ridge regression, and Bayesian weighing that were the most accurate. Missing HAI entries were imputed using the row- and column-means of all measured values, yet all error metrics (e.g., σ_Actual) were computed using the measured titers. Table 1 describes the four σ terms used throughout this study.

To validate how well CAPYBARA predicted unmeasured serum-virus interactions across a compendium of influenza studies, we entirely withheld antibody responses from each variant within 20 vaccine studies and 5 longitudinal infection studies conducted between 1997–2023 (Table 2). These studies covered a variety of vaccine types (inactivated, live attenuated), age groups (children and adults), and geographic regions, containing ~200,000 HAI titers from 3,855 unique subjects (Table 2 and S1 Fig), Given this diversity, it was unclear a priori which datasets would be most informative to impute the HAI titers in any other study.

Download:

Table 2. List of large-scale influenza studies used in this analysis. 25 influenza datasets comprising vaccine [Vac, white background] or infection studies [Inf, gray background] used to assess cross-study predictions. The year represents when each study was conducted (e.g., 2010-2014 implies that samples were collected annually across these 5 years). Sera collected at different time points from the same subject were considered independently. The total number of measurements in each study equals (# of sera)×(# of viruses)-(% missing).

https://doi.org/10.1371/journal.pcbi.1014129.t002

Antibody responses are predicted between infection and vaccination studies within experimental noise

To test how well the HAI of new variants could be inferred across diverse biological contexts, we first examined how a longitudinal 6-year infection study (2007–2011 Fonv_Inf) predicted the titers of overlapping variants in a vaccine study conducted six years later (2017 UGA_Vac), by which point subsequent infections or vaccinations could have dramatically altered HAI cross-reactivity. In total, N = 4,336 titers were predicted in the vaccine study with root-mean-squared error (RMSE) σ_Actual = 3.1x (where “x” denotes fold-error) between the predicted and measured titers (Fig 2A, green), implying that a measured HAI = 20 will typically be predicted as a titer between 20/3.1 = 6.5 and 20·3.1 = 62. The model’s estimated error σ_Predict = 7.7x represents an upper bound (worst case) error, and although this bound was not tight, it satisfied σ_Actual≲σ_Predict as expected. In contrast, when we predicted this same 2017 UGA_Vac dataset using a vaccine study from one year earlier (2016 UGA_Vac), we found a smaller prediction error σ_Actual = 2.0x and a tighter estimated error σ_Predict = 2.3x when predicting these same N = 4,336 titers (Fig 2A, blue).

Download:

Fig 2. HAI titers across vaccination and infection studies are consistently predicted within experimental noise by combining predictions from all other studies.

(A,B) Example predictions trained on an individual dataset (left and middle columns) and the combination of both datasets (right column). Labels above each plot identify the training → testing datasets. (C-E) Predicting three datasets using all other studies in Table 2. The estimated fold-error (σ_Predict), measured fold-error (σ_Actual), and the number (N) of predicted titers are shown, with the gray diagonal bands representing σ_Predict.

https://doi.org/10.1371/journal.pcbi.1014129.g002

For each serum-virus HAI, the estimated titer and error (μ₁ ± σ₁ from study 1, μ₂ ± σ₂ from study 2) were combined using Bayesian statistics, (μ₁/σ₁² + μ₂/σ₂²)/(1/σ₁² + 1/σ₁²) ± (1/σ₁² + 1/σ₂²)^-1/2, which places more weight on the more confident prediction with smaller σ_Predict (Methods). In this case, 2016 UGA_Vac was weighed ~ 20x more heavily (1/σ₁² = 0.19 vs 1/σ₂² = 0.01), as may be expected from its similar study design. The combined predictions remained as good as the predictions from the 2016 UGA_Vac study alone (σ_Actual = 2.0x, σ_Predict = 2.1x), demonstrating that the model is not hampered by adding the poorly predicted infection study (Fig 2A, purple).

As another example, we used an infection study (2010–2014 Ert_Inf) and a vaccine study (1997 Fonv_Vac) to both individually and jointly predict another infection study (2007–2011 Fonv_Inf). Interestingly, predictions between the infection→infection study (σ_Actual = 3.1x) were very slightly worse than vaccine→infection predictions (σ_Actual = 2.9x), even though the infection studies overlapped in time while the vaccine study occurred ten years earlier (Fig 2B, green/blue). Combining both studies led to more accurate predictions than either dataset alone (σ_Actual = 2.4x) with similarly tight estimated error (σ_Predict = 2.5x, Fig 2B, purple).

More generally, the predictions from any number of datasets can be combined using error estimation (Methods). As representative examples, we used every dataset in Table 2 to predict HAI titers in an adult health care workers vaccine study (2016 Fox_HCW,Vac, Fig 2C), vaccinated children (2014 Hin_V,Vac, Fig 2D), and an adult infection study (2012–2015 Hay_Inf, Fig 2E; all studies in S2 Fig). The 10³-10⁴ predicted HAIs in each study had low RMSE (σ_Actual = 1.8-2.1x, similar to the error of the HAI assay) and comparably low estimated error (σ_Predict = 1.4-1.7x), demonstrating that combining datasets can precisely and confidently extrapolate HAI titers for completely unmeasured variants. These results corroborate that studies do not need to be pre-screened, since the framework will identify the most predictive datasets and ignore the poorly predictive ones.

More training datasets lead to better prediction accuracy across 25 years of data

To test generalizability, we compared the prediction error when training on any single dataset vs the combined predictions from all studies (Fig 3A). As expected, the individual datasets showed far larger variability (σ_Actual = 1.6-10.7x) than the combined predictions (1.7-2.5x) that were always comparable to the most accurate pairwise predictions in each column. Interestingly, prediction accuracy does not need to be symmetric. For example, multiple studies had poor predictions with σ_Actual > 6x when trained solely on 1997 Fonv_Vac, yet using any training dataset to predict values in 1997 Fonv_Vac led to more accurate σ_Actual < 4x. Computing signed prediction error showed that predictions were skewed to be slightly larger when the measured titer = 5 and slightly smaller when the measured titer was for higher ≤640, yet the median predictions were within 2-fold for measured titers ≤1280 and within 4-fold for larger titers (S3 and S4 Figs).

Download:

Fig 3. Predicting HAI responses across all studies.

(A) Heatmap of the average RMSE (σ_Actual) across all subjects and overlapping variants in a study-of-interest (column). Training is either done using all studies (top row) or using a single study (all other rows). (B-C) All predicted versus measured HAIs when training on (B) a single study or (C) all other studies. The number N of predictions is larger for pairwise predictions since the same serum-virus pair is predicted multiple times using different training datasets. The diagonal line y = x represents perfect predictions.

https://doi.org/10.1371/journal.pcbi.1014129.g003

The greatest signed error deviations occurred when the two 2012–2013 Ert_LAIV,Vac studies were used for training where, unlike in all adult studies, HAI titers for nearly all viruses hit the limit of detection (HAI = 5), leading to markedly different cross-reactivity relations. Even so, the combined predictions using all studies uniformly had signed error ≈0, demonstrating improved accuracy when using more datasets (S4 Fig).

At the individual-person level, there was a noticeably greater spread in pairwise predictions (σ_Actual = 2.6x across all subjects, Fig 3B) than in the combined predictions (2.0x, Fig 3C), with 14.3% of the former predictions having an error > 4x while only 5.3% of the latter predictions had such error (S5 Fig). Indeed, CAPYBARA does better than averaging the individual predictions from each study by heavily weighing the more reliable, and hence more accurate, predictions (S6 Fig).

A few general trends can be seen from pairs of studies that poorly predict one another (Fig 3A). The two oldest studies (1997/1998 Fonv_Vac) tend to poorly predict studies from 2010 and beyond. The LAIV studies (2012/2013 Ert_LAIV,Vac) were sometimes poorly predicted by the more common IIV studies. Beyond these few rules, it was often unclear which studies would poorly predict one another, emphasizing the utility of CAPYBARA to infer such relationships directly from the data.

To demonstrate the generalizability of this approach, we next used CAPYBARA to predict all H1N1, B Victoria, and B Yamagata HAI titers in these same studies. Although the virus panels were smaller in each case, the UGA studies had the necessary overlap of ≥3 variants. Combined predictions of HAI titers using all datasets led to σ_Actual = 1.9-2.6x for H1N1, comparable to H3N2 prediction accuracy (S7 Fig). B Victoria and B Yamagata achieved σ_Actual = 1.7-3.7x, with most studies against being predicted with ≈2-fold error (S7 Fig).

Lastly, we showed that this CAPYBARA can disregard an extremely noisy dataset by generating a new study with random H3N2 HAI titers drawn from the same distribution and with the same variants as the 2016 UGA_Vac and 2007–2011 Fonv_Inf studies (S8 Fig). Adding this noisy dataset to the existing studies did not affect overall prediction accuracy, as expected. Thus, more datasets can be added until the desired prediction accuracy is achieved. Systematic shifts (e.g., all titers increased by 4x) are automatically accounted for, since HAI titers for one variant are inferred using titers from other variants within that same study (S9 Fig).

Subsetting datasets helps explain prediction dynamics

Since age is well known to affect the antibody response, we assessed how well children (age ≤ 18) can predict adult responses (age > 18) and vice versa. Datasets were categorized as containing children only, adults only, or a combination of both (S1 Fig). HAI titers from studies in each category were exclusively predicted using models from either the same or a different category (Fig 4A). As expected, the best predictions came from models trained within the same category. For example, children’s titers were better predicted by children data (σ_Actual = 1.7x) than by adult data (σ_Actual = 2.4x; p < 0.05, two-sided permutation test). Studies containing both children and adults represented an intermediate phenotype, which was itself best predicted by studies containing titers from children and adults. Despite most of these age effects being significant, the absolute effect of age was small, where even purposefully mismatching datasets (predicting children→adults or adults→children) led to median error < 2.5x.

Download:

Fig 4. Training on similar datasets marginally improves prediction accuracy.

Cross-study RMSE (σ_Actual) when training and predicting between datasets based on (A) the age groups adult-only, children-only, or mixed (child + adult); (B) vaccination or infection studies; (C) datasets grouped in 5-year intervals based on their median year; or (D) pre-vaccination (Day 0) vs post-vaccination (~1 month) data. Each box plot shows the distribution of errors for all possible withheld variants. The horizontal line denotes the median, boxes show the interquartile range, and whiskers extend to 1.5 times the interquartile range. Circles denote outliers. Statistical significance was assessed using two-sided permutation tests with Benjamini–Hochberg correction for multiple testing. Asterisks denote adjusted p-values: **** = p < 0.0001, *** = p < 0.001, ** = p < 0.01, * = p < 0.05.

https://doi.org/10.1371/journal.pcbi.1014129.g004

We next split datasets by study type (vaccine versus infection). As before, there was a small but significant improvement in prediction accuracy when the same type of dataset was used for training (Fig 4B). For example, predicting from infection→infection studies (σ_Actual = 1.7x) was more accurate than vaccination→infection (σ_Actual = 2.0x; p < 0.05, two-sided permutation test), although predictions in either case were surprisingly accurate, with similar results when predicting vaccination responses.

The worst prediction accuracy was seen when splitting datasets by their year of study and using old datasets to predict responses >10 years into the future (Fig 4C). Studies were binned in five year increments, with studies conducted over multiple years represented by their median year. Training on studies from the same bin either led to the best predictions or to comparable predictions with the best bin (median σ_Actual 1.6–2.1x). Measured accuracy dropped, often significantly, when using older datasets to predict more recent ones. In particular, training the oldest 1996–2000 datasets led to poor predictions and large variation on 2011–2015 (σ_Actual = 3.2x; p < 0.05, two-sided permutation test) or 2016–2020 data (σ_Actual = 2.6x; p < 0.05, two-sided permutation test), although predictions going backwards in time by >10 years tended to be more accurate. Estimated accuracy (1/σ_Predicted²) showed similar behavior, with larger confidence for studies conducted within 10 years of one another (S4 Fig).

Lastly, we examined how accurately pre-vaccination titers predicted the peak post-vaccination titers (21–43 days post-vaccination) across vaccine studies. Surprisingly, we observed nearly identical prediction accuracies (median within ~0.02x of each other; p = 1.0, two-sided permutation test), suggesting that the HAI cross-reactivity across variants holds over time, with most variants increasing in tandem post-vaccination.

Identifying universal relations between influenza variants

To demonstrate how future studies can leverage CAPYBARA to measure a few variants and infer the response from others, we sought universal relations that could be applied to a new study without requiring dataset reweighing through CAPYBARA. To that end, we used RFM to denote which variant features were the most important when predicting each of the 112 variants across these studies (Figs 5A and S10; red represents greater importance).

Download:

Fig 5. A global dictionary of influenza variant importance.

(A) Rainbow diagram of feature importance between any pair of variants (connections are bidirectional). (B) Examples of universal HAI titer equations for multiple influenza vaccine strains, using titers from one variant (when possible) or two variants. Each virus name stands for its log₂(HAI/5) titer. See S1 File for all relations using ≤5 variants. (C) Measured versus predicted HAI titers for all vaccine strains in each study. Predictions were averaged from all other studies that measured the necessary variants. (D) Example using a small subset of five variants to predict ten other vaccine strains.

https://doi.org/10.1371/journal.pcbi.1014129.g005

While variants circulating more than 10 years apart could be important (average feature importance = 0.3), the most important variant pairs tended to circulate less than a decade apart (average feature importance = 0.5, S10 Fig). However, feature importance could only be determined when two viruses were measured in at least two studies, so that the more frequently selected vaccine strains tend to have far better coverage than non-vaccinate strains. For example, the 1968 pandemic strain Hong Kong 1968 was often measured, and it exhibited strong feature importance of ≈1 against viruses circulating as late as Hong Kong 2014.

Each variant-of-interest V₀ in study S₀ is predicted by every other study with at least three overlapping strains, leading to multiple potential distinct HAI relations. While 53.1% of relations required 1–2 variants, 20.3% of equations required 4 or more variants (all relations shown in S1 File). Since vaccine strains were frequently measured, many relations exclusively use these strains (examples in Fig 5B).

To evaluate the accuracy of these relations, each vaccine strain’s HAI titers were individually withheld from a study-of-interest and derived by averaging the relations from all other studies. To make these results as generalizable as possible, predictions were not weighted by their estimated accuracy as in the sections above, but instead averaged equally across all studies. The resulting predictions showed an RMSE of 2.1x, comparable to the ≈ 2-fold error of the HAI assay, provided that each study measured all of the necessary variants to apply these relationships (Fig 5C).

To further expedite future studies, we assessed whether measuring a smaller set of only five influenza variants (comprising four vaccine strains and one non-vaccine strain) could predict ten other vaccine strains (Fig 5D and S1 File). This reduced set of variants only had a slightly larger RMSE of 2.7x, showing that cross-study relationships can increase the amount of data generated by a few experiments, and that prediction accuracy should increase as more datasets are measured, or by applying CAPYBARA to heavily weigh the most accurate studies.

Finally, we determined the key HA sites that underlie influenza cross-reactivity in two different ways. Method #1 identified the smallest number of HA1 amino acids whose edit distance best matches the RFM-derived virus-virus similarity (a.k.a. Minimal Key Residues). Starting from the full HA1, we eliminated one amino acid at a time that led to the greatest similarity (smallest Frobenius norm) between virus edit distance at the remaining sites and RFM importance. This identified a small set of 31 HA1 positions, where the virus-virus edit distance at these sites (S11A Fig) roughly reproduced their RFM-derived feature importance (Fig 5A) with Pearson correlation r = 0.60. In both methods, we found the expected similarity between contemporaneous variants as well as between some variants circulating 10+ years apart.

Method #2 identified the key conserved residues across distant variants necessary for accurate predictions (a.k.a. Conserved Sites in Distant Variants). Thus, we analyzed the 11 pairs of variants circulating 10+ years apart with ≥0.7 RFM importance. Taking the most common conserved sites in >80% of pairs identified 167 amino acids, with this larger number arising because the 11 pairs explored a limited part of sequence space. Across all variants, the edit distance at these sites was at most 6, demonstrating that these are highly conserved sites in HA1 (S11B Fig).

When we compared the residues found from each method, we found that 21 of them overlapped, representing the positions most likely to be important for both variants circulating in similar years as well as 10+ years apart (S11C and S11D Fig). As a further check, 14 residues fell within the canonical H3N2 epitopes A-E and the receptor binding site, highlighting their biological plausibility. 4 other sites lie outside these canonical sites but have been previously reported to impact H3N2 antigenicity, while 3 additional sites outside the canonical epitopes were identified by this analysis [5,32–35]. Taken together, the HAI data suggest that these 21 sites are responsible for both the short-range and long-range similarity between influenza variants.

Discussion

Here, we developed CAPYBARA, a general algorithm that combines feature learning, model generation, and error estimation to predict unmeasured interactions based on existing datasets. As a case study, CAPYBARA was applied to identify universal patterns in serum-virus cross-reactivity and predict each serum’s HAI against variants that were entirely withheld from a study. While factors such as age [7–10,36] or exposure history [1,3,5,6] are known to affect the antibody response, it is unclear how these impact serum cross-reactivity. To that end, CAPYBARA quantifies how accurately the local relationships in one dataset translate into another dataset using all non-withheld data, testing this approach across 25 different influenza studies. A key innovation from previous methods is that this model combines state-of-the-art feature selection [37] and error estimation techniques [38] while leveraging ridge regression for greater interpretability. By using predictions of overlapping variants to estimate prediction error between studies, this approach unbiasedly determined which studies exhibit the same cross-reactivity relations directly from the data. Interestingly, there was always at least one study with accurate predictions, and hence combined predictions trained on all datasets were uniformly accurate with 1.7-2.5x prediction error. The resulting cross-reactivity relations could be partially recouped by a subset of 21 amino acids in the HA head, offering a direct link between model-derived relationships and underlying sequence features.

Subject age had a small but significant effect, suggesting that cross-reactivity changes from childhood (age ≤ 18) into adulthood (age > 18). Children predicted other children’s responses better than adults, while adults predicted other adult responses better than children, with mixed datasets containing both children and adults falling in the middle. The year a study was conducted also had a significant effect, with studies within a 10 year window exhibiting 1.6x-2.4x error while studies done further apart in time had 2.0x-3.2x error. However, studies conducted within 10 years were not always highly predictive, and future studies should explore what other study features lead to similar cross-reactivity relations and better predictive power. Vaccination and infection studies similarly predicted their own category better than the other category. Surprisingly, within vaccine studies, the pre-vaccination (day 0) and peak response (day 21–40) time points predicted one another with comparable accuracy, suggesting that pre- and post-vaccination cross-reactivity resemble one another. This could arise if all variant HAIs increase by a similar amount post-vaccination, or if post-vaccination responses are relatively weak, both of which held true across these datasets and were previously reported [39].

One limitation of this approach is that a variant’s HAI titers can only be predicted in a dataset-of-interest if that variant has been measured in at least one other study. Thus, this method is not equipped to predict the HAI of new variants, although a variant measured in one dataset can be predicted in all other studies. As datasets measuring more variants are added, the number of predictions in each study grows combinatorially.

As such, CAPYBARA lays the foundation to design more efficient experiments that leverage existing studies. It further provides a quantitative foundation to determine the minimum number of variants that should be measured to infer the HAIs from multiple variants of interest. To facilitate such use, we also provide the average cross-reactivity relations between all H3N2 influenza variants examined in this work (S1 File). These relations can be immediately applied to a new study, or they can be further augmented with CAPYBARA that will derive new dataset-specific relations weighed by dataset similarity.

In principle, CAPYBARA could help augment vaccine strain selection by unifying the insights gained from influenza surveillance from around the world. For example, each year the WHO Collaborating Centers measure how human sera from their country inhibit potential vaccine strains from their region, resulting in partially overlapping virus panels that are local to each site. If each site agrees to measure the same three viruses (e.g., the most recent three vaccine strains), they could immediately infer how sera measured at any WHO site would inhibit each variant measured at every other site, thereby quantifying the potential for viral escape across the world.

Methods

Overview of the datasets

We analyzed a collection of 25 influenza vaccine and infection studies spanning 1997–2023 (Table 2). If one participant had multiple sera (e.g., pre-vaccination and post-vaccination), the two were analyzed independently. Predictions were carried out between two datasets if they measured HAI against at least three of the same H3N2 variants, since this ensures that there are enough features for cross-study prediction.

To assess generalizability, CAPYBARA was applied without modification to HAI titers for H1N1, B Victoria, and B Yamagata in these same datasets. Each subtype or lineage was predicted independently, with model fitting, cross-dataset prediction, and error estimation performed identically to the H3N2 analysis.

Analyzing HAI titers

All studies used hemagglutination inhibition, which measures the highest dilution of serum at which hemagglutination is inhibited. A larger HAI titer will reflect a more potent serum, but it may also reflect differences in virus passaging (egg- vs cell-grown) or study design (incubation conditions, type or batch of red blood cells). Missing HAI titers, comprising 2.1% of all measurements, were imputed using the row–column mean since both RFM and ridge regression require complete data. While these imputed values were used for model training, model evaluation was only carried out on the measured values.

As in prior analyses, titers were transformed to log₂(HAI/5), which reduces the bias toward large titers [14,38]. All prediction errors are shown in unlogged units so that they can be compared to the measured HAI titers. More precisely, the root-mean-squared error (σ_Actual) of the logged titers is exponentiated by 2 to get the unlogged error (i.e., σ_Actual = 1.0 for log₂ titers corresponds to an error of σ_Predict = 2^1.0 = 2-fold, with “fold” or “x” indicating an un-logged number). Prior work has shown that the HAI has an inherent 2-fold error on average [22], and hence predictions with ≈2-fold error are as accurate as possible. Batch correction across studies was not applied, since HAI titers are standardized using reference antisera to maintain comparability across experiments.

We tested if including demographic covariates (age, sex, BMI) as additional features in the ridge regression improved prediction accuracy. Sex and BMI led to comparable prediction accuracy. Age led to far worse prediction accuracy when extrapolating from adults→children or children→adults studies, since linear regression is poorly suited to such large extrapolations. However, when combining predictions from all studies, even with age information, CAPYBARA up-weighed the more accurate children→children or adult→adult predictions and retained ≈2-fold prediction error in every study. Although demographic information was removed during the analyses above, these results demonstrate that titers can be robustly predicted even when adversarial information is added, provided that at least some studies yield accurate predictions.

A synthetic noisy dataset was created by randomly sampling HAI titers from the joint distribution of real training studies (2016 UGA_Vac and 2007–2011 Fonv_Inf), preserving the same virus panel and assuming the average number of subjects between the two studies (214 subjects).

Overview of CAPYBARA

We first outline the four main steps of the algorithm and then describe each in detail:

Step 1: Feature Learning (Fig 1B): For each external study S_j that measured the target virus V₀, a Recursive Feature Machine [37] identifies a small subset of variants that best predict V₀.
Step 2: Model Training (Fig 1B): Ridge regression is applied to a subset of sera within S_j, using the selected variants as inputs and V₀ as the output. The internal root-mean-square error σ_Internal (V₀) is computed on the withheld sera.
Step 3: Cross-Study Error Calibration (Fig 1C): To extrapolate the regression relation from S_j (where σ_Internal is known) to the new dataset S₀, CAPYBARA withholds each variant V_k ≠ V₀ and measures how its internal error in S_j maps to its external error σ_External(V_k) in S₀. A piecewise linear function is then fit to the (σ_Internal, σ_External) pairs for all V_k, and this function is applied to σ_Internal(V₀) to estimate the error in S₀, denoted by σ_Predict(V₀).
Step 4: Combined Predictions (Fig 1D): When multiple studies can predict a virus V₀ in S₀, their predictions are combined using Bayesian weighting, i.e., weighting each prediction inversely by its squared predicted error, (1/σ_Predict)². This yields a single predicted HAI titer and a calibrated uncertainty estimate for that titer.

CAPYBARA was designed to balance simplicity, efficiency, and interpretability. RFM and ridge regression ensure that the minimum number of predictive features are used, although each component could be modified independently (e.g., using random forests instead of ridge). The resulting CAPYBARA architecture was chosen after testing a number of different architectures, with the current method optimizing speed and accuracy.

Predictions between two studies are carried out independently of any other studies. Hence, studies can be weighed and combined in a modular fashion, and introducing a new study does not require rerunning the model on prior datasets. A new study’s predictions for a study-of-interest can be directly combined with all prior predictions using Bayesian inverse-variance weighing (as in Step 4).

Step 1: Using Recursive Feature Machines to identify the most predictive features

A Recursive Feature Machine (RFM) is a supervised machine learning model that incorporates feature learning into general non-parametric models through the Average Gradient Outer Product (AGOP) [37]. Unlike prior methods that used brute force (randomly selecting five variants V₁-V₅ to predict a target virus V₀, assessing that selection using cross-validation), RFM gives the feature importance of all variants so that the top candidates can be used to predict V₀. This leads to more efficiently identifying the predictive features, is not restricted to a pre-imposed number of features (e.g., RFM does not always choose five features), and yields better predictions than a random search through a subset of possibilities (S12A Fig). For example, the Fig 3 analysis requires approximately 300 minutes using the brute force approach versus 20 minutes using CAPYBARA’s RFM.

Given any differentiable predictor, f:ℝ^d → ℝ trained on n data points x⁽¹⁾, …, x⁽ⁿ⁾∊ℝ^d, the AGOP operator G(f) is the covariance matrix of the input-output gradients of the predictor over the training data,

(1)

This covariance captures the most predictive directions in its top eigenvectors, and the most important coordinates on its diagonal. RFM proceeded by obtaining an initial estimate of the target function using a standard kernel machine without feature learning. Given this initial estimate of the predictor, the AGOP of the predictor was computed on the training data, after which the inner product function was updated using the AGOP. RFM then recursed this procedure beginning with the transformed data. Formally, the algorithm proceeded as follows.

Algorithm 1: Recursive Feature Machine (RFM)

Inputs:

•Training data:

◦x⁽¹⁾,..., x⁽ⁿ⁾∊ℝ^d: HAI of feature variants V₁-V_d for all n subjects

◦y ∊ ℝⁿ: HAI of withheld virus-of-interest V₀ for each subject

•Mahalanobis Laplace kernel k(x, x’) = exp(-dist_M(x, x′)/σ): Kernel function used to define similarity between samples for kernel ridge regression. dist_M(x, x′)= is the Mahalanobis distance with σ chosen by the median heuristic based on the training data, and positive semidefinite matrix ℝ^d

•T = 5: Number of RFM iterations

•μ = 10^-5: Ridge regularization coefficient, chosen to be small enough to avoid over-regularization yet still stabilize kernel inversion

Steps:

•Initialize feature matrix X₀ = [x⁽¹⁾,..., x⁽ⁿ⁾]^T ∊ ℝ^n×d

•Initialize positive semidefinite matrix M = I_d

•For t = 0 to T-1:

◦Set α = (k_M(X_t, X_t) + μ I)^-1y

◦Define predictor, f^(t)(x) = k_M(x, X_t) α

◦Update feature matrix M to be the AGOP, M = 1/n ∑_j=1ⁿ ▽_xf^(t)(x^(j)) ▽_xf^(t)(x^(j))^T ∊ ℝ^d×d

Output: The diagonal elements of the learned feature transformation matrix (M_T-1) indicate the importance of each variant. Variants with diagonal values >0.1 are chosen as the predictive features for target virus V₀.

Step 2: Model training and internal error

Within each external dataset, ridge regression models were trained on selected features identified by RFM. Note that RFM importance was not considered during ridge regression, since in the case of multiple degenerate but highly important features, only a single feature should be selected. Hyperparameters (ridge regularization strength, kernel bandwidth, diagonal thresholds) were optimized via internal cross-validation (80% training, 20% validation splits), but were found to minimally vary (S12C Fig).

Following ridge regression, each variant feature with ridge coefficient >0.2 (in absolute value) was retained. When deriving universal cross-reactivity relations, if two studies predicted a target virus V₀ using the same variants as features, the ridge coefficients were averaged for each of the viruses in their equation.

Step 3: Cross-study error calibration

Following prior work [38], to calibrate how accurately the model trained on study S_j applied to study S₀, every possible virus V_k ≠ V₀ was withheld one-by-one (in addition to excluding V₀) from both the training and testing datasets. 80% of sera in the training set were used to fit a ridge regression model in the training dataset, with the remaining sera used to compute the internal error σ_Internal (V_k). All sera in the testing dataset were used to compute σ_External (V_k). Performing this for all V_k resulted in multiple points (σ_Internal, σ_External) that mapped the transferability of error between the two studies.

These paired internal–external errors were fit using a total-least-squares (orthogonal-distance) line, σ_External = α σ_Internal + β. To account for the uncertainty of this fit (i.e., highly scattered points with a poor best-fit line are more uncertain), we added to σ_External the root-mean-square vertical distance of each point from the fitted line, δ=[1/m∑_k=1^m(α σ_Internal(V₀) + β - σ_External(V_k))²]^1/2. Lastly, the external error was forced to always be at least as large as the internal error. Altogether, the estimated error when predicting variant V₀ in S₀ is given by

(2)

Step 4: Combining predictions from multiple datasets

When multiple studies S₁, S₂… predicted the HAI titers of virus V₀ in S₀, each subject had predictions μ₁ ± σ₁, μ₂ ± σ₂… (where σ is a shorthand for σ_Predict). Predictions were combined using Bayesian weighting that is inversely proportional to predicted error squared, namely,

(3)

More confident predictions (smaller σ_j) are weighted more heavily, while highly inaccurate predictions (σ_j→∞) have little-to-no influence. As a result, all datasets can be included, and the algorithm will unbiasedly determine the most accurate predictions and use their values more heavily.

Identification of HA1 residues underlying RFM similarity

To identify which HA1 sites best explain the RFM-derived virus-virus similarity, we performed iterative backward-elimination on the aligned HA1 head sequences (Method #1, Minimal Key Residues). At each iteration, every remaining position was temporarily removed, the edit-distance matrix was recomputed, and the position whose removal most decreased the Frobenius distance was eliminated. Iterations proceeded until removal no longer reduced the Frobenius norm (resulting in 28 sites). Robustness was assessed using one additional forward add-back (yielding 32 sites) and then backward pruning (31 sites). All HA1 positions are reported in H3 numbering, including the 16 amino-acid signal peptide.

Software and computational resources

Analyses were implemented in Python using standard scientific libraries (NumPy, SciPy, scikit-learn). Code is available through the accompanying GitHub repository (https://github.com/TalEinav/CAPYBARA).

Supporting information

S1 Fig. Subject age distributions across datasets.

Datasets are ordered chronologically and by study group.

https://doi.org/10.1371/journal.pcbi.1014129.s001

(TIF)

S2 Fig. Prediction accuracy using all studies is consistently comparable to experimental noise.

Every other study in Table 2 is used to predict HAI titers for all variants in the study-of-interest (shown by the plot label).

https://doi.org/10.1371/journal.pcbi.1014129.s002

(TIF)

S3 Fig. Signed prediction error as a function of the measured ground truth titer.

Boxplots show median and interquartile range of prediction residual across all datasets, with sample counts annotated above each bin.

https://doi.org/10.1371/journal.pcbi.1014129.s003

(TIF)

S4 Fig. Prediction accuracy between every pair of studies.

(A) The estimated prediction accuracy for each pair of studies, computed as the mean (1/σ_Predicted)² over all overlapping variants. Larger values indicate that the training dataset will be weighted more heavily in combined-study predictions. (B) Signed prediction error on log titers, log₂(measured HAI/predicted HAI) for all variants in each pair of studies. Red indicates that measured titers were larger than predicted titers on average. Training is either done using all studies (top row) or using a single study (all other rows).

https://doi.org/10.1371/journal.pcbi.1014129.s004

(TIF)

S5 Fig. Distribution of errors for individual and combined predictions.

Fold-error (σ_Actual) of predictions for every subject and virus using (A) each dataset to make a separate prediction and (B) all datasets to make combined predictions. Red shading marks the region of ≤4x error, and the annotated percentages indicate the fraction of predictions that fall within this threshold.

https://doi.org/10.1371/journal.pcbi.1014129.s005

(TIF)

S6 Fig. Combined predictions outperform averaged predictions from individual studies.

Prediction errors (σ_Actual) for all viruses in all infection studies were computed using two other datasets for training. These two datasets either independently predicted each virus, and their resulting predictions were averaged [x-axis] or CAPYBARA was used to combine these predictions by more heavily weighing the dataset that was more similar to the target infection study [y-axis]. Points below the diagonal indicate improved performance with the combined model.

https://doi.org/10.1371/journal.pcbi.1014129.s006

(TIF)

S7 Fig. Predicting HAI responses across other influenza subtypes.

(A) Heatmap of the predicted vs measured RMSE (σ_Actual) across all subjects and overlapping variants for H1N1 (left column), B Victoria (middle column), and B Yamagata (right column). Within each heatmap, training is either done using all studies (top row) or using a single study (all other rows). (B-C) All predicted vs measured HAIs when training on (B) a single study or (C) all other studies. The number N of predictions is larger for pairwise predictions since the same serum-virus pair is predicted multiple times using different training datasets. The diagonal line y = x represents perfect predictions.

https://doi.org/10.1371/journal.pcbi.1014129.s007

(TIF)

S8 Fig. Predictions remain robust with the inclusion of a noisy dataset.

(A) Example predictions from Fig 2A using an individual dataset (left and middle columns) or the combination of both datasets (right column) to predict titers in 2017 UGA_Vac. Labels above each plot identify the training → testing dataset. (B) A study with random data predicted this same testing dataset either individually (left) or in addition with the other two training datasets from Panel A (right).

https://doi.org/10.1371/journal.pcbi.1014129.s008

(TIF)

S9 Fig. Multiplying all HAI titers in one study does not significantly change prediction accuracy.

Predicted versus measured HAI titers for two representative vaccine studies based on all other studies. Titers are predicted in (A) 2010 Fonv_Vac and (B) 2016 Fox_HCW,Vac using the original data (left) or after multiplying all titers in that single study by 4x (right) to demonstrate the effects of one study having systematically higher titers.

https://doi.org/10.1371/journal.pcbi.1014129.s009

(TIF)

S10 Fig. Feature importance via RFM.

The importance of each virus feature (column) when predicting a target virus (V₀, row). Feature importance is quantified within a single study. Only viruses with feature importance≥0.1 shown, as these viruses are subsequently used in ridge regression when predicting the target virus. Any virus not picked is shown in white.

https://doi.org/10.1371/journal.pcbi.1014129.s010

(TIF)

S11 Fig. Identification of HA1 head positions that best reproduce RFM-derived virus-virus similarity.

(A-B) Arc plots connecting each virus to its predictive partners. (A) Arcs colored by raw sequence distance computed from the selected 31 positions. (B) Arcs colored by the sequence distance computed from the 167 positions selected from virus pairs 10+ years apart and with high importance (≥0.7) in 80% of pairs. (C) Example sequence H3N2 A/Perth/16/2009 with canonical epitopes denoted by different colored lines above each position, and key residues found from two methods in this study highlighting the corresponding position numbers. (D) List of the 31 HA1 amino acids leading to the minimum Frobenius norm, annotated by which H3N2 epitope they fall into. Amino acid positions include the 16 amino-acid signal peptide.

https://doi.org/10.1371/journal.pcbi.1014129.s011

(TIF)

S12 Fig. CAPYBARA achieves lower prediction error than brute force approaches and allows for predictive uncertainty estimation.

(A) Comparison of fold-error for pairwise models generated by brute-force selection (running ridge regression on five randomly selected viruses, repeating 50 times to find the best five viruses) versus CAPYBARA (runs RFM a single time to identify the most predictive features and then ridge regression). Each point represents an overlapping virus between each dataset pair. More points lie above the diagonal and the average error is slightly smaller along the x-axis, with both traits indicating better performance with CAPYBARA. (B) Predicted versus actual error across all datasets using CAPYBARA, with each point representing all measurements for one virus in one study. We expect the predicted error to represent an upper bound, worst case error (σ_Actual≲σ_Predict), which is satisfied in the vast majority of cases. (C) Heatmap of mean σ_Actual across all dataset pairs for different hyperparameter settings for the diagonal threshold and bandwidth in RFM, showing nearly comparable prediction accuracy across all parameter choices.

https://doi.org/10.1371/journal.pcbi.1014129.s012

(TIF)

S1 File.

https://doi.org/10.1371/journal.pcbi.1014129.s013

(XLSX)

Acknowledgments

We especially thank the experimental groups who shared their data, and we hope this paper will inspire other groups to integrate their datasets for everyone’s benefit. We always welcome pointers to new datasets. We further acknowledge Adit Radha and Mikhail Belkin for useful discussions.

References

1. Andrews SF, Huang Y, Kaur K, Popova LI, Ho IY, Pauli NT, et al. Immune history profoundly affects broadly protective B cell responses to influenza. Sci Transl Med. 2015;7(316):316ra192. pmid:26631631
- View Article
- PubMed/NCBI
- Google Scholar
2. Gostic KM, Ambrose M, Worobey M, Lloyd-Smith JO. Potent protection against H5N1 and H7N9 influenza via childhood hemagglutinin imprinting. Science. 2016;354(6313):722–6. pmid:27846599
- View Article
- PubMed/NCBI
- Google Scholar
3. Mosterín Höpping A, McElhaney J, Fonville JM, Powers DC, Beyer WEP, Smith DJ. The confounded effects of age and exposure history in response to influenza vaccination. Vaccine. 2016;34(4):540–6. pmid:26667611
- View Article
- PubMed/NCBI
- Google Scholar
4. Vinh DN, Nhat NTD, de Bruin E, Vy NHT, Thao TTN, Phuong HT, et al. Age-seroprevalence curves for the multi-strain structure of influenza A virus. Nat Commun. 2021;12(1):6680. pmid:34795239
- View Article
- PubMed/NCBI
- Google Scholar
5. Fox A, Carolan L, Leung V, Phuong HVM, Khvorov A, Auladell M, et al. Opposing effects of prior infection versus prior vaccination on vaccine immunogenicity against influenza A(H3N2) viruses. Viruses. 2022;14(3):470. pmid:35336877
- View Article
- PubMed/NCBI
- Google Scholar
6. Loes AN, Tarabi RAL, Huddleston J, Touyon L, Wong SS, Cheng SMS, et al. High-throughput sequencing-based neutralization assay reveals how repeated vaccinations impact titers to recent human H1N1 influenza strains. J Virol. 2024;98(10):e0068924. pmid:39315814
- View Article
- PubMed/NCBI
- Google Scholar
7. Lessler J, Riley S, Read JM, Wang S, Zhu H, Smith GJD, et al. Evidence for antigenic seniority in influenza A (H3N2) antibody responses in southern China. PLoS Pathog. 2012;8(7):e1002802. pmid:22829765
- View Article
- PubMed/NCBI
- Google Scholar
8. Henry C, Zheng N-Y, Huang M, Cabanov A, Rojas KT, Kaur K, et al. Influenza virus vaccination elicits poorly adapted B cell responses in elderly individuals. Cell Host Microbe. 2019;25(3):357-366.e6. pmid:30795982
- View Article
- PubMed/NCBI
- Google Scholar
9. Gouma S, Kim K, Weirick ME, Gumina ME, Branche A, Topham DJ, et al. Middle-aged individuals may be in a perpetual state of H3N2 influenza virus susceptibility. Nat Commun. 2020;11(1):4566. pmid:32917903
- View Article
- PubMed/NCBI
- Google Scholar
10. Brouwer AF, Balmaseda A, Gresh L, Patel M, Ojeda S, Schiller AJ, et al. Birth cohort relative to an influenza A virus’s antigenic cluster introduction drives patterns of children’s antibody titers. PLoS Pathog. 2022;18(2):e1010317. pmid:35192673
- View Article
- PubMed/NCBI
- Google Scholar
11. Kim K, Vieira MC, Gouma S, Weirick ME, Hensley SE, Cobey S. Measures of population immunity can predict the dominant clade of influenza A (H3N2) in the 2017-2018 season and reveal age-associated differences in susceptibility and antibody-binding specificity. Influenza Other Respir Viruses. 2024;18(11):e70033. pmid:39501522
- View Article
- PubMed/NCBI
- Google Scholar
12. Xie H, Wan XF, Ye Z, Plant EP, Zhao Y, Xu Y, et al. H3N2 mismatch of 2014–15 Northern Hemisphere influenza vaccines and head-to-head comparison between human and ferret antisera derived antigenic maps. Sci Rep. 2015;5:15279.
- View Article
- Google Scholar
13. Morris DH, Gostic KM, Pompei S, Bedford T, Łuksza M, Neher RA, et al. Predictive modeling of influenza shows the promise of applied evolutionary biology. Trends Microbiol. 2018;26(2):102–18. pmid:29097090
- View Article
- PubMed/NCBI
- Google Scholar
14. Zhao X, Fang VJ, Ohmit SE, Monto AS, Cook AR, Cowling BJ. Quantifying protection against influenza virus infection measured by hemagglutination-inhibition assays in vaccine trials. Epidemiology. 2016;27(1):143–51. pmid:26427723
- View Article
- PubMed/NCBI
- Google Scholar
15. Cowling BJ, Lim WW, Perera RAPM, Fang VJ, Leung GM, Peiris JSM, et al. Influenza hemagglutination-inhibition antibody titer as a mediator of vaccine-induced protection for influenza B. Clin Infect Dis. 2019;68(10):1713–7. pmid:30202873
- View Article
- PubMed/NCBI
- Google Scholar
16. Krammer F. The human antibody response to influenza A virus infection and vaccination. Nat Rev Immunol. 2019;19(6):383–97. pmid:30837674
- View Article
- PubMed/NCBI
- Google Scholar
17. Harvey WT, Benton DJ, Gregory V, Hall JP, Daniels RS, Bedford T, et al. Identification of low- and high-impact hemagglutinin amino acid substitutions that drive antigenic drift of influenza A(H1N1) viruses. PLoS Pathog. 2016;12:e1005526.
- View Article
- Google Scholar
18. Harvey WT, Davies V, Daniels RS, Whittaker L, Gregory V, Hay AJ, et al. A Bayesian approach to incorporate structural data into the mapping of genotype to antigenic phenotype of influenza A(H3N2) viruses. PLoS Comput Biol. 2023;19(3):e1010885. pmid:36972311
- View Article
- PubMed/NCBI
- Google Scholar
19. Lee J, Hadfield J, Black A, Sibley TR, Neher RA, Bedford T, et al. Joint visualization of seasonal influenza serology and phylogeny to inform vaccine composition. Front Bioinform. 2023;3:1069487. pmid:37035035
- View Article
- PubMed/NCBI
- Google Scholar
20. Kucharski AJ, Lessler J, Read JM, Zhu H, Jiang CQ, Guan Y, et al. Estimating the life course of influenza A(H3N2) antibody responses from cross-sectional data. PLoS Biol. 2015;13(3):e1002082. pmid:25734701
- View Article
- PubMed/NCBI
- Google Scholar
21. Kucharski AJ, Lessler J, Cummings DAT, Riley S. Timescales of influenza A/H3N2 antibody dynamics. PLoS Biol. 2018;16(8):e2004974. pmid:30125272
- View Article
- PubMed/NCBI
- Google Scholar
22. Stacey H, Carlock MA, Allen JD, Hanley HB, Crotty S, Ross TM, et al. Leveraging pre-vaccination antibody titres across multiple influenza H3N2 variants to forecast the post-vaccination response. EBioMedicine. 2025;116:105744. pmid:40424667
- View Article
- PubMed/NCBI
- Google Scholar
23. Lapedes A, Farber R. The geometry of shape space: application to influenza. J Theor Biol. 2001;212(1):57–69. pmid:11527445
- View Article
- PubMed/NCBI
- Google Scholar
24. Smith DJ, Lapedes AS, de Jong JC, Bestebroer TM, Rimmelzwaan GF, Osterhaus ADME, et al. Mapping the antigenic and genetic evolution of influenza virus. Science. 2004;305(5682):371–6. pmid:15218094
- View Article
- PubMed/NCBI
- Google Scholar
25. Anderson CS, McCall PR, Stern HA, Yang H, Topham DJ. Antigenic cartography of H1N1 influenza viruses using sequence-based antigenic distance calculation. BMC Bioinform. 2018;19(1):51. pmid:29433425
- View Article
- PubMed/NCBI
- Google Scholar
26. Einav T, Cleary B. Extrapolating missing antibody-virus measurements across serological studies. Cell Syst. 2022;13(7):561-573.e5. pmid:35798005
- View Article
- PubMed/NCBI
- Google Scholar
27. Fonville JM, Wilks SH, James SL, Fox A, Ventresca M, Aban M, et al. Antibody landscapes after influenza virus infection or vaccination. Science. 2014;346(6212):996–1000. pmid:25414313
- View Article
- PubMed/NCBI
- Google Scholar
28. Ertesvåg NU, Cox RJ, Lartey SL, Mohn KG-I, Brokstad KA, Trieu M-C. Seasonal influenza vaccination expands hemagglutinin-specific antibody breadth to older and future A/H3N2 viruses. NPJ Vaccines. 2022;7(1):67. pmid:35750781
- View Article
- PubMed/NCBI
- Google Scholar
29. Hinojosa M, Shepard SS, Chung JR, King JP, McLean HQ, Flannery B, et al. Impact of immune priming, vaccination, and infection on influenza A(H3N2) antibody landscapes in children. J Infect Dis. 2021;224(3):469–80. pmid:33090202
- View Article
- PubMed/NCBI
- Google Scholar
30. Carlock MA, Allen JD, Hanley HB, Ross TM. Longitudinal assessment of human antibody binding to hemagglutinin elicited by split-inactivated influenza vaccination over six consecutive seasons. PLoS One. 2024;19(6):e0301157. pmid:38917104
- View Article
- PubMed/NCBI
- Google Scholar
31. Hay JA, Zhu H, Jiang CQ, Kwok KO, Shen R, Kucharski A, et al. Reconstructed influenza A/H3N2 infection histories reveal variation in incidence and antibody dynamics over the life course. PLoS Biol. 2024;22(11):e3002864. pmid:39509444
- View Article
- PubMed/NCBI
- Google Scholar
32. Wolf YI, Viboud C, Holmes EC, Koonin EV, Lipman DJ. Long intervals of stasis punctuated by bursts of positive selection in the seasonal evolution of influenza A virus. Biol Direct. 2006;1:34. pmid:17067369
- View Article
- PubMed/NCBI
- Google Scholar
33. Ndifon W, Dushoff J, Levin SA. On the use of hemagglutination-inhibition for influenza surveillance: surveillance data are predictive of influenza vaccine effectiveness. Vaccine. 2009;27(18):2447–52. pmid:19368786
- View Article
- PubMed/NCBI
- Google Scholar
34. Koel BF, Burke DF, Bestebroer TM, van der Vliet S, Zondag GCM, Vervaet G, et al. Substitutions near the receptor binding site determine major antigenic change during influenza virus evolution. Science. 2013;342(6161):976–9. pmid:24264991
- View Article
- PubMed/NCBI
- Google Scholar
35. Shih AC-C, Hsiao T-C, Ho M-S, Li W-H. Simultaneous amino acid substitutions at antigenic sites drive influenza A hemagglutinin evolution. Proc Natl Acad Sci U S A. 2007;104(15):6283–8. pmid:17395716
- View Article
- PubMed/NCBI
- Google Scholar
36. Welsh FC, Eguia RT, Lee JM, Haddox HK, Galloway J, Van Vinh Chau N, et al. Age-dependent heterogeneity in the antigenic effects of mutations to influenza hemagglutinin. Cell Host Microbe. 2024;32(8):1397-1411.e11. pmid:39032493
- View Article
- PubMed/NCBI
- Google Scholar
37. Radhakrishnan A, Beaglehole D, Pandit P, Belkin M. Mechanism for feature learning in neural networks and backpropagation-free machine learning models. Science. 2024;383(6690):1461–7. pmid:38452048
- View Article
- PubMed/NCBI
- Google Scholar
38. Einav T, Ma R. Using interpretable machine learning to extend heterogeneous antibody-virus datasets. Cell Rep Methods. 2023;3(8):100540. pmid:37671020
- View Article
- PubMed/NCBI
- Google Scholar
39. Lane A, Quach HQ, Ovsyannikova IG, Kennedy RB, Ross TM, Einav T. Characterizing the short- and long-term temporal dynamics of antibody responses to influenza vaccination. medRxiv [Preprint]. 2025. pmid:40061340
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Andrews SF, Huang Y, Kaur K, Popova LI, Ho IY, Pauli NT, et al. Immune history profoundly affects broadly protective B cell responses to influenza. Sci Transl Med. 2015;7(316):316ra192. pmid:26631631
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Gostic KM, Ambrose M, Worobey M, Lloyd-Smith JO. Potent protection against H5N1 and H7N9 influenza via childhood hemagglutinin imprinting. Science. 2016;354(6313):722–6. pmid:27846599
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Mosterín Höpping A, McElhaney J, Fonville JM, Powers DC, Beyer WEP, Smith DJ. The confounded effects of age and exposure history in response to influenza vaccination. Vaccine. 2016;34(4):540–6. pmid:26667611
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Vinh DN, Nhat NTD, de Bruin E, Vy NHT, Thao TTN, Phuong HT, et al. Age-seroprevalence curves for the multi-strain structure of influenza A virus. Nat Commun. 2021;12(1):6680. pmid:34795239
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Fox A, Carolan L, Leung V, Phuong HVM, Khvorov A, Auladell M, et al. Opposing effects of prior infection versus prior vaccination on vaccine immunogenicity against influenza A(H3N2) viruses. Viruses. 2022;14(3):470. pmid:35336877
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Loes AN, Tarabi RAL, Huddleston J, Touyon L, Wong SS, Cheng SMS, et al. High-throughput sequencing-based neutralization assay reveals how repeated vaccinations impact titers to recent human H1N1 influenza strains. J Virol. 2024;98(10):e0068924. pmid:39315814
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Lessler J, Riley S, Read JM, Wang S, Zhu H, Smith GJD, et al. Evidence for antigenic seniority in influenza A (H3N2) antibody responses in southern China. PLoS Pathog. 2012;8(7):e1002802. pmid:22829765
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Henry C, Zheng N-Y, Huang M, Cabanov A, Rojas KT, Kaur K, et al. Influenza virus vaccination elicits poorly adapted B cell responses in elderly individuals. Cell Host Microbe. 2019;25(3):357-366.e6. pmid:30795982
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Gouma S, Kim K, Weirick ME, Gumina ME, Branche A, Topham DJ, et al. Middle-aged individuals may be in a perpetual state of H3N2 influenza virus susceptibility. Nat Commun. 2020;11(1):4566. pmid:32917903
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref10] 10. Brouwer AF, Balmaseda A, Gresh L, Patel M, Ojeda S, Schiller AJ, et al. Birth cohort relative to an influenza A virus’s antigenic cluster introduction drives patterns of children’s antibody titers. PLoS Pathog. 2022;18(2):e1010317. pmid:35192673
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref11] 11. Kim K, Vieira MC, Gouma S, Weirick ME, Hensley SE, Cobey S. Measures of population immunity can predict the dominant clade of influenza A (H3N2) in the 2017-2018 season and reveal age-associated differences in susceptibility and antibody-binding specificity. Influenza Other Respir Viruses. 2024;18(11):e70033. pmid:39501522
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref12] 12. Xie H, Wan XF, Ye Z, Plant EP, Zhao Y, Xu Y, et al. H3N2 mismatch of 2014–15 Northern Hemisphere influenza vaccines and head-to-head comparison between human and ferret antisera derived antigenic maps. Sci Rep. 2015;5:15279.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref13] 13. Morris DH, Gostic KM, Pompei S, Bedford T, Łuksza M, Neher RA, et al. Predictive modeling of influenza shows the promise of applied evolutionary biology. Trends Microbiol. 2018;26(2):102–18. pmid:29097090
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref14] 14. Zhao X, Fang VJ, Ohmit SE, Monto AS, Cook AR, Cowling BJ. Quantifying protection against influenza virus infection measured by hemagglutination-inhibition assays in vaccine trials. Epidemiology. 2016;27(1):143–51. pmid:26427723
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref15] 15. Cowling BJ, Lim WW, Perera RAPM, Fang VJ, Leung GM, Peiris JSM, et al. Influenza hemagglutination-inhibition antibody titer as a mediator of vaccine-induced protection for influenza B. Clin Infect Dis. 2019;68(10):1713–7. pmid:30202873
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref16] 16. Krammer F. The human antibody response to influenza A virus infection and vaccination. Nat Rev Immunol. 2019;19(6):383–97. pmid:30837674
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref17] 17. Harvey WT, Benton DJ, Gregory V, Hall JP, Daniels RS, Bedford T, et al. Identification of low- and high-impact hemagglutinin amino acid substitutions that drive antigenic drift of influenza A(H1N1) viruses. PLoS Pathog. 2016;12:e1005526.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref18] 18. Harvey WT, Davies V, Daniels RS, Whittaker L, Gregory V, Hay AJ, et al. A Bayesian approach to incorporate structural data into the mapping of genotype to antigenic phenotype of influenza A(H3N2) viruses. PLoS Comput Biol. 2023;19(3):e1010885. pmid:36972311
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref19] 19. Lee J, Hadfield J, Black A, Sibley TR, Neher RA, Bedford T, et al. Joint visualization of seasonal influenza serology and phylogeny to inform vaccine composition. Front Bioinform. 2023;3:1069487. pmid:37035035
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref20] 20. Kucharski AJ, Lessler J, Read JM, Zhu H, Jiang CQ, Guan Y, et al. Estimating the life course of influenza A(H3N2) antibody responses from cross-sectional data. PLoS Biol. 2015;13(3):e1002082. pmid:25734701
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref21] 21. Kucharski AJ, Lessler J, Cummings DAT, Riley S. Timescales of influenza A/H3N2 antibody dynamics. PLoS Biol. 2018;16(8):e2004974. pmid:30125272
View Article
PubMed/NCBI
Google Scholar

[80] View Article

[81] PubMed/NCBI

[82] Google Scholar

[ref22] 22. Stacey H, Carlock MA, Allen JD, Hanley HB, Crotty S, Ross TM, et al. Leveraging pre-vaccination antibody titres across multiple influenza H3N2 variants to forecast the post-vaccination response. EBioMedicine. 2025;116:105744. pmid:40424667
View Article
PubMed/NCBI
Google Scholar

[84] View Article

[85] PubMed/NCBI

[86] Google Scholar

[ref23] 23. Lapedes A, Farber R. The geometry of shape space: application to influenza. J Theor Biol. 2001;212(1):57–69. pmid:11527445
View Article
PubMed/NCBI
Google Scholar

[88] View Article

[89] PubMed/NCBI

[90] Google Scholar

[ref24] 24. Smith DJ, Lapedes AS, de Jong JC, Bestebroer TM, Rimmelzwaan GF, Osterhaus ADME, et al. Mapping the antigenic and genetic evolution of influenza virus. Science. 2004;305(5682):371–6. pmid:15218094
View Article
PubMed/NCBI
Google Scholar

[92] View Article

[93] PubMed/NCBI

[94] Google Scholar

[ref25] 25. Anderson CS, McCall PR, Stern HA, Yang H, Topham DJ. Antigenic cartography of H1N1 influenza viruses using sequence-based antigenic distance calculation. BMC Bioinform. 2018;19(1):51. pmid:29433425
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref26] 26. Einav T, Cleary B. Extrapolating missing antibody-virus measurements across serological studies. Cell Syst. 2022;13(7):561-573.e5. pmid:35798005
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref27] 27. Fonville JM, Wilks SH, James SL, Fox A, Ventresca M, Aban M, et al. Antibody landscapes after influenza virus infection or vaccination. Science. 2014;346(6212):996–1000. pmid:25414313
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref28] 28. Ertesvåg NU, Cox RJ, Lartey SL, Mohn KG-I, Brokstad KA, Trieu M-C. Seasonal influenza vaccination expands hemagglutinin-specific antibody breadth to older and future A/H3N2 viruses. NPJ Vaccines. 2022;7(1):67. pmid:35750781
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref29] 29. Hinojosa M, Shepard SS, Chung JR, King JP, McLean HQ, Flannery B, et al. Impact of immune priming, vaccination, and infection on influenza A(H3N2) antibody landscapes in children. J Infect Dis. 2021;224(3):469–80. pmid:33090202
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

[ref30] 30. Carlock MA, Allen JD, Hanley HB, Ross TM. Longitudinal assessment of human antibody binding to hemagglutinin elicited by split-inactivated influenza vaccination over six consecutive seasons. PLoS One. 2024;19(6):e0301157. pmid:38917104
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref31] 31. Hay JA, Zhu H, Jiang CQ, Kwok KO, Shen R, Kucharski A, et al. Reconstructed influenza A/H3N2 infection histories reveal variation in incidence and antibody dynamics over the life course. PLoS Biol. 2024;22(11):e3002864. pmid:39509444
View Article
PubMed/NCBI
Google Scholar

[120] View Article

[121] PubMed/NCBI

[122] Google Scholar

[ref32] 32. Wolf YI, Viboud C, Holmes EC, Koonin EV, Lipman DJ. Long intervals of stasis punctuated by bursts of positive selection in the seasonal evolution of influenza A virus. Biol Direct. 2006;1:34. pmid:17067369
View Article
PubMed/NCBI
Google Scholar

[124] View Article

[125] PubMed/NCBI

[126] Google Scholar

[ref33] 33. Ndifon W, Dushoff J, Levin SA. On the use of hemagglutination-inhibition for influenza surveillance: surveillance data are predictive of influenza vaccine effectiveness. Vaccine. 2009;27(18):2447–52. pmid:19368786
View Article
PubMed/NCBI
Google Scholar

[128] View Article

[129] PubMed/NCBI

[130] Google Scholar

[ref34] 34. Koel BF, Burke DF, Bestebroer TM, van der Vliet S, Zondag GCM, Vervaet G, et al. Substitutions near the receptor binding site determine major antigenic change during influenza virus evolution. Science. 2013;342(6161):976–9. pmid:24264991
View Article
PubMed/NCBI
Google Scholar

[132] View Article

[133] PubMed/NCBI

[134] Google Scholar

[ref35] 35. Shih AC-C, Hsiao T-C, Ho M-S, Li W-H. Simultaneous amino acid substitutions at antigenic sites drive influenza A hemagglutinin evolution. Proc Natl Acad Sci U S A. 2007;104(15):6283–8. pmid:17395716
View Article
PubMed/NCBI
Google Scholar

[136] View Article

[137] PubMed/NCBI

[138] Google Scholar

[ref36] 36. Welsh FC, Eguia RT, Lee JM, Haddox HK, Galloway J, Van Vinh Chau N, et al. Age-dependent heterogeneity in the antigenic effects of mutations to influenza hemagglutinin. Cell Host Microbe. 2024;32(8):1397-1411.e11. pmid:39032493
View Article
PubMed/NCBI
Google Scholar

[140] View Article

[141] PubMed/NCBI

[142] Google Scholar

[ref37] 37. Radhakrishnan A, Beaglehole D, Pandit P, Belkin M. Mechanism for feature learning in neural networks and backpropagation-free machine learning models. Science. 2024;383(6690):1461–7. pmid:38452048
View Article
PubMed/NCBI
Google Scholar

[144] View Article

[145] PubMed/NCBI

[146] Google Scholar

[ref38] 38. Einav T, Ma R. Using interpretable machine learning to extend heterogeneous antibody-virus datasets. Cell Rep Methods. 2023;3(8):100540. pmid:37671020
View Article
PubMed/NCBI
Google Scholar

[148] View Article

[149] PubMed/NCBI

[150] Google Scholar

[ref39] 39. Lane A, Quach HQ, Ovsyannikova IG, Kennedy RB, Ross TM, Einav T. Characterizing the short- and long-term temporal dynamics of antibody responses to influenza vaccination. medRxiv [Preprint]. 2025. pmid:40061340
View Article
PubMed/NCBI
Google Scholar

[152] View Article

[153] PubMed/NCBI

[154] Google Scholar

Figures

Abstract

Author summary

Introduction

Results

Overview of the algorithm

Antibody responses are predicted between infection and vaccination studies within experimental noise

More training datasets lead to better prediction accuracy across 25 years of data

Subsetting datasets helps explain prediction dynamics

Identifying universal relations between influenza variants

Discussion

Methods

Overview of the datasets

Analyzing HAI titers

Overview of CAPYBARA

Step 1: Using Recursive Feature Machines to identify the most predictive features

Step 2: Model training and internal error

Step 3: Cross-study error calibration

Step 4: Combining predictions from multiple datasets

Identification of HA1 residues underlying RFM similarity

Software and computational resources

Supporting information

S1 Fig. Subject age distributions across datasets.

S2 Fig. Prediction accuracy using all studies is consistently comparable to experimental noise.

S3 Fig. Signed prediction error as a function of the measured ground truth titer.

S4 Fig. Prediction accuracy between every pair of studies.

S5 Fig. Distribution of errors for individual and combined predictions.

S6 Fig. Combined predictions outperform averaged predictions from individual studies.

S7 Fig. Predicting HAI responses across other influenza subtypes.

S8 Fig. Predictions remain robust with the inclusion of a noisy dataset.

S9 Fig. Multiplying all HAI titers in one study does not significantly change prediction accuracy.

S10 Fig. Feature importance via RFM.

S11 Fig. Identification of HA1 head positions that best reproduce RFM-derived virus-virus similarity.

S12 Fig. CAPYBARA achieves lower prediction error than brute force approaches and allows for predictive uncertainty estimation.

S1 File.

Acknowledgments

References