CAPYBARA: A generalizable framework for predicting serological measurements across human cohorts

Sierra Orsinelli-Rivers; Daniel Beaglehole; Tal Einav

doi:10.1371/journal.pcbi.1014129

Peer Review History

Original SubmissionJune 16, 2025
23 Sep 2025 Decision Letter - Tyler Cassidy, Editor CAPYBARA: A Generalizable Framework for Predicting Serological Measurements Across Human Cohorts PLOS Computational Biology Dear Dr. Einav, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 60 days Nov 23 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter We look forward to receiving your revised manuscript. Kind regards, Tyler Cassidy Academic Editor PLOS Computational Biology James Faeder Section Editor PLOS Computational Biology Additional Editor Comments (if provided): In general, the reviewers appreciated this manuscript. However, they raised important concerns that must be addressed before publication (specifically, both major points from Reviewer 3 and the final two points from Reviewer 2). Further, as mentioned by reviewer 1 and 4, the manuscript may benefit from a discussion of how this approach may apply to vaccine design. Journal Requirements: If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. 1) Please ensure that the CRediT author contributions listed for every co-author are completed accurately and in full. At this stage, the following Authors/Authors require contributions: Daniel Beaglehole, Sierra Orsinelli-Rivers, and Tal Einav. Please ensure that the full contributions of each author are acknowledged in the "Add/Edit/Remove Authors" section of our submission form. The list of CRediT author contributions may be found here: https://journals.plos.org/ploscompbiol/s/authorship#loc-author-contributions 2) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019. 3) Please provide an Author Summary. This should appear in your manuscript between the Abstract (if applicable) and the Introduction, and should be 150-200 words long. The aim should be to make your findings accessible to a wide audience that includes both scientists and non-scientists. Sample summaries can be found on our website under Submission Guidelines: https://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-parts-of-a-submission 4) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines: https://journals.plos.org/ploscompbiol/s/figures 5) We notice that your supplementary Figures are included in the manuscript file. Please remove them and upload them with the file type 'Supporting Information'. Please ensure that each Supporting Information file has a legend listed in the manuscript after the references list. 6) Please amend your detailed Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published. 1) State the initials, alongside each funding source, of each author to receive each grant. For example: "This work was supported by the National Institutes of Health (####### to AM; ###### to CJ) and the National Science Foundation (###### to AM)." 2) State what role the funders took in the study. If the funders had no role in your study, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." 3) If any authors received a salary from any of your funders, please state which authors and which funders.. If you did not receive any funding for this study, please simply state: u201cThe authors received no specific funding for this work.u201d 7) Please send a completed 'Competing Interests' statement, including any COIs declared by your co-authors. If you have no competing interests to declare, please state "The authors have declared that no competing interests exist". Otherwise please declare all competing interests beginning with the statement "I have read the journal's policy and the authors of this manuscript have the following competing interests" Reviewers' comments: Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: CAPYBARA: A Generalizable Framework for Predicting Serological Measurements Across Human Cohorts This manuscript develops an algorithm to predict HAI titers to a large number of variants using serological measures of only a few variants, by exploiting cross-reactivity information from historical data sets. A significant limitation is that the variant of interest must already exist in the historical data set, and the need to test at least some variants in the current study — unlike ab initio models based on viral sequence like in Loes et al (Ref #6). The CAPYBARA algorithm could potentially be useful in reducing cost and labor, and for understanding the landscape of cross-reactivity, but the latter is not fully explored. Specific commons and questions: • The algorithm is rather ad-hoc in nature, combing approaches from ML (RFM), frequentist statistics (ridge regression), and Bayesian statistics (study weighting). Depending on the reader’s preference for methodological consistency, this is either a trivial or serious issue. • I could not get the GitHub notebook to work directly — for example, the torchmetrics package is not specified in requirments.txt so the import statement failed. Even after fixing that, other import errors arose. This was done in a virtual environment with python 3.10 as suggested, so environmental conflicts are not likely to be the issue. • It does not appear that batch correction or “normalization” methods were applied to the data sets from different studies. Can you explain why this is not feasible or necessary or useful? • The term “feature” is ubiquitous but not very well-defined. It appears to refer to the HAI titer of a variant, but this is easily missed by the reader. Can you provide a short description of the features used to help the reader? • Since other features like “age” and “year of study” influence prediction accuracy, why not include these as additional features in CAPYBARA? • Can new vaccine/infection studies with HAI data be easily added to CAPYBARA using Bayesian updating, or must the algorithm be re-trained? • Can you provide a clearer explanation of what CAPYBARA revealed about the cross-reactivity landscape? For example, does it provide useful guidance for designing the next seasonal or universal influenza vaccine? Reviewer #2: Please see attached comments Reviewer #3: The manuscript by Orsinelli-Rivers describes the development of an analytical workflow for modeling and mapping the relationships among different Influenza virus serological datasets. I commend the authors for tackling this important problem and for providing an initial framework to infer influenza virus cross-reactivity from existing data. However, an important issue not addressed in the study is the accuracy of the model’s predictive power in relation to antigenic changes. Notably, one of the main conclusions is that predictions are accurate when datasets are derived from studies conducted less than 5 years apart. This inherently suggests that serological assays performed on antigenically similar viruses, typically those within a 5-year window, are more likely to yield more accurate cross-reactivity predictions. Another important point not addressed in the study is the generalizability of this analytical workflow, despite this being emphasized in both the abstract and the title of the manuscript. These and other issues included below, should be addressed before the manuscript can be considered for further evaluation. Major comments: 1. While the analytical workflow is useful for predicting serological cross-reactivity among H3N2 influenza viruses, it has a major limitation regarding the biological interpretation of antigenic differences between the viruses assessed. As shown in Figure 4, the accuracy of predictions was influenced by the time elapsed between the datasets used for analysis and the variants being evaluated—viruses within a five-year window were more likely to yield accurate cross-reactivity predictions. This likely reflects lower antigenic divergence within such a timeframe; in other words, H3 hemagglutinins (HAs) within five years of each other are generally more antigenically similar, with this similarity declining beyond that period. This trend is clearly visible in Figure 5 and Supplementary Figure S5. However, there are notable exceptions—some viruses that are many years apart still produced accurate predictions. This discrepancy is not addressed in the study. It would be helpful for the authors to explore this further, possibly by analyzing HA sequences to identify amino acid residues that may underlie cross-reactivity between antigenically distant viruses. An amino acid sequence alignment could provide insights into this issue. While it is understandable that the primary focus of the manuscript is the development of the analytical workflow, the biological interpretation of the results should not be overlooked. Including such analysis in future iterations of the tool could improve its utility and generalizability. This is particularly important given the history of antigenic shifts in influenza viruses—for example, the 2009 H1N1 pandemic virus exhibited close antigenic similarity to strains that circulated prior to the 1968 H3N2 pandemic, highlighting the complexity of antigenic evolution across time. 2. While the authors mention that this analytical tool is generalizable, they did not provide evidence that the workflow works effectively for other influenza subtypes (e.g. H1N1) or other viruses. Therefore, it would be more appropriate to avoid making this assumption in the manuscript. I would also recommend a slight revision of the tittle to better reflect the specific scope and finding of the study: “CAPYBARA: A Framework for Predicting H3N1 influenza Serological Measurements Across Human Cohorts”. Minor Comments: 1. In the introduction, the authors should indicate that the study was performed only with H3N2 viruses. 2. The second paragraph of the introduction indicates “While thousands of new variants (or strains) emerge each year, …”. Please provide a reference for this statement. 3. The fifth paragraph from the introduction, which starts with: “A key innovation…” provides discussion points of the workflow, and therefore this should be moved to the discussion section. 4. I am a bit confused by the last sentence of the introduction, as the authors did not specifically quantified differences among study populations, experimental conditions, and virus panels. Did they mean to say that, despite differences among study populations, experimental conditions, and virus panels, the analyses workflow allows to estimate cross-reactivity among different Influenza virus serological datasets? I suggest revising this sentence. Also edit the beginning of the sentence to read: “By predicting these interactions for H3N2 viruses, …” 5. In the second paragraph, line 2 of the discussion, did the authors meant to say “…cross-reactivity across influenza variants.”? 6. In the fourth paragraph of the discussion, did the authors meant to say “…didn’t require”? Reviewer #4: I find this work interesting and I think the manuscript is for the most part well-written. There is a sentence, page 8 of the manuscript, "Prediction accuracy can only decrease..." Please clarify this statement, it is vague saying adding more datasets when the number of datasets is not restricted. It lists studies {S1, S2,...} Perhaps this is intentionally left vague because we cannot easily say an exact number of datasets to stop at, however the statement I believe can be made more clear. Later, in the paper there is again mention of very noisy datasets having little impact and it also states all datasets can be included, which seems to go against the statement where adding more datasets will decrease accuracy. Hence, some clarification of these statements would be beneficial to the reader. On page 16 of the manuscript, "The algorithm did require..." It seems you meant to write didn't or did not. I would be interested in seeing this methodology utilized in other studies for vaccine development. I think with some minor writing revisions this is an acceptable paper. ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: None Reviewer #4: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our For information about this choice, including consent withdrawal, please see our For information about this choice, including consent withdrawal, please see our For information about this choice, including consent withdrawal, please see our Privacy Policy.... Reviewer #1: No Reviewer #2: No Reviewer #3: No Reviewer #4: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: After uploading your figures to PLOS’s NAAS tool - https://ngplosjournals.pagemajik.ai/artanalysis, NAAS will process the files provided and display the results in the "Uploaded Files" section of the page as the processing is complete. If the uploaded figures meet our requirements (or NAAS is able to fix the files to meet our requirements), the figure will be marked as "fixed" above. If NAAS is unable to fix the files, a red "failed" label will appear above. When NAAS has confirmed that the figure files meet our requirements, please download the file via the download option, and include these NAAS processed figure files when submitting your revised manuscript. Reproducibility:** To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Attachments Attachment Submitted filename: PLOS_CB_Review___CAPYBARA.pdf https://doi.org/10.1371/journal.pcbi.1014129.r001
Revision 1
24 Nov 2025 Author Response Attachments Attachment Submitted filename: Response to Reviewers.docx https://doi.org/10.1371/journal.pcbi.1014129.r002
16 Mar 2026 Decision Letter - Tyler Cassidy, Editor Dear Dr. Einav, We are pleased to inform you that your manuscript 'CAPYBARA: A Generalizable Framework for Predicting Serological Measurements Across Human Cohorts' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Tyler Cassidy Academic Editor PLOS Computational Biology James Faeder Section Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: I have reviewed the revised manuscript. The authors have done a thorough job addressing reviewer comments and the GitHub code is now executable. I am happy to recommend Acceptance of the manuscript. Reviewer #2: Please see my attachment. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our For information about this choice, including consent withdrawal, please see our For information about this choice, including consent withdrawal, please see our For information about this choice, including consent withdrawal, please see our Privacy Policy.... Reviewer #1: No Reviewer #2: No Attachments Attachment Submitted filename: PLOS_CB_Review___CAPYBARA_2.pdf https://doi.org/10.1371/journal.pcbi.1014129.r003
Formally Accepted
Acceptance Letter - Tyler Cassidy, Editor PCOMPBIOL-D-25-01204R1 CAPYBARA: A Generalizable Framework for Predicting Serological Measurements Across Human Cohorts Dear Dr Einav, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. For Research, Software, and Methods articles, you will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Anita Estes PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1014129.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .