Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data

Katharina Waury; Stefan Lelieveld; Sanne Abeln; Henk-Jan van den Ham

doi:10.1371/journal.pcbi.1013057

Peer Review History

Original SubmissionJune 25, 2024
25 Jun 2024 Author Response https://doi.org/10.1371/journal.pcbi.1013057.r001
12 Nov 2024 Decision Letter - Claude Loverdo, Editor PCOMPBIOL-D-24-01065Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing dataPLOS Computational Biology Dear Dr. van den Ham, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 60 days Jan 12 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below.* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. We look forward to receiving your revised manuscript. Kind regards, Claude Loverdo, Ph.D.Academic EditorPLOS Computational Biology Amber SmithSection EditorPLOS Computational Biology Feilim Mac GabhannEditor-in-ChiefPLOS Computational Biology Jason PapinEditor-in-ChiefPLOS Computational Biology Journal Requirements: Additional Editor Comments (if provided): While all the reviewers had positive things to say about the manuscript, reviewer 3 raised substantial points that should be addressed. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: In this paper Waury et al. investigate the utility of antibody structural clustering methods, specifically SAAB+ and SPACE2, for their ability to group sequences that bind to the same epitope on a given antigen amidst a diverse simulated repertoire. They compare these two structural clustering methods to the traditional method of ‘clonotyping’ which relies on the CDRH3 sequence identity and V/J genes to group similar sequences. Structural methods instead group by shape of the paratope region and therefore can identify relationships which may not be evident from sequence alone. SAAB+ relies on homology modelling whereas SPACE2 clusters length matched CDRs. The ability of these methods to group pairs of antibodies which bind the same epitope amidst a large artificial repertoire was analysed. No method was able to correctly group most of epitope-specific pairs antibody pairs. While clonotyping outperformed both structural methods, the authors demonstrated that structural clustering did group some pairs with highly dissimilar CDRH3 sequences. The limitations of each method are also assessed in detail. Of particular interest were observations the CDR length matching applied by SPACE2 is very restrictive, as well as the ability to increase cluster size (through lower stringency) without overly compromising specificity. Overall, this paper is well thought out, figures are nicely presented, and the text is clearly written. I have only minor comments. Minor comments 1. Line 199-201: Although the authors have tried to explain how pairs can be found in the clonotyping method that exceed the identity threshold, it is still not fully clear (assume they mean that within a cluster the overall distance might be greater, but distance from the cluster centroid is still below the threshold). Please spell this out exactly. 2. Line 435-437: In generating the artificial dataset. Does it matter that the authors have not used lambda chains and only focused on kappa? Please explain why? 3. Figure 2: Please state the statistical test used in the legend. 4. Line 116: Typo – should say clusters (plural). Reviewer #2: The authors curated a dataset of antibody pairs confirmed to bind to the same epitope; they then placed these into a simulated naïve repertoire, clustered the simulated repertoire via 3 algorithms and looked at the sensitivity and specificity of these methods in the context of a naïve human background. In contrast to previous findings, they found that the sequence-based method had the highest sensitivity, and was able to identify a larger number of sequence-dissimilar/same-epitope pairs. The methodology is more precise and realistic than previous benchmarks, the claims are well-substantiated, and the dataset could be widely-used. Minor - Line 26 – DOI in text. - Line 93 - Capitalize Immune Epitope Database - Please cite the original SPACE paper. - Line 58-59 - SPACE2 can work with any numbered antibody structure (predicted with ImmuneBuilder or otherwise), so this is not entirely accurate (although indeed this is recommended). - Line 197 – Is this not just a consequence of hierarchical clustering? - Figure 4 caption mentions specificity but I can’t see this plotted anywhere (presumably because it is perfect at all thresholds)? It says “specificity stays high” – I would just say “specificity remains 100%”. - I disagree with the usage of “functional clustering” to refer to non-validated clusterings in the simulated data - there is no evidence of shared function in these large clusters. - It is not clear to me whether having any RMSD cut-off at all improves specificity, or whether most of SPACE2’s predictive power is in CDR length matching. At the highest cut-off there is still perfect specificity. Please could you report sensitivity and specificity for the CDR length matching alone. Further to this - I haven’t seen a simple CDR sequence identity clustering be compared to SPACE2. Please could you calculate the sensitivity+specificity using sequence identity instead of RMSD in the SPACE2 clustering algorithm- a threshold could be selected to produce a similar number of clusters to RMSD clustering or you could consider a reasonable range such as 70% and above. This would be stronger evidence that SPACE2 works because of structure specifically. Reviewer #3: The manuscript by Waury et al. compares two recent approaches to detect functionally related antibodies from B cell repertoire data, SSAB+ and SPACE2 (refs. 22 and 23). The benchmark highlights several interesting features of these methods and points out significant limitations in applying structure-based algorithms to real repertoires. A comparison of existing methods for detecting convergent evolution is important to researchers analyzing immune repertoires. However, the benchmark presented in this study is limited in scope as it only considers a single synthetic repertoire with a trivial clonal structure. Some of the conclusions, in particular regarding the comparison of structure-based methods with clonotyping, are poorly supported by evidence presented in the manuscript. I think two factors significantly diminish the methodological insight of this paper and its potential impact. 1. A single synthetic dataset was used for comparison and its parameters don't reflect the potential difficulties in detecting convergent evolution. These include varying degrees of mutation, repertoire size (depth of sampling), and, seemingly most important for the task at hand, a nontrivial clonotype structure. Antibody repertoires come in multiple shapes and sizes, and a meaningful comparison of clustering approaches must take that into account. 2. The comparison with clonotyping is done with a relatively old method, which suffers from low accuracy, as suggested by more recent studies (see the following references and comparisons to other methods there: Ralph and Matsen, 2016 and 2022; Nouri and Kleinstein, 2020, Lindenbaum et al., 2020; Spisak et al., 2024), as well as by the results presented here (IGX-Cluster merges sequences that are not clonally related and have >40% divergence in CDRH3, in larger datasets this leads to positive predictive value close to zero). Links to references: Ralph and Matsen, 2016, doi.org/10.1371/journal.pcbi.1005086 Nouri and Kleinstein, 2020, doi.org/10.1371/journal.pcbi.1007977 Lindenbaum et al., 2020, doi.org/10.1093/nar/gkaa1160 Ralph and Matsen, 2022, doi.org/10.1371/journal.pcbi.1010723 Spisak et al., 2024, doi.org/10.7554/eLife.86181 The paper is appropriately structured. Data and code are readily available and the methodology is clearly described in the manuscript. The clarity of the presentation could be improved by avoiding the use of jargon. A few concrete suggestions and comments are listed below. 1. Abstract: I'm afraid the phrase "multiple-occupancy clusters" will be confusing to the reader, and probably shouldn't be introduced in the abstract. I find the formulation in "author's summary" easier to follow. 2. The reference in line 26 is not formatted. 3. In line 27: "them" suggests "some of them". 4. In line 84: this sentence is hard to parse 5. In line 95: the 75% threshold is not justified. 6. When discussing sequence identity, it should be stated whether this refers to amino acid or nucleotide sequence. 7. In line 133: suggest avoiding "spiked" 8. In line 149: The sentence "Antibodies (...)" is not grammatical. 9. In line 167. One cannot infer that any of the methods is highly specific from the observation that no two curated antibodies were incorrectly assigned to the same cluster. More generally, in discussing the performance of the methods, it's not specificity but positive predictive value or precision, that's 1. a relevant measure of accuracy and 2. difficult to achieve in clonotyping. 10. In line 184: suggests rather "identified by"? ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Charlotte Deane Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission:While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions. Reproducibility:** To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1013057.r002
Revision 1
21 Feb 2025 Author Response Attachments Attachment Submitted filename: Response_to_reviewers_PLOSCB.pdf https://doi.org/10.1371/journal.pcbi.1013057.r003
17 Apr 2025 Decision Letter - Claude Loverdo, Editor Dear van den Ham, We are pleased to inform you that your manuscript 'Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Claude Loverdo, Ph.D. Academic Editor PLOS Computational Biology Amber Smith Section Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have addressed all of the points in their response. Reviewer #2: All of my comments were addressed well. I congratulate the authors for their valuable contribution to this field. Reviewer #3: The response by the authors is lengthy but not consequential; the manuscript hasn't gone through a major revision. As one example, my comment about using PPV rather than specificity to quantify the methods' accuracy has led to rephrasing of one or two sentences but not a change in data analysis. The actual statistic used in the figures is "random clustering rate," and (as far as I can from its confusing definition on page 16) it's neither specificity nor PPV (and certainly not both). The main points of my critique have not been addressed. The analysis relies on a single synthetic dataset. No evidence is presented on how closely it matches realistic repertoires beyond the PCA plot in the supplementary figure. This plot is uninformative because (1) the simulated dataset was supposed to resemble a typical repertoire and not the curated dataset of antibody pairs, and (2) it's not clear what characteristics of the datasets are picked up by the two principal components. As a consequence, a number of claims made by the authors remain poorly supported. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: None ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Charlotte Deane Reviewer #2: Yes:** Eve Richardson Reviewer #3: No https://doi.org/10.1371/journal.pcbi.1013057.r004
Formally Accepted
Acceptance Letter - Claude Loverdo, Editor PCOMPBIOL-D-24-01065R1 Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data Dear Dr van den Ham, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Judit Kozma PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1013057.r005

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .