Modeling and predicting the overlap of B- and T-cell receptor repertoires in healthy and SARS-CoV-2 infected individuals

María Ruiz Ortega; Natanael Spisak; Thierry Mora; Aleksandra M. Walczak

doi:10.1371/journal.pgen.1010652

Peer Review History

Original SubmissionOctober 17, 2022
30 Nov 2022 Decision Letter - Scott M. Williams, Editor, Mark J Cameron, Editor Dear Dr Walczak, Thank you very much for submitting your Research Article entitled 'Modeling and predicting the overlap of B- and T-cell receptor repertoires in healthy and SARS-CoV-2 infected individuals' to PLOS Genetics. The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the impact of your work, but raised substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review a much-revised version. We cannot, of course, promise publication at that time. Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. Please do not hesitate to contact us if you have any concerns or questions. Yours sincerely, Mark J Cameron, PhD Guest Editor PLOS Genetics Scott Williams Section Editor PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: In this work, Ortega and colleagues analyse overlap of TCR/BCR repertoires among S-CoV-2 neg and pos individuals using statistical models. I don't have many comments. The main comment is: how does this work conceptually differ from previous works of the research group? The research group has recently published on both Covid repertoires as well statistical models for generation and sharing of immune receptors. What exactly is the novelty in this work as compared to the group's previous research as well as to other works in this space. Can this be made more clear? For example, this sentence from the abstract "Yet many of these public receptors are shared by chance. We present a statistical approach, defined in terms of a probabilistic V(D)J recombination model enhanced by a selection factor, that describes repertoire diversity and predicts with high accuracy the spectrum of repertoire overlap in healthy individuals. " sounds very similar to previous manuscripts from this group. Please reformat the manuscript to isolate the advance made in this manuscript. Or, if there is no conceptual advance made here, rewrite to focus on the biological insight gained by the data analysis (but also there explain how it is novel compared to prior work in this space). Minor: Figure 1 looks a bit disorganized and could benefit from streamlining. Reviewer #2: In their manuscript ”Modeling and predicting the overlap of B- and T-cell receptor repertoires in healthy and SARS-CoV-2 infected individuals” Dr. Ruiz Ortega et al. describes bioinformatic processes to investigate public B and T cell repertoires, an approach that was also applied to studies of immune repertoires identified in subjects infected with SARS-CoV-2. They provide a convincing analysis framework, but one that also, as other pipelines designed to interrogate big data, requires further assessment. 1. In several instances the focus of the study is entirely on CDR3. This is a common approach in particular in studies of T cell repertoires. CDR3 is certainly at the centre of determination of specificity but it operates in the context of other sequences, such as those encoded by the IGHV. The light chain is of course also a key feature of factors that determine specificity but, as the authors rightfully describe, it is currently not economically feasible to collect sufficient single cell data to carry out a study like this. However, the authors ought to take in particular sequences encoded by IGHV into account for instance in studies of SARS-CoV-2-specific antibodies. 2. In some instances, the authors do to take sequences encoded by the V gene into consideration. Annotation of such data is complicated by the fact that the TRBV and IGHV loci are highly variable in terms of structural variation, gene duplication and the existence of highly similar alleles in different gene locations. How does the analytical approach deal with these aspects, for instance but not restricted to genes like IGHV3-30/3-30-3/3-30-5/3-33? 3. The authors identify a set of non-productive sequences that are used for some comparisons. These reads might have an origin in non-productive transcripts but may also be a consequence of PCR or sequencing errors. How does the authors deal with these matters to ensure that the reads are truly non-functional in the B cell population and not technical artefacts? 4. Public clonotypes seems often to be associated with IGHJ6 (see for instance Figure 3D, Figure 4 and Suppl Fig 2A). This might be a technical artifact. The alleles of IGHJ6 adds a particularly long stretch of residues into CDR3, thereby allowing the for a higher similarity score in the bioinformatic pipeline that allows sequences to reach the identity cutoff. The authors must comment on this. 5. In Figure 4 the authors show CDR3s that differ in length. These differences might certainly be a consequence of insertion/deletion hypermutation events but are more likely a consequence of differential V-DJ or D-J splicing events that exploit similar D/J genes. The frequency of some of the length differences suggests that they do not represent insertion/deletion events. Differences in length of for instance clones 1, 5, 7, 9, 14, 16, 17, 18, 20, 21, 23, 24 are indicative of such artifacts. That puts the true clonal relationship of these groups into questions. Are they really clonally related or just similar clones that happens to represent common sequences. It is not obvious to me if they were derived each from a single subject or derived from multiple individuals; please clarify. 6. Sequences that carry long IGHJ genes but also those that incorporate long IGHD might be better at passing thresholds of the analysis pipeline as N nucleotides represents a smaller fraction of the CDR3. Do the authors identify such aspects of the analysis (for instance, are incorporated D genes of public clones longer than those typically found in rearrangements as such a finding identifies a limitation of the pipeline)? 7. In Figure 4 it is stated that light chain CDR3 conservation is even more remarkable that that of heavy chains. This is entirely expected as these CDR3 show much less diversity in general and depend only on the V-J rearrangement and a limited incorporation of N nucleotides. Human TRBV rearrangements use only a very limited set of D genes but more extensive V gene trimming (as compared to IGHV). How does this affect analysis? 8. CDR3 regions similar to those of SARS-CoV-2 specific clones in an antibody database have been identified (page 7/32). Do these sequences also share V gene? Even if CDR3 is an important part of generation of the specificity it is commonly dependent on other features of the antibodies sequence. We cannot in this context relate to the light chain but the authors would be able to relate to the V-gene. 9. On page 7 (line 227 the authors state (2.7±1.5 sequences). What does it represent in relative terms? 10. In several instances including in Figure 3 and in the text the authors state that they observe a difference but that it is not statistically significant. It would be much better to say something like “the data was not sufficiently powered to with certainty identify a statistically significant difference between the groups”. 11. Are all data sets used full length or do they cover only a part of the IGHV and TRBV genes? If the latter is the case, this may affect gene assignment and clonal binning. Please comment how this may affect analysis and results. 12. Public antibodies have been identified in antibodies against the S-protein / RBD of SARS-CoV-2. Many of the public clones recognize early strains of the virus but not omicron. The present data sets, were they generated from samples collected during the early phase of the pandemic or following circulating of mutated viruses? If some data sets were collected at a point in time when other strains were dominant, this may impact the outcome of analysis. Please discuss how this relates to the results. 13. On line 455 the authors relate to a particular, common CDR3. This sequence has been generated using very little (if any) incorporation of a D gene and limited N nucleotide addition. I can myself identify this sequence in IgM data sets in association with different IGHV genes like IGHV1-2 and IGHV3-74. It is thus not unexpected that it is identified. It may however not relate to one but several clonotypes. Please discuss or delete this particular part of the discussion. 14. The authors may consider shortening the manuscripts as it is quite extensive, e.g. but not limited to the paragraph starting on line 503. ******** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes ********** https://doi.org/10.1371/journal.pgen.1010652.r001
Revision 1
15 Dec 2022 Author Response Attachments Attachment Submitted filename: PLoS Genetics sharing paper response.pdf https://doi.org/10.1371/journal.pgen.1010652.r002
2 Feb 2023 Decision Letter - Scott M. Williams, Editor, Mark J Cameron, Editor Dear Dr Walczak, We are pleased to inform you that your manuscript entitled "Modeling and predicting the overlap of B- and T-cell receptor repertoires in healthy and SARS-CoV-2 infected individuals" has been editorially accepted for publication in PLOS Genetics. Congratulations! Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made. Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics! Yours sincerely, Mark J Cameron, PhD Guest Editor PLOS Genetics Scott Williams Section Editor PLOS Genetics www.plosgenetics.org Twitter: @PLOSGenetics ---------------------------------------------------- Comments from the reviewers (if applicable): Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have address all my concerns. Reviewer #2: Thank you for providing an extensively updated manuscript though. I noted though that the authors decided not to try to reduce the manuscript in length, probably a disadvantage to the readers. ******** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes ******** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No ---------------------------------------------------- Data Deposition If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website. The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-22-01187R1 More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support. Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present. ---------------------------------------------------- Press Queries** If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org. https://doi.org/10.1371/journal.pgen.1010652.r003
Formally Accepted
22 Feb 2023 Acceptance Letter - Scott M. Williams, Editor, Mark J Cameron, Editor PGENETICS-D-22-01187R1 Modeling and predicting the overlap of B- and T-cell receptor repertoires in healthy and SARS-CoV-2 infected individuals Dear Dr Walczak, We are pleased to inform you that your manuscript entitled "Modeling and predicting the overlap of B- and T-cell receptor repertoires in healthy and SARS-CoV-2 infected individuals" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work! With kind regards, Anita Estes PLOS Genetics On behalf of: The PLOS Genetics Team Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom plosgenetics@plos.org \| +44 (0) 1223-442823 plosgenetics.org \| Twitter: @PLOSGenetics https://doi.org/10.1371/journal.pgen.1010652.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .