scRepertoire 2: Enhanced and efficient toolkit for single-cell immune profiling

Qile Yang; Ksenia R. Safina; Kieu Diem Quynh Nguyen; Zewen Kelvin Tuong; Nicholas Borcherding

doi:10.1371/journal.pcbi.1012760

Peer Review History

Original SubmissionDecember 30, 2024
30 Dec 2024 Author Response https://doi.org/10.1371/journal.pcbi.1012760.r001
Decision Letter - Amber M Smith, Editor, Pramod Shinde, Editor PCOMPBIOL-D-24-02249 scRepertoire 2: Enhanced and Efficient Toolkit for Single-Cell Immune Profiling PLOS Computational Biology Dear Dr. Borcherding, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 30 days May 23 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter We look forward to receiving your revised manuscript. Kind regards, Pramod Shinde Guest Editor PLOS Computational Biology Amber Smith Section Editor PLOS Computational Biology Additional Editor Comments : The manuscript on scRepertoire 2 presents a significant update to the existing toolkit, with improved computational efficiency, expanded data format compatibility, and integration with machine learning frameworks. The new features and performance enhancements add value to single-cell immune repertoire analysis, and the erythema migrans case study effectively demonstrates the toolkit’s capabilities. However, the reviewers have raised several critical issues that need to be addressed to strengthen the manuscript and broaden its impact. They emphasized the need for more rigorous benchmarking against existing tools to provide a clearer comparative analysis of performance and functionality. The machine learning integration, while promising, would benefit from a more concrete example to show how it can generate meaningful insights. Reviewers also suggested clarifying statistical uncertainty in diversity analysis and expanding the results section to better illustrate how the toolkit enhances biological interpretation. Improvements in presentation and clarity, such as defining acronyms at first use, providing more detailed figure legends, strengthening references, and ensuring overall consistency, were also recommended. I suggest major revisions to address all the key points raised by the reviewers and resubmit after thoroughly incorporating the feedback. Journal Requirements: 1) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019. 2) Please provide an Author Summary. This should appear in your manuscript between the Abstract (if applicable) and the Introduction, and should be 150-200 words long. The aim should be to make your findings accessible to a wide audience that includes both scientists and non-scientists. Sample summaries can be found on our website under Submission Guidelines: https://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-parts-of-a-submission 3) We noticed that you used the phrase 'data not shown' in the manuscript. We do not allow these references, as the PLOS data access policy requires that all data be either published with the manuscript or made available in a publicly accessible database. Please amend the supplementary material to include the referenced data or remove the references. 4) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines: https://journals.plos.org/ploscompbiol/s/figures 5) Some material included in your submission may be copyrighted. According to PLOSu2019s copyright policy, authors who use figures or other material (e.g., graphics, clipart, maps) from another author or copyright holder must demonstrate or obtain permission to publish this material under the Creative Commons Attribution 4.0 International (CC BY 4.0) License used by PLOS journals. Please closely review the details of PLOSu2019s copyright requirements here: PLOS Licenses and Copyright. If you need to request permissions from a copyright holder, you may use PLOS's Copyright Content Permission form. Please respond directly to this email and provide any known details concerning your material's license terms and permissions required for reuse, even if you have not yet obtained copyright permissions or are unsure of your material's copyright compatibility. Once you have responded and addressed all other outstanding technical requirements, you may resubmit your manuscript within Editorial Manager. Potential Copyright Issues: i) Figure 1. Please confirm whether you drew the images / clip-art within the figure panels by hand. If you did not draw the images, please provide (a) a link to the source of the images or icons and their license / terms of use; or (b) written permission from the copyright holder to publish the images or icons under our CC BY 4.0 license. Alternatively, you may replace the images with open source alternatives. See these open source resources you may use to replace images / clip-art: - https://commons.wikimedia.org - https://openclipart.org/. ii) The following Figure contains a logo or branding: 1. We are not permitted to publish this under our CC-BY 4.0 license, even with permission. We ask that you please remove or replace it. Reviewers' comments: Reviewer's Responses to Questions Reviewer #1: Exploring a fresh version of scRepertoire is a significant and valuable contribution. It is necessary to improve and expand the evaluation and presentation of results and performance. The abstract is quite generic, so the authors could include summary statistics to assess the performance of their method and support their assertions in the abstract. Please explain the acronyms before starting to use them. For instance, what does 'scAIRR-seq' mean? You should add multiple references to support your statements. For instance, this sentence requires multiple references: "Many tools lack robust integration for immune receptor profiling with transcriptomic data or flexibility in data export formats, which hinders reproducibility and cross-platform compatibility in expanding datasets." It is necessary to create a table and compare the functionality of your package with those of the published packages, including scRepertoire v1. The aim of scAIRR-seq and Single Cell Immune Profiling is to provide comprehensive insights into the immune system at the single-cell level. However, their scope and integration of different data modalities have nuances. You need to discuss this in your article. Can your tool be used for Single Cell Immune Profiling? Can your tool help improve the antigen specificity information of scAIRR-seq? The method section focuses mainly on reviewing functions. It would be helpful if you could give more details about the methodology behind each function to help understand their functionality and as a guide for picking the appropriate one for downstream analysis. Additionally, it is important to highlight any new features in your package. Is it simply a collection of previously released tools or do you include additional components to address the shortcomings in the field? The results section will become more extensive by presenting examples of how this new tool enhances biological interpretation. Can you provide a demonstration of biological insights that can be achieved using ScRepertoire? I am unable to find a discussion section. Is it not necessary to include a discussion section? Reviewer #2: The manuscript presents scRepertoire 2, a major update to an R-based toolkit for single-cell adaptive immune receptor repertoire sequencing (scAIRR) analysis. The package optimizes computational efficiency, expands data format compatibility, introduces novel repertoire analysis features, and integrates with deep learning frameworks. A case study on erythema migrans lesion samples demonstrates its practical application, though validation across multiple disease contexts would strengthen the work. Major Strengths Substantial performance gains via C++ integration, reducing computational complexity from quadratic to linear scaling Expanded data format support with automatic format detection, eliminating the need for file conversion Novel analytical tools (positional entropy, amino acid property analysis, k-mer distributions) enable deeper repertoire characterization Enhanced visualization modules and improved metrics for repertoire diversity analysis Seamless integration with machine learning frameworks supports predictive modeling in computational immunology The erythema migrans case study effectively demonstrates the package's capabilities Major Weaknesses No benchmarking against competing tools to quantitatively demonstrate efficiency gains Limited validation based on a single dataset—additional disease models would improve generalizability Insufficient technical details on algorithmic implementations, particularly for novel features Lack of error case analysis—no discussion of potential failure scenarios Minimal validation of new metrics across diverse datasets Recommendations Include benchmarking results comparing performance against tools like Immcantation, scirpy, and TRUST4 Expand dataset validation using scAIRR data from cancer immunotherapy, autoimmune diseases, or vaccine studies Provide detailed algorithmic descriptions for novel features Discuss potential failure cases (e.g., low cell counts, dropout effects, misaligned assignments) Enhance comparison with alternative tools to contextualize unique contributions Include runtime efficiency comparisons on large-scale datasets Explore future directions such as spatial transcriptomics integration and multi-omics approaches The manuscript makes a significant contribution to single-cell immunogenomics. The performance improvements, expanded analytical capabilities, and deep learning integration are valuable for immune profiling research. However, addressing the benchmarking gaps, dataset limitations, and validation concerns would substantially strengthen the work. Reviewer #3: In the manuscript of Yang et al, the authors present an updated version of the single-cell immune repertoire analysis tool, scRepertoire2, with a strong emphasis on integrating diverse single-cell data formats for upstream processing and incorporating various amino acid autoencoders in the downstream analyses. In the updated package, the enhancements in immune repertoire summarization, with the focus on amino acid composition and gene usage, have the potential to drive further developments in the field. Overall, the manuscript is well-constructed and I recommend acceptance of the manuscript after minor revisions. 1. Authors demonstrate the updated scRepertoire2 by analyzing transcriptomic and immune repertoire profiling data from erythema lesions. Authors did a good job in detailing the new features. More quantitative benchmarking with other packages would help highlight the unique advantages of scRepertoire2. 2. One unique improvement of scRepertoire 2 is this compatibility with ML packages (like Trex, Ibex and ImmApex). Although the authors have included vignettes in the online tutorials, it would be helpful is author could include a more detailed example in the manuscript, demonstrating how scRepertoire 2 can integrate with those packages and what unique biological insights can be acquired. 3. Single-cell dataset volumes are increasing dramatically. For example, ParseBio has published several immune profiling datasets with over 1 million cells. Are there specific optimizations for large-scale dataset(like disk storage or memory management) implemented in this version? How does scRepertoire2 compatilble with large-scale data (>1 miilions cells)? Authors should provide the information on scalability and performance with very large datasets. 4. In the future direction section, authors should discuss alternative approaches currently addressing similar problems and how further development of scRepertoire2 would fit within this landscape. 5. Please ensure all figures have clear legends and all abbreviations should be well-defined. Reviewer #4: The study by Yang et. al. presents scRepertoire 2, a computational R package designed for immune repertoire analysis. The tool integrates single-cell RNA-seq and TCR/BCR sequencing data, enabling users to characterize immune clonotypes in their transcriptomic context. The key strengths of this work include: - Enhanced usability & accessibility: The redesign introduces intuitive function names, improved documentation, and a pkgdown website to support users. - Expanded data compatibility: Supports multiple sequencing pipelines (e.g., 10x Genomics, AIRR, MiXCR, TRUST4), improving adaptability. - Performance optimizations: Efficient C++ integration reduces computational complexity, allowing linear scalability for key functions. - Comprehensive repertoire summarization: Advanced features such as entropy analysis, positional amino acid property mapping, and k-mer analysis enhance functional interpretation. - Machine learning (ML) applications: Integration with Trex, Ibex, and ImmApex expands predictive modeling capabilities, particularly for immunology-related deep learning applications. Overall, scRepertoire 2 is a well-designed computational tool for immune repertoire analysis, introducing significant enhancements in data compatibility, performance, and machine learning applications. However, critical aspects such as comparative benchmarking, validation of functional predictions, batch effect correction, and scalability testing remain underexplored. Addressing these limitations will further strengthen its utility in high-throughput immunogenomics. I have some listed concerns & suggested improvements 1. Design and Implementation / Workflow: The workflow efficiently integrates immune repertoire data, but it lacks explicit benchmarking against existing tools such as Immunarch, scIR, or TCRgrapher to demonstrate comparative performance improvements. - Suggested Change: Include quantitative benchmarking (e.g., accuracy of clonal assignments, runtime comparisons). 2. Expanded Data Compatibility: The newly introduced `loadContigs()` function automatically detects formats, which is a strong feature. However, the details of how misclassified formats are handled are not described. - Suggested Change: Provide error-handling mechanisms (e.g., warnings for ambiguous format detection). 3. Performance Optimizations: While C++ acceleration improves runtime efficiency, the scalability for large repertoires (>1M cells) is not explicitly tested. - Suggested Change: Include real-world benchmarking for large single-cell datasets, preferably using public datasets from immune cell atlases. 4. Clonal Diversity Analysis: While rarefaction analysis is robust, it lacks a discussion on statistical variability (e.g., how confidence intervals are derived). -Suggested Change: Report statistical uncertainties (e.g., confidence intervals, bootstrap variability). 5. ML Applications: The ML component is promising, but it lacks clarity on training data availability and whether models are trained on independent datasets. - Suggested Change: Provide details on model generalizability, including cross-validation performance on independent cohorts. ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy . Reviewer #1: Yes: Isar Nassiri Reviewer #2: No Reviewer #3: Yes: Junyue Cao Reviewer #4: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions. Reproducibility:** To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1012760.r002
Revision 1
23 Apr 2025 Author Response Attachments Attachment Submitted filename: scRepertoire2_reviewer_comments.docx https://doi.org/10.1371/journal.pcbi.1012760.r003
Decision Letter - Amber M Smith, Editor, Pramod Shinde, Editor Dear Dr Borcherding, We are pleased to inform you that your manuscript 'scRepertoire 2: Enhanced and Efficient Toolkit for Single-Cell Immune Profiling' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Pramod Shinde Guest Editor PLOS Computational Biology Amber Smith Section Editor PLOS Computational Biology ********************************************************* Thank you to the authors for thoroughly addressing the reviewers' comments. The revised manuscript satisfactorily resolves the concerns raised during peer review. I am pleased to recommend the manuscript for acceptance. Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: I appreciate the authors' diligent work on the revisions for the manuscript. Upon reviewing their responses and the revised manuscript, I am convinced that all my concerns have been addressed. I suggest that the manuscript is now suitable for publication. Reviewer #2: Authors have addressed all the comments. Manuscript can be accepted. Reviewer #3: The author has addressed my concerns and suggestions. The addition of quantitative benchmarking, demonstration of ML packages, report on scalability, and the discussion of alternative approaches have significantly improved the manuscript. I am satisfied with the revision and recommend this manuscript for publication. Reviewer #4: As the comments are fairly addressed, I would recommend the acceptance of the article for publication. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy . Reviewer #1: No Reviewer #2: Yes: Ankush Bansal Reviewer #3: No Reviewer #4: Yes:** Aquib Ehtram https://doi.org/10.1371/journal.pcbi.1012760.r004
Formally Accepted
Acceptance Letter - Amber M Smith, Editor, Pramod Shinde, Editor PCOMPBIOL-D-24-02249R1 scRepertoire 2: Enhanced and Efficient Toolkit for Single-Cell Immune Profiling Dear Dr Borcherding, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Lilla Horvath PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1012760.r005

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .