Multiplets in scRNA-seq data: Extent of the problem and efficacy of methods for removal

Dimitris Ttoouli; Daniel Hoffmann

doi:10.1371/journal.pone.0333687

Peer Review History

Original SubmissionJune 13, 2025
9 Jul 2025 Decision Letter - Nagarajan Raju, Editor PONE-D-25-32117Multiplets in scRNA-seq data: extent of the problem and efficacy of methods for removalPLOS ONE Dear Dr. Ttoouli, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Aug 23 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Nagarajan Raju Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf. 2. Please note that PLOS One has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 3. Thank you for stating the following financial disclosure: [This study was funded in part by grant HO 1582/12 from Deutsche Forschungsgemeinschaft.]. Please state what role the funders took in the study. If the funders had no role, please state: ""The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."" If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. 4. When completing the data availability statement of the submission form, you indicated that you will make your data available on acceptance. We strongly recommend all authors decide on a data sharing plan before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire data will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption. Please be assured that, once you have provided your new statement, the assessment of your exemption will not hold up the peer review process. Additional Editor Comments: I suggest authors to go through the comments from the reviewer and address them in the revised version of the manuscript [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Single-cell RNA sequencing (scRNA-seq) has revolutionized the field of transcriptomics, allowing for in-depth analysis of individual cells. However, multiplets—droplets containing more than one cell—are a known artifact that can significantly impact the accuracy and interpretation of scRNA-seq data. This study comprehensively evaluates the prevalence of multiplets across diverse datasets and assesses the effectiveness of commonly used detection tools. The authors utilize cell hashing as a benchmark to determine the true multiplet rate and refine a Poisson-based model to estimate multiplet frequencies. Major: 1, Is the assessment of multiplet removal limited to specific sequencing technologies? What is the typical prevalence of multiplets across datasets? Please provide supplemental information about the datasets used (e.g., cell types, sequencing platforms, library preparation methods). 2, Can the "gold standard" dataset with a 26.13% multiplet rate be considered representative of typical scRNA-seq studies? Furthermore, do multiplets (capturing >2 cells) exhibit more pronounced distinguishing features compared to doublets (2 cells), and if so, how does this impact detection? 3, The observation that the unprocessed data ("No processing") does not consistently yield the worst clustering performance (Fig. 6) requires further investigation. Additionally, could the improved performance after multiplet removal be partially confounded by the reduced cell count? Please address how changes in dataset size post-removal might influence clustering metric comparisons. 4, The analysis of clustering results should be extended: a) Does multiplet removal eliminate spurious clusters predominantly composed of multiplets? b) Conversely, could it potentially hinder the discovery of rare, biologically relevant cell types or states? c) Beyond clustering, what impact does multiplet contamination (and its removal) have on other critical downstream analyses (e.g., differential expression, trajectory inference, cell-cell communication)? ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0333687.r001
Revision 1
25 Aug 2025 Author Response Dear Editor, dear Reviewer, We thank you both for the constructive comments on our manuscript. We have carefully revised the text and figures in response. Below we provide a point-by-point response to Reviewer #1’s comments, and also a mark-up version of the manuscript with changes to the first version of the manuscript highlighted. Line numbers given in our responses below refer to the consolidated revised manuscript, not the mark-up version. Reviewer #1 – Major Comments 1. “Is the assessment of multiplet removal limited to specific sequencing technologies? What is the typical prevalence of multiplets across datasets? Please provide supplemental information about the datasets used (e.g., cell types, sequencing platforms, library preparation methods).” We have altered the text in the Introduction (page: 2, lines: 12-14, 18-21) to clarify that multiplets are a universal phenomenon in single-cell technologies, but particularly prominent in droplet-based methods, and we report typical prevalences of multiplets in general. In the Datasets section of Methods (page: 3, lines: 83-85), we have changed phrasing so as to be explicit about the sequencing technology and platform. The dataset-specific information was already provided in the same section (page: 3-4, lines: 86-116), but for clarity we have also created a new Supplemental Table S1 summarizing source, tissue and number of samples, library preparation chemistry, multiplexing method, droplet counts, observed multiplet counts, and resulting multiplet rate for all datasets analyzed for this work. We now explicitly state in the Limitations section of the Discussion (page: 16, lines: 608-611), that our analysis is restricted to droplet-based scRNA-seq using 10x Genomics chemistries (3′ v2, 3′ v3, and 5′ Immune Profiling v1) and may not generalize to other platforms. 2. “Can the ‘gold standard’ dataset with a 26.13% multiplet rate be considered representative of typical scRNA-seq studies? Furthermore, do multiplets (>2 cells) exhibit more pronounced distinguishing features compared to doublets, and if so, how does this impact detection?” The gold-standard dataset was chosen not because it is representative of the average scRNA-seq experiment but because it uniquely combines multimodal data with cell-hashing, which holds new information about multiplets. We added a clarification in the Discussion (page: 15, lines: 539-546) noting that ~26% is not representative of older studies but reflects current high-throughput droplet practice and that its rate falls within our observed range across all datasets (16–37%; Fig. 3). Regarding the comparison between doublets and multiplets, we have clarified this point in the Limitations section of the Discussion (page: 16, lines: 600-608). Specifically, we acknowledge that while Poisson statistics predict higher-order multiplets should be rare, their prevalence may be somewhat elevated under the intentionally high cell-loading conditions typical of cell-hashing experiments. Such higher-order multiplets are more transcriptionally heterogeneous and potentially more disruptive, which in principle makes them easier to detect. However, most computational tools are optimized for doublet detection, whereas cell hashing is agnostic to multiplet order. 3. “The observation that unprocessed data does not consistently yield the worst clustering performance requires further investigation. Could the improved performance after multiplet removal be partially confounded by the reduced cell count?” We thank the reviewer for bringing this important point to our attention. We agree that dataset size differences could indeed confound clustering quality comparisons. To address this, we performed stratified subsampling prior to metric calculation, fixing all datasets to a common cell count (just below the smallest dataset size). We repeated subsampling 100 times per dataset and method combination, and calculated clustering metrics on the resulting subsets. Error bars (IQR) were very small and one could not judge whether they are large enough to change the ranking of best-to-worst performing method, so we decided to put the revised version of Figure 6 with error bars in Supporting information as Figure S5 (previous Figure S5 is now Figure S6 due to order of appearance), for completeness. We then summarized performance by computing average ranks across all three metrics, as well as separately for the Silhouette index, which is commonly used to evaluate clustering quality and optimal cluster number in scRNa-seq. This analysis confirmed that the “No processing” condition consistently performs worst, across datasets and across clustering metrics. Besides the new figure, the new findings are reflected in the manuscript with a new "Clustering quality assessment" section in Methods (page: 7, lines: 214-234), and a new Results section detailing these findings in the new figure 6 (pages: 10-11, lines: 369-400). 4. “The analysis of clustering results should be extended: a) Does multiplet removal eliminate spurious clusters predominantly composed of multiplets?" We have addressed this point directly in the manuscript. In the gold-standard dataset, only two pre-removal clusters (10 and 13) contained a majority of multiplets (64.5% and 58% cell-hashing multiplets, respectively). After removal, however, these clusters were not eliminated but redistributed and merged with adjacent, closely related clusters. In Fig. 10 we point out which cluster pairs fused into new, single clusters after multiplet removal and in Fig. 9 we show the bubble tree that demonstrates how these clusters are closely related to each other and that, with the exception of the two aforementioned clusters, no cluster comprises more than 50% cell-hashing multiplets. Silhouette values improved after multiplet removal, indicating increased cluster cohesion rather than wholesale elimination of cell types. We have changed the wording of this conclusion for clarification in the Discussion (page: 15-16, lines: 564-571). Uncertainty remains for very small clusters that could consist of undetected multiplets, which we note in the Results as a caveat (page: 12, lines: 445-448). b) Conversely, could it hinder the discovery of rare, biologically relevant cell types or states? As noted in the Results (page: 12, lines: 443-445), small clusters such as 11 and 13 in the gold-standard dataset were called almost entirely as multiplets by DoubletFinder and DoubletDetection, despite having low cell-hashing multiplet counts. This overcalling illustrates exactly the risk the reviewer points out: false-positive multiplet calls can lead to removal of rare or biologically relevant populations. We therefore added some text in the Discussion (page: 16, lines:75-585) to emphasize that both false negatives and false positives in multiplet detection can mislead downstream analysis. c) Beyond clustering, what impact does multiplet contamination (and its removal) have on other critical downstream analyses (e.g., differential expression, trajectory inference, cell-cell communication)?” We agree that multiplet contamination can in principle affect multiple stages of downstream analysis. One of the most basic downstream analyses is differential gene expression (DGE) between clusters, an analysis that can be meaningfully interpreted irrespective of the nature of the data set. So we selected clusters that are strongly affected by multiplet removal and performed DGE. Specifically, we compared two CD14 monocyte clusters before and after multiplet removal (new Figure 11). One of these clusters looses many droplets after multiplet removal and the other is one of the small, disappearing clusters. The pair was chosen to demonstrate how the remaining droplets of the disappearing cluster might display a more representative signal of their cell-type annotation, but this was not the case. While overall log fold-change estimates were highly correlated between conditions (Spearman ρ = 0.96), we observed notable changes in the top differentially expressed genes: several NK cell–associated markers lost statistical significance after multiplet removal, though they remained among the strongest residual signals. There are also quite a few significant DEGs in this analysis and the differentially expressed genes are not exactly the same before vs after multiplet removal (Jaccard = 0.72). We did highlight the canonical markers for CD14 monocytes that were also significant DEGs, and they were not affected by multiplet removal, in contrast to the mentioned NK cell markers. This highlights how multiplets can introduce misleading transcriptional signatures into DGE results and potentially confound biological interpretation. We added a new Methods subsection (page: 7-8, lines: 262-281), a new paragraph in Results (page: 14, lines: 506-530), the new Figure 11, and a small reference in Discussion (page: 16, lines: 580-585) to reflect the findings of this additional analysis. We did not pursue trajectory inference or cell–cell communication analysis. While these are valuable downstream applications, they are not uniformly applied across scRNA-seq studies, and are only relevant for the respective tissues where differentiation trajectories and/or cell-cell communication is the purpose of the analysis (which was not the case for our data sets). In contrast, multiplet removal, principal component analysis, clustering, cell-type annotation, and DGE are core analyses performed in nearly every scRNA-seq workflow. We believe these revisions substantially strengthen the manuscript and directly address the reviewer’s concerns. We thank the editor and the reviewer again for their insightful feedback and hope the revised version will now be suitable for publication. Sincerely, Dimitris Ttoouli and Daniel Hoffmann Attachments Attachment Submitted filename: Response to Reviewers.docx https://doi.org/10.1371/journal.pone.0333687.r002
17 Sep 2025 Decision Letter - Nagarajan Raju, Editor Multiplets in scRNA-seq data: extent of the problem and efficacy of methods for removal PONE-D-25-32117R1 Dear Dr. Ttoouli, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Nagarajan Raju Academic Editor PLOS ONE Additional Editor Comments (optional): Based on the responses to reviewer's comments, we are accepting your article for the publication. Reviewers' comments: https://doi.org/10.1371/journal.pone.0333687.r003
Formally Accepted
Acceptance Letter - Nagarajan Raju, Editor PONE-D-25-32117R1 PLOS ONE Dear Dr. Ttoouli, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Nagarajan Raju Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0333687.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .