Quantifying the clusterness and trajectoriness of single-cell RNA-seq data

Hong Seo Lim; Peng Qiu

doi:10.1371/journal.pcbi.1011866

Peer Review History

Original SubmissionAugust 20, 2023
8 Oct 2023 Decision Letter - Shihua Zhang, Editor, Jian Ma, Editor Dear Dr. Qiu, Thank you very much for submitting your manuscript "Quantifying the clusterness and trajectoriness of single-cell RNA-seq data" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Shihua Zhang Academic Editor PLOS Computational Biology Jian Ma Section Editor PLOS Computational Biology ********************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors of this submission has presented to quantify "clusterness" and "trajectoriness" of scRNA-seq data, which could help researchers to decide what would be suitable analysis methods given scRNA-seq data. This review applauds the efforts of developing such a quantitative computational pipeline and believes that the presented work can indeed help researchers, especially biomedical researchers, to better explore and understand data. There are a few questions/concerns that the authors may want to address: 1. Are the selected five metrics, pairwise distance distribution, persistent homology, vector magnitude, Ripley's K, and degrees of separation, comprehensive? Are there other possibly better metrics to use? The authors may want to discuss on the motivations of using these five specific metrics. Also, as computing some of these metrics depends on some selected algorithms or hyperparameters, will the final "clusterness" and "trajectoriness" results change depending on these choices? Finally, it appears that higher degree of separation scores indicates better connectivity (or worse separation). Is this historically this way or it is better to call it degree of connectivity instead of separation? 2. The generative 'geometric landscape' based on simulated data is using UMAP. Will the results depend on different embedding / projection methods? 3. Such geometric landscapes can probably change with simulated data, for example, the smoothness of trajectories, the number of clusters, etc., which do not necessarily help derive consistent/useful "clusterness" and "trajectoriness" scores? How would the authors take care of these? 4. It is indeed encouraging to see the results in Figs 4 and 5. However, the authors may want to discuss some non-intuitive situations. For example, why some single clusters still appear in the right 'clusterness' branch in 'geometric landscape' in Fig 5a? In Fig 4b, it might be better to separately show the distributions of these metrics for "cluster"-like and "trajectory"-like scRNA-seq data to indeed check whether they are similar as simulated data, instead of pulling all 169 datasets together? Regarding the biological ground truth, are these labels in Fig 4c based on 20 PCs as described in the text or there are more biologically grounded knowledge? 5. In the generated landscape in Fig. 3, it is non-intuitive that both P-dist and degree of separation scores are high at the right-bottom tip. The authors may want to provide more discussions to help better understand the potential reason/limitation? Reviewer #2: Summary of Article This article aims to address a key challenge in scRNA sequencing analyses. They discuss two key approaches to scRNA seq analysis: cluster analysis and trajectory inference analysis. Each of these approaches operate under distinct assumptions. This article aims to quantify the strength of each approach based on the geometric characteristics of the data. The authors present scoring metrics that aim to quantify “clusterness” and “trajectoriness” of the data. The authors use simulated data sets to differentiate between cluster-like data sets and trajectory-like datasets. They apply their metrics on real scRNA seq data sets to demonsrate how the metrics can be used as indicators of whether clustering or trajectory inference is appropriate for the data. This article would be of interest to the community of researchers using scRNA seq. It offers tools that can be used to help with the interpretation of data. With some revisions, this article is suitable for publication. Methods: The 5 scores that the authors use are well described in the methods and absolutely essential to understanding the manuscript though there are some inconsistencies between what the authors expect and the outcomes in the figures. Comments are below in the detailed notes. The Github repository does share the code associated with the manuscript, but the README is empty and there is very little information on how to reuse the data set or run the analysis. Without a great deal of experience, this could would be difficult or impossible to reuse. The code would benefit substantially from additional annotation on what the different steps of the code are doing, order in which files would need to be used, etc. If someone had their own data set, where would it be brought into the pipeline so they can do their own comparison? Detailed Notes Figure 1: Depiction of approach. No comments Figure 2: Figure 2 depiction of score distributions grouped by set-type using simulated data sets. I agree this shows some indication of separation between clustered data sets and trajectory data sets. Figure 3: UMAP clustering of 5-score outcomes from simulated data indicate separation between clearly clustered data sets and clearly trajectory data 3c – It appears that there is a gradient low-to-high for all metrics defined by the authors except for Homology and Degree of Separation. Can the authors comment on why these do or do not match the expected outcomes described in the methods section? Figure 4: The authors depict where the real scRNA seq data sets fall along their UMAP spectrum of simulated data sets. 4a – Many of the data sets fall somewhere in the middle of the UMAP spectrum. It would be interesting to see a supplemental figure depicting some examples of scRNA seq data sets that fall in the middle where they are not exactly determined to be cluster like but also not exactly trajectory-like. In practice, someone might want to assess the clusteriness or trajectoriness characteristics in their data sets, but examples of how they might interpret data if it is not clearly one or the other would be very helpful for a reader. 4b – It appears as though the range of the data sets for all 5 metrics are similar between the simulated and scRNA-seq data sets, but the distributions are not all similar. The vector measurements and the degree of separation are particularly different. Can the authors comment on why the distributions are different between the scRNA seq and simulated data sets for these two metrics. 4c – The authors mention around a 70% agreement between the prediction of cluster-like/ trajectory-like geometry for real scRNA seq data sets and their presumed geometric intuitions based on the experimental design and underlying biology. It might benefit readers to see the scores and distributions of those that agreed versus the scores and distributions of those that did not agree. While the n is low for some of the presumed geometries, there could be enough data sets to display this for Cluster, Organs, Tree, Linear and possibly Bifurcation. Figure 5: Authors clustered the scores for the 169 scRNA seq data sets using Seurat. It is not entirely clear what this figure is adding to the manuscript. It appears the authors are adding an additional transformation to the data set but clustering in Seurat and seeing if it holds up in their geometric landscape. Since the data are already abstracted through the use of the scores, adding an additional layer of clustering feels difficult to interpret. I might rather see a comparison of score distributions for data sets that agreed compared to those that did not agree as mentioned in 4c. Figure 6: Representative examples of how real scRNA seq data sets described as trajectories and clusters fall into the geometric landscape. The examples are clear and illustrative. Figure 7: Representative examples of how real scRNA seq data sets, that are not clearly described as trajectories or clusters, fall into the geometric landscape. The authors show that their method can help determine the clusterness and trajectoryness for data sets that are not clearly described as either. Discussion: Overall, this manuscript is interesting. It offers tools to researchers to help with the determination of whether to use cluster analysis or trajectory inference analysis on a data set. While I agree with the authors that using one analysis will bias the outcome towards the type of analysis conducted (ie. clustering analysis will generate clusters) the authors could benefit from describing examples of where this could make a difference when studying the biological significance of different pathways. It does not feel sufficient to say that it will help us distinguish between tragectories of clusters or clusters of trajectories, why is this important biologically? It is always useful to have more tools to work with, but it remains unclear how this makes a difference in our fundamental study of biological systems. Reviewer #3: The authors provide a report that is centered around the development of a method/pipeline to evaluate / quantifiy “clusterness” and “trajectoriness” of scRNA-seq data. To achieve this, the authors propose five scores, namely pairwise distance distribution, persistent homology, vector magnitude, Ripley's K, and degrees of separation to quantify the “clusterness” and “trajectoriness” of single-cell RNA-seq data. The subject is current and extremely relevant, and this beautiful study brings further understanding of the choices on how to analyze and interpret single-cell RNA-Seq data. Since many of the single cell RNA-Seq datasets show both cluster-like and trajectory-like characteristics, I suggest that the authors apply clustering analysis to data showing this feature, and then evaluate the clusterness and trajectoriness for both the original dataset and the individual clusters. This would greatly improve the original manuscript. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1011866.r001
Revision 1
28 Dec 2023 Author Response Attachments Attachment Submitted filename: Response_to_reviewers_v14.docx https://doi.org/10.1371/journal.pcbi.1011866.r002
28 Jan 2024 Decision Letter - Shihua Zhang, Editor, Jian Ma, Editor Dear Dr. Qiu, We are pleased to inform you that your manuscript 'Quantifying the clusterness and trajectoriness of single-cell RNA-seq data' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Jian Ma Section Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The revised manuscript and discussions in the response letter have addressed all my questions. Reviewer #2: I feel satisfied with how the authors responded to my comments and the comments of my fellow reviewers and have no further comments. Reviewer #3: The authors answered satisfactorily all the questions raised by the reviewers. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: None Reviewer #2: Yes Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes:** Ibraheem Ali Reviewer #3: No https://doi.org/10.1371/journal.pcbi.1011866.r003
Formally Accepted
21 Feb 2024 Acceptance Letter - Shihua Zhang, Editor, Jian Ma, Editor PCOMPBIOL-D-23-01334R1 Quantifying the clusterness and trajectoriness of single-cell RNA-seq data Dear Dr Qiu, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Anita Estes PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1011866.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .