Peer Review History

Original SubmissionJuly 31, 2022
Decision Letter - Sushmita Roy, Editor, Maxwell Wing Libbrecht, Editor

Dear Doctor Zhao,

Thank you very much for submitting your manuscript "scDSSC: Deep Sparse Subspace Clustering for scRNA-seq Data." for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

As you see from the reports below, the reviewers note that scDSSC performs well but they raise multiple concerns about the manuscript, including regarding references to related work, lack of description of the methods used, and understandability of text and figures.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Maxwell Wing Libbrecht, Ph.D.

Academic Editor

PLOS Computational Biology

Sushmita Roy

Section Editor

PLOS Computational Biology

***********************

As you see from the reports below, the reviewers note that scDSSC performs well but they raise multiple concerns about the manuscript, including regarding references to related work, lack of description of the methods used, and understandability of text and figures.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: General

This paper proposes a deep sparse subspace clustering method scDSSC combining noise reduction and dimensionality reduction for scRNA-seq data. Experiments on a variety of scRNA-seq datasets from thousands to tens of thousands of cells have shown that scDSSC can significantly improve clustering performance. My concerns can be found below.

1 You should describe the data preprocessing process in more detail, such as size factor calculation, normalization, log-transformation, etc., and it would be clearer to give the formulae for their calculation

2 In your work, the three hyperparameters λ_1,λ_2 and λ_3 are taken as 0.2, 1.0, 0.5 respectively, is there any special condition for their values? What is the basis for the values you have given them?

3 Is there any connection between pre-training and fine-tuning in the optimization process? Or what are the differences between them?

4 The entire network is built on a fully connected layer, so why not use the more popular convolutional network?

5 In literature, there are already many similar works, e.g. Nucleic Acids Res, 49(D1):D1029-D1037; Bioinformatics. 2021 May 10:btab250, doi: 10.1093/bioinformatics/btab250. The authors should mention related works in their paper.

6 There are some grammatical errors in the manuscript, for example, line 67 “Recently, some new single cell analysis methods based-subspace clustering are proposed”. Please check the entire text carefully and correct any grammatical errors and writing mistakes in the manuscript.

Reviewer #2: In this manuscript, the authors proposed a novel model, scDSSC, for clustering single-cell data. This model employs deep sparse subspace clustering by using a self-expressiveness matrix. The extensive experiments show that scDSSC outperforms most competing methods in multiple real datasets.

Major

L121. The author should explain why a cell can be represented by other cells in a biological view. This is the assumption of this model.

Line 63. “However, the cell clusters obtained by these methods lack the support of mathematical theory”. Need more explanations for this statement. The author should give the mathematical theory which only supports the proposed model.

L168. Could the authors explain why they use both MSE and ZINB as the reconstruction loss. Ablation studies are suggested.

L259. The author should explain why scDSSC can cope with the batch effect in a computational view.

The authors should enhance the resolution of the figures. Many figures are not legible. It is suggested to highlight the differences between scDSSC and scDeepCluster in Figure 1 and 2. From the current figure, I cannot see any differences.

Minor

Line 73. “most proposed models subspace-based” should be ‘most proposed subspace-based model’?

Figure 1 and Figure 2 can be combined

Line 100. The authors should clarify how to calculate size factor and library size.

Line 100-104. Double-check the grammar.

Figure 2. What are the differences between the red and the orange layers? Please specify.

It is suggested to put real data description in the method section.

I suggest the authors to include SC3 in the competing methods

It is suggested to describe the way of running the competing methods in the method section. For example, for using scDCC, how did the authors build the constraints?

Line 211. ‘We infer that the reason may be that the ZINB distribution cannot well approximate the real distribution of Human kidney dataset.’ The authors may try to run the model without ZINB loss.

L 283. This is suggested to put this part into the method section.

I suggest showing NMI in Fig 3 since ACC is influenced by the cluster sizes.

L226. ‘Next, calculate the nearest neighbors of each cell according to the given number’. The author should explain what this step is for.

In the Umap figures, I cannot find any advantages from scDSSC to the competing methods.

L272 ‘’ For, example,’ typo

Reviewer #3: The study by Zhao et al. proposed a deep sparse subspace clustering method scDSSC. Considering the noise and large number of genes in scRNA-seq data, autoencoder based on ZINB distribution is applied to reduce the dimension and noise of the data simultaneously in scDSSC. The authors showed scDSSC outperformed state-of-the-art methods under various clustering performance metrics.

Major concern:

1. Simulated data rather then real data should be used to evaluate the performance of scDSSC, in particular, real data are complexity and could not provide the exactly scenarios under which scDSSC performs better. Furthermore, the clusters in real data are unknown and could not be used to evaluate the performance of these methods. (Fig3–6)

2.scDSSC clustered the cells into subgroups as previous studies (Fig3–6), which did not provide novel biological insight.

3. The authors did not provide information on how to establish the reference for judging the performance of these methods based on real data.

Minor comments:

1.The manuscript is not well organized and should be carefully organized.

2. The fig resolution is low. Some words in fig4 and fig6 could not be recognized.

3. The figure legend is too simple to exactly understand the figure, In particularly Fig4–6.

4. Fig4 should be separated into panel A, B, and C for easy reference.

5. Fig 5A and 5B are redundant.

6. Although all the figures legend are appended at the end of the maintext. It is surprised that the fig legend of Fig 3, 4,5 are also in the maintext, which is redundant.

7. The grammatical errors should be carefully checked.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Wenfei Jin

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Revision 1

Attachments
Attachment
Submitted filename: Response letter.docx
Decision Letter - Sushmita Roy, Editor, Maxwell Wing Libbrecht, Editor

Dear Doctor Zhao,

Thank you very much for submitting your manuscript "scDSSC: Deep Sparse Subspace Clustering for scRNA-seq Data." for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

I am pleased to report that the reviewers found the manuscript to be acceptable for publication, subject to the revisions below. In particular, Reviewer 2 notes several statements that must be clarified.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Maxwell Wing Libbrecht, Ph.D.

Academic Editor

PLOS Computational Biology

Sushmita Roy

Section Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

I am pleased to report that the reviewers found the manuscript to be acceptable for publication, subject to the revisions below. In particular, Reviewer 2 notes several statements that must be clarified.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors have addressed all my concerns.

Reviewer #2: The authors addressed most of my concerns. Some claims should be further clarified.

1)      The authors state that “the gene expression profile of a cell can be linearly represented by

the gene expression profiles of other cells only within the same subspace. This also means that

these cells belong to the same cell type, which refers to their genotypes being the same and the

expression patterns of genes within the cells are similar.” This statement is misleading because

the cell-type information (ground truth) cannot be used for clustering. Do the authors mean that

a cell is represented by the cells that are predicted to be in the same cell type? Besides, how do

the authors initialize C? This step might be very important for this model.

2)       The authors state that “However, thanks to the self-representation property of subspaces,

the gene expression profile of a cell within the same subspace can be represented as a linear

combination of the expression profiles of other cells, which tends to capture the global structural

information and thus curbs the aforementioned random effects on the integrated data.” I believe

this is only working when the batch effects are smaller than the differences between different

cell types. The authors should specify this in the manuscript.

3)      The authors state that “When using scDCC, the “constraint-pairs” are first computed

according to the way proposed in the paper, and then a semi-supervised learning is performed”.

The performance of scDCC is majorly depending on the constraints used. It is suggested to

specify the way of building constraints, such as the marker gene(s) used as well as the number

of ML and CL links.

4)      If SC3 takes too large RAM or too long times, it is suggested using a subset of cells to run

the experiments.

5)      Some texts on the figures are still too small to read.

Reviewer #3: The manuscript has improved. The authors have addressed my comments in some way.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Wenfei Jin

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Revision 2

Attachments
Attachment
Submitted filename: Response letter.docx
Decision Letter - Sushmita Roy, Editor, Maxwell Wing Libbrecht, Editor

Dear Doctor Zhao,

We are pleased to inform you that your manuscript 'scDSSC: Deep Sparse Subspace Clustering for scRNA-seq Data.' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Maxwell Wing Libbrecht, Ph.D.

Academic Editor

PLOS Computational Biology

Sushmita Roy

Section Editor

PLOS Computational Biology

***********************************************************

Formally Accepted
Acceptance Letter - Sushmita Roy, Editor, Maxwell Wing Libbrecht, Editor

PCOMPBIOL-D-22-01159R2

scDSSC: Deep Sparse Subspace Clustering for scRNA-seq Data.

Dear Dr Zhao,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofia Freund

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .