Peer Review History

Original SubmissionDecember 15, 2020
Decision Letter - Qing Nie, Editor, Alice Carolyn McHardy, Editor

Dear Prof. Dr. Claassen,

Thank you very much for submitting your manuscript "Mixture-of-Experts Variational Autoencoder for Clustering and Generating from Similarity-Based Representations on Single Cell Data" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Qing Nie

Associate Editor

PLOS Computational Biology

Alice McHardy

Deputy Editor

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The manuscript `Mixture-of-Experts Variational Autoencoder for Clustering and Generating from Similarity-Based Representations on Single Cell Data` proposes a generative clustering model, which is based on a variational autoencoder with a Gaussian mixture model for the latent space and a decoder consisting of a mixture of networks, where each mode in the latent space is decoded by an expert network. The gating network to assign samples to modes can be guided by prior knowledge on sample similarity. The authors demonstrate the performance of their model on MNIST data as well as single cell RNA-seq and mass cytometry clustering tasks. Overall, the work is interesting and the experiments conducted in a thorough manner. For an added value to the community, the authors should make their methods and scripts to reproduce the results accessible and provide additional details on their experiments. Other points are listed below.

Major points:

- Please make the code for the method and the scripts to reproduce the experiments accessible.

- What pre-processing was applied to the data presented in the Results? Please specify this and also provide details on the split of train and test data.

- - Several methods exist that use non-Gaussian variational distribution in a VAE, especially count based models in the single cell domain. These should be mentioned in p.2 l.14 and the results of a clustering in their latent space included in Table 2.

- The ablation study nicely highlights the benefits of including a similarity matrix. However, the exact impact of the chosen similarity metric remains unclear. For this, a clustering based solely on the chosen similarity matrix should be included as a reference in all experimental results. Does the influence of the similarity matrix depend on the dimensions of the data or number of clusters? How sensitive is the method to misspecifications in the similarity (e.g. wrongly chosen signature genes in the cell type clustering problem)?

Minor points:

- Could the authors comment on the motivation for the binary cross-entropy as reconstruction loss in Eq. (4). This seems to lead to blurry images in Fig. 2 and it is unclear why this should be a suitable loss for RNA-seq data.

- What would be the guidelines to choose the number of clusters K in applications? This seems to be crucial for the ability to generate samples and according to Fig S1 many more experts than actual clusters in the data might be found.

- Table 1 would be more insightful if the authors could provide a short description for these methods. Also, GANs for clustering seem an important alternative for the task but are not contained in the comparison nor mentioned in the text.

- The ablation study could be included as part of Table 1.

- l117-136 are hard to follow without going back to the original publication of VaDE.

- The authors could remove repetitive parts of the text (e.g. method description of VaDE and MoE-Sim-VAE in Results, eq (2) and (3)) and better separate general concepts and technical details in the introduction.

- How sensitive is the method to the choice of pi_1 and pi_2?

Reviewer #2: The authors report a computational method, Mixture-of-Experts Similarity Variational Autoencoder, at clustering and data generation, with applications on large-scale single-cell data. The proposed mathematical framework is solid and builds upon ideas that are appropriate for the analysis of large-scale single-cell data. The authors have demonstrated the applicability of their method using publicly available data sets, and the figures and tables are simple and clear. I however have a few suggestions to improve the manuscript.

Major comments:

1. The authors do not a link to their implementation. This should be a red line for academic computational tools, and I would request the authors to share their implementation via a github repository reproducible in a revised version of the manuscript. In addition, the authors should include a vignette or a tutorial reproducing the results from at least one of the applications presented in the manuscript.

2. In Table 2 the authors extract the F-measure and NMI scores from a public review (Lukas et al. 2016). This is fine as long as the processing pipelines that the author’s followed is the same as in the review. Minor changes in the data processing pipelines can lead to significant differences in the results. If this is the case, the authors should clarify. If not, the authors should verify the reported results with their own data processing pipelines.

Minor comments:

1. Fig3 is missing the colour legend

2. Could the authors back the quantitative results shown in Fig4 with a visualisation of the data, with cells coloured by cluster / cellt ype?

3. Why do the authors use the binary cross entropy as a reconstruction loss instead of the mean-squared error? Single-cell data is usually not scaled between 0 and 1.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: Code to reproduce the results is missing.

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see http://journals.plos.org/compbiol/s/submission-guidelines#loc-materials-and-methods

Revision 1

Attachments
Attachment
Submitted filename: PCB-review-comments.pdf
Decision Letter - Qing Nie, Editor, Alice Carolyn McHardy, Editor

Dear Prof. Dr. Claassen,

We are pleased to inform you that your manuscript 'Mixture-of-Experts Variational Autoencoder for Clustering and Generating from Similarity-Based Representations on Single Cell Data' has been provisionally accepted for publication in PLOS Computational Biology.

The reviewer one has two minor comments. Please address. 

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Qing Nie

Associate Editor

PLOS Computational Biology

Alice McHardy

Deputy Editor

PLOS Computational Biology

***********************************************************

Please address Reviewer 1's two minor comments in the final version of the submission.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors have addressed my previous comments in their revised manuscript.

Before publication they should still address the following minor points:

1. Please fix typos and sentence structure in lines 66/67 and 261.

2. Please complete the code repository to make all examples from the paper reproducible and include the code that was used for the comparison to other methods in the benchmarks.

Reviewer #2: The authors have correctly addressed all my comments. I recommend publication of this manuscript.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No: see comments to the Authors

Reviewer #2: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Formally Accepted
Acceptance Letter - Qing Nie, Editor, Alice Carolyn McHardy, Editor

PCOMPBIOL-D-20-02250R1

Mixture-of-Experts Variational Autoencoder for Clustering and Generating from Similarity-Based Representations on Single Cell Data

Dear Dr Claassen,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Katalin Szabo

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .