Peer Review History
| Original SubmissionApril 26, 2022 |
|---|
|
PONE-D-22-12274Sparse Representation Learning Derives Biological Features with Explicit Gene Weights from the Allen Mouse Brain AtlasPLOS ONE Dear Dr. Bartelle, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please, revise the manuscript to accentuate the goals and conclusions/achievements. Please, incorporate biological interpretation of the results. The method section could be revised to give a brief description of all the methods and their associated hyperparameters (with tested ranges). A lot of observations in the paper are simply reported and a follow-up discussion about their potential reasons and implications could improve the narrative. Please, address technical issues raised by the reviewers: Visual clustered representations in the area of the olfactory bulb and nucleus will be less accurate in these regions due to registration issues. The brains were distorted somewhat in these areas through processing. Please, clarify what do you mean by a single coronal slice as referred to on line 144. Application of these methods to sagittal oriented sections may be problematic due to the limited resolution in this orientation. What precisely is the difference in results of SVD versus PCA methods here, as the former is essentially equivalent at full rank. What anatomical features differ? Please, consider providing a basic explanation of how the application of sparse representation learning algorithms work in this application. References to advances in applied information theory should be given. Long sections titles such as “Variance based…” on line 151 are overly descriptive without the readers ability to parse meaning. A more intuitive section title with explaining the result early in the first paragraph would be a little easier on the reader. Have unsupervised representation learning methods never been applied to or studied for the AMBA dataset? If so, why is that the case and how does this paper address it? It seems the hyperparameters for different algorithms have been selected using the performance on ground truth labels. How will these be selected for other datasets? One experiment could be to split the existing dataset into different sets and see if the hyperparameters selected for one set generalize to another. It is important to answer how can SFt (the proposed method) generalize to other similar datasets and be useful for the community? How do the applied methods in the paper help fix the drawback “Initial dimensionality reduction filters for high variance global trends over localized features”? “ AMI scores peaked well below the 574 labeled brain regions, with most fitting optimally at ~200 clusters” - what does it imply for the applicability of these methods in the domain? How is the feature selection done using the sparse methods affected by the correlation structures in the data? Would these methods suffer from identifiability issues due to highly correlated genes? For data compression, how was the threshold selected to filter out the genes at each step? Why was the SFt step run multiple times? Was the original gene list analyzed with different thresholding? What is the significance of the other genes reported in the list of 12 highest weighted genes that are not associated with the brain region (highlighted in orange)? For example, what is the role of Pip5k1a that appears for 2 different regions? Table 1 could be better formatted and maybe have the relevant genes in bold text. Some figures have empty white spaces on the sides. Does the logistic regression task have any class label imbalance? If so, AUPRC score might be worth observing as well. Please submit your revised manuscript by Sep 05 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Gennady S. Cymbalyuk, Ph.D. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: No ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Sparse Representation Learning Derives Biological Features with Explicit Gene Weights from the Allen Mouse Bran Atlas is an interesting and comprehensive view of the relationship between spatial transcriptomic patterns and neuroanatomy. There have been several papers over the years since the release of the Allen Mouse Brain Atlas, but this study is among the most careful and detailed. The authors have a solid understanding of the issues of transcriptomic defined clusters and marker gene identification. To eliminate the need for dimension reduction and clustering the authors explored sparse learning methods for building localized features from minimal numbers of inputs. This topic is also highly relevant to the more modern multiplex spatial transcriptomics data sets and the authors may wish to comment on this point. The authors use the 200micron isotropic voxel-based data from the Allen Atlas as a basis for studying the detailed correspondence of genetics with neuroanatomy over 574 anatomically labelled voxels. The methods studied here provide a natural means of identifying minimal marker sets of genes defining anatomic patterning. The range of k-means clusters used is appropriate and covers up to the right order of the number of voxels. The use of these methods to determine an optimal fitting at around 200 clusters and using sparse methods is an interesting result and may indicate a fundamental level of resolution in the data. Finally, the paper is written rather technically from an informatics perspective for this journal, and I might recommend shortening the work somewhat with a it more biological interpretation for readers. Major • Visual clustered representations in the area of the olfactory bulb and nucleus will be less accurate in these regions due to registration issues. The brains were distorted somewhat in these areas through processing. I’m not certain what the authors mean by a single coronal slice as referred to on line 144. • Application of these methods to sagittal oriented sections may be problematic due to the limited resolution in this orientation. The results of the paper are ample with restriction to the coronal datasets in this reviewer’s opinion. • What precisely is the difference in results of SVD versus PCA methods here, as the former is essentially equivalent at full rank. What anatomical features differ? • The lack of improvement of KPCA to the parcellations is interesting. What is the authors intuition for this result? • The application of sparse representation learning algorithms is interesting and important I this context. It might be helpful for the authors to give a basic explanation of how these methods work in this context for the reader. • The use of this approach to identify more minimal sets of defining gene sets is an important result and highlight of the work. I would make more of this section and produce a comprehensive defining set by anatomic region to the extent possible, together with biological annotations where available. • One important consideration for this work which I think should be remarked discussion and presented as a caveat is that the anatomic labels are determined by neuroanatomists interpretation of the definition of these regions. This itself has variability and potential discrepancy from what might be considered ground truth, whatever that might be. Thus, we are making a comparison with annotations themselves which are potentially inaccurate. This in no way diminishes the importance of the approach but should be remarked. With respect to the last point, a comprehensive set of tables showing how predicted transcriptomic regions using the various approaches intersects with the Allen Reference Atlas anatomy would be useful. A abbreviated form of this could be given in the main paper and the full set supplementally. Minor • References to advances in applied information theory should be given. • Long sections titles such as “Variance based…” on line 151 are overly descriptive without the readers ability to parse meaning. A more intuitive section title with explaining the result early in the first paragraph would be a little easier on the reader. These types of titles are attractive but if they become too long it starts to mix with the actual description in the paper. Reviewer #2: This paper applies and compares various unsupervised representation learning methods - Independent Component Analysis (ICA), Principal Component Analysis (PCA), Kernel PCA (KPCA), Sparse PCA (SPCA), Dictionary Learning and Sparse Components (DLSC), and Sparse Filtering (SFt) - to the transcriptomic data from the Allen Mouse Brain Atlas (AMBA) project. Given the ground truth anatomy labels, the paper evaluates the quality of the representation learned by different algorithms and ways to generate gene lists and compressed information. The paper first demonstrates that out of the applied unsupervised methods (+ K-means to assign anatomical label) SFt gives the best AMI and ARI scores that test if the cluster labels match the ground truth labels. The paper then performs other analyses and makes multiple observations like - (1) SFt overall gives good performance for a variety of evaluation metrics like DICE scores etc. (2) Sparse representation learning methods without secondary clustering provide a ranked gene list of samples (3) The ranked list can be used to perform data compression and accurate supervised classification of regions. While the paper presents potentially useful results, the main contribution of the work is unclear. It would be very useful for the paper to have a single coherent message of how its observations could help the community assess/better analyze the new spatial transcriptomics datasets. Specifically, addressing the following questions/suggestions: Major comments: Have unsupervised representation learning methods never been applied to or studied for the AMBA dataset? If so, why is that the case and how does this paper address it? The paper focuses on only one dataset, making it hard to assess the robustness of these methods. All of the results and conclusions were dependent on the availability of ground truth data. Can the paper include some other datasets to see if similar conclusions can be drawn if the ground truth is hidden? For example, it seems the hyperparameters for different algorithms have been selected using the performance on ground truth labels. How will these be selected for other datasets? One experiment could be to split the existing dataset into different sets and see if the hyperparameters selected for one set generalize to another. It is important to answer how can SFt (the proposed method) generalize to other similar datasets and be useful for the community? How do the applied methods in the paper help fix the drawback “Initial dimensionality reduction filters for high variance global trends over localized features”? I am assuming this was the reason to exclude tSNE and other methods mentioned earlier in the comparison. “ AMI scores peaked well below the 574 labeled brain regions, with most fitting optimally at ~200 clusters” - what does it imply for the applicability of these methods in the domain? How is the feature selection done using the sparse methods affected by the correlation structures in the data? Would these methods suffer from identifiability issues due to highly correlated genes? For data compression, how was the threshold selected to filter out the genes at each step? Why was the SFt step run multiple times? Was the original gene list analyzed with different thresholding? What is the significance of the other genes reported in the list of 12 highest weighted genes that are not associated with the brain region (highlighted in orange)? For example, what is the role of Pip5k1a that appears for 2 different regions? Minor comments: The paper could be revised to highlight the main conclusions and contributions of the work. The method section could be revised to give a brief description of all the methods and their associated hyperparameters (with tested ranges). Table 1 could be better formatted and maybe have the relevant genes in bold text. Some figures have empty white spaces on the sides. A lot of observations in the paper are simply reported and a follow-up discussion about their potential reasons and implications could improve the narrative. Does the logistic regression task have any class label imbalance? If so, AUPRC score might be worth observing as well. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Michael Hawrylycz Reviewer #2: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
Sparse Representation Learning Derives Biological Features with Explicit Gene Weights from the Allen Mouse Brain Atlas PONE-D-22-12274R1 Dear Dr. Bartelle, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Gennady S. Cymbalyuk, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #3: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #3: Partly ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #3: N/A ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #3: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #3: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors have addressed all of my questions and concerns and provided much additional information related to other reviewers critiques. Reviewer #3: This is a would-be interesting paper studying contributions of individual genes to some feature using unsupervised learning methods, using the spatial transcriptomic data and anatomical labels of the Allen Mouse Brain Atlas as a test dataset. In introduction authors mentioned that some other methods have weaknesses, and they suggested that used by authors methods are better. I would not look at these methods as competitors. They are just different tools; it is better to take advantage of each of them rather than using just one that currently give best solution. For example, to find what genes are representing major features it will be better use methods that less sensitive to rare genes. But additionally, one could study how combination of rare or low expressed genes could be related to other features. It would be interesting to compare Table 1 with results of other methods. pp.4-5. Authors wrote: “Initial dimensionality reduction filters for high variance global trends over localized features, offering low sensitivity to rare or low expressing genes (Torgerson, 1952).” This is not weakness of method but rather weakness of dataset, as authors approach offering low sensitivity to less represented features in used dataset. Authors wrote: “the contribution of any one gene to a cluster is not explicit”. Actually, suggested by authors approach as any other approach based on statistics is not explicit as well. Methods section should be before results and discussion. Unreadable characters at page 16, Figure 4, pages 39-41, 45 ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Michael Hawrylycz Reviewer #3: No ********** |
| Formally Accepted |
|
PONE-D-22-12274R1 Sparse Representation Learning Derives Biological Features with Explicit Gene Weights from the Allen Mouse Brain Atlas Dear Dr. Bartelle: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Gennady S. Cymbalyuk Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .