Supervised deep learning with gene functional annotation for cell classification

Zhexiao Lin; Yuanyuan Gao; Wei Sun

doi:10.1371/journal.pcbi.1014327

Peer Review History

Original SubmissionJanuary 27, 2026
1 Mar 2026 Decision Letter - Peng Wei, Editor PCOMPBIOL-D-26-00188 Supervised deep learning with gene annotation for cell classification PLOS Computational Biology Dear Dr. Sun, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by May 01 2026 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter We look forward to receiving your revised manuscript. Kind regards, Peng Wei, Ph.D. Academic Editor PLOS Computational Biology Marc Birtwistle Section Editor PLOS Computational Biology Additional Editor Comments: Your manuscript has been assessed by our reviewers who are experts in the field. Based upon these reviews, we would be willing to consider a revision. You would need to compare and evaluate the proposed method with both standard benchmark methods and more recently proposed method such as scNET. In addition, please address reviewers’ comments on clarification of the primary objective, filtering prior to GNN, and interpretation of real data analysis results. Journal Requirements: If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. 1) Please ensure that the CRediT author contributions listed for every co-author are completed accurately and in full. At this stage, the following Authors/Authors require contributions: Yuanyuan Gao, Zhexiao Lin, and Wei Sun. Please ensure that the full contributions of each author are acknowledged in the "Add/Edit/Remove Authors" section of our submission form. The list of CRediT author contributions may be found here: https://journals.plos.org/ploscompbiol/s/authorship#loc-author-contributions 2) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019. 3) We have noticed that you have uploaded Supporting Information files, but you have not included a list of legends. Please add a full list of legends for your Supporting Information files after the references list. 4) Please amend your detailed Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published. 1) State the initials, alongside each funding source, of each author to receive each grant. For example: "This work was supported by the National Institutes of Health (####### to AM; ###### to CJ) and the National Science Foundation (###### to AM)." 2) State what role the funders took in the study. If the funders had no role in your study, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." 3) If any authors received a salary from any of your funders, please state which authors and which funders.. If you did not receive any funding for this study, please simply state: u201cThe authors received no specific funding for this work.u201d 5) Please send a completed 'Competing Interests' statement, including any COIs declared by your co-authors. If you have no competing interests to declare, please state "The authors have declared that no competing interests exist". Otherwise please declare all competing interests beginning with the statement "I have read the journal's policy and the authors of this manuscript have the following competing interests" Reviewers' comments: Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The manuscript introduces Supervised Deep learning with gene ANnotation (SDAN), a graph neural network (GNN)-based framework designed for the supervised classification of cells and individuals using single-cell RNA-sequencing (scRNA-seq) data. The core motivation is to move beyond traditional gene-by-gene differential expression (DE) analysis—which often yields long, uninterpretable lists of genes with small effect sizes—toward identifying functionally coherent gene sets that directly optimize classification performance. SDAN integrates gene expression profiles with gene-gene interaction networks (e.g., BioGRID) and employs a graph pooling operation to cluster genes into latent components. These components serve as "gene sets" used to classify cell states (e.g., severe vs. mild COVID-19) and are subsequently aggregated to provide individual-level clinical predictions. My comments are listed below: Major: 1. SDAN has a similar design as scNET (Ron Sheinin et al. Nature Methods. 2025): both using scRNA-seq data and gene-gene interaction as input into a graph neural network for better functional annotation of scRNA-seq across cell types or biological conditions. The authors should compare scNET with SDAN, and discuss the strengths and weaknesses of each methods. 2. The authors state that they pre-select marker genes using a one-sided Mann-Whitney U test to facilitate training. While this reduces computational overhead, it may introduce a "winner's curse" where the model only builds gene sets around genes that already show strong individual separation. A key promise of GNNs is the ability to pick up on subtle, coordinated signals from connected nodes. The authors should discuss whether SDAN can identify meaningful gene sets if pre-selection is relaxed, or provide a sensitivity analysis on how the pre-selection threshold affects the discovery of novel biology. 3. While the source code for SDAN is provided on GitHub, I encountered difficulties in creating a functional runtime environment due to unresolved dependency conflicts. To ensure the method is accessible to the broader research community, I strongly recommend that the authors distribute SDAN as a formal Python package (e.g., installable via pip or conda). At a minimum, the authors must provide a comprehensive requirements.txt or environment.yml file with strictly pinned versions for all dependencies to ensure long-term reproducibility. Minor: 1. The term “gene annotation” is inappropriately used. According to Ensembl, Gene annotation is the plotting of genes onto genome assemblies, and indexing their genomic coordinates (http://useast.ensembl.org/info/genome/genebuild/index.html). In the manuscript, it will be better described as “gene-gene interaction”, “protein-protein interaction”, or similar terms. 2. For figure 5, “Hoyer sparsity” is described in the legend, but “hyper sparsity” is shown in the figure. Are they the same thing? In addition, the evaluation metrics, including Hoyer sparsity and quantiles of edges per component, should be clearly defined in the main Methods section, and explained why these metrics indicate better performance. 3. The authors compared SDAN, Spectra, and sciRED across multiple datasets. It would be beneficial to include a direct comparison of the "functionally coherent" gene sets found by each method for the same biological process (e.g., show the gene weights of a biological pathway known to be involved in COVID-19, dementia or immunotherapy in all three methods) to demonstrate the "unambiguous assignment" claimed in the abstract. Reviewer #2: Please see the attachment for review. Reviewer #3: This study presents SDAN, a computational framework which incorporates graph neural network to learn gene set assignments for the classification of cell-level and patient-level metadata. While the manuscript evaluates SDAN on three datasets and benchmarks it against two existing methods, the presentation of the results could be strengthened to more clearly highlight the model’s novelty and its performance advantages. Below are my detailed comments. Major Comments: 1. The main motivation of this work is to solve the effect size issue of differential expression (DE) analysis arising from the large sample size typical of scRNA-seq data, where even genes with minimal biological effect can achieve extremely small p-values. To mitigate this, the study proposes learning gene set assignments based on the top DE genes identified. While this is an interesting direction, it would be helpful to more clearly explain how this strategy substantively mitigates the stated issue. Specifically, because genes are initially selected based on statistical significance, genes with negligible biological relevance may still be maintained. Conversely, genes that do not meet the initial DE threshold are excluded entirely and cannot be reconsidered later, potentially introducing an irreversible selection bias. It would strengthen the manuscript to provide additional justification or empirical evidence demonstrating that the proposed framework meaningfully alleviates the effect size concern. 2. Figure 2C&F, 3A-B & C_D show the cell-level and individual-level precision score respectively. However, Figure 2F and 3A&B’s x-axis and y-axis is not labeled, which makes interpretation difficult. In addition, only the distribution of prediction scores is shown, while the classification performance is not presented. It would be more informative to report the classification metrics, such as AUC score, of SDAN and compare it to the benchmarking methods. 3. The study reports gene set enrichment analysis results to demonstrate the biological relevance of the identified gene sets, it remains unclear how these gene sets are intended to replace or improve upon traditional DE analysis. The manuscript would benefit from a clearer explanation of how gene set–level outputs provide comparable or superior interpretability relative to standard DE results. Additionally, some gene sets may contribute to both classes. For example, mild and severe condition of COVID-19 may share similar pathway, but the degree of contribution can be different. Clarifying how SDAN captures and quantifies such differential contributions would enhance the interpretability claims of the framework. 4. An interesting aspect of SDAN is that the learned mapping matrix S is cell independent. In principle, this suggests that the model could be applied to other datasets that share the same gene measurements. Beyond evaluating performance through a train–test split within the same dataset, it would be valuable to assess the generalizability of SDAN on more biologically heterogeneous datasets, such as across independent studies. Such analysis would provide stronger evidence of the robustness and transferability of the learned representation. 5. In lines 171-173, the manuscript states that “Combining pTau and amyloid beta measurement may give a better characterization of the dementia-i donors (Fig 3(H)), though a larger sample size is needed to confirm this conclusion.” Does the current modeling framework allow for the integration of two or more cell types simultaneously, and if so, how would it be extended to accommodate such multi-cell-type inputs? Minor comments 1. The gene-gene network is constructed from PPI. It is unclear how this information is mapped or utilized at the gene level in the proposed framework. Clarification on this point would be helpful. ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: While revising your submission, we strongly recommend that you use PLOS’s NAAS tool (https://ngplosjournals.pagemajik.ai/artanalysis) to test your figure files. NAAS can convert your figure files to the TIFF file type and meet basic requirements (such as print size, resolution), or provide you with a report on issues that do not meet our requirements and that NAAS cannot fix. After uploading your figures to PLOS’s NAAS tool - https://ngplosjournals.pagemajik.ai/artanalysis, NAAS will process the files provided and display the results in the "Uploaded Files" section of the page as the processing is complete. If the uploaded figures meet our requirements (or NAAS is able to fix the files to meet our requirements), the figure will be marked as "fixed" above. If NAAS is unable to fix the files, a red "failed" label will appear above. When NAAS has confirmed that the figure files meet our requirements, please download the file via the download option, and include these NAAS processed figure files when submitting your revised manuscript. Reproducibility:** To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Attachments Attachment Submitted filename: Review.pdf https://doi.org/10.1371/journal.pcbi.1014327.r001
Revision 1
30 Apr 2026 Author Response Attachments Attachment Submitted filename: SDAN_responses.pdf https://doi.org/10.1371/journal.pcbi.1014327.r002
12 May 2026 Decision Letter - Peng Wei, Editor Dear Dr. Sun, We are pleased to inform you that your manuscript 'Supervised deep learning with gene annotation for cell classification' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Peng Wei, Ph.D. Academic Editor PLOS Computational Biology Marc Birtwistle Section Editor PLOS Computational Biology ********************************************************* We appreciate the authors' efforts in addressing the reviewers' comments. Reviewer 1 has some minor suggestion regarding the use of "gene function". Please consider changing it if possible. Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have addressed most of my comments. I suggest changing “gene annotation” to either “gene functional annotation”, “functional annotation” or “gene-gene interaction” throughout the manuscript depending on the context. This avoid the confusion with the gene structural annotation, which is also commonly known as “gene annotation” in genetics and bioinformatics studies. Reviewer #2: The authors have addressed my questions. Reviewer #3: Thank you for your revised manuscript and detailed response letter addressing my previous comments. I appreciate the efforts made to improve the manuscript. I have no additional comments. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: None Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No https://doi.org/10.1371/journal.pcbi.1014327.r003
Formally Accepted
Acceptance Letter - Peng Wei, Editor PCOMPBIOL-D-26-00188R1 Supervised deep learning with gene annotation for cell classification Dear Dr Sun, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. For Research, Software, and Methods articles, you will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Anita Estes PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1014327.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .