Towards robust medical machine olfaction: Debiasing GC-MS data enhances prostate cancer diagnosis from urine volatiles

Adan Rotteveel; Wen-Yee Lee; Zoi Kountouri; Nikolas Stefanou; Howard Kivell; Clifford Gluck; Shuguang Zhang; Andreas Mershin

doi:10.1371/journal.pone.0314742

Peer Review History

Original SubmissionNovember 16, 2024
16 Nov 2024 Author Response https://doi.org/10.1371/journal.pone.0314742.r001
14 Feb 2025 Decision Letter - Li Yang, Editor PONE-D-24-52113Towards robust machine olfaction: debiasing GC-MS data enhances prostate cancer diagnosis from urine volatilesPLOS ONE Dear Dr. Rotteveel, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Mar 31 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Li Yang, M.D. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for stating the following financial disclosure: “startup RealNose.ai has supported part of this project. Co-authors involved in startup: A Rotteveel (as a contractor) C Gluck N Stefanou A Mershin Startup: RealNose Inc realnose.ai The paper was mostly written by startup staff.” Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. 3. Thank you for stating the following in the Competing Interests section: “The authors have read the journal’s policy and have the following competing interests: Co-authors Rotteveel, Stefanou, Gluck and Mershin have a financial interest in startup RealNose.ai which has supported part of this project.” Please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials, by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests). If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared. Please include your updated Competing Interests statement in your cover letter; we will change the online submission form on your behalf. 4. In the online submission form, you indicated that “The data underlying the results presented in the study are available from Prof Wen-Yee Lee (contact via email wylee@utep.edu)” All PLOS journals now require all data underlying the findings described in their manuscript to be freely available to other researchers, either 1. In a public repository, 2. Within the manuscript itself, or 3. Uploaded as supplementary information. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If your data cannot be made publicly available for ethical or legal reasons (e.g., public availability would compromise patient privacy), please explain your reasons on resubmission and your exemption request will be escalated for approval. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly Reviewer #3: Yes Reviewer #4: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No Reviewer #3: Yes Reviewer #4: No ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: No Reviewer #3: Yes Reviewer #4: No ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This is amazing work. Although, as the limitations rightly acknowledge, the true mimicry of the original concept may be too far off at this point, the attempt itself to apply both MS and ML to olfactory senses in diagnosing PCa is a brilliant idea and fully worth pursuing in the future. However, I do recommend some additional corrections to improve readability. 1. Workflow of data. As with most ML papers, a comprehensive workflow of how the data was processed would improve understanding. A diagram depicting the 4th paragraph of the methods would help improve understanding. 2. CNN for MS signals? My understanding of the methods, thus, may be unclear, but is it correct for me to understand that you have processed the MS signals through Resnes as images? While GS-MS outputs are, in someway, 2 dimensional, it could hardly be considered an 'image' that should/could be processed through a CNN, as the 2 dimensional array is only something based on charge/size and doesn't convey any comprehensive information of individual identities of the molecules in the array. For instance, wouldn't a configuration of similarly distributed charge/size molecules, but in an entirely unrelated molecular composition be deemed similar? TL;DR: Please, help me understand the wisdom behind applying 2 dimensional image analysis to a molecular array. 3. Small sample. Primarily as a urologic surgeon, I am more curious as to why the authors chose to use such a diverse patient group. Limiting them to a limited Gleason Score group might have homogenized the characteristics better. 400 or so samples are too few to train a model. Could you offer a reply? Thank you. Reviewer #2: This study proposes an innovative approach to enhance machine olfaction for prostate cancer (PCa) diagnosis using gas chromatography-mass spectrometry (GC-MS) data. The authors introduce a novel debiasing method to address source-related biases and employ machine learning techniques to classify urinary samples. The topic is highly relevant, especially given the need for non-invasive diagnostic tools for PCa. The manuscript is well-structured and clearly presents the motivation, methodology, and results. However, there are several areas where the manuscript can be improved to meet the standards of a high-impact international journal. These areas relate to methodological transparency, scientific rigor, and presentation. 1. The introduction is well-written but could benefit from a stronger emphasis on the novelty of the "scent character" approach 2. Provide more detail on the configuration of the Empirical Bayes debiasing technique, such as the exact distributions used and how they were validated. 3. Elaborate on how hyperparameters for the convolutional neural network (CNN) and ResNet18 were selected. Was there any hyperparameter tuning process? 4. Clarify the rationale for transforming GC-MS data into 3D images instead of other approaches, such as feature extraction or embeddings. 5. Provide access to the Python code or pseudo-code of the debiasing pipeline to ensure reproducibility. 6. Provide a brief explanation of how the “emergent scent character” relates to standard biomarkers and how it advances existing methodologies. 7. Potential data imbalance and demographic bias could influence the model's performance. Quantify and report the impact of these imbalances on model performance using metrics like accuracy, precision, recall, and F1. 8. Provide a confusion matrix and classification report, including sensitivity, specificity, and AUC, for a complete evaluation of model performance. 9. Improve the clarity of figures by adding descriptive legends and using distinct colors for different categories. Reviewer #3: 1) The abstract is not expressing the novelty of the proposed approach. The whole abstract is not impressive and needs to be rewritten. Should focus on what is problem, why it is important to be solved. How it is solved and what are findings. 2) Introduction section requires reorganization and missing the novelty of the proposed approach, hence, the author is suggested to rewrite the introduction section. 3) The author is requested to include literature review section within the manuscript and here should be some lines of text between the main and sub heading. This rule should be follow in whole paper. 4) Data characteristic should be place in experiments or result section. In the meantime, the methodology section needs the revision and major attention of the author. 5) Results section needs major revision which demonstrates the training as well the testing results. In the meantime the impacts of each consider parameter in a detailed manner which improves the validity of the approach and quality of the manuscript. Reviewer #4: The research is inspiring. Suggest it be accepted. However, improvement is needed: For example: Readability: (1)Prostate cancer (PCa) is the second most frequent cancer globally in men in 2022, while being the first most frequent cancer for men in 118 countries (2)Conventional prostate-specific antigen (PSA) blood tests and digital rectal examinations (DRE) are widely used at the initial detection stage of prostate cancer, when combined, the probability of prostate cancer evading detection, when the results are within normal levels, is only 10% [5]. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Jin Wook Kim Reviewer #2: No Reviewer #3: No Reviewer #4: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0314742.r002
Revision 1
4 Mar 2025 Author Response Summary of major changes based on all four reviewers’ feedback Stronger Emphasis on Novelty: Clearly articulated how our scent character approach moves beyond traditional VOC-based biomarker identification by creating an olfactory fingerprint in perceptual space (a Synesthetic Memory Object representing scent character and its recording) instead of relying on molecular composition as a proxy for scent character. Justification for Machine Vision Techniques Applied to Machine Olfaction: Expanded our discussion to explain why GC-MS chromatograms were transformed into structured images, leveraging CNNs for feature extraction, akin to perceptual olfaction. Expanded Literature Review: Added references to key machine olfaction, VOC biomarker research, and CNN applications in mass spectrometry, contextualizing our contributions. Detailed Explanation of Empirical Bayes Debiasing: Specified the exact statistical distributions used and provided validation metrics in the Methods section. Clarification of Model Training & Hyperparameters: Described the grid search hyperparameter tuning process and reported validation results in supplementary material. Bias Quantification & Model Evaluation: Provided confusion matrices, precision-recall metrics, and F1 scores to assess data imbalance and potential demographic bias. Improved Readability & Figures: Revised multiple sections for clarity, added two new figures, enhanced figure legends, and improved category differentiation using symbols and colors. One-By-One Responses to All Reviewer’s Comments Reviewer #1 1. Workflow of data. As with most ML papers, a comprehensive workflow of how the data was processed would improve understanding. A diagram depicting the 4th paragraph of the methods would help improve understanding. We thank Reviewer #1 for this suggestion and they are rightly asking for a clearer visual representation of our data processing workflow. We have now added a workflow diagram (see FIGURE 4) summarizing the key steps from raw GC-MS data to the final classification model in the Methods section to enhance clarity. 2. CNN for MS signals? My understanding of the methods, thus, may be unclear, but is it correct for me to understand that you have processed the MS signals through ResNet as images? While GS-MS outputs are, in some way, 2-dimensional, it could hardly be considered an 'image' that should/could be processed through a CNN, as the 2-dimensional array is only something based on charge/size and doesn't convey any comprehensive information of individual identities of the molecules in the array. For instance, wouldn't a configuration of similarly distributed charge/size molecules, but in an entirely unrelated molecular composition, be deemed similar? Reviewer #1’s perspective here is indeed important to address: we acknowledge that GC-MS data differs from conventional images in many ways; however, the transformation into a 2D representation is done to exploit the well-established feature extraction techniques in computer vision which at its core is a primary source-agnostic methodology. We note that electrical current amplitudes over time over different sensors/pixels/frequency bins are still just a time series of data points to an algorithm and adding more rows just adds dimensionality to the signal. Most of the image-manipulation techniques can be used in machine olfaction upon a transformation of the data but not all would be necessarily useful to us here. The reason CNNs were seen as suitable in this case is that we can expect to capture meaningful structural patterns in the chromatograms, akin to how CNNs detect edges and textures in visual data. The key motivation is that the emergent scent character is not a direct mapping of molecular identity or physicochemical parameters but rather a more holistic representation of the data in perceptual space (PubMed). While two unrelated molecular compositions could have similar charge/size distributions, our approach includes domain-adapted pre-processing techniques to preserve diagnostically relevant signals that might not easily correspond to any one recipe of molecules in analytical chemistry space or charge/mass space since those relationships do not guide olfactory receptor responses. The physics of the interactions between the structures is non-obvious because unlike crystallography-determined protein structures (when the proteins are in a crystal they are largely unmoving and not in their natural, membrane-bound, with aqueous intracellular and extracellular environments in contact with a continuously moving set of five extracellular loops- state). In biological olfaction the membrane protein receptors are labile and are constantly bombarded by thermal noise at 37oC, meaning that their interactions with ligands both specific and not are highly dynamic. In other words, knowing the molecular composition is not the same thing as knowing what something smells of. Those two are related but not by a one-to-one function. A closer analogy is that of photon wavelength to color perception is first recorded by the photoisomerization of a retinol molecule- held as an antenna for light with peak absorption determined by the bioprogrammable sequence of amino acids in the Rhodopsin GPCR expressed in the membrane of retinal cone cells (a tellingly similar system to the GPCRs involved in olfaction the difference being the role of retinol is being played by the odorant). That photon might be right in the middle of the -say- red absorption peak of 633nm delivered by a laser to a human eye, but registering “red” in the human mind is not at all guaranteed by having the wavelength be what we have designated to be the very middle of “red”. The color that is actually experienced depends on context (such as total intensity of light and contrast between shapes surrounding the 633nm stimulus), and both historical and immediate prior experience (such as photobleaching) play a role as to whether one reports “red” or “green” or “gray”. Similarly with shapes, and sounds, we can have a cartoon version of the Mona Lisa and an off-key version of Beethoven's fifth where not a single brushstroke or frequency are the same yet the pieces are instantly recognizable in their transformed form. Similar situation exists with scent! In the world of scent perception you can write “banana” using one molecule or many, the scent character can be recreated in many molecular “dialects” using many different molecules at different concentrations. Reviewer 1 makes the point well that this approach needs clarification, so we have added a version of the above justification text in the methodology section explaining why CNNs, particularly ResNet18, were chosen. 3. Small sample. Primarily as a urologic surgeon, I am more curious as to why the authors chose to use such a diverse patient group. Limiting them to a limited Gleason Score group might have homogenized the characteristics better. 400 or so samples are too few to train a model. Could you offer a reply? This is an excellent point, and we now more explicitly acknowledge the limitations of sample size in machine learning. However, the goal of our study was not purely to train a deployable diagnostic classifier but we were primarily interested in testing the feasibility of an emergent scent character approach and stumbled upon the existence of havoc-causing bias that we then had to learn how to remove efficiently while keeping the useful signal intact. That is the main thrust behind the innovation here. Our dataset was intentionally chosen to be diverse to assess whether patterns in volatile compounds generalize across different Gleason score groups. If we had restricted the dataset, we might have overfitted to a narrower spectrum of cases, limiting the model’s broader applicability. That said, we agree that stratifying the dataset by Gleason score could provide additional insights, and we have to this end included a short analysis discussion in the results section breaking down model performance by Gleason score groups. Reviewer #2 1. The introduction is well-written but could benefit from a stronger emphasis on the novelty of the "scent character" approach. We thank Reviewer #2 for this suggestion. We have thoroughly revised the introduction to better highlight the novelty of our approach, specifically how the emergent scent character framework moves beyond traditional molecular biomarkers by creating a holistic olfactory fingerprint: “scent character” rather than a list of compounds. We have also addressed a similar point raised by Reviewer #1 in the methodology and combined the responses to “what is novel” and “why machine vision” into the same discussion, and also expanded on these concepts elsewhere in the body of the paper. 2. Provide more detail on the configuration of the Empirical Bayes debiasing technique, such as the exact distributions used and how they were validated. We welcome the opportunity to provide additional transparency as our goal is for others to reproduce and build upon the findings here so in response to this comment we have now added details on the priors used (gamma distribution for multiplicative batch effects, normal distribution for mean shifts) and provide validation metrics for the bias removal step in the Methods section. 3. Elaborate on how hyperparameters for the convolutional neural network (CNN) and ResNet18 were selected. Was there any hyperparameter tuning process? Indeed as Reviewer #2 notes here hyperparameters were selected, we did so via grid search with cross-validation. We have now added to the supplementary information a section outlining the hyperparameter space explored (learning rate, optimizer, batch size). 4. Clarify the rationale for transforming GC-MS data into 3D images instead of other approaches, such as feature extraction or embeddings. We agree that alternative approaches, such as feature extraction, embeddings, or even others such as reverse autostereography could have been used and we hope to explore these and many others in future work. The scope of the current paper was limited so our rationale for choosing the image transformation modality was mainly immediacy of implementation and ease of use by others wishing to copy or expand upon our work such was our reasoning leading us to leverage well-established CNN architectures while avoiding premature assumptions about which features are most relevant. We have now clarified this choice in the text by explaining that other modalities can and should be explored as future directions. We do not claim to have exhausted the potential of available tools for this. 5. Provide access to the Python code or pseudocode of the debiasing pipeline to ensure reproducibility. We apologize for this omission and have now provided a pseudo-code representation of our debiasing pipeline in the supplementary information materials. 6. Provide a brief explanation of how the “emergent scent character” relates to standard biomarkers and how it advances existing methodologies. We have added a discussion section explicitly comparing our approach to traditional biomarker-based diagnostics, emphasizing how dogs do not classify based on molecular lists but on holistic scent characters, that survive changes to the volatilome of urine —and how our thinking was influenced by the analogy to how melodies and images are stored and encoded by brains as opposed to how lists of names and numbers are encoded. 7. Potential data imbalance and demographic bias could influence the model's performance. Quantify and report the impact of these imbalances on model performance using metrics like accuracy, precision, recall, and F1. Reviewer #2 here makes an important point and we thank them for highlighting this oversight on our part. We have now included an appropriate confusion matrix, precision-recall metrics, and F1 scores in the results section. Reviewer #3 1) The abstract is not expressing the novelty of the proposed approach. The whole abstract is not impressive and needs to be rewritten. We appreciate this feedback from Reviewer #3 and it echoes others’ comments and we have taken this to heart and as a result completely re-written not just the abstract but also the entire introduction and significant parts of the rest of the text to better highlight the novelty of the scent character approach and its departure from conventional molecular biomarker methods. 2) Introduction section requires reorganization and is missing the novelty of the proposed approach. In response to this we have now re-written the introduction to clearly state the gap in current GC-MS-based PCa detection and how our approach is an interesting way to explore towards addresses the problems of current methods. 3) The author is requested to include a literature review section. We have now expanded the previously too brief literature review in our manuscript to include key studies on machine olfaction, volatile organic compound (VOC) biomarkers, and the application of convolutional neural networks (CNNs) in mass spectrometry. These additions provide a comprehensive context for our research and highlight the advancements in these fields. Machine Olfaction and VOC Biomarkers: The integration of canine olfaction with chemical and microbial profiling has shown promise in detecting lethal prostate cancer through urinary VOC analysis. Guest et al. (2021) demonstrated the feasibility of this approach, indicating its potential for non-invasive diagnostics and form the core of our motivation for this work PLOS. As well as subsequent work on comparing dogs to machine olfactors PubMed. Additionally, Warli et al. (2023) conducted a systematic review and meta-analysis on the olfactory ability of medical detection canines to identify prostate cancer from urine samples. Their findings support the potential of VOC profiling in non-invasive cancer detection. WJON. Regarding past applications of CNN to Mass Spectrometry: deep learning techniques, particularly CNNs, have significantly advanced the analysis of mass spectrometry data, for instance, Wang et al. (2020) introduced MSpectraAI, a platform using deep neural networks to analyze proteome profiles from mass spectrometry data across multiple tumor types, achieving high prediction accuracy. BMC Bioinformatics Furthermore, Hu et al. (2022) developed a self-supervised clustering approach using contrastive learning to analyze mass spectrometry imaging data. This method effectively identifies molecular colocalizations without manual annotations, enhancing the understanding of biochemical pathways. RSC Publishing These studies collectively underscore the advancements in machine olfaction-adjacent methods such as VOC biomarker research, and the application of CNNs in mass spectrometry, providing the literature foundation for our work discussed here. Reviewer #4 The research is inspiring. Suggest it be accepted. However, improvement is needed. We thank reviewer #4 for their enthusiasm and for the actionable, positive feedback. We appreciate your support for this work! We too feel this is only the beginning! (1) Readability: Certain sentences need restructuring for clarity. We have taken this and the other reviewers’ critique of our readability to heart and revamped our prose (none of us are native English speakers but we feel the new version is much improved and has now passed our readability level check. (2) Improve clarity of figures by adding descriptive legends and using distinct colors for different categories. We have enhanced figure legends and added two figures to better explain our logic and process and have paid particular attention to making sure the different categories look sufficiently distinguishable to the reader by using symbols in addition to colors. Attachments Attachment Submitted filename: PLoS One Debiaser GC-MS rebuttal letter-2.pdf https://doi.org/10.1371/journal.pone.0314742.r003
13 Apr 2025 Decision Letter - Li Yang, Editor PONE-D-24-52113R1Towards robust machine olfaction: debiasing GC-MS data enhances prostate cancer diagnosis from urine volatilesPLOS ONE Dear Dr. Rotteveel, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by May 28 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Li Yang, M.D. Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: All comments have been addressed Reviewer #3: All comments have been addressed Reviewer #4: All comments have been addressed ******** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ****** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ****** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: The authors have satisfactorily addressed all of the concerns and suggestions raised in my previous review. The revised manuscript demonstrates a clear improvement in response to the feedback, and the authors have provided adequate explanations for all modifications made. Reviewer #3: (No Response) Reviewer #4: All the concerns have been accepted. Suggest it be accepted. However, pls note that some descriptions can be optimized. One example: Various studies have identified different lists of VOCs that all show a significant correlation with cancer in their specific datasets, yet these sets of VOCs often differ completely drastically and do not generalize from one study to another . ****** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No Reviewer #3: No Reviewer #4: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0314742.r004
Revision 2
15 Apr 2025 Author Response Dear Dr. Li, Thank you for your careful review of our manuscript. In response to your comments regarding the corrections needed in order to publish it: specifically for the erroneously referenced papers, we now have made the following revisions: James et al. (2024): We have updated the reference list to include the erratum concerning the minor figure correction. In the text, we now cite both the original article and the erratum to ensure full transparency. Luedemann et al. (2022): Recognizing the correction issued in 2023, we have updated the reference list and updated the citation. Salinas et al. (2024): For this paper, we have also recognized the erratum, and updated both the reference list and updated the citation. Removed a duplicate reference (from Yang et al) As reviewer # 4 asked we have also combed through the text for sentences such as the one they highlighted to make sure they are optimized. Thank you all again, we are eager to see our work published as open-access material for all to see in PLoS One On behalf of the co-authors and with much appreciation for you, the reviewers, and PLoS One, Attachments Attachment Submitted filename: #2 PLoS One Debiaser GC-MS rebuttal letter.pdf https://doi.org/10.1371/journal.pone.0314742.r005
16 Apr 2025 Decision Letter - Li Yang, Editor Towards robust machine olfaction: debiasing GC-MS data enhances prostate cancer diagnosis from urine volatiles PONE-D-24-52113R2 Dear Dr. Rotteveel, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Li Yang, M.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Thanks for the authors' efforts to comprehensively improve your manuscript according to editor's and reviewers' comments. I am pleased to inform you that your paper can be accepted for publication now. Thanks for the chance to assess your interesting and important work. Additionally, many thanks for all the reviewers' precious inputs. Reviewers' comments: https://doi.org/10.1371/journal.pone.0314742.r006
Formally Accepted
Acceptance Letter - Li Yang, Editor PONE-D-24-52113R2 PLOS ONE Dear Dr. Rotteveel, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Li Yang Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0314742.r007

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .