Semi-supervised Bayesian integration of multiple spatial proteomics datasets

Stephen Coleman; Lisa Breckels; Ross F. Waller; Kathryn S. Lilley; Chris Wallace; Oliver M. Crook; Paul D. W. Kirk

doi:10.1371/journal.pcbi.1013799

Peer Review History

Original SubmissionJanuary 17, 2025
7 May 2025 Decision Letter - John Parkinson, Editor, Mark Alber, Editor PCOMPBIOL-D-25-00095 Semi-supervised Bayesian integration of multiple spatial proteomics datasets PLOS Computational Biology Dear Dr. Coleman, Thank you for submitting your manuscript to PLOS Computational Biology. First I would like to apologise for the delay in the review process, this was largely a function of finding willing and knowledgeable reviewers. As you will see from the reports, both reviewers appreciated the value of your study and recommend only relatively minor edits to improve the clarity of manuscript, provide additional useful context and promote adoption of your method. Consequently we invite you to submit a revised version of the manuscript that addresses the excellent points the reviewers raised. Please submit your revised manuscript within 30 days Jul 07 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. We look forward to receiving your revised manuscript. Kind regards, John Parkinson Guest Editor PLOS Computational Biology Mark Alber Section Editor PLOS Computational Biology Journal Requirements: 1) Please ensure that the CRediT author contributions listed for every co-author are completed accurately and in full. At this stage, the following Authors/Authors require contributions: Stephen Coleman, Lisa Breckels, Ross F Waller, Kathryn S Lilley, Chris Wallace, Oliver M Crook, and Paul D.W. Kirk. Please ensure that the full contributions of each author are acknowledged in the "Add/Edit/Remove Authors" section of our submission form. The list of CRediT author contributions may be found here: https://journals.plos.org/ploscompbiol/s/authorship#loc-author-contributions 2) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019. 3) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines: https://journals.plos.org/ploscompbiol/s/figures 4) We have noticed that you have uploaded Supporting Information files, but you have not included a list of legends. Please add a full list of legends for your Supporting Information files after the references list. 5) Thank you for stating "All data is available publicly from previous publications". Please provide a complete Data Availability Statement in the submission form, ensuring you include all necessary access information. If your research concerns data from external sources, please amend your Data Availability Statement to include the full links to the data. 6) Please amend your detailed Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published. 1) Please clarify all sources of financial support for your study. List the grants, grant numbers, and organizations that funded your study, including funding received from your institution. Please note that suppliers of material support, including research materials, should be recognized in the Acknowledgements section rather than in the Financial Disclosure 2) State the initials, alongside each funding source, of each author to receive each grant. For example: "This work was supported by the National Institutes of Health (####### to AM; ###### to CJ) and the National Science Foundation (###### to AM)." 3) State what role the funders took in the study. If the funders had no role in your study, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." 4) If any authors received a salary from any of your funders, please state which authors and which funders. 7) Your current Financial Disclosure states, "The author(s) received no specific funding for this work." However, your funding information on the submission form indicates receiving funds. Please ensure that the funders and grant numbers match between the Financial Disclosure field and the Funding Information tab in your submission form. Note that the funders must be provided in the same order in both places as well. Reviewers' comments: Reviewer's Responses to Questions Reviewer #1: The manuscript presents an innovative and well-supported approach to integrating spatial proteomics datasets using Bayesian models. The methodology is rigorous, and the results are compelling. Minor revisions to improve clarity, provide additional justification for certain methodological choices, and better contextualize findings in the broader field would strengthen the manuscript significantly. Abstract Line 20-21: The statement "existing approaches... do not quantify uncertainty" is strong. Consider specifying which methods lack this feature to prevent overgeneralization. Ensure "semi-supervised Bayesian approach" is briefly explained for a broader audience. Line 24: What types of the data? Introduction Lines 78–88: Consider restructuring the paragraph that discusses multi-omic integration challenges to emphasize why Bayesian methods are particularly suited for this task. A direct comparison between past integrative methods and the proposed approach would help contextualize the contribution. Lines 95–97: The sentence “Our approach is applicable beyond MS-based spatial proteomics...” is strong but could be supported with an example outside proteomics. Materials and Methods The section introducing Gaussian mixture models (GMM) is mathematically rigorous but slightly dense. Equations (6)–(9) are helpful, but it would be beneficial to provide a brief practical interpretation of each parameter. lines 177–197 could include a brief comparison to other integration models or simpler methods into the introduction part. Results Simulation Study The explanation of different generative models (Gaussian, MVT, Log-Poisson) is well-structured, but an intuitive justification for their selection (e.g., why these specific distributions?) would be helpful. Figure 3 conveys important insights, but the differences between semi-supervised MDI and overfitted semi-supervised MDI should be better explained. Discussion The limitations section is somewhat brief. Potential areas to mention: Computational costs of methods, including MCMC for large datasets (big O complexity). Challenges in parameter selection and prior specification. Reviewer #2: This manuscript by Coleman et al. addresses the problem of finding the subcellular localization of proteins in spatial proteomics experiments. Specifically, “LOPIT” type mass spectrometry experiments involves separating cellular proteins into sub-cellular fractions through multi-step centrifugation, followed by quantification of each fraction’s TMT profile using MS. Each organelle would have a different profile due to different density. The computational task then is to find the latent sub cellular localization from the TMT profile and potentially estimate uncertainty. This is typically done using supervised or semi-supervised classification approaches. Here, the authors propose a multiple dataset integration framework that incorporates gaussian process modeling to incorporate different types of auxiliary data (categorical, time-series, etc.). This allows spatial proteomics mass spectrometry data to be combined with other data types (such as prior literature annotations) to improve classification performance. This work expands on a series of prior papers from the authors. Most notably, Crook et al. PLoS Comp Biol 2018 which introduced the Bayesian mixture model approach to analyze TMT fraction profiles, and also introduced the outlier T-distribution to catch non-conforming data (TAGM approach). Crook et al. PLoS Comp Biol 2020 then extended this approach toward semi-supervised discovery of new cluster/localization. Crook et al. Annals of Applied Statistics 2022 introduced the Gaussian Process mixture modeling approach. Breckels et al. PLoS Comp Biol 2016 introduced the KNN-TL classifier to include GO terms in compartment assignment. Overall this looks to be an excellent paper that addresses an important need. As the authors stated, the sub cellular localization of proteins is a major determinant of their function. How to determine the dynamic localization of proteins in an unbiased and context-specific manner remains an incompletely solved problem. The proposed approach here appears to be robust and well justified. The authors demonstrate performance using both simulated data and existing experimental data sets. The rationale and notations are well explained, and the manuscript is well written. Another major strength is the availability of an R package so other investigators can take advantage of this advance. I have no problem recommending acceptance, and only have the following minor comments. These are not necessary for acceptance but the authors’ considerations. Comments: 1. The primary goal of this work, as I understand it, or at least as laid out in the abstract/introduction, is to improve on the inference of protein sub cellular localization. This appears to have evolved somewhat as the manuscript progresses to the T. gondii results section. It would be nice to have more details on why the authors opted to incorporate the time-series data as opposed to other types of data (e.g., co-expression or interactome data), and intuitively how this may help with localization assignment. When the protein assignment changes this way (e.g., ERK7 or the dense granule proteins), does it indicate a change in actual sub cellular localization, or that different proteins within the same localization can take on different function or temporal behavior? 2. Since the AOAS 2022 paper already described the GP model in some detail, it would be nice to have more clarity on how the current method compares, and also include the semi-supervised GP model as comparison along with the TAGM mixture models in benchmarking (e.g., in Figures 3-5). Also as far as I know, previous methods from the authors already had the capacity to integrate multiple independent sets of LOPIT TMT data. To show the value of multi datatype integration, it would be nice to directly compare whether incorporating 3. Did the authors investigate the effect of GO categorical annotation on difficult-to-localize outliers, e.g., if a protein is annotated to be in either the cytosol or the nucleus in GO or UniProt, would the performance gain in protein assignment reflect this by preferentially localizing a protein to those compartments? It would be interesting as well if the authors could comment on whether this integration can help the assignment of differential localization between the two annotated compartments. 4. I was not completely clear on how accuracy and other performance metrics were calculated in the validation study. Were the true positives used to calculate these metrics picked from holdout markers that were then not used to train the supervised/semi-supervised models? As far as I know training the TAGM mixture model requires 20-30% of proteins to be designated as markers and how the markers are chosen can have an effect on the classification outcome. Did the authors investigate whether the change in markers affected the TAGM mixture model and the MDI approach differently? (For instance, it is not clear how the use of GO annotations for integration affects marker selection. Is manual marker selection still required for the MDI method, or can literature annotation be used to skip manual markers?) 5. Can the authors explain in greater details what the phi parameter is and how it is estimated, and how 12, 8, 4 were chosen for the simulation study? 6. A major strength here is the implementation of an R package but the Github of MDIr appears to contain little to no documentation. While the repository is not strictly part of the paper, I believe this should be fixed to make the paper more useful to the readers. Besides documentations and/or tutorial, it would also be useful to have more discussion on how users may apply their own data - e.g., computational requirements, what type of mass spec data is allowed, how to tune parameters, etc. 7. It would be useful to have additional discussion on how robust the method is to data input (e.g., some existing LOPIT data are rather sparse with low mass spec depth, others may use suboptimal or misannotated markers, etc. etc.) 8. Some typos/formatting errors, e.g., line 37 time-course; supplement line 56 and 105, missing ref. ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions. Reproducibility:** To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1013799.r001
Revision 1
4 Nov 2025 Author Response Attachments Attachment Submitted filename: Response to Reviewers.pdf https://doi.org/10.1371/journal.pcbi.1013799.r002
30 Nov 2025 Decision Letter - John Parkinson, Editor, Mark Alber, Editor Dear Mr Coleman, We are pleased to inform you that your manuscript 'Semi-supervised Bayesian integration of multiple spatial proteomics datasets' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Mark Alber, Ph.D. Section Editor PLOS Computational Biology Mark Alber Section Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have addressed my concerns Reviewer #2: The authors have addressed my previous (minor) comments to my satisfaction. I applaud the authors for this excellent contribution. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No https://doi.org/10.1371/journal.pcbi.1013799.r003
Formally Accepted
Acceptance Letter - John Parkinson, Editor, Mark Alber, Editor PCOMPBIOL-D-25-00095R1 Semi-supervised Bayesian integration of multiple spatial proteomics datasets Dear Dr Coleman, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. For Research, Software, and Methods articles, you will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Anita Estes PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1013799.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .