Peer Review History

Original SubmissionMarch 16, 2025
Decision Letter - Nir Ben-Tal, Editor, Arturo Medrano-Soto, Editor

PCOMPBIOL-D-25-00506

AI-Driven Discovery of Novel Extracellular Matrix Biomarkers in Pelvic Organ Prolapse

PLOS Computational Biology

Dear Dr. Mi,

Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

​Please submit your revised manuscript within 60 days Aug 18 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter

We look forward to receiving your revised manuscript.

Kind regards,

Arturo Medrano-Soto, Ph.D.

Guest Editor

PLOS Computational Biology

Nir Ben-Tal

Section Editor

PLOS Computational Biology

Journal Requirements:

1) Please ensure that the CRediT author contributions listed for every co-author are completed accurately and in full.

At this stage, the following Authors/Authors require contributions: Yanlin Mi, Ben Cahill, Venkata VB Yallapragada, Reut Rotem, Barry A O’Reilly, and Sabin Tabirca. Please ensure that the full contributions of each author are acknowledged in the "Add/Edit/Remove Authors" section of our submission form.

The list of CRediT author contributions may be found here: https://journals.plos.org/ploscompbiol/s/authorship#loc-author-contributions

2) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019.

3) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines: 

https://journals.plos.org/ploscompbiol/s/figures

4) Some material included in your submission may be copyrighted. According to PLOSu2019s copyright policy, authors who use figures or other material (e.g., graphics, clipart, maps) from another author or copyright holder must demonstrate or obtain permission to publish this material under the Creative Commons Attribution 4.0 International (CC BY 4.0) License used by PLOS journals. Please closely review the details of PLOSu2019s copyright requirements here: PLOS Licenses and Copyright. If you need to request permissions from a copyright holder, you may use PLOS's Copyright Content Permission form.

Please respond directly to this email and provide any known details concerning your material's license terms and permissions required for reuse, even if you have not yet obtained copyright permissions or are unsure of your material's copyright compatibility. Once you have responded and addressed all other outstanding technical requirements, you may resubmit your manuscript within Editorial Manager. 

Potential Copyright Issues:

- The following Figure contains a logo or branding: Figure 2. We are not permitted to publish this under our CC-BY 4.0 license, even with permission. We ask that you please remove or replace it.

5) Please ensure that the funders and grant numbers match between the Financial Disclosure field and the Funding Information tab in your submission form. Note that the funders must be provided in the same order in both places as well.

- State the initials, alongside each funding source, of each author to receive each grant. For example: "This work was supported by the National Institutes of Health (####### to AM; ###### to CJ) and the National Science Foundation (###### to AM)."

- State what role the funders took in the study. If the funders had no role in your study, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.".

If you did not receive any funding for this study, please simply state: u201cThe authors received no specific funding for this work.u201d

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The study introduces the "Extracellular Matrix Protein Predictor" (EPOP), a transfer learning framework designed to leverage protein language models for identifying disease-specific proteins. Specifically, the research focuses on pelvic organ prolapse (POP), a significant condition affecting a large number of women worldwide. The authors aim to demonstrate how AI-driven protein analysis can uncover new therapeutic targets and enhance our understanding of disease mechanisms. They have developed a fine-tuning protocol for a protein language model (ESM-2) and present their findings that highlight the utility of machine learning in understanding complex pathologies. Overall, this study sounds solid and I personally applaud to this model authors established in implicating protein pathological functions. Despite this case, I have several methodological comments:

The manuscript mentions rigorous manual curation of the positive dataset. However, it lacks specific details on the criteria used for this curation. Clarifying the selection process and the qualifications of the curators would enhance transparency.

The criteria for selecting the negative dataset (46,427 proteins) are not fully explained. While it mentions the use of SignalP and TMHMM to exclude certain proteins, a more detailed rationale for these exclusions would be beneficial. For instance, why were these specific tools chosen, and how do they relate to ECM protein characteristics?

The use of CD-HIT with a 30% sequence identity threshold is a common practice, but the implications of this threshold on the diversity of the dataset should be discussed. A justification for this specific threshold and its impact on model training would provide clarity.

The two-stage transfer learning approach is innovative, but the manuscript should elaborate on the rationale behind the choice of training parameters (e.g., epochs, batch size, learning rate). How were these parameters determined, and were they optimized through preliminary experiments? Providing this information would allow for better reproducibility.

The validation strategy using 20,000 independent sequences is mentioned, but the manuscript does not specify how these sequences were selected or their relevance to the training data. A clearer explanation of the validation set's composition and its relationship to the training data would enhance the robustness of the findings.

The literature-based validation process is mentioned, but the criteria for selecting the curated list of ECM-related proteins should be detailed. How were these proteins identified, and what sources were used?

The manuscript should discuss potential limitations of the interpretability analyses. For instance, how might the findings be affected by the inherent biases of the model or the datasets used?

The manuscript suggests that the EPOP framework could be extended to other ECM-related disorders. However, it should address how the model's findings might generalize to different contexts or populations. Are there specific considerations or adaptations needed for applying this model to other diseases?

Reviewer #2: This paper presents EPOP, a novel AI-based framework that uses fine-tuned protein language models (ESM-2) for predicting ECM-related proteins in the context of pelvic organ prolapse (POP). The manuscript is highly novel, technically strong, well-structured, and written in a clear and engaging style. However, the paper would benefit significantly from addressing the following major and minor issues.

1- 99.7% accuracy is repeatedly emphasized. This is extremely high and may not generalize to real-world datasets. There’s no test on independent, external datasets (e.g., from another tissue, experiment, or species). Include external validation to test generalizability or tone down the performance claims.

2-No wet-lab or histological validation for novel biomarkers (e.g., EMILIN-1). Acknowledge this limitation more clearly in the discussion. Suggest an experimental pipeline for validation.

3- Negative class = "proteins lacking ECM-related GO terms" and filtered by SignalP/TMHMM. This may introduce annotation bias and exclude many unknown ECM-related proteins from the negative set. Add a discussion about potential bias and the implications on classification fairness.

4- A threshold of 0.6 for peptide classification is used without statistical justification.

5-Discussion of novel proteins (e.g., EMILIN-1, G protein subunit beta-1) is too brief and speculative.

6- Was k-fold cross validation used during training? Please specify

Reviewer #3: Comments to the authors:

This study introduces EPOP (Extracellular Matrix Protein Predictor), a deep learning framework specifically designed for ECM protein classification and disease mechanism discovery. The authors apply a sophisticated two-stage transfer learning approach using the ESM-2 protein language model, fine-tuned on large curated datasets comprising over 86,000 proteins. The model integrates attention mechanisms and interpretability modules and is rigorously validated using 20,000 independent sequences and clinical proteomics data, with a focus on pelvic organ prolapse (POP) as a clinically relevant application.

EPOP achieves impressive classification accuracy of 99.7%, significantly outperforming existing Transformer and LSTM architectures. Beyond performance, the model identifies 10 novel disease-associated ECM proteins, including EMILIN-1, and reveals new patterns of ECM remodeling in disease. Importantly, interpretability analyses provide biological insights into sequence motifs and structural features critical to ECM function. Ablation studies further confirm the robustness of the architecture.

The manuscript is well-written and presents a thoughtful approach to identifying POP-associated ECM proteins. The study is methodologically sound, and the results are compelling, with only a few minor clarifications needed for improved transparency.

Minor Comments and Suggestions:

1. There are minor typographical errors in the manuscript at several points, including Figure 4 caption, lines 142, 263, 285, and 305.

2. Threshold Selection Rationale (Line 142): The manuscript mentions a confidence threshold of 0.6 for ECM classification. While this value appears reasonable, the rationale behind its selection is not provided. I recommend specifying whether this threshold was chosen based on a particular statistical criterion, such as ROC curve optimization, Youden's index, or empirical tuning on a validation set.

3. Performance Evaluation of ECM Function Prediction Models (Lines 179-206): The comparative performance analysis of the ECM function prediction models is informative. However, in cases like the LSTM model, where recall is high but precision is low, the inclusion of the F1-score would provide a more balanced view of model performance. Please consider adding this metric to Table 1 or the main text for completeness.

4. Prediction Results of POP-Associated ECM Proteins (Lines 207-249):

o The phrase "no significant change" should be clearly defined. Was this determined by a fold-change cutoff, a p-value threshold, or another statistical test? Including this information will improve the interpretability of the results.

o When noting that certain proteins "were not previously reported", it would be helpful to clarify whether this determination was made through a systematic literature review, database query (e.g., DisGeNET, PubMed), or both.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

Figure resubmission:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions.

Reproducibility:

To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Attachments
Attachment
Submitted filename: Comments to the authors.docx
Revision 1

Attachments
Attachment
Submitted filename: Response_to_Reviewers.pdf
Decision Letter - Nir Ben-Tal, Editor, Arturo Medrano-Soto, Editor

Dear Miss Mi,

We are pleased to inform you that your manuscript 'AI-Driven Discovery of Novel Extracellular Matrix Biomarkers in Pelvic Organ Prolapse' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Arturo Medrano-Soto, Ph.D.

Guest Editor

PLOS Computational Biology

Nir Ben-Tal

Section Editor

PLOS Computational Biology

***********************************************************

Reviewer #1:

Reviewer #2:

Reviewer #3:

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Authors have addressed my all the concerns. Good job.

Reviewer #2: After carefully reviewing the revised manuscript and the authors’ detailed responses, I confirm that all of my previous comments have been fully addressed. The revision adds the requested methodological clarifications, strengthens validation and analysis where appropriate, and improves the clarity and reproducibility of the presentation. These changes resolve my earlier concerns and materially enhance the rigor and contribution of the work. I have no further substantive requests and recommend the manuscript be accepted for publication.

Reviewer #3: I thank the authors for their thoughtful and thorough revisions. The authors have adequately addressed the previously noted concerns.

1. The typographical errors have been corrected, and the manuscript has been proofread.

2. The threshold selection for ECM classification is now statistically justified based on F1-score optimization, which is clearly explained in the revised text.

3. The inclusion of F1-scores in Table 1 and discussion provides a more balanced and comprehensive evaluation of model performance.

4. The clarifications regarding expression pattern definitions and the systematic literature/database review for identifying novel POP-associated ECM proteins greatly improve the clarity and transparency of the work.

Overall, the revisions substantially strengthen the manuscript. I have no further concerns.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: None

Reviewer #2: Yes

Reviewer #3: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Zhiyi Chen

Reviewer #2: No

Reviewer #3: No

Formally Accepted
Acceptance Letter - Nir Ben-Tal, Editor, Arturo Medrano-Soto, Editor

PCOMPBIOL-D-25-00506R1

AI-Driven Discovery of Novel Extracellular Matrix Biomarkers in Pelvic Organ Prolapse

Dear Dr Mi,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Judit Kozma

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .