OpDetect: A convolutional and recurrent neural network classifier for precise and sensitive operon detection from RNA-seq data

Rezvan Karaji; Lourdes Peña-Castillo

doi:10.1371/journal.pone.0329355

Peer Review History

Original SubmissionMay 6, 2025
6 May 2025 Author Response Transfer Alert This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present. https://doi.org/10.1371/journal.pone.0329355.r001
6 Jun 2025 Decision Letter - Ivan S Petrushin, Editor PONE-D-25-24402OpDetect: A convolutional and recurrent neural network classifier for precise and sensitive operon detection from RNA-seq dataPLOS ONE Dear Dr. Peña-Castillo, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. We'are sorry for the delay with the review. Please submit your revised manuscript by Jul 21 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Ivan S Petrushin, Ph.D Academic Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Review of OpDetect: A convolutional and recurrent neural network classifier for precise and sensitive operon detection from RNA-seq data Summary The authors develop OpDetect, a deep learning approach for detecting operons in a speices-agnostic manner using RNA-sequencing (RNA-seq data). They use only genome sequences, RNA-seq data and existing operon annotation, and no additional information. They treat RNA-seq as you would signal data, and process it as such to maximize information utilization. They demonstrate that their approach outperforms state-of-the-art algorithms on recall, F1 and AUROC. Overall, there is always the need to continually improve predictive biology programs as new data and cutting-edge algorithms come into play, and operon identification, particularly in understudied species, still requires significant research, so this work is relevant to microbiologists and computational biologists, and addresses an area of need. While the approach is distinct from anything that has been attempted and produces high metrics, a few fundamental concerns need to be addressed before publication can be recommended. Additional details are provided below, but in summary, first the authors need to justify their use of their methodology more thoroughly. Second, the metrics used are skewed by their very low numbers of ‘non-operon’ genes, and evaluation on such a unbalanced dataset can affect these metrics significantly. Third, the use of C. elegans data has to be justified with the biology of C. elegans operons in mind, which could change the interpretation of the data significantly. Major points 1) The combination of two architectures to properly capture the complexity of RNA-seq data is appreciated. However, the justification is not clear. For example, did the authors try a CNN or LSTM alone and find reduced performance? The inclusion of an attention layer is interesting, but it seems like there is a missed opportunity for cross-attention or dual-attention here when asking about the relationship of RNA-seq data between two genes and an intergenic region – why the choice of self attention? Also, is the LSTM required or would more attention layers perform the same function? 2) The small set of non-operons is a potential issue, both with training and testing. I understand the motivation behind only wanting to use experimentally validated non-operons (I assume this is the case? The authors mention the reverse case of a predicted operon being missed, but I assume the reason there are so few non-operons is because most of them are not experimentally validated to the satisfaction of the authors?). The validity of the F1 score will indeed be impacted by this as mentioned, and while using AUROC and recall are good options, they will not tell you the extent to which non-operon calls are valid with this algorithm. It would be worth considering testing performance on putative non-operons, just to demonstrate the broader applicability of the algorithm outside of confirmed non-operons. 3) The observation with C. elegans operon identification is very interesting but raises a number of concerns that the authors need to address. C. elegans operons are fundamentally biologically different from prokaryotic ones. While prokaryotic polycistronic mRNAs are not processed, but rather the genes on the mRNA are individually translated, the operon mRNAs from C. elegans are processed co-transcriptionally before translation. The other methods evaluated in this study are focused on prokaryotic operons, not eukaryotic ones, and are tailored towards detecting those. This then leads to the question of how stably detectable C. elegans operon RNAs are, and whether the RNA preparation involved any selection (eg. polyA, which would likely preclude these intermediate RNAs, or ribo-depletion, which might preserve them though they would remain transient, or neither, which would call into question the quality of the dataset altogether). The C. elegans dataset referenced does not have this level of detail, and does not have an associated study, so clarification is needed here. As such, I wonder if another non-C.elegans eurkaryotic dataset would predict operons? And given the scarcity of non-operons and the methods used to evaluate, it’s very difficult to tell whether these predictions are valid. So before claiming that this method can find the C. elegans operons, I would do a lot of due diligence with respect to: • How the RNA from C.elegans was collected and processed, and as such are you likely to find these polycistronic mRNAs in the sample? • Does the algorithm predict operons in higher organisms known not to have operons? 4) The authors do not differentiate between data leakage and using the same organism for training and testing. While there is validity in training on a set of organisms and testing on a different, unseen organism, using the same organism but different datasets for training and testing is not data leakage or cross-validation. The authors should address this discrepancy and re-evaluate their assessments in the data leakage section. Minor points 1) All vectors are re-sampled to a fixed size of 150. While this is a technique used with image data, the reconstruction fidelity in RNA-seq data is not clear. Visualization of this resampled data, or some other quantitative metric to assess this would be helpful. 2) The inclusion of a label to account for lack of annotation is interesting, but redundant (i.e. how is this different from just removing unannotated data?). 3) Line 141 – I do not understand the meaning of this threshold (and it is not immediately obvious from skimming the referenced paper). 4) Table 3 – why use lambda layers instead of another convolution (first lambda) or fully connected linear layer (second lambda)? Or is that lambda layer a wrapper for just that? A justification would be helpful. 5) Table 7 – recall is an incomplete metric. I would include precision, specificity, F1 (or a subset at least) so the reader can get a better picture of overall performance. 6) All figures were very low resolution and difficult to read. 7) The observation about non-contiguous genes within an operon is interesting, but not enough discussion is there on why existing programs do not identify these and OpDetect does. It is likely due to the requirement for two genes to be adjacent, not a performance issue of existing programs. If this is the case, it should be clarified. Also, with the gene pairs in Table 9, there should be a column stating whether the intervening antisense gene is expressed or not (i.e. is there detectable signal in the intergenic region?). Reviewer #2: The paper titled "OpDetect: A Convolutional and Recurrent Neural Network Classifier for Precise and Sensitive Operon Detection from RNA-seq Data" presents a novel, species-agnostic deep learning approach for operon detection using RNA-seq data and sequence information. However, a number of questions arise: 1)Could the authors clarify how they construct the gene pair vector in cases where the intergenic region is smaller than 150 base pairs? 2)While the architecture proposed by the authors is highly parameter-efficient and its hyperparameters are well justified, it remains an open question whether this architecture is truly optimal. Could the authors conduct an ablation study to show that modifying the number of CNN layers or removing certain components of the neural network leads to worse performance? 3)I suggest that the authors report additional metrics on the set of operons that all methods were able to predict. The procedure described by the authors—where “if a method failed to make a prediction for a gene pair, we considered this as predicting that the gene pair was a non-operon and assigned a probability of 0.49”—may influence the comparison results. Clarification on how this assumption affects the evaluation would be valuable. 4)The reformulation of the operon detection problem as a binary classification task is somewhat limiting, as it hinders the ability to capture more complex scenarios—such as when a model fails to predict some intermediate genes within a multi-gene operon, thereby compromising the full operon prediction. Could the authors comment on their choice of metric and perhaps provide counterarguments to this concern? 5)It would also be interesting to conduct an ablation study on the features provided to the network. Is RNA-seq data alone sufficient for accurate operon prediction, or does the nucleotide sequence contribute significantly to model performance? 6)If the nucleotide sequence plays an important role in prediction, could the authors investigate which parts of the sequence the model considers most informative? Are there specific motifs that the model relies on for prediction? This would be especially intriguing given the model’s ability to generalize to previously unseen eukaryotic species. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Dr. Penzar Dmitry ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. Attachments Attachment Submitted filename: PLoSOne_OpDetect_Review.docx https://doi.org/10.1371/journal.pone.0329355.r002
Revision 1
9 Jul 2025 Author Response Our response to reviewers has been uploaded as a separate file. Attachments Attachment Submitted filename: OpDetect_ResponseToReviewers.pdf https://doi.org/10.1371/journal.pone.0329355.r003
16 Jul 2025 Decision Letter - Ivan S Petrushin, Editor OpDetect: A convolutional and recurrent neural network classifier for precise and sensitive operon detection from RNA-seq data PONE-D-25-24402R1 Dear Dr. Peña-Castillo, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Sincerely, Ivan S Petrushin, Ph.D Academic Editor PLOS ONE Additional Editor Comments: Please check, that all references contain the DOI or URL if available. https://doi.org/10.1371/journal.pone.0329355.r004
Formally Accepted
Acceptance Letter - Ivan S Petrushin, Editor PONE-D-25-24402R1 PLOS ONE Dear Dr. Peña-Castillo, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Ivan S Petrushin Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0329355.r005

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .