Semantic code clone detection using hybrid intermediate representations and BiLSTM networks

M. Shahbaz Ismail; Sara Shahzad; Fahmi H. Quradaa

doi:10.1371/journal.pone.0340971

Peer Review History

Original SubmissionOctober 9, 2025
26 Nov 2025 Decision Letter - Sajid Anwar, Editor PONE-D-25-54710 Semantic Code Clone Detection Using Hybrid Intermediate Representations and BiLSTM Networks PLOS ONE Dear Dr. Quradaa, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jan 10 2026 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Sajid Anwar, Ph.D Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please note that PLOS One has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 3. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. 4. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: I Don't Know Reviewer #2: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Based on the reviewer’s feedback, the manuscript titled “Semantic Code Clone Detection Using Hybrid Intermediate Representations and BiLSTM Networks” requires further enhancement through the inclusion of relevant and studies. It is recommended to review and integrate insights from the following articles to strengthen the theoretical background and related work section of the paper: 1. https://www.edupij.com/index/arsiv/79/801/exploring-the-artificial-intelligence-era-in-influencing-self-paced-learning-systematic-and-bibliometric-review-of-literature 2. https://doi.org/10.1007/978-3-031-80334-5_11 Reviewer #2: Summary The paper presents a deep learning approach for detecting semantic code clones by using a mix of two intermediate representations, Baf and Jimple, both produced through the Soot framework. The idea behind combining them is that Baf captures low-level structural details while Jimple provides a clearer, higher-level view of the code. When used together, they offer a more complete picture of how the code actually behaves. To analyze the code fragments, a Siamese BiLSTM model with an attention layer is used so the network can focus on important patterns and relationships. The model was trained and tested on the BigCloneBench dataset and performed strongly across different clone categories. It reached around 97 percent recall and similar F1 scores even for the more challenging semantic clone types like WT3 and WT4. The results show that the hybrid IR method works noticeably better than using either Baf or Jimple alone and also performs better than many traditional clone detection tools and recent deep learning models. Review The paper gives a clear and well organized contribution to the area of semantic code clone detection. The idea of using a mix of Baf and Jimple is creative and helps connect low level structure with higher level meaning in the code. The reasons for choosing these two representations are explained well, and the experiments support the choice. The model that uses a Siamese BiLSTM with attention adds more depth because it allows the system to focus on important parts of the code and understand context better. The evaluation is strong and covers different types of clones, and the comparisons with existing tools make the results more reliable. There are still a few areas where the paper could improve. One is scalability, since working with very large datasets or real industry-level code could be costly in terms of computation. Another point is that even though transformer models are mentioned as future work, adding at least one comparison with a transformer method would make the study more complete. Still, the work is carefully done, and it makes a meaningful contribution to software engineering and code analysis. Strengths 1. The work introduces a new approach by combining Baf and Jimple, which helps capture both the structure of the code and its deeper meaning. 2. The neural model, built with a Siamese BiLSTM and an attention layer, is able to learn important relationships between code fragments. 3. The method is tested well and shows strong results even on the harder clone categories, including MT3 and WT3 and WT4. 4. The system performs better than many existing clone detection tools, both older ones and newer deep learning models. 5. The approach is useful for real software tasks such as maintenance, refactoring, detecting malware, and finding security issues. Weaknesses 1. The study works only with Java code, so the findings may not fully apply to other programming languages. 2. The paper does not give much detail about how well the method scales when dealing with very large codebases or heavy workloads. 3. Even though transformer models are mentioned for future research, the study does not compare its results with any transformer-based approaches, which leaves an important gap. 4. The model still behaves like a black box, and the attention mechanism does not fully explain how decisions are being made. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Zeyad Ghaleb Al-Mekhlafi Reviewer #2: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] To ensure your figures meet our technical requirements, please review our figure guidelines: https://journals.plos.org/plosone/s/figures You may also use PLOS’s free figure tool, NAAS, to help you prepare publication quality figures: https://journals.plos.org/plosone/s/figures#loc-tools-for-figure-preparation. NAAS will assess whether your figures meet our technical requirements by comparing each figure against our figure specifications. https://doi.org/10.1371/journal.pone.0340971.r001
Revision 1
16 Dec 2025 Author Response Reviewer Response Report I would like to express my sincere gratitude to the Editor and the reviewer. Your valuable feedback and insights are greatly appreciated, and we are committed to addressing all your comments and suggestions to enhance the quality and impact of our work. Editor's Journal comments and responses • Comment 1: [Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at….] Response : We would like to inform you that we have adhered to PLOS ONE's style requirements as outlined in the PLOS ONE style templates. • Comment 2: [Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work.] Response : Thank you for your concerns regarding the PLOS ONE guidelines on code sharing requirement. We assure you that we are fully committed to transparency and are willing to provide any details about the source code to reviewers upon request. • Comment 3: [If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise.] Response : We thank the reviewer for the suggestion. We reviewed the two recommended works, found them relevant, and have added them to the revised manuscript.. • Comment 4: [Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.] Response : We have carefully reviewed the reference list to ensure that it is complete and accurate. Kindly refer to the revised version of the manuscript for these updates. Comments from Reviewer 1 and Responses • Comment 1: [The paper gives a clear and well organized contribution to the area of semantic code clone detection. The idea of using a mix of Baf and Jimple is creative and helps connect low level structure with higher level meaning in the code. The reasons for choosing these two representations are explained well, and the experiments support the choice. The model that uses a Siamese BiLSTM with attention adds more depth because it allows the system to focus on important parts of the code and understand context better. The evaluation is strong and covers different types of clones, and the comparisons with existing tools make the results more reliable..] Response : Thanks for your positive comments on our paper. Your comments and suggestions are responded to below Comment 2: [The study works only with Java code, so the findings may not fully apply to other programming languages. ] Response : We thank the reviewer for this valuable comment. We have clearly stated in the manuscript that our study focuses on the Java programming language and provided justification for this choice. We have also clarified in the Threats to Validity and Limitations section that our findings are primarily applicable to Java and may not directly generalize to other programming languages. Comment 3: [The paper does not give much detail about how well the method scales when dealing with very large codebases or heavy workloads.] Response : We thank the reviewer for this insightful comment. The primary focus of this work is to propose and evaluate a novel code representation that combines two intermediate representations. A full scalability study was beyond the scope of this paper. We have clarified this in the revised manuscript and also highlighted scalability evaluation as an important direction for future work in the conclusion section. Comment 4: [Even though transformer models are mentioned for future research, the study does not compare its results with any transformer-based approaches, which leaves an important gap.] Response : We thank the reviewer for this valuable comment. In this work, our primary objective was to investigate the effectiveness of the proposed code representation and its integration with BiLSTM-based models. A direct comparison with transformer-based approaches was beyond the scope of this study due to differences in model architectures and computational requirements. We have clarified this limitation in the manuscript and strengthened the discussion by explicitly positioning transformer-based models as an important direction for future work. Comment 5: [The model still behaves like a black box, and the attention mechanism does not fully explain how decisions are being made.] Response : We thank the reviewer for this important comment. To address this concern, we have substantially revised the manuscript and added detailed explanations of the internal computations of the BiLSTM model, particularly on pages 14–17. These additions clarify how hidden states are computed and how attention weights are derived, thereby improving the transparency and interpretability of the model and reducing its black-box nature. Attachments Attachment Submitted filename: Response to Reviewers.docx https://doi.org/10.1371/journal.pone.0340971.r002
30 Dec 2025 Decision Letter - Sajid Anwar, Editor Semantic Code Clone Detection Using Hybrid Intermediate Representations and BiLSTM Networks PONE-D-25-54710R1 Dear Dr. Quradaa, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Sajid Anwar, Ph.D Academic Editor PLOS One Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: All comments have been addressed ******** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: Yes ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: Yes ****** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes ****** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: (No Response) ****** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No ******** https://doi.org/10.1371/journal.pone.0340971.r003
Formally Accepted
Acceptance Letter - Sajid Anwar, Editor PONE-D-25-54710R1 PLOS One Dear Dr. Quradaa, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS One. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Sajid Anwar Academic Editor PLOS One https://doi.org/10.1371/journal.pone.0340971.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .