Peer Review History

Original SubmissionMay 17, 2024
Decision Letter - Agnieszka Konys, Editor

PONE-D-24-19871Predicting Software Reuse using Machine Learning Techniques— A Case Study on Open-source Java Software SystemsPLOS ONE

Dear Dr. Yee Yen,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please carefully review the comments from the reviewers and make improvements to the manuscript. The reviews provide details about areas that need enhancement.

Please submit your revised manuscript by Sep 20 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Agnieszka Konys, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Please amend your list of authors on the manuscript to ensure that each author is linked to an affiliation. Authors’ affiliations should reflect the institution where the work was done (if authors moved subsequently, you can also list the new affiliation stating “current affiliation:….” as necessary).

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Summary of the paper:

This paper aims to automate software reuse prediction through leveraging ML algorithms so that future research and practitioners can identify highly reusable software. Data consisting of software metrics are extracted from Maven artefacts, fitted into classification and regression models to predict, and estimate software reuse. The ground truth to software reuse in this study is the detection of code clones in popular GitHub projects.

Comments:

The problem this paper addresses is both interesting and significant within the field of software engineering. The quality of

English in the paper is commendable. Additionally, the paper provides links to their tool and the dataset collected, which is beneficial. However, I recommend making the source codes publicly available to enhance transparency and reproducibility.

The paper’s structure could be improved. I suggest reorganizing it into sections: "Research Questions," "Experiment Design," "Experiment Setup," and "Experiment Results," with subsections for "Quantitative Results" and "Qualitative Results." Currently, the research questions are presented as Section 1 of the paper. I recommend just mentioning them briefly in the introduction section and then moving the detailed discussion of the research questions to the proposed approach section.

The current introduction is overly lengthy, containing background information that can be moved to a separate background section. The introduction should instead focus on summarizing the proposed approach, existing problems in related work, the contributions of the paper, and the structure of the paper.

The acknowledgement section contains pseudo text which should be removed or corrected.

The abstract is vague regarding the use of cross-project code clones in the proposed approach. The authors need to clarify this, specifically detailing its relationship with identifying software artifacts suitable for reuse.

Figure 1 lacks informativeness and could be omitted.

A software system may include many different software artifacts. Thus, the authors need to precisely specify for what software artifacts they want to predict the reusability.

Several technical details of the proposed approach are missing. For instance, the list of features in Section 3.2.7, the justification for selecting K=4 in K-fold validation, and the reasoning for choosing only two levels (High and Low) for reusability and not including a Moderate level.

The evaluation section misses an empirical study to gauge user perceptions of the proposed approach. Including interviews with expert developers to identify important features and comparing their feedback with the paper’s findings could provide valuable insights.

The paper primarily uses traditional machine learning techniques as mentioned in Section 3.3. It would be beneficial to explore newer deep learning or modern LLM-based techniques, as these have shown promising results in various software engineering tasks.

Currently, the paper provides the quantitative results of the experiments but lacks the qualitative results. The quantitative results need to be qualitatively analyzed to offer better insights.

Overall, the paper addresses a crucial problem in software engineering with a well-written manuscript. However, it requires significant restructuring, additional details, and a broader experimental evaluation as well as exploration of modern techniques to enhance its comprehensiveness and impact.

Reviewer #2: The paper is mostly well written and makes an important contribution. It is well-deserving of publication provided the authors make the following small changes:

1. Explain in more deatail the limitations of your approach for example Java only projects.

2. Move figures and tables closer to where they are first referenced.

3. Please test the application at https://reuse-assessment.streamlit.app/Evaluate with invalid input such as empty strings. When I did that, the application did not gracefully failed and did not give an appropriate error.

4. Please make your dataset available publicly for reproducibility and benefit of other researchers.

English corrections:

1. 'The ground truth of software resuse ...'.

2. '40% of reused code'.

3. 'such a manual approach is not readily available and is expensive'.

4. 'an existing software'.

5. Use correct commas in 'utilisation of code developed by third parties'.

6. 'an artefact'

7. Use correct commas around "35"

8. Correct commas around 'Maven Usages'.

9. Correct commas around 'Usages'.

10. 'Yellow line' appears 'red line' in Fig. 6.

11. Duplicate sentence: 'Cross-validation splits are performed at this stage to avoid data leakage'.

12. Shouldn't the acronyms be all uppercase? KBF, RFI?

13. Put equation (3) after it is first referenced, and not before.

14. In Fig. 7 caption, should it be 'R' or 'RR? In the previous references you use 'R' but here you are using 'RR'. Be consistent.

15. 'Practitioners can use these pipeline methods ...'

16. Appendix number is missing in last line on Page 39.

17. '... these findings further validate the advantage ...'

18. Appendix number missing on Line 1082, Page 42.

19. Provide a mathematical formula for this computation: 'Table 7 shows the top important features after aggregating the feature importance score weighted by the F1-score of the cross-validation instance and the overall F1-score of the model'.

20. The section under Acknowledgments has garbage text. Please remove it or put an actual acknowledgement.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Waseem Sheikh

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Revision 1

Rewrote introduction to be made more concise. Separated the first part of initial “Proposed approach and experimental setup” into 2 sections – “experimental design” and “experimental setup”.

Made some amendments in the 2nd paragraph of the threads to validity section.

Figures and tables were move closer to where they were referenced.

Updated Fig 6.

Added appendices which were initially missing.

Added equation above table 7 to make the importance calculation clearer.

Attachments
Attachment
Submitted filename: Response to Reviewers.docx
Decision Letter - Agnieszka Konys, Editor

PONE-D-24-19871R1Predicting Software Reuse using Machine Learning Techniques— A Case Study on Open-source Java Software SystemsPLOS ONE

Dear Dr. Yee Yen,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

I kindly ask you to post a response to each reviewer's review.

Please submit your revised manuscript by Nov 18 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Agnieszka Konys, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments:

I kindly ask you to post a response to each reviewer's comments.

[Note: HTML markup is below. Please do not edit.]

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Revision 2

Point to point response to reviewers has been uploaded.

Attachments
Attachment
Submitted filename: Response to Reviewers.docx
Decision Letter - Agnieszka Konys, Editor

Predicting Software Reuse using Machine Learning Techniques— A Case Study on Open-source Java Software Systems

PONE-D-24-19871R2

Dear Dr. Yee Yen,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Agnieszka Konys, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Formally Accepted
Acceptance Letter - Agnieszka Konys, Editor

PONE-D-24-19871R2

PLOS ONE

Dear Dr. Yee Yen,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Agnieszka Konys

Academic Editor

PLOS ONE

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .