Peer Review History

Original SubmissionNovember 2, 2023
Decision Letter - Sathishkumar Veerappampalayam Easwaramoorthy, Editor

PONE-D-23-36197A refined approach for evaluating small datasets via binary classification using machine learningPLOS ONE

Dear Dr. Steinert,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jan 05 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Sathishkumar Veerappampalayam Easwaramoorthy

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Please upload a copy of Supporting Information Figure/Table/etc. Supporting information S1 table, S2 table, S3 table and S4 table which you refer to in your text on pages 11 and 12.

4. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors presented a method for evaluating binary classification problems with small datasets. The problem is highly significant , but I have few concerns that need to be addressed.

1. The authors have not clearly specified the background of the problem and main contributions of the work.

2. The introduction section need to be properly organized with more clarity.

3. The experiment needs more regorios analysis. Moreover, hypertuning experiment as well as Ablation study can be included.

4. I suggest to add an additional experiment via error analysis that is usually helpful for model trst8ng with diverse conditions.

5. Have authors considered model evaluation via cross-domains? My suggestion is do more experimentation through cross- domain with small datasets.

6. Finally, use a real-time case of both images and other datasets to measure efficacy of proposed method

Reviewer #2: Dear Authors,

The manuscript "A refined approach for evaluating small datasets via binary classification using machine learning" presents an interesting topic by suggesting the use of a pre-processing of small datasets before using the Support Vector Machine Learning technique.

In this context, another widespread non-parametric technique is Random Forest. It would be interesting to mention this technique in the introduction chapter and highlight the reason for choosing the SVM algorithm.

To help you improve your manuscript, I offer the following recommendations:

1) It is necessary to review punctuation, especially the use of commas. Revise long paragraphs as they tend to confuse the reader. I've pointed out in the comments on the digital file the passages where the writing should be revised.

2) When acronyms are presented for the first time in the text, their meaning should be presented.

3) Highlight the objectives of the study in chapter 1.

4) Lines 62 to 85: a flowchart would illustrate the methodology used more effectively.

5) Line 106: "We do not examine dataset sizes below 25 data points because the performance is already not good for 25 data points". Please detail this restriction.

6) Line 153: Were the SVM parameters determined empirically? Please detail.

7) Line 188: "However, the focus of this work is on the qualitative behavior and the demonstration of the evaluation methodology and, therefore, fewer permutations are used for reasons of computational time." The parameter computational time appears repetitively in the text, making it clear that fewer permutations were chosen at the expense of less computational time. In this case, the use of cloud computing could solve this restriction.

8) Detail the applications used, as well as the hardware used.

9) The chapter reserved for discussions explores the results achieved. However, when compared with the results obtained by the authors cited in the references, the superior results achieved by the proposed methodology could be better substantiated.

10) Line 432: it would be interesting to create a chapter on conclusions. What are your recommendations for future studies?

I end my review by congratulating you on your study.

Respectfully,

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Arvind Selwal

Reviewer #2: Yes: Marcos Benedito Schimalski, Professor at Santa Catarina State University, Brazil.

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachments
Attachment
Submitted filename: PONE-D-23-36197.pdf
Revision 1

Dear reviewers and editor,

we would like to thank you for your helpful feedback to improve the quality of our paper. We have substantially revised the manuscript to follow your suggestions and hope that the current state is to your satisfaction.

Below we would like to address the points you raised.

Academic editor:

1. Please ensure that your manuscript meets PLOS ONE’s style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Answer:

We use the POS ONE Latex template (https://journals.plos.org/plosone/s/file?id=9a7a/plos_latex_template_v3.6.zip) and checked

that everything fulfils the style criteria. If this is not the case we would be glad if you could point us to some issues.

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

Answer:

Thank you for this comment. We shared our code using a static reference link (https://osf.io/8rkjb/?view_only=4352e24595d34cb29b763b71b018766d) and also uploaded the relevant documentation.

3. Please upload a copy of Supporting Information Figure/Table/etc. Supporting information S1 table, S2 table, S3 table and S4 table which you refer to in your text on pages 11 and 12.

Answer:

Because of the major changes to the paper the table numbers changed but we uploaded all referenced supplementary tables as separate files.

4. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Answer:

We checked our references and we cite no retracted article. Unfortunately we found some minor errors in the references 3, 4, 9, 14 and 20.

These errors are fixed now.

5. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Answer:

We uploaded all figure files and they were converted to TIF.

Reviewer 1:

1. The authors have not clearly specified the background of the problem and main contributions of the work.

Answer:

In the introduction, we have now significantly extended the background and given examples of papers where we think our method would be useful.

2. The introduction section need to be properly organized with more clarity.

Answer:

We have added more details and tried to make it clearer what the paper is about. First, we motivate the usefulness of ML for data evaluation and its application in different fields. Next, we systematically list some problems and possible mistakes for data evaluation using ML. Finally, we propose our solution.

3. The experiment needs more regorios analysis. Moreover, hypertuning experiment as well as Ablation study can be included.

Answer:

We followed your advise and conducted an ablation study as well as using the confusion matrices to gain some insights on possible errors. We focused our interpretation of confusion matrices on differences between metrics because we aimed not to interpret the exemplary used SVM classifier, which can be chosen freely.

4. I suggest to add an additional experiment via error analysis that is usually helpful for model trst8ng with diverse conditions.

Answer:

We aimed to fulfil this request by above mentioned analysis of confusion matrices and the ablation study.

5. Have authors considered model evaluation via cross-domains? My suggestion is do more experimentation through cross-domain with small datasets.

Answer:

Our synthetic data does not belong to a specific domain, but we did evaluate other well-known real-world datasets from computer vision and medical research.

6. Finally, use a real-time case of both images and other datasets to measure efficacy of proposed method

Answer:

Unfortunately, the method proposed is not suitable for real time evaluation because it is only intended to find connections in data and to accurately specify how well an ML method would be able to predict new data if trained in the investigated dataset. The purpose is to be able to e.g. compare classifiers accurately or to prove the existence of weak connections in data which is for example for the field of education of great interest.

Reviewer 2:

1. Another widespread non-parametric technique is Random Forest. It would be interesting to mention this technique in the introduction chapter and highlight the reason for choosing the SVM algorithm.

Answer:

In the introduction, we have now explained why we have chosen the SVM for the demonstration of the method and briefly mentioned Random Forest (see lines 16 to 25 of the revised manuscript).

2. It is necessary to review punctuation, especially the use of commas. Revise long paragraphs as they tend to confuse the reader. I’ve pointed out in the comments on the digital file the passages where the writing should be revised.

Answer:

We attempted to use shorter paragraphs and revised punctuation.

3. When acronyms are presented for the first time in the text, their meaning should be presented.

Answer:

We revised the use of acronyms to better follow best practise.

4. Highlight the objectives of the study in chapter 1.

Answer:

Yes, we have included the study objectives in the introduction section.

5. Lines 62 to 85: a flowchart would illustrate the methodology used more effectively.

Answer:

We noticed that the explanation of synthetic data generation was unnecessarily complicated, so we revised it and added a flowchart (see Fig. 1).

6. Line 106: ”We do not examine dataset sizes below 25 data points because the performance is already not good for 25 data points”. Please detail this restriction.

Answer:

We have rewritten this part and explained our reasoning. A smaller size would not give better results and our results show that this dataset size is a good example of a dataset where the trained classifier may not generalise to more data points.

Therefore, we do not expect further insights from a smaller dataset.

7. Line 153: Were the SVM parameters determined empirically? Please detail.

Answer:

No, we used the default parameters of the function in scikit-learn as a starting point and tried all possible values where applicable, or tried heavily modified values. The aim was to show the effect of hyperparameter tuning, not to achieve the best possible result. We added the details of how we obtained the parameters to revised version of the manuscript.

8. Line 188: ”However, the focus of this work is on the qualitative behavior and the demonstration of the evaluation methodology and, therefore, fewer permutations are used for reasons of computational time.”

The parameter computational time appears repetitively in the text, making it clear that fewer permutations were chosen at the expense of less computational time. In this case, the use of cloud computing could solve this restriction.

Answer:

We rerun some calculations for the CESAR dataset so that now everything is calculated using 50 permutations. This recalculation as well as the evaluation of the MNIST and BCWD dataset took almost a month of computation time using a whole node on our cluster. The by far most time took the full rnCV because of the feature selection. It would be possible to use even more computing power to double or even triple the number of permutations, but we do not believe that this would improve our results, since we have been able to show that a higher number of permutations yields lower probabilities on average, but shows the same qualitative trend. It is therefore difficult for us to justify such an intensive use of resources.

9. Detail the applications used, as well as the hardware used.

Answer:

We have added a short paragraph at the end of the ’Evaluated data’ subsection about the hardware and applications used.

10. The chapter reserved for discussions explores the results achieved. However, when compared with the results obtained by the authors cited in the references, the superior results achieved by the proposed methodology could be better substantiated.

Answer:

The introduction as well as discussion now contain text clarifying the merit of our proposed method as well as giving examples for use cases.

11. Line 432: it would be interesting to create a chapter on conclusions.

What are your recommendations for future studies?

Answer:

We added a conclusion chapter and gave explicit recommendations.

Thank you again for taking the time to review our paper and for your valuable comments.

Yours sincerely,

Steffen Steinert

Attachments
Attachment
Submitted filename: Response to Reviewers.pdf
Decision Letter - Sathishkumar Veerappampalayam Easwaramoorthy, Editor

A refined approach for evaluating small datasets via binary classification using machine learning

PONE-D-23-36197R1

Dear Dr. Steinert,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at http://www.editorialmanager.com/pone/ and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Sathishkumar Veerappampalayam Easwaramoorthy

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Formally Accepted
Acceptance Letter - Sathishkumar Veerappampalayam Easwaramoorthy, Editor

PONE-D-23-36197R1

PLOS ONE

Dear Dr. Steinert,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Sathishkumar Veerappampalayam Easwaramoorthy

%CORR_ED_EDITOR_ROLE%

PLOS ONE

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .