Next generation insect taxonomic classification by comparing different deep learning algorithms

Song-Quan Ong; Suhaila Ab. Hamid

doi:10.1371/journal.pone.0279094

Peer Review History

Original SubmissionSeptember 19, 2022
3 Nov 2022 Decision Letter - Vijayalakshmi G V Mahesh, Editor PONE-D-22-26033Next Generation Insect Taxonomic Classification by Comparing Different Deep Learning AlgorithmsPLOS ONE Dear Dr. Ong, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Dec 17 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Vijayalakshmi G V Mahesh, Ph.D Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. 3. We noticed you have some minor occurrence of overlapping text with the following previous publication(s), which needs to be addressed: - https://onlinelibrary.wiley.com/doi/10.1002/ps.7028 The text that needs to be addressed involves the Introduction and the Results sections. In your revision ensure you cite all your sources (including your own works), and quote or rephrase any duplicated text outside the methods section. Further consideration is dependent on these concerns being addressed. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: N/A Reviewer #3: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No Reviewer #3: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors have proposed a novel algorithm version of n Insect Taxonomic Classification using CNN. The following are the comments that needs to addressed in the manuscript - Abstract and conclusion needs the accuracy/ performance evaluation results to be specified. - The research gap and the proposed solution should be highlighted before the methodology - The novelty of the proposed work should be highlighted. - Is there any open source database available for for this application? is yes then the results should be obtained the the same and reported in the article. - Discussion part needs to be elaborated and how the proposed method is efficient compared to other existing algorithms - references should be recent (less than 5-7 years) Reviewer #2: In this work, the authors study the classification performance of four deep CNN models (InceptionV3, VGG19, MobileNetV2 and Xception) in classifying insect images into three taxonomic levels (order, family and genus). I have only a few minor comments to improve the paper: 1. The introduction section could elaborate on the motivation for the work. 2. The authors propose that the classification pipeline must include several classification algorithms for different taxonomic ranks. It will be interesting if the authors could elaborate on the characteristic that the classification algorithm should possess to perform remarkably for each taxonomic rank. 3. How did the authors choose the optimal hyperparameters for the model? 4. Even though 2000 images may not be sufficient, it might be interesting to see how the model performs on the original dataset of ~2000 images and compare the performance with the dataset that had the rotated images as well. 5. The authors also fixed the number of training epochs at 100, which might quite low. The authors might consider increasing the number of training epochs and evaluating the performance. Reviewer #3: The paper addresses the class classification task according to the taxonomic ranks of insects—order, family, and genus and compared the generalization of four state-of-the-art deep convolutional neural network (DCNN) architectures. The statistical analysis for all the four Deep learning models with respect to taxonomy levels are showcased. Model classification based on the individual group are also well depicted. Concern: Little more detailing on preprocessing of the data and the InceptionV3 model layers could be added relevance. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Rajkumar Palaniappan Reviewer #2: No Reviewer #3: Yes: ROOPA B S ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0279094.r001
Revision 1
15 Nov 2022 Author Response Reviewer #1: The authors have proposed a novel algorithm version of Insect Taxonomic Classification using CNN. The following are the comments that needs to addressed in the manuscript - Abstract and conclusion needs the accuracy/ performance evaluation results to be specified. Response: Thank you for your comment. The performance evaluation results of F1-score for InceptionV3 has been added in the abstract as the sentences of "The InceptionV3 model has advantages over other models due to its high performance in distinguishing insect order and family, which is having F1-score of 0.75 and 0.79, respectively" - The research gap and the proposed solution should be highlighted before the methodology Response: Thank you for your comment. Research gap and hypothesis were added in the end of introduction before the methodology in line 82 - 95. "However, most of these previous studies of DL models on insect classification were not designed to assess the capability of DL in classifying different taxonomic levels. For instance, research questions such as “What will the performance of a DL model be as the taxonomic level decreases?” and “Will a single DL architecture be sufficient to classify specimens regardless of their taxonomic levels?” remain. Since previous studies assumed that insect classification can be done according to the concept of one- size-fits-all, the most appropriate algorithm could be the solution for most classifications at the taxonomic level. We hypothesise that different algorithms for classification are needed for different taxonomic levels, because the lower the level, the closer the external morphology. For this reason, this study aims to evaluate the ability of DL models in classifying insect specimens at different taxonomic levels. We compared the performances of four DL models, InceptionV3, VGG19, MobileNetV2, and Xception, in classifying three taxonomic levels: order, family, and genus." - The novelty of the proposed work should be highlighted. Response: We have restructured the sentences and emphasis of the novelty of the study, which are 1. Customised datasets (line 191 to 193) 2. No one-size-fits-all model, and each taxa levels is having their own best performed algorithm (line 221) - Is there any open-source database available for for this application? is yes then the results should be obtained the the same and reported in the article. Response: Yes, there is a open source of dataset available in [15]. We have mentioned the dataset in line 197 and data availability. - Discussion part needs to be elaborated and how the proposed method is efficient compared to other existing algorithms Response: Thank you for your comment. We elaborated how our result is more effective compared to other studies in line 281-283, where describing our result is more comprehensive and having better performance coverage including the F1-score and precision. - references should be recent (less than 5-7 years) Response: Thank you for your comment. We updated the reference [12] (the one reference with older than 7 years) into: Tang L, Zhang H, Zhang B. A note on error bars as a graphical representation of the variability of data in biomedical research: choosing between standard deviation and standard error of the mean. Journal of Pancreatology. 2019 Sep 1;2(03):69-71. Which published in 2019 and having more compherasive discussion on the error bar that we used as the stat tool in this study. Reviewer #2: In this work, the authors study the classification performance of four deep CNN models (InceptionV3, VGG19, MobileNetV2 and Xception) in classifying insect images into three taxonomic levels (order, family and genus). I have only a few minor comments to improve the paper: 1. The introduction section could elaborate on the motivation for the work. Response: Thank you for your comment. Motivation, research gap and hypothesis were added in the end of introduction in line 82 - 95. "However, most of these previous studies of DL models on insect classification were not designed to assess the capability of DL in classifying different taxonomic levels. For instance, research questions such as “What will the performance of a DL model be as the taxonomic level decreases?” and “Will a single DL architecture be sufficient to classify specimens regardless of their taxonomic levels?” remain. Since previous studies assumed that insect classification can be done according to the concept of one- size-fits-all, the most appropriate algorithm could be the solution for most classifications at the taxonomic level. We hypothesise that different algorithms for classification are needed for different taxonomic levels, because the lower the level, the closer the external morphology. For this reason, this study aims to evaluate the ability of DL models in classifying insect specimens at different taxonomic levels. We compared the performances of four DL models, InceptionV3, VGG19, MobileNetV2, and Xception, in classifying three taxonomic levels: order, family, and genus." 2. The authors propose that the classification pipeline must include several classification algorithms for different taxonomic ranks. It will be interesting if the authors could elaborate on the characteristic that the classification algorithm should possess to perform remarkably for each taxonomic rank. Response: Thank you for your comment. We elaborate more on the algorithm characteristic in the section of discussion, where taking note of the model characteristic such as trainable parameters versus the taxonomic level, which a decrease of parameters (VGG19 to MobileNetV2), higher the performance with lower taxonomic levels. 3. How did the authors choose the optimal hyperparameters for the model? Response: The optimal hyperparameters were chosen manually by comparing different learning rate and two of standard optimisers. We have described the process of studying the optimization of model in line 156-159 "This study trained deep learning neural networks by using the adaptive learning rate optimization (Adam) algorithm with learning rate hyperparameters of 0.001, 0.0001, and 0.00001 to control the rate of change of the model during each step of the optimization process. 4. Even though 2000 images may not be sufficient, it might be interesting to see how the model performs on the original dataset of ~2000 images and compare the performance with the dataset that had the rotated images as well. Response: Thank you for your comment. Comparison of model performance by using original image number and data augmented number is not the objective of this study, therefore we added one reference [10] to justify the needs of augmenting the data before the deep model development. Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. Journal of big data. 2019 Dec;6(1):1-48. 5. The authors also fixed the number of training epochs at 100, which might quite low. The authors might consider increasing the number of training epochs and evaluating the performance. Response: We applied early stop mechanism (Appendices II and III) to prevent the overfitting for the image classification. In other words, higher epochs could lead to the issue of overfitting, we further justified the epochs number with an additional reference - A survey on Image Data Augmentation for Deep Learning (especially Fig 1) Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. Journal of big data. 2019 Dec;6(1):1-48 Reviewer #3: The paper addresses the class classification task according to the taxonomic ranks of insects—order, family, and genus and compared the generalization of four state-of-the-art deep convolutional neural network (DCNN) architectures. The statistical analysis for all the four Deep learning models with respect to taxonomy levels are showcased. Model classification based on the individual group are also well depicted. Concern: Little more detailing on preprocessing of the data and the InceptionV3 model layers could be added relevance Response: Thank you for your comment. We added more details on the preprocessing of data in line 140-141 "The base images (0 degrees, without rotation) and all the rotated images (90, 180, and 270 degrees) used for training are not used for the testing and validation sets.", and InceptionV3 model layers in line 226 - "For instance, the VGG19 model performed the best for order, InceptionV3 performed the best for family, and MobileNetV2 performed the best for genus. The inceptionV3 that having a total of 42 layers is having advantages of consistent performance from one level to another, which did not perform significantly differently when the taxonomic level was lowered from order to family, in contrast with other models that exhibited significantly lower performance when the level was lower. Thank you very much for the valuable feedback and comment. Best regards, Song-Quan Ong Attachments Attachment Submitted filename: Rebuttal letter.docx https://doi.org/10.1371/journal.pone.0279094.r002
1 Dec 2022 Decision Letter - Vijayalakshmi G V Mahesh, Editor Next Generation Insect Taxonomic Classification by Comparing Different Deep Learning Algorithms PONE-D-22-26033R1 Dear Dr. Ong, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Vijayalakshmi G V Mahesh, Ph.D Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #3: All comments have been addressed ******** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #3: Yes ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #3: Yes ****** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #3: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #3: Yes ****** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Title:Next Generation Insect Taxonomic Classification by Comparing Different Deep Learning Algorithms The author's have addressed all the comments raised and the proposed method is novel . Reviewer #3: All the comments are addressed. VGG19 used is an advanced CNN model capable of complex classification tasks. This deep model is showcased for the taxonomy rank classification with appropriate classification scores and statistical analysis. ****** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: RAJKUMAR PALANIAPPAN Reviewer #3: Yes: ROOPA B S ******** https://doi.org/10.1371/journal.pone.0279094.r003
Formally Accepted
5 Dec 2022 Acceptance Letter - Vijayalakshmi G V Mahesh, Editor PONE-D-22-26033R1 Next Generation Insect Taxonomic Classification by Comparing Different Deep Learning Algorithms Dear Dr. Ong: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Vijayalakshmi G V Mahesh Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0279094.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .