Towards automated recipe genre classification using semi-supervised learning

Nazmus Sakib; G. M. Shahariar; Md. Mohsinul Kabir; Md. Kamrul Hasan; Hasan Mahmud

doi:10.1371/journal.pone.0317697

Peer Review History

Original SubmissionAugust 11, 2024
18 Sep 2024 Decision Letter - Zeheng Wang, Editor PONE-D-24-34319Towards Automated Recipe Genre Classification using Semi-Supervised LearningPLOS ONE Dear Dr. Sakib, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Nov 02 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Zeheng Wang Academic Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. During your revisions, please note that a simple title correction is required: Springer Nature 2021 LATEX template has to be removed. Please ensure this is updated in the manuscript file and the online submission information. 3. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 4. Please update your submission to use the PLOS LaTeX template. The template and more information on our requirements for LaTeX submissions can be found at http://journals.plos.org/plosone/s/latex. 5. Please provide a complete Data Availability Statement in the submission form, ensuring you include all necessary access information or a reason for why you are unable to make your data freely accessible. If your research concerns only data provided within your submission, please write "All data are in the manuscript and/or supporting information files" as your Data Availability Statement. 6. Please provide a complete Data Availability Statement in the submission form, ensuring you include all necessary access information or a reason for why you are unable to make your data freely accessible. If your research concerns only data provided within your submission, please write "All data are in the manuscript and/or supporting information files" as your Data Availability Statement. 7. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Additional Editor Comments (if provided): [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Partly ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: No ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors extend the 3A2M Cooking Recipe dataset to 3A2M+ by first creating a larger named entity list and then classifying recipes into genres. They combine the existing NER list with those found with a maximum entropy classifier and a neural network classifier. They classify genres based on the extended NER list using a variety of machine learning methods and pre-trained language models. This demonstrated the extended NER allowed for more accurate genre labeling, although this is unsurprising since it includes the original NER list. Assuming the above is true, overall, the methodology, results, and specific conclusions appear consistent. The results need extra details and proper presentation, in particular all the results figures. Furthermore, the manuscript requires significant revisions for clarity as it contradicts itself and is ambiguous at several points. My specific comments: The authors need to be more explicit about what is previously published and what is novel, especially with regards to the 3A2M dataset. In particular, these two quotations seem to contradict each other and require more clarity: Page 5 Paragraph 3: “The 3A2M dataset [9] is based on the RecipeNLG dataset and incorporates all of its data and features. The data attributes include title, directions, NER and genre as labels” Page 7 Paragraph 2: “The 3A2M dataset [9] contains a vast collection of 2,231,142 culinary recipes, making it the largest publicly available dataset of its kind. One lim- itation of the dataset is that it lacks specific genre categorization for the recipes.” What is the problem with the 3A2M genre labels? Also, where do the 3A2M+ dataset genre labels come from? Are they inherited from 3A2M? How accurate were they originally in 3A2M? If your models are more consistent with the human labelled dataset, did you update the 3A2M+ genre labels? Furthermore, the original 3A2M dataset had human labelled data and machine labelled data. For all experiments, the authors use machine labelled data to train their network and test against human labelled data. Is there a limitation by the fact machine learning models are trained on data generated by previous machine learning models, rather than just the human labelled dataset? Since the extended NER list includes the original NER list, the authors should state how the original NER list was created. This will allow the method to be extended to other databases or other genre labels. Page 18, Figure 3 and 4: The labels in the legend are too long. You can put “Precision-recall curves” in the figure caption and expand on it for clarity and simply label each curve by number. If I understand correctly, the 9 classes are the recipe genres, so it may be worth explicitly writing those in the legend. Furthermore, these figures are not discussed in depth in the text. What effect do they have on your results? Page 19, Figure 5 and 6: Axis labels are missing. I assume Accuracy against Epoch? If that is the case, I don’t understand why the validation accuracy is so high initially before training has begun, and plateaus while training. I also don’t understand why the validation loss gets worse while training either. Is this because you're training on machine labeled data but testing on human labeled data? It’s also unclear exactly which model this is for. Furthermore, you said there were only 10 epochs of training, but Figure 5 goes to 20 on the x-axis. These figures need clarification in the main text or caption. Page 25 Limitations, second point: “Domain experts resolved tie situations. If the study was to redo the Extended NER process for the entire 1900K dataset, it would make the dataset more robust.” Does this mean the Extended NER process was not performed for the full 3A2M dataset? Or is this about not generating genre labels based on Extended NER? The sentence needs to be clarified or expanded. The motivation and conclusion for genres in recipes says it can be used for identifying diet restrictions or the origin of dishes, however the specific genres used don’t seem designed for these and there's no connection to these uses. I understand this is a limitation of the original 3A2M dataset. It could be possible to connect the results to the current conclusion by, for example, commenting on the recall/ROC of the vegetarian class and the implications for people on vegetarian diets. Otherwise, proposing a way to extend the dataset to different genre labels, which would make the extended NER list look more useful. Otherwise, the conclusion should be written closer to the specific uses of the current genre labels, e.g. recipe prediction, menu selection and explicitly mention the limitations. Some typographical errors I noticed: Page 23 Section 6.5 “the Directionsfeature.” missing space Page 24 “In the Experiment II, by analyzing the title with NER, and title with Extended NER using the DistilBERT model, this experiment was obtained 98% and 99y% accuracy in training respectively” 99% has extra y Data availability – authors selected no. However, they have released the database. For full reproducibility, the human labelled subset of the database should be identified or separately released. If this is already the case, it should be mentioned. Reviewer #2: In response to the lack of sufficient labeling data, it can be challenging to categorize raw recipes found online into appropriate food types. The authors constructed a dataset called the “Classified, Prototyped, and Annotated 2 Million Extended (3A2M+) Cooking Recipes Dataset” and tested a variety of classification methods and tasks on this dataset. The results of this paper are fulfilling, but I have a few concerns. 1. The authors claim limited access in Data Availability, but as far as I know, building the dataset was the main work of this paper. And this dataset was collected from public data, it is the process and methodology of building it that is innovative and needs to be described further in the paper. 2. I think it is a jump to go from a traditional machine learning model and extend it directly to a pre-trained BERT model. The authors could have considered including experiments with CNN, GNN[1], RNN[2], and other models. Or explain the reasons why these methods were not chosen. These methods perform well in both DNA and protein tasks. [1]Cao B, Wang B, Zhang Q. GCNSA: DNA storage encoding with a graph convolutional network and self-attention[J]. Iscience, 2023, 26(3). [2]Li X, Han P, Chen W, et al. MARPPI: boosting prediction of protein–protein interactions with multi-scale architecture residual network[J]. Briefings in Bioinformatics, 2023, 24(1): bbac524. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0317697.r001
Revision 1
7 Nov 2024 Author Response Thank you to the editor and reviewers for their valuable feedback. Editor: Thank you for your detailed instructions. We have adhered to the PLOS ONE LaTeX template, included author name formatting as required, provided the data availability link, and added links to the code repository within the manuscript. ORCID information is also included in Editorial Manager. Reviewer 1: Thank you for the constructive comments. We have addressed each point in detail in the response document (Response to Reviewers) and reflected all changes in the manuscript. Typographical errors were corrected, and an image was replaced due to a file name error—our apologies for this oversight. The limitations and conclusion sections were revised following your guidance. Reviewer 2: Thank you for the valuable comments. We clarified our model selection process in detail and have elaborated in the response document (Response to Reviewers). We have also cited the recommended paper as guided. Attachments Attachment Submitted filename: Response To Reviewers PONE-D-24-34319.pdf https://doi.org/10.1371/journal.pone.0317697.r002
17 Dec 2024 Decision Letter - Zeheng Wang, Editor PONE-D-24-34319R1Towards Automated Recipe Genre Classification using Semi-Supervised LearningPLOS ONE Dear Dr. Sakib, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jan 31 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Zeheng Wang Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. Additional Editor Comments: Please double-check to ensure that ALL datasets and codes used in this work can be freely accessed by other researchers for reproduction upon acceptance. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: All comments have been addressed ******** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: Yes ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: N/A ****** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes ****** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: All comments have been address. I think this paper suit to publish in PLOS One in current Version. Thanks for Author's contribution. ****** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0317697.r003
Revision 2
1 Jan 2025 Author Response This submission represents the second revision of our manuscript, and we have made comprehensive updates to address all the suggestions and comments provided by Reviewers and Editor. The reviewers have acknowledged that all raised issues have been adequately resolved in first revision. Specifically, we have ensured that all datasets and codes used in our study are publicly available, with verified and functional links included in the manuscript. Additionally, the references have been carefully reviewed and cross-checked to confirm that they are accurate and accessible in the public domain. Furthermore, we have revised the image formatting in accordance with the PACE guidelines, ensuring compliance with the journal's standards. Attachments Attachment Submitted filename: Response To Reviewers PONE-D-24-34319 v2.pdf https://doi.org/10.1371/journal.pone.0317697.r004
3 Jan 2025 Decision Letter - Zeheng Wang, Editor Towards Automated Recipe Genre Classification using Semi-Supervised Learning PONE-D-24-34319R2 Dear Dr. Sakib, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Zeheng Wang Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: https://doi.org/10.1371/journal.pone.0317697.r005
Formally Accepted
16 Jan 2025 Acceptance Letter - Zeheng Wang, Editor PONE-D-24-34319R2 PLOS ONE Dear Dr. Sakib, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Zeheng Wang Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0317697.r006

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .