Peer Review History
| Original SubmissionJuly 22, 2025 |
|---|
|
Dear Dr. Misbah, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. The manuscript has been evaluated by two reviewers, and their comments are available below. The reviewers have raised a number of concerns that need attention. In particular, they request additional information on the statistical analysis. Could you please revise the manuscript to carefully address the concerns raised? Please submit your revised manuscript by Nov 03 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Helen Howard Staff Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. Please include a statement in your manuscript text clarifying whether the authors of this study carried out the assessment reported in the 'Human Evaluation' section of your manuscript text. 3. Please note that PLOS One has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 4. We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service. The American Journal Experts (AJE) (https://www.aje.com/) is one such service that has extensive experience helping authors meet PLOS guidelines and can provide language editing, translation, manuscript formatting, and figure formatting to ensure your manuscript meets our submission guidelines. Please note that having the manuscript copyedited by AJE or any other editing services does not guarantee selection for peer review or acceptance for publication. Upon resubmission, please provide the following: The name of the colleague or the details of the professional service that edited your manuscript A copy of your manuscript showing your changes by either highlighting them or using track changes (uploaded as a *supporting information* file) A clean copy of the edited manuscript (uploaded as the new *manuscript* file) 5. In the online submission form you indicate that your data is not available for proprietary reasons and have provided a contact point for accessing this data. Please note that your current contact point is a co-author on this manuscript. According to our Data Policy, the contact point must not be an author on the manuscript and must be an institutional contact, ideally not an individual. Please revise your data statement to a non-author institutional point of contact, such as a data access or ethics committee, and send this to us via return email. Please also include contact information for the third party organization, and please include the full citation of where the data can be found. 6. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. 7. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #1: N/A Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy Reviewer #1: Yes Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #1: Yes Reviewer #2: Yes ********** Reviewer #1: The present submission involves multi-turn conversational AI being a challenging field in itself, with additional complications concerning languages with existing resource issues such as Arabic, especially the spoken form(s) of Arabic. It is of special importance that linguistic and socio-cultural aspects in spoken communication are addressed in the present submission, namely nuances and context of Arabic conversation, where language-specific parameters apply and are not always compatible to standard English and English data. The submitted research paper includes a comprehensive yet detailed outline of related research - directly or indirectly connected to the present approach, methods and resources for Arabic, including the addressing of current issues and limitations. The goals are clearly defined (achieving a robust, high-quality dataset specifically designed to fine-tune existing casual Arabic LLMs for multi-turn conversational tasks by building a synthetic Arabic dataset using a recently launched Arabic Instructional LLM) as is the detailed and well-organized methodology. A significant strength of the paper is the step by step analytical presentation of the linguistic (Arabic) data processing according to the proposed approach, contributing to its explainability and transparency. It may be noted that providing extended examples of more complex issues such as the "new educational initiatives in Morocco" (Line 310). The combination of human evaluators and benchmark evaluation is a another feature of the proposed approach that is especially sensitive to efficient deployment of Arabic LLMs in real-life situations. In general, the present submission demonstrates a processing approach targeting to resolve a set of specific and challenging issues with a detailed and explanatory methodology and results, all expressed in a clear, well-written text. Indeed, the approach presented may serve as a basis for additional upgrading and research for conversational Arabic and also for other languages. Reviewer #2: The manuscript entitled “Fine-Tuning Arabic Large Language Models for Improved Multi-Turn Dialogue: A Blueprint for Synthetic Data Generation and Benchmarking” addresses an important gap in Arabic natural language processing by proposing a reproducible methodology for generating synthetic multi-turn dialogue datasets and evaluating their effectiveness in fine-tuning Arabic large language models (LLMs). The study is timely, methodologically well-structured, and offers valuable contributions to the field of conversational AI, particularly in low-resource and linguistically complex languages such as Arabic. My comments are provided below with respect to the journal’s review criteria. 1. Technical Soundness and Data Support for Conclusions The study is technically sound. The authors carefully describe their approach to dataset generation, including the selection of instructional LLMs, prompt engineering strategies, and hyperparameter tuning. The methodology for fine-tuning two Arabic-native LLMs (ArabianGPT-08B-V2 and AraGPT2-mega) is well-detailed, and the evaluation framework, which incorporates perplexity, RAVEN, and human judgment, is appropriate and robust. The reported results consistently demonstrate improvements over multilingual baselines, supporting the central claim that synthetic data can effectively enhance Arabic conversational models. The conclusions are aligned with the presented findings and are drawn in a balanced and evidence-based manner. One limitation, however, is the sole reliance on synthetic data without the inclusion of external validation against naturally occurring conversations. Addressing this in future work would further strengthen the study. 2. Statistical Analysis The statistical framework is generally appropriate and provides meaningful insights into model performance. The use of perplexity and RAVEN ensures a quantitative assessment of fluency and contextual coherence, while human evaluation captures cultural and linguistic nuances that automated metrics may miss. Nevertheless, the analysis would benefit from additional detail. Specifically, reporting inter-rater reliability for human evaluations and providing confidence intervals or statistical significance tests when comparing model scores would enhance the rigor of the findings. While these omissions do not undermine the overall validity of the results, their inclusion would make the analysis more comprehensive and transparent. 3. Data Availability The manuscript notes that the generated dataset is not publicly available, as it forms part of the author’s doctoral research, but may be shared upon reasonable request for academic and non-commercial use. While metadata, prompt templates, and generation parameters are provided, this arrangement does not fully comply with the PLOS Data Policy, which requires unrestricted availability of data underlying published findings. To align with the policy, the authors are strongly encouraged to deposit the dataset, or a representative subset, in a public repository, ensuring that it is accessible to the research community. If restrictions are unavoidable, these should be explicitly justified, with clear ethical, legal, or proprietary grounds. Without such measures, reproducibility and transparency are limited. 4. Presentation and Language The manuscript is written in clear and professional English, with a coherent structure and logical flow across sections. Technical terminology is employed appropriately, making the paper accessible to specialists in natural language processing and AI. However, minor typographical and stylistic inconsistencies are present, such as variable hyphenation (“multi-turn” vs. “multi turn”) and occasional redundant phrasing. These are relatively minor issues but should be corrected at the revision stage to improve polish and readability. No substantive language editing is required. 5. Additional Comments The novelty of the study is well-established. The introduction and literature review provide a solid contextual grounding, highlighting the lack of high-quality Arabic multi-turn conversational datasets and positioning the proposed methodology as a significant step forward. The experimental design is rigorous, and the discussion appropriately situates the results within the broader field. Importantly, the authors provide a technical blueprint that can be adapted for other low-resource languages, enhancing the manuscript’s broader applicability. ********** what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy Reviewer #1: Yes: Christina Alexandris Reviewer #2: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
Dear Dr. Misbah, Please submit your revised manuscript by Dec 26 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.
If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Mohammad Salah Hassan, Ph.D Academic Editor PLOS ONE Journal Requirements: If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. Additional Editor Comments: Dear Authors, Your paper is timely and valuable. The synthetic data process is clearly explained, the dataset scale is meaningful, and combining automatic and human evaluations was the right choice. The reviewer has now made a decision, so please make sure to address each of the points they have raised. In addition, I have included several further revisions and recommendations that should help strengthen the manuscript before resubmission. The data-generation pipeline is explained step-by-step, including unusual length settings for multi-turn dialogues and the compute/time budget. The fine-tuning setup is transparent (IA3/PEFT, packing, training hours). You also try to look at quality from multiple angles-perplexity, RAVEN, and a small human study. That triangulation is a strength. What needs tightening (do these first) Dataset size math. The manuscript says there are 43,316 conversations with “precisely 20 exchanges” each, and elsewhere mentions “57 million utterances.” Those numbers don’t line up. Even if “exchange” means a user–bot pair (i.e., 40 turns), the total utterances are nowhere near 57M. Please correct the count everywhere and add a one-sentence derivation so a reviewer can follow the arithmetic. Data availability. The Data Availability Statement points to a Google Drive folder. PLOS strongly prefers stable, citable hosting. Deposit the dataset (and scripts if possible) in a DOI-issuing repository such as Zenodo or OSF, cite the DOI in the DAS, and remove the ad-hoc cloud link. Version the release (e.g., v1.0), add a license, and include checksums. Evaluation leakage risk. Your benchmark is a 20% subset of the same synthetic corpus used for training/fine-tuning. That invites style overlap and optimistic scores. Clarify how you prevented duplicates (hashing, embedding similarity, seed control), and say explicitly whether the test split was frozen before model selection. If you can, add a small out-of-distribution test (e.g., held-out topics or prompts generated with different seeds) and report ID vs. OOD results. RAVEN details. Lock down the metric. Name the sentence-embedding model (exact model and version), state any text normalization (diacritics, punctuation), explain the scaling from cosine similarity to the reported “RAVEN (scaled),” and say how you aggregate from turn-level to conversation-level. Using an embedding model you also fine-tuned can bias the metric-pick a fixed external Arabic/multilingual embedder for all systems and document it. Perplexity comparability. Perplexity depends on the tokenizer and isn’t apples-to-apples across models with different vocabularies. Keep PP for within-model tracking, but add a short caveat and, if you can, report a tokenizer-agnostic figure (bits-per-byte) or compute PP with a common reference tokenizer for cross-model comparisons. Also name the exact tokenizers used. Human evaluation reliability. Right now the raters are authors, and some weighted kappa values are near zero (even slightly negative). That weakens the claim. Bring in at least two independent raters blind to model identity, run a short calibration pass, and then report per-criterion agreement with confidence intervals. Keep the current results, but frame the author-only phase as a limitation. Ethics note. Add a one-paragraph ethics statement: internal quality assessment, no personal data collected, no vulnerable populations, no compensation, and institutional review not required under journal policy. That will stop any back-and-forth at production. Decoding setup clarity. The generation table mixes sampling and beam search (do_sample=True with num_beams=2). If you really used beam sampling, say so; otherwise separate the setups for clarity. Fix the parameter typo repetition_penalty (it’s misspelled in one place). State seeds, library versions, and whether decoding parameters were identical across models. Secondary edits that help – Compute/time in one place. You mention 14 days for data generation, 100 hours for fine-tuning, and 2.5 months for full benchmarking. Summarize this in a small table with hardware, key libraries (PyTorch/Transformers/PEFT versions), CUDA, OS, and random seeds. It instantly boosts trust. – Terminology. Replace “casual Arabic LLMs” with “causal (Arabic) language models” throughout. Standardize model names (LLaMA 2, Llama 3, GPT-4, Gemini, etc.). – PEFT specifics. Since you reference IA3, packing, and use_liger, add one sentence per item (what it does, which library/version). Consider linking or archiving the training config files. – Baseline fairness. Readers will ask whether LLaMA- or AceGPT-based baselines were fine-tuned on your training split or evaluated as-is. If they weren’t fine-tuned, label them clearly as zero-/few-shot baselines or add a fine-tuned variant for a fairer comparison. – Tables/labels. If a column says “RAVEN (scaled),” state the range (e.g., 0–1). For inter-rater agreement, specify whether κ is quadratic-weighted and include 95% CIs. – Hyphenation artifacts. Clean the soft-hyphen breaks from PDF export (mul-ti-turn, to-ken, in-stance, etc.). [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions??> Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy Reviewer #2: No ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #2: Yes ********** Reviewer #2: The data availability statement indicates that the underlying dataset has been deposited in a public repository; however, the provided Google Drive link was found to be non-functional at the time of review, preventing access to the data. Regarding the manuscript's language, it is written in generally intelligible scientific English, but a number of typographical errors, grammatical inconsistencies, and awkward phrasings were noted throughout the text, which should be addressed to enhance clarity and polish. The study itself is commended for its technical soundness and methodological rigor, exemplified by a well-structured and reproducible pipeline for synthetic data generation and model fine-tuning. The conclusions are considered to be robustly supported by the data, which were derived from a large-scale dataset and a comprehensive, multi-faceted evaluation benchmark. Furthermore, the statistical analysis of the human evaluation results has been performed with appropriate rigor, utilizing a complementary suite of metrics to thoroughly assess inter-rater reliability. It is therefore suggested that the manuscript requires minor revisions, primarily to rectify the broken data link and to undertake a thorough copyediting of the manuscript to correct language errors. ********** what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy Reviewer #2: Yes: Vijayakumar Selvaraj, Associate Professor, B.S.Abdur Rahman Crescent Institute of Science and Technology. ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] To ensure your figures meet our technical requirements, please review our figure guidelines: https://journals.plos.org/plosone/s/figures You may also use PLOS’s free figure tool, NAAS, to help you prepare publication quality figures: https://journals.plos.org/plosone/s/figures#loc-tools-for-figure-preparation. NAAS will assess whether your figures meet our technical requirements by comparing each figure against our figure specifications. |
| Revision 2 |
|
Fine-Tuning Arabic Large Language Models for improved multi-turn dialogue: A blueprint for synthetic data generation and benchmarking PONE-D-25-35904R2 Dear Dr. Authors, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support . If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Mohammad Salah Hassan, Ph.D Academic Editor PLOS One Additional Editor Comments (optional): Dear Authors, I am pleased to inform you that your manuscript has been accepted for publication. Thank you for your revisions and for addressing the reviewers’ comments. Please submit the final production files through the system as requested. Sincerely Reviewers' comments: Reviewer's Responses to Questions Comments to the Author Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions??> Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #2: Yes ********** Reviewer #2: All issues raised in Revision 1 have been fully and rigorously addressed in Revision 2. The authors not only corrected factual and methodological shortcomings but also enhanced reproducibility (Zenodo DOI, detailed hyperparameters, evaluation protocols) and statistical transparency (expanded IRR metrics, external evaluator). The editorial summary explicitly states the manuscript now requires only minor revisions, primarily limited to copyediting already completed. ********** what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy Reviewer #2: Yes: Vijayakumar Selvaraj ********** |
| Formally Accepted |
|
PONE-D-25-35904R2 PLOS One Dear Dr. Misbah, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS One. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Mohammad Salah Hassan Academic Editor PLOS One |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .