An appraisal-based chain-of-emotion architecture for affective language model game agents

Maximilian Croissant; Madeleine Frister; Guy Schofield; Cade McCall

doi:10.1371/journal.pone.0301033

Peer Review History

Original SubmissionSeptember 26, 2023
8 Nov 2023 Decision Letter - Michal Ptaszynski, Editor PONE-D-23-31194An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game AgentsPLOS ONE Dear Dr. Croissant, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Dec 23 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Michal Ptaszynski, PhD Academic Editor PLOS ONE Journal requirements: 1. When submitting your revision, we need you to address these additional requirements. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf. 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 3. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. 4. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please ensure that your ethics statement is included in your manuscript, as the ethics statement entered into the online submission form will not be published alongside your manuscript. 5. We note that Figure 4 in your submission contain copyrighted images. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright. We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission: a. You may seek permission from the original copyright holder of Figure 4 to publish the content specifically under the CC BY 4.0 license. We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text: “I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.” Please upload the completed Content Permission Form or other proof of granted permissions as an ""Other"" file with your submission. In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].” b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only. 6. We notice that your supplementary information are included in the manuscript file. Please remove them and upload them with the file type 'Supporting Information'. Please ensure that each Supporting Information file has a legend listed in the manuscript after the references list. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: No Reviewer #3: Yes Reviewer #4: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This paper is based on An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game Agents. Thus, this paper is directly related to the theme of this journal. Overall, the paper is organized properly; the concept and future research directions are extensively explained. So, the paper is accepted after following minor changes: 1. Problem of paper and motivation is not clear in introduction 2. Contribution of paper is not clear and not given in bullets 3. Paper contains few grammar mistakes which will be cooperated in final version. 4. Only 67 references are added in paper, but more than 75 references so to attract readers add few latest references related to this paper, which is mentioned below Laghari, Asif Ali, Hui He, Kamran Ali Memon, Rashid Ali Laghari, Imtiaz Ali Halepoto, and Asiya Khan. "Quality of experience (QoE) in cloud gaming models: A review." multiagent and grid systems 15, no. 3 (2019): 289-304. Laghari, Asif Ali, Kamran Ali Memon, Muhammad Bux Soomro, Rashid Ali Laghari, and Vishal Kumar. "Quality of experience (QoE) assessment of games on workstations and mobile." Entertainment Computing 34 (2020): 100362. Madiha, Hina, LiHui Lei, Asif Ali Laghari, and Sajida Karim. "Quality of experience and quality of service of gaming services in fog computing." In Proceedings of the 2020 4th international conference on management engineering, software engineering and service sciences, pp. 225-228. 2020. Laghari, Asif Ali, Sana Shahid, Rahul Yadav, Shahid Karim, Awais Khan, Hang Li, and Yin Shoulin. "The state of art and review on video streaming." Journal of High Speed Networks Preprint (2023): 1-26. Laghari, Asif Ali, Xiaobo Zhang, Zaffar Ahmed Shaikh, Asiya Khan, Vania V. Estrela, and Saadat Izadi. "A review on quality of experience (QoE) in cloud computing." Journal of Reliable Intelligent Environments (2023): 1-15. Reviewer #2: - Introduction needs to be revised, including the problem identification and research gaps. - Methodology is the not discussed systematically. - Expand the critical results in the conclusion. Focus on the main developments in the finale. Also, write the main contributions in the conclusion. - Results are not described properly. - All figures have low quality, and please improve all of them. - The article needs to be a review of grammatical errors. Reviewer #3: This paper provides a study that is of great interest in the gaming field and increasingly in the VR field. The comparison of the three level of experiement give a siginificant understanding of future possibilities. However, there are still some parts that could be improve to make this paper more clear and also readable to a bigger audience. 1) For the few studies, are the three studies done on the same experiement? Or are they studies of three different experiments? The whole experiment process is not very concise and spreaded around the whole paper. I would suggest to put them at the end of chapter 3 before going into details. 2) This paper compares three different types of situation, no memory as basis, and then memory and appraisal. Since the key of this paper is about the appraisal and the memory, a bit more explanation and diagrams to show the difference between the two will be very necessary. 3) The language and words might be also an affecting factors, it might not be part of the key research of this paper, but I believe some explanation or observation should be given to explain how the results are putting this aside at this stage. 4) The design of the UI as well, it is good that the author provide a screenshot but no explanation was done with respect to the interface. There will definitely be some influence in what the participants see on the screen as well. It will be better if the author put that into account for the discussion. Reviewer #4: The paper explores using large language models (LLMs) to develop believable and interactive artificial agents that simulate human emotions. Based on psychological appraisal research, the study presents a new chain-of-emotion architecture for emotion simulation in video games, which outperforms standard LLM architectures in user experience and content analysis metrics, Below are my comments. Comment #1: The paper should clarify the LLMs used in the study and expand on the testing scenarios. A description of the models' training datasets, limitations, and capabilities would greatly help in assessing the validity of the results. Testing the proposed Chain-of-Emotion architecture across varied gaming genres and contexts is recommended to strengthen the argument of its general applicability. Comment #2: The claim that the new architecture outperforms "standard LLM architectures" lacks a clear benchmark. The study should specify the other LLM architectures used for comparison, detailing their design and the metrics on which they were evaluated. This would provide the reader with a clearer understanding of the proposed architecture's relative performance. Comment #3: While user-rated metrics provide valuable insights into the user experience, the study should aim to balance these with objective measures where possible. The diversity and representativeness of the user group should be disclosed to ensure the reliability of these subjective metrics. Comment #4: The paper could further acknowledge the complexity of human emotions and how this complexity poses a challenge for AI simulation. There could be a discussion about how architecture accounts for or falls short in simulating the full spectrum of human emotional responses. Comment #5: It is suggested that the study references a broader range of psychological theories to ensure that the architecture isn't overly dependent on a narrow set of assumptions. The implications of basing LLM outcomes on these observations should be thoroughly discussed. Comment #6: The study should provide a clear, transparent methodology that allows other researchers to replicate the work. This includes detailed descriptions of the game scenarios used, the nature of the emotional responses evaluated, and the precise nature of the chain-of-emotion mechanism. Comment #7: Additionally, how to enhance the game agent architecture to better compete with other LLM frameworks such as langChan, Llama Index, Autogen, etc. that boast advanced modules like memory, chains, agents, callbacks, security, and integration capabilities: • While the architecture's focus on emotional intelligence within conversational agents is commendable, it could benefit from incorporating more sophisticated memory modules, similar to those seen in competing frameworks. Enhanced memory capabilities would allow for a more nuanced understanding of context over longer interactions, which is crucial for maintaining coherent and emotionally appropriate responses. Consider adopting or developing memory structures that can handle complex conversational threads without losing the emotional thread of the interaction. • The game agent architecture has potential, but to truly compete, it should look into creating or integrating more complex chain mechanisms. Chains that can manage sequences of interrelated tasks would provide a significant edge, enabling agents to handle multiturn dialogues with more awareness and anticipation of user needs, thus improving the emotional engagement in conversations. • Your approach to developing emotional intelligence in game agents is innovative; however, it might be beneficial to include more robust agent management and callback functions that other frameworks offer. These would allow for better event-driven interactions, which can result in more dynamic and responsive emotional behaviors in real-time, leading to a more immersive user experience. • Security is an increasingly vital concern in AI applications. To enhance the competitive edge of your architecture, it is crucial to integrate state-of-the-art security protocols to ensure user data, especially as it pertains to emotional data, is handled with the utmost care. This would not only increase trust in your system but also align with best practices in responsible AI development. • Integration capabilities are a standout feature in existing LLM frameworks. To bolster your architecture's marketability, provide clear and streamlined processes for integrating with popular LLMs, databases, and external APIs. Ensuring your system can easily fit within different tech stacks will be key to its adoption. • Lastly, the game agent architecture should refine its context management system. While handling emotional responses is your architecture's unique selling point, the ability to maintain and leverage context effectively over the course of long interactions is what will truly enhance its practicality and appeal. Better context management can lead to more personalized and accurate emotional interactions, which is paramount for user engagement. By addressing these comments, the study can significantly improve its scientific rigor, relevance, and potential impact in the field of AI-driven emotional simulation within interactive media. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: krishna kumar mohbey Reviewer #3: Yes: Sky Lo Reviewer #4: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0301033.r001
Revision 1
21 Dec 2023 Author Response Editor Comments Thank you for the comments and suggestions. We have changed the manuscript accordingly and believe the changes improve the manuscript. Please find attached detailed answers for each comment raised. We fixed formatting issues, styling, inconsistent funding information, and supplementary material placement. Regarding code sharing: We’re happy to make the Unity game project publicly available. Nevertheless, we believe our main contribution is a high-level architecture described within the paper that is agnostic to specific code or programming languages. The code used was a specific implementation in a game example, but we wish the architecture to be viewed independent from this specific implementation. Regarding Figure 4: This figure represents an in-game screenshot created by the authors. All authors grant permission to use the figures under the CC BY 4.0 license. This is not a copyrighted image of a published game, it is purely a screenshot from the used experiment created by the authors. Please let us know if a content permission form is necessary in this case. Sincerely, Maxi Croissant Reviewer #1 1. Problem of paper and motivation is not clear in introduction Thank you for your comments. We’ve made some changed to the introduction to better reflect the motivation and make the whole process clearer. 2. Contribution of paper is not clear and not given in bullets We hope our changes to the conclusion make our contributions clear and easy to understand. 3. Paper contains few grammar mistakes which will be cooperated in final version. Yes – we hope to have addressed these oversights. 4. Only 67 references are added in paper, but more than 75 references so to attract readers add few latest references related to this paper, which is mentioned below Laghari, Asif Ali, Hui He, Kamran Ali Memon, Rashid Ali Laghari, Imtiaz Ali Halepoto, and Asiya Khan. "Quality of experience (QoE) in cloud gaming models: A review." multiagent and grid systems 15, no. 3 (2019): 289-304. Laghari, Asif Ali, Kamran Ali Memon, Muhammad Bux Soomro, Rashid Ali Laghari, and Vishal Kumar. "Quality of experience (QoE) assessment of games on workstations and mobile." Entertainment Computing 34 (2020): 100362. Madiha, Hina, LiHui Lei, Asif Ali Laghari, and Sajida Karim. "Quality of experience and quality of service of gaming services in fog computing." In Proceedings of the 2020 4th international conference on management engineering, software engineering and service sciences, pp. 225-228. 2020. Laghari, Asif Ali, Sana Shahid, Rahul Yadav, Shahid Karim, Awais Khan, Hang Li, and Yin Shoulin. "The state of art and review on video streaming." Journal of High Speed Networks Preprint (2023): 1-26. Laghari, Asif Ali, Xiaobo Zhang, Zaffar Ahmed Shaikh, Asiya Khan, Vania V. Estrela, and Saadat Izadi. "A review on quality of experience (QoE) in cloud computing." Journal of Reliable Intelligent Environments (2023): 1-15. We’ve added more relevant references as other reviewers also wished more context for certain points. We hope this addresses all concerns of missing context. Reviewer #2 - Introduction needs to be revised, including the problem identification and research gaps. Thank you for your comments. We’ve revised the introduction to better state our motivation and potential contribution to the field. - Methodology is the not discussed systematically. We hope our changes to the methods sections in all experiments enhance clarity and replicability (specifically our new description of individual conditions with a new figure 4). - Expand the critical results in the conclusion. Focus on the main developments in the finale. Also, write the main contributions in the conclusion. Yes – we hope our changes to the conclusion address these concerns. - Results are not described properly. We’ve changed how we present results to clearly provide differences in all conditions for all experiments. - All figures have low quality, and please improve all of them. We’ve made some changes to the figures. Regarding quality, these were created using the PLOS ONE tooling and we hope these are in accordance with the journal styling requirements. - The article needs to be a review of grammatical errors. Yes – we have addressed all errors in the revision. Reviewer #3 1) For the few studies, are the three studies done on the same experiement? Or are they studies of three different experiments? The whole experiment process is not very concise and spreaded around the whole paper. I would suggest to put them at the end of chapter 3 before going into details. Thank you for your comments. The three studies are three separate experiments. We’ve changed the titles of the sections in the paper to reflect this and added an explanation at the end of section 3 to explain the overall process better and give a clearer structure to the process. 2) This paper compares three different types of situation, no memory as basis, and then memory and appraisal. Since the key of this paper is about the appraisal and the memory, a bit more explanation and diagrams to show the difference between the two will be very necessary. We’ve added an additional figure to demonstrate how the three architectures differ from each other (Fig 4) and added more explanations for the three approaches in section 5. 3) The language and words might be also an affecting factors, it might not be part of the key research of this paper, but I believe some explanation or observation should be given to explain how the results are putting this aside at this stage. This is true. Since LLMs work exclusively with language, this is a big factor to consider. This is why Experiment 1 and Experiment 2 are highly controlled for language variance – all three architectures are tested with exactly the same prompts. Experiment 3 adds variation due to different users giving different input, which is addressed through a large enough sample to ensure high power for a medium-sized effect. We test both the controlled and the more open scenario of a real-life implementation to assess the potential for affective LLM agents. We hope to better explain this in the changes made to the methods sections, specifically Experiment 2. 4) The design of the UI as well, it is good that the author provide a screenshot but no explanation was done with respect to the interface. There will definitely be some influence in what the participants see on the screen as well. It will be better if the author put that into account for the discussion. Thank you – again this is true. We’ve expanded the methods section of Experiment 3 to better describe the user interface (Section 6.1.1). The screenshots do show the entire UI – there is only a text field and a button. We have tried to make this clearer with those changes. How much the UI influences the effects has not been tested. This is a potential topic for future research, which we now mention in the Discussion (Section 7.2). Reviewer #4 Comment #1: The paper should clarify the LLMs used in the study and expand on the testing scenarios. A description of the models' training datasets, limitations, and capabilities would greatly help in assessing the validity of the results. Testing the proposed Chain-of-Emotion architecture across varied gaming genres and contexts is recommended to strengthen the argument of its general applicability. Thank you for your comments. We tried to address the clarification issues through an additional Figure (Fig 4) and more explanations regarding the differences between architectures (see Section 5.1). We’ve also included more details to the limitations and parameters of the LLM used. We agree that the architecture needs to be tested in further scenarios – here we present 3 distinct experiments building on each other to validate the architecture as a way to create affective agents. We feel like this reaches the limit of what we can do in one contained study, but we expanded our discussion (see 7.2 and 7.3) for the need of further gaming-related applications to better understand how applicable the architecture could be. Comment #2: The claim that the new architecture outperforms "standard LLM architectures" lacks a clear benchmark. The study should specify the other LLM architectures used for comparison, detailing their design and the metrics on which they were evaluated. This would provide the reader with a clearer understanding of the proposed architecture's relative performance. This statement refers specifically to the new architecture outperforming both control groups, and it has been slights adapted to reflect this. It was our aim to clearly describe methodology and benchmark results for each experiment. We’ve made some changes to the presentations of our methods and results to give more details of (1) how each architecture functions, and (2) how well it performs in each task, including providing statistical comparisons wherever possible. Comment #3: While user-rated metrics provide valuable insights into the user experience, the study should aim to balance these with objective measures where possible. The diversity and representativeness of the user group should be disclosed to ensure the reliability of these subjective metrics. We agree. This is why we included 3 distinct experiments, all focusing on different aspects of evaluating the proposed architecture. Experiment 1 uses purely objective measures through a validated emotional intelligence instrument, while Experiment 2 uses a mixed-method approach to content analysis. Experiment 3 uses subjective measures to draw conclusions of player experience (PX), which is arguably the most important aspect of game design. We feel like together, these experiments make an effort to balance various objective and subjective outcomes to give a broad picture of the area. We describe demographic data of the participants in section 6.1.5. The age ranges from 19 to 47 years and 33% of participants identified as female. We hope this addresses diversity concerns, although we were not able to fully control for this, as we were limited to the University subject pool. Comment #4: The paper could further acknowledge the complexity of human emotions and how this complexity poses a challenge for AI simulation. There could be a discussion about how architecture accounts for or falls short in simulating the full spectrum of human emotional responses. This is a very good point and quite crucial for this technology. Human emotion is extremely complex and whatever is simulated within our experiments should not be confused with real affective processes. We added more explanation and further references in our discussion of the limitations to address this further. Comment #5: It is suggested that the study references a broader range of psychological theories to ensure that the architecture isn't overly dependent on a narrow set of assumptions. The implications of basing LLM outcomes on these observations should be thoroughly discussed. We hope the changes mentioned in comment #4 also address these concerns. It is very important to us to not simplify emotional processes and to not misrepresent the various theoretical perspectives of emotion research. Comment #6: The study should provide a clear, transparent methodology that allows other researchers to replicate the work. This includes detailed descriptions of the game scenarios used, the nature of the emotional responses evaluated, and the precise nature of the chain-of-emotion mechanism. We agree and we have edited the Method sections to provide more detail. Comment #7: Additionally, how to enhance the game agent architecture to better compete with other LLM frameworks such as langChan, Llama Index, Autogen, etc. that boast advanced modules like memory, chains, agents, callbacks, security, and integration capabilities: Yes – this is very important. Given that the field is rapidly developing, our main aim was to present the simplest version of the architecture possible (i.e. short enough scenarios to fully be stored in context, no need for chains, and no need for retrieval systems). However, we fully believe that this should be tested within more complex LLM frameworks. We hope our changes made in the discussion will lead to further research approaching these issues. • While the architecture's focus on emotional intelligence within conversational agents is commendable, it could benefit from incorporating more sophisticated memory modules, similar to those seen in competing frameworks. Enhanced memory capabilities would allow for a more nuanced understanding of context over longer interactions, which is crucial for maintaining coherent and emotionally appropriate responses. Consider adopting or developing memory structures that can handle complex conversational threads without losing the emotional thread of the interaction. Yes – specifically for memory, there is a lot of interesting work being done involving custom retrieval modules and dynamic contexts. We hope to further test this in future studies. • The game agent architecture has potential, but to truly compete, it should look into creating or integrating more complex chain mechanisms. Chains that can manage sequences of interrelated tasks would provide a significant edge, enabling agents to handle multiturn dialogues with more awareness and anticipation of user needs, thus improving the emotional engagement in conversations. We agree with this point as well. In it’s core, the Chain-Of-Emotion architecture does implement an appraisal-based chaining mechanism that was tested by itself, but the next logical step would be to integrate this with more complex chains – and even agent tasks. • Your approach to developing emotional intelligence in game agents is innovative; however, it might be beneficial to include more robust agent management and callback functions that other frameworks offer. These would allow for better event-driven interactions, which can result in more dynamic and responsive emotional behaviors in real-time, leading to a more immersive user experience. • Security is an increasingly vital concern in AI applications. To enhance the competitive edge of your architecture, it is crucial to integrate state-of-the-art security protocols to ensure user data, especially as it pertains to emotional data, is handled with the utmost care. This would not only increase trust in your system but also align with best practices in responsible AI development. These two steps specifically are very high-level and would provide valuable insights for real-world applications of our proposed architectures. We hope to integrate the approach with concern to such systems in future research. • Integration capabilities are a standout feature in existing LLM frameworks. To bolster your architecture's marketability, provide clear and streamlined processes for integrating with popular LLMs, databases, and external APIs. Ensuring your system can easily fit within different tech stacks will be key to its adoption. • Lastly, the game agent architecture should refine its context management system. While handling emotional responses is your architecture's unique selling point, the ability to maintain and leverage context effectively over the course of long interactions is what will truly enhance its practicality and appeal. Better context management can lead to more personalized and accurate emotional interactions, which is paramount for user engagement. These points as well as very relevant. Since receiving this review, OpenAI has already introduced the new Assistants API and common frameworks and stacks are currently constantly adopted to address the rapid changes in the field. Our system might not yet integrate with many of these tools, because of our (for now) only fundamental research. This has the benefit of being agnostic to these rapid changes, which is important for introducing such techniques, but we hope to integrate with modern tech stacks in the future. Overall we agree on all these points and thank you for this thorough write up. All these mechanisms would have introduced potential sources of variability at th Attachments Attachment Submitted filename: Response to Reviewers.docx https://doi.org/10.1371/journal.pone.0301033.r002
11 Mar 2024 Decision Letter - Michal Ptaszynski, Editor An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game Agents PONE-D-23-31194R1 Dear Dr. Croissant, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at http://www.editorialmanager.com/pone/ and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Michal Ptaszynski, PhD Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed Reviewer #4: All comments have been addressed ******** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #4: Yes ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: N/A Reviewer #4: Yes ****** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #4: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #4: Yes ****** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: authors done good work, quality of paper is improved and revised paper according to suggestions so I recommend this paper as accept Reviewer #2: (No Response) Reviewer #4: The author has meticulously addressed every one of my comments and concerns, demonstrating a thorough and thoughtful consideration of the feedback provided. ****** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #4: No ******** https://doi.org/10.1371/journal.pone.0301033.r003
Formally Accepted
29 Apr 2024 Acceptance Letter - Michal Ptaszynski, Editor PONE-D-23-31194R1 PLOS ONE Dear Dr. Croissant, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Michal Ptaszynski Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0301033.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .