Maynard Smith revisited: A multi-agent reinforcement learning approach to the coevolution of signalling behaviour

Olivia Macmillan-Scott; Mirco Musolesi

doi:10.1371/journal.pcbi.1013302

Peer Review History

Original SubmissionNovember 28, 2024
22 Jan 2025 Decision Letter - Feng Fu, Editor, Tobias Bollenbach, Editor PCOMPBIOL-D-24-02076 Maynard Smith revisited: A multi-agent reinforcement learning approach to the coevolution of signalling behaviour PLOS Computational Biology Dear Dr. Macmillan-Scott, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 60 days Mar 24 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter We look forward to receiving your revised manuscript. Kind regards, Feng Fu Academic Editor PLOS Computational Biology Tobias Bollenbach Section Editor PLOS Computational Biology Additional Editor Comments : While both reviewers see the strengths and contributions of the manuscript, they also raise important points. In your revisions, it would be necessary to address their comments for further consideration. Journal Requirements: 1) Please ensure that the CRediT author contributions listed for every co-author are completed accurately and in full. At this stage, the following Authors/Authors require contributions: Olivia Macmillan-Scott, and Mirco Musolesi. Please ensure that the full contributions of each author are acknowledged in the "Add/Edit/Remove Authors" section of our submission form. The list of CRediT author contributions may be found here: https://journals.plos.org/ploscompbiol/s/authorship#loc-author-contributions 2) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019. 3) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines: https://journals.plos.org/ploscompbiol/s/figures Reviewers' comments: Reviewer's Responses to Questions Comments to the Authors: Please note that one of the reviews is uploaded as an attachment. Reviewer #1: “In Maynard Smith revisited: A multi-agent reinforcement learning approach to the coevolution of signalling behaviour”, authors study the problem of emergence of honest signalling using the Philip Sidney signalling game, using methods from multi-agent reinforcement learning (MARL). Extensive simulations have been conducted showing that proactive pro-sociality i.g. resource provision without signalling, often emerges as the optimal strategy. It challenges the established view of honest signalling. The paper also argues for MARL's value in studying evolutionary dynamics due to its interpretability and ability to study emergent behaviour without pre-defined strategies. The research problem addressed is novel, interesting and crucially important, and is highly relevant to the audience of PLOS Computational Biology. While the Philip Sidney signalling game has been studied using other methods such as evolutionary game theory, the application of MARL has led to novel and interesting observations that are highly relevant to biological contexts. The analysis in the paper is thorough, including extensive appendix with additional results. There are some minor issues authors might consider for further improvement of their work. 1) While the paper argues that MARL can be used to study emergent behaviour without pre-defined strategies, it seems the strategy space is still pre-defined (e.g. in Tables 5 and 6). It seems the difference between MARL and evolutionary game theory methods is more about that in the latter, all the strategies are included in model and are always analysed, while in the former, it might the the case that not all strategies might be learned via MARL. Please clarify this. 2) The paper identifies proactive pro-sociality as the key emergent behaviour. While this is a reasonable conclusion given the simulations, it would be important to consider alternative explanations. For example, what is the role of kin selection in here? Also related to this, there are previous stochastic analyses of the Philip Sidney signalling game which show that a diverse equilibirum points are possible, including those involve providing without signaling, see e.g. "Evolutionary stability of honest signaling in finite populations." 2013 IEEE Congress on Evolutionary Computation. IEEE, 2013. and "Evolution of honest signaling by social punishment." Proceedings of the 2014 annual conference on genetic and evolutionary computation. 2014. This stochastic analysis usually leads to a richer set of possible outcomes (see e.g. "Emergence of cooperation and evolutionary stability in finite populations." Nature 428.6983 (2004): 646-650.) It would be useful to compare MARL with this approach. 3) While it’s good to extend the strategy sets for both donor and beneficial strategies in the paper analysis, it raises the questions if it’s useful to study MARL analysis for the original sets of strategies (for example in the two references above) as the baseline? Otherwise, it would be important to justify why this original setup was not adopted for a MARL analysis (which might allow a more direct comparison?). 3) There are still quite a few typos in the paper. A thorough proofread of the paper is needed. Reviewer #2: Review uploaded as attachment. ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Martin Smit [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions. Reproducibility:** To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Attachments Attachment Submitted filename: Review_of_PLOS_Article.pdf https://doi.org/10.1371/journal.pcbi.1013302.r001
Revision 1
24 Mar 2025 Author Response Attachments Attachment Submitted filename: Response to Reviewers.docx https://doi.org/10.1371/journal.pcbi.1013302.r002
30 Apr 2025 Decision Letter - Feng Fu, Editor, Tobias Bollenbach, Editor PCOMPBIOL-D-24-02076R1 Maynard Smith revisited: A multi-agent reinforcement learning approach to the coevolution of signalling behaviour PLOS Computational Biology Dear Dr. Macmillan-Scott, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 30 days Jun 30 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. We look forward to receiving your revised manuscript. Kind regards, Feng Fu Academic Editor PLOS Computational Biology Tobias Bollenbach Section Editor PLOS Computational Biology Additional Editor Comments : One reviewer still has reservations about the revised manuscript. Please (1) Update and document the GitHub code so it is up-to-date and runs out-of-the-box; (2) Provide a detailed, explicit account of the Q-learning implementation, including state definition, discount factor, strategy-space limits -- and demonstrate results for γ = 0 -- to rule out potential artifacts. Journal Requirements: 1) The file inventory includes files for Figures 5a, 5b,6a and 6b. We would recommend either combining these into single Figure 5.tiff and Figure 6.tiff files with separate internal panels, or renumbering them as individual figures, as we are not able to publish multiple components of a single figure as separate files. Reviewers' comments: Reviewer's Responses to Questions Reviewer #1: The authors have addressed all my comments very well. I really appreciate their great efforts to make everything much clearer. I believe this is a very good contribution to the theoretical literature of evolution of prosocial and collective behaviours. I am happy to recommend the paper publication in its present form Reviewer #2: Thank you for addressing the points from my previous review. Firstly, I blatantly misunderstood the "conditional strategies" aspect of the paper and the added sentences about this are much appreciated. I understand how frustrating it can be for a reviewer to completely miss a key aspect of a paper, so thank you for your patience. That said, I still have a number of concerns with the model used in this paper. Besides the "conditional strategies" misunderstanding, the other overarching point of my first review was about the use of reinforcement learning (RL) in the first place. I thank the authors for clarifying the biological plausibility of RL, I see that, if the focus is individual learning, then it makes sense to model individuals as learning through RL. However, I would like to push back on two things: 1) I am still not happy with the "RL doesn't constrain the strategies that can be learned" argument. 2) I don't understand how the agents can effectively use the discount factor as I don't see how they can predict the next state given their current state and action. Regarding 1), put bluntly: I disagree. As soon as the state space is determined, the set of learnable strategies is determined. I'm not just being pedantic about choice of words here. Throughout the introduction of the paper (lines 37 to 114), the authors reiterate that, while previous work studying the PS game exists, in this paper MARL is used so you can study the emergence of decision-making strategies without specifying which ones are learnable. I just don't think this is true. Once you specify the agents' memory length, both agents are learning policies from a now specified strategy space. Please explain whether I am misunderstanding something or justify your use of this statement beyond the explanations given. Regarding 2), please explain what the next state is. In line 293 you specify that the state in the state-action pair is "made up of both players' actions in the previous round", hence in equation (2) Q(s', a') refers to s' which is the next state, made up of both players' actions in the current round. How can the signaller know their interaction partner's action in the current round? If they don't know what the next action is e.g. if s' is just s (which is the case in the version of the code in the linked Github repository), then this term is just noise and it makes sense that "for each value of learning rate, a lower discount rate leads to the optimal strategy being learned a higher proportion of the time." (line 363). My worry is that the cyclical/alternating strategies are not an emergent property but just a byproduct of the agents thinking they can predict the future i.e. what action their opponent will take, and the various sources of noise in the reward signal. What happens when the discount factor is zero? Are cyclical strategies learned? I wish I could be more specific in my critiques, but I don't think I understand precisely how Q-learning was used in this paper, and I don't want to critique something I might not understand properly. I tried looking into the code for answers, but I don't think the code is up to date (the last commit was 2 years ago) so I don't want to assume that things were done in a certain way. Please update the code. ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No: Code is, as far as I can tell, not up-to-date with the current version of the paper. ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions. Reproducibility:** To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1013302.r003
Revision 2
16 Jun 2025 Author Response Attachments Attachment Submitted filename: Response_to_Reviewers_auresp_2.docx https://doi.org/10.1371/journal.pcbi.1013302.r004
8 Jul 2025 Decision Letter - Feng Fu, Editor, Tobias Bollenbach, Editor Dear Miss Macmillan-Scott, We are pleased to inform you that your manuscript 'Maynard Smith revisited: A multi-agent reinforcement learning approach to the coevolution of signalling behaviour' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Feng Fu Academic Editor PLOS Computational Biology Tobias Bollenbach Section Editor PLOS Computational Biology *********************************************************** https://doi.org/10.1371/journal.pcbi.1013302.r005
Formally Accepted
Acceptance Letter - Feng Fu, Editor, Tobias Bollenbach, Editor PCOMPBIOL-D-24-02076R2 Maynard Smith revisited: A multi-agent reinforcement learning approach to the coevolution of signalling behaviour Dear Dr Macmillan-Scott, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Lilla Horvath PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1013302.r006

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .