An iterated learning model of language change that mixes supervised and unsupervised learning

Jack Bunyan; Seth Bullock; Conor Houghton

doi:10.1371/journal.pcsy.0000030

Peer Review History

Original SubmissionMay 18, 2024
18 May 2024 Author Response https://doi.org/10.1371/journal.pcsy.0000030.r001
14 Aug 2024 Decision Letter - Keith Burghardt, Editor PCSY-D-24-00076 An iterated learning model of language change that mixes supervised and unsupervised learning. PLOS Complex Systems Dear Dr. Houghton, Thank you for submitting your manuscript to PLOS Complex Systems. After careful consideration, we feel that it has merit but does not fully meet PLOS Complex Systems's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 60 days Oct 13 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at complexsystems@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcsy/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. We look forward to receiving your revised manuscript. Kind regards, Keith Burghardt, Ph.D. Academic Editor PLOS Complex Systems Journal Requirements: 1. We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. Additional Editor Comments (if provided): Overall, the reviewers believe that this work is interesting, but that there is significant work needed to improve the flow of the paper: Namely, “[t]he text is difficult to follow, and the research question is unclear” (R3), the contributions should be clarified, and assumptions should be justified. Moreover, the methodology needs to be clarified, e.g., the model description is unclear and definitions of key quantities. Finally, some reviewers requested additional clarity in the figures. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Does this manuscript meet PLOS Complex Systems’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes -------------------- 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes -------------------- 3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes -------------------- 4. Is the manuscript presented in an intelligible fashion and written in standard English?<br/><br/>PLOS Complex Systems does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes -------------------- 5. Review Comments to the Author<br/><br/>Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The study incorporates unsupervised learning methods into the traditional supervised learning-based Iterated Learning Model (ILM), establishing a semi-supervised learning Iterated Learning Model. The study designs the semi-supervised iterative learning model by drawing on the experience of the human language learning process and obtaining results superior to traditional models. It ingeniously combines the language learning process with the design of the iterative learning model, making the model more practically significant. By comparing the stability, expressiveness, and compositionality of languages formed by traditional ILM models of language change and semi-supervised ILM models of language change, it is found that this semi-supervised ILM model is not only more efficient in simpler language generation problems but also, due to the reduced computational burden, can study more complex expressive compositional languages. In addition, the study also reveals a linear relationship between the dimensionality of a language's meaning-signal space and the size of the optimal transmission bottleneck. There are several shortcomings and suggestions for improvement: 1 Unclear problem statement in the introduction: The research introduction starts with an analysis of languages with learnability, introduces the current research framework of language change ILM, and the traditional ILM based on obverter, and finally introduces the new unsupervised iterative learning model. However, when introducing the traditional iterative learning model, the existing problems should be placed at the end of this section, and the ideas for solving these problems based on which the new model is built should be placed at the beginning of the introduction of the new model. The current text places the problems of the traditional model in the middle of the introduction, diluting the problem statement. 2 Lack of persuasiveness in research assumptions: An important assumption of this study is that learnable languages have stability, expressiveness, and compositionality. This assumption should be reasonably proposed based on previous research or theory and should be supported by relevant references. 3 Discussion results on language transmission bottleneck and the dimensionality of the language's meaning-signal space are mediocre: The dimensionality of the language's meaning-signal space in the study is manifested as the size of the language signal "word", which is limited to each "word" having only binary values of 0 and 1, and the size of the word does not vary greatly. Within this narrow range, the discussion on the relationship between the language transmission bottleneck and the dimensionality of the language's meaning-signal space is limited in scope and lacks depth. 4 The definitions of key quantities in the study are not clear enough: The definitions of expressiveness, stability, and compositionality of language are at the end of the paper, but their mentions in the main text also appear in the later part of the article. It would be appropriate to cite the quantitative definitions of these three linguistic feature quantities earlier. The size of the transmission bottleneck, as an important research object, should be clearly defined and emphasized. 5 Some of the figures in the study are not intuitive: The expressions of the horizontal and vertical coordinates in the figures are unclear. The labels of the horizontal and vertical coordinates should use full names when there is no consensus on the symbols (it would be clearer to use the full names instead of x, c, s in plots). Some figures omit the explanation of the horizontal coordinates, which is not advisable in terms of expression. Reviewer #2: The research presents a promising approach to reducing computational complexity using auto-encoder. However, there are several questions regarding certain aspects of the article that need to be addressed. It would be beneficial to include specific examples of meaning-signal pairs in the article, particularly if they hold practical significance. Alternatively, is the training data generated randomly? In Figure 1B, an example with is provided. However, the length of the signal is $2^3$, which does not correspond to the label indicated in the table. This discrepancy has caused some confusion. What causes generation and n to appear to have a quadratic relationship? What would be the implications of using signals merely to train the auto-encoder rather than meaning-signal pairs? Will the experimental results remain consistent, or is there potential for variation? Reviewer #3: In “An iterated learning model of language change that mixes supervised and unsupervised learning,” the authors introduce an agent-based model of language change. They demonstrate that (1) their model generates more complex and expressive compositional languages than existing models, and (2) there is a linear relationship between the dimensionality of a language and the size of the optimal transmission bottleneck. While some sections are well-written, many parts of the manuscript are difficult to follow. The model and results are challenging to grasp, making it hard to assess their implications and significance. Here are my specific concerns: 1) The text is difficult to follow, and the research question is unclear. 1a) The broader motivation behind the work is somewhat buried in the text. The authors should consider making this more explicit (e.g., in the first paragraph or abstract) to better contextualize the relevance of their work. 1b) The research question the authors aim to answer is not clearly stated. The Introduction starts with a general paragraph but does not clearly highlight the gap in the literature that they are addressing. Additionally, the first paragraph of the Introduction would benefit from references to better contextualize their work. 1c) L34: The concept of a “transmission bottleneck” is mentioned but not defined. Understanding this concept is fundamental to understand the paper, it seems, but it’s only described later. 1d) L43-L44: The authors list previous works without contextualizing them, identifying gaps, or explaining how their proposal fits into the existing literature. 2) The authors should consider reorganizing the manuscript to make their contribution clearer. The Results section begins with “The Obverter ILM,” where the authors present results from a previous model. This is somewhat confusing—if the focus is on their model, why start here? The manuscript would be more concise and focused if the first section of results directly addressed their model. A side-by-side comparison of the models (e.g., curves in the same plot) could also be more effective. 2a) In this same section, between L172 and L186, the authors discuss an approach to avoid obversion, which is not their approach. This discussion would be more appropriate in the Discussion section. It disrupts the flow of the paper and makes it harder to understand what the authors are proposing. They should focus on what they are doing, rather than what they are not. Additionally, they claim that the so-called one-way ILM does not work, but it’s unclear whether this is their own finding or something shown in previous works. 2b) The authors should also consider using subsections to better organize the manuscript. Currently, the model’s definition is mixed with various analyses and results. Separating these into distinct sections would help readers follow the paper more easily. 3) The model description is not clear, thus it is difficult to assess it; the authors should work on clarifying it. The text mixes the inspiration from children learning language with the technical details of the model, making it hard to understand how the model operates. For example, it’s not clear how the autoencoder is used. A flowchart explaining the model would be helpful, and a pseudocode in the supplementary material would also aid in understanding. 3a) Between L72 and L80, the authors discuss how the ILM achieves stability and high levels of expressivity and compositionality, depending on the transmission bottleneck, stating that when the bottleneck is too small, language doesn’t stabilize, and when it’s too large, language may stabilize on a non-compositional structure. Their proposed model, however, does not follow this pattern. Their results show that a higher transmission bottleneck always leads to stability and high levels of expressivity and compositionality. The authors should discuss this. 3b) The authors should consider conducting a more thorough analysis of the relationship between \|B\| and \|A\|. They show that for larger languages, a larger \|A\| (than \|B\|, it seems) is necessary for an XCS language. However, how much larger than \|B\|? This remains unclear, as the results only include one example where \|A\| = 225 and \|B\| = 75. 3c) The authors briefly mention that increasing the number of hidden layers helps achieve an XCS language in L267. This point deserves a more detailed analysis, as it is currently underdeveloped in the manuscript. 3d) L271: The phrase “at least less finely tuned in this case” is unclear. The authors should clarify what they mean here. 3e) Why Fig. 11 is in the Discussion? It seems to make more sense in the Results section. Minor points: - L50: Are the arrows reversed? Instead of S - (e) -> M, shouldn't it be M - (e) -> S? - L132: The section title could be more informative to better guide the reader. - Fig. 3: What does the dashed line represent? How is the autoencoder used? - Fig. 5: The caption should specify which model is being referenced. - I miss error bars in Fig. 5A and 5B. For example, why is generation when n equals 6 higher than when n equals 7? This might just be due to large variance, but without error bars, it’s unclear. -------------------- 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public. For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No -------------------- [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pcsy.0000030.r002
Revision 1
4 Oct 2024 Author Response Attachments Attachment Submitted filename: ILM_resub_response.txt https://doi.org/10.1371/journal.pcsy.0000030.r003
24 Nov 2024 Decision Letter - Keith Burghardt, Editor An iterated learning model of language change that mixes supervised and unsupervised learning. PCSY-D-24-00076R1 Dear Houghton, We are pleased to inform you that your manuscript 'An iterated learning model of language change that mixes supervised and unsupervised learning.' has been provisionally accepted for publication in PLOS Complex Systems. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow-up email from a member of our team. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact complexsystems@plos.org. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Complex Systems. Best regards, Luca Maria Aiello Section Editor PLOS Complex Systems Hocine Cherifi Editor-in-Chief PLOS Complex Systems ********************************************************* Thank you for your submission. We have decided to accept your manuscript for publication. Congratulations! Reviewer Comments (if any, and for reference): Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed Reviewer #3: All comments have been addressed ****** 2. Does this manuscript meet PLOS Complex Systems's publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** 4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: (No Response) Reviewer #3: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS Complex Systems does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: (No Response) Reviewer #3: Yes ******** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors have seriously dealt with the opinions put forward by the previous reviewers, and the article has been greatly improoved in terms of quality. I have no new suggestions and agree to accept this article. Reviewer #2: (No Response) Reviewer #3: The authors have addressed each of my comments with attention. I want to acknowledge the effort put into this revision, especially in the updated Introduction, which I found very well done. I have just a few minor style suggestions: - The section titled “Future work: richer languages” should be formatted as a \subsubsection, as it currently appears as section 0.0.1. - “Supporting information” would be better presented as a separate .pdf file. - For improved readability, I recommend using a LaTeX library, such as algorithm2e, for the pseudocode. This library includes line numbers and vertical lines for inner loops and functions, which would improve clarity. Additionally, a brief description of each function called in the pseudocode would be helpful. Also, it is currently unclear why some functions have an exclamation mark (e.g., train!, shuffle!). ******* 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public. For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: Yes: Marcos Oliveira ******** https://doi.org/10.1371/journal.pcsy.0000030.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .