Computational modeling of choice-induced preference change: A Reinforcement-Learning-based approach

Jianhong Zhu; Junya Hashimoto; Kentaro Katahira; Makoto Hirakawa; Takashi Nakao

doi:10.1371/journal.pone.0244434

Peer Review History

Original SubmissionApril 7, 2020
11 Jun 2020 Decision Letter - Baogui Xin, Editor PONE-D-20-10026 Computational modeling of Choice-Induced Preference Change: A Reinforcement-Learning-based approach PLOS ONE Dear Dr. Zhu, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. We recommend that it should be revised taking into account the changes requested by the reviewers. Since the requested changes includes Major Revision, the revised manuscript will undergo the next round of review by the same reviewers. Please submit your revised manuscript by Jul 26 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Baogui Xin, Ph.D. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Partly Reviewer #4: Partly ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: I Don't Know Reviewer #2: Yes Reviewer #3: No Reviewer #4: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: No Reviewer #4: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: General: Zhu et al., present results from a study of decision-making, in which they analyze internally guided decision-making within a reinforcement learning context called choice-based learning (CBL) to describe a phenomenon of internal decision making called choice-induced preference change (CPIC). At a high level, the theory of estimating the perceived rewards (Q value) starting from an initial arbitrary decision towards a goal of maintaining internal consistency is interesting, and if validated could provide an important consideration in analysis of decision-making. The specific simulation approach and parameter estimation function (fmincon) are outside of my area of expertise, and I’ll have to defer to experts in this area for specific commentary on the internal validity of the simulations performed. My main concern regarding the approach is that it is not clear to what extent the authors definitively ruled out prior external experience (value estimation) as biasing the initial choices. Other comments are given below. Major: -- Although the authors provide a solid explanation for how a CBL model for CPIC can be simulated using novel contour shapes, it is not clear to me how rigorous this approach is for avoiding differential initial preferences. In other words, how do we know that a subject does not have an initial preference for a given shape before the experiment? Is there a way to test this alternative possibility, or are we left to just assume based on intuition? -- Figure 1 indicates that preference to images is provided using a Likert scale, although the computational models use Softmax function (actually it looks more like a logistic function) to calculate probabilities, and based on the description provided, choices are dichotomized. Was the ordinal relationship within the Likert scale addressed, and if not, why was this scale used in the experiment rather than a binary decision? -- Did the authors address autocorrelation within an individual’s selections? It would seem that the time between preference choices would be correlated, although it is not clear whether any of the models accounts for temporal autocorrelation, as for example, a hidden Markov model might. -- Since PLOS One is aimed toward more general readership, the authors might consider widening the scope with more explanation about the applications of this approach within the field of neurology/psychology and behavior. Minor: -- Perhaps it reflects my distance from the subject matter, but the initial statement ‘the value of an item is learned through making a series of decisions,’ (line 45) seems fairly abstract and no further information is given to support what context that value is made? What ‘item’ is being referred to? A decision tool? An item of value being purchased? As above, some additional context in the Background could be helpful. -- In most RL applications, the learning rate is a hyperparameter, and thus is assigned outside the modeling process. On line 56, the authors note that the learning rate is estimated as a model parameter. If there is another method the authors are applying to identify a learning rate, it should be provided, otherwise this section should be corrected to indicate that LR is a hyperparameter, and must be optimized through manual or automated search (e.g., grid search), not fit like a statistical parameter. If this optimization is all performed by fmincon function, then more information about the specific algorithm is needed for us less informed readers. -- Table 2 is referenced before Table 1 -- How was the range of B of [0, 20] chosen? Reviewer #2: The paper presents validity of the CBL model that has not been previously confirmed by fitting the model to IDM behavioral data, since the differences of initial preferences among the items make it difficult to estimate the model parameters. Hence, the paper presents a set of experiments conducted with novel contour shapes that are supposed to be equaly probably selected. The paper is very well writen and explain clearly the main concepts. In addition, the discussion part has very clear and concise arguments that support the experiments results. Figure 2 should be created with more definition as it is blurred when the size is big. Reviewer #3: In this study, the authors suggested the validity of the choice-based learning model in internally guided decision-making (IDM). Please see attached word file for my full review on this paper. Many Thanks. Reviewer #4: I am not sure if most researchers in the reward and RL worlds would accept the formulation of IDM. Is it really necessary to build this work on this notion? What if we do not learn the value through a series of decisions, but simply your valuations are noise estimates and by experience we sharpen them? In that case the reward is still external, but we just have to learn the value of the external reward. That seems more general to me, because 0.8 of 1 dollar also has a utility to me personally I need to learn, I know the expected value (0.8) but the utility I have to learn by experience. In fact I find the setup unexpected. So if I understood this correctly, the introduction claims that this does not work with the usual stimuli that have intrinsic values (like pictures or jobs) that are classically used for the rate-choice-rate task? But instead, this task uses arbitrary shapes. And now the logic is that the choice is the supervision signal to adjust the evaluation? So in a way, this is promoting internal consistency? In the classic way of studying this, the rate-choose-rate procedure is accompanied by a rate-rate-choose control condition that estimates the baseline amount of increased consistency. How is this model accounting for that effect in the baseline condition? Does it need this control condition too? Presumably the parameter estimates should be smaller in this condition because there cannot be a causal link? (chen and risen 2010 and many papers that cite it) If I understand it correctly in the current study the conditions are just ’choose-rate’ because it is assumed that all shapes are very close in value. It seems to me a rate-choose control condition is necessary? I was surprised that this paper is not discussed ‘Sour grapes and sweet victories: How actions shape preferences’, how the two modelling approaches are related appears to be relevant? The results in figure 6 are very surprising are they not? The way I understand it, subject chose between items 14 times, but their final appraisals (the ‘rate’ part of the classic rate-choose-rate) shows no trace of this? Doesn't this mean there is no choice induced chance in the valuation, and it is purely an increased consistency in the choices? Is there a way of looking at this with higher resolution, by plotting the subjects rating for each of the 15 items to a. The number of times the item was chosen b. The value that the Rl model assigned to that stimulus? Would this results mean that in a classic rate-choose-rate setup, we would conclude that ‘it didn't work’ because there is no effect of choice on rate? The discussion mentions two limitations, I am most concerned about the former.It is not really clear to me how to interpret these results and what conclusions to draw from this model, particularly in contrast to existing models. My confusion is increased by the results in figure 6. Also can you add a URL where the data is in the paper, i couldn't actually make the figshare link work I hope my comments are helpful to make this a bit clearer, it is a nice idea! ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: Yes: Takashi Nakano Reviewer #4: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0244434.r001
Revision 1
9 Dec 2020 Author Response We are grateful for the excellent and extremely helpful comments. We addressed all the various issues, and we hope that the manuscript has now been improved considerably. Please let us know if further changes are necessary. We are more than happy to carry out such changes. For responses to specific reviewers' comments, see "Response_To_reviewers". Modifications made to the main text and are shown using yellow highlights (see the separate version labeled “Manuscript with changes marked”). Attachments Attachment Submitted filename: Response_To_reviewers.docx https://doi.org/10.1371/journal.pone.0244434.r002
10 Dec 2020 Decision Letter - Baogui Xin, Editor Computational modeling of Choice-Induced Preference Change: A Reinforcement-Learning-based approach PONE-D-20-10026R1 Dear Dr. Zhu, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Baogui Xin, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: https://doi.org/10.1371/journal.pone.0244434.r003
Formally Accepted
14 Dec 2020 Acceptance Letter - Baogui Xin, Editor PONE-D-20-10026R1 Computational modeling of Choice-Induced Preference Change: A Reinforcement-Learning-based approach Dear Dr. Zhu: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Professor Baogui Xin Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0244434.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .