Uncertainty-driven regulation of learning and exploration in adolescents: A computational account

Marieke Jepma; Jessica V. Schaaf; Ingmar Visser; Hilde M. Huizenga

doi:10.1371/journal.pcbi.1008276

Peer Review History

Original SubmissionOctober 2, 2019
17 Dec 2019 Decision Letter - Natalia L. Komarova, Editor, Jean Daunizeau, Editor Dear Dr Jepma, Thank you very much for submitting your manuscript 'Uncertainty-driven regulation of learning and exploration in adolescents: A computational account' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by two independent peer reviewers. The reviewers appreciated the attention to an important question, but raised some substantial concerns about the manuscript as it currently stands. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time. * You will see that both reviewers make similar comments. In particular, they are challenging your model-based analysis. More precisely, they are highlighting the fact that your interpretation of the observed difference in learning rate between the two groups may be overly restrictive. Recall that the Kalman filter can only explain this difference in terms of perceived volatility of the environment (under the assumption that model recovery is accurate). In other terms, for a man with a hammer, everything looks like a nail (figuratively speaking). Now you should explicitly look for alternative explanations of this difference. Practically speaking, this means performing (more) model comparisons, having included in the comparison set models that can a priori explain this difference. Note: you should also perform confusion analyses (to assess the accuracy of model comparisons) and parameter recovery analyses (at least for the winning model). I hope you will find these comments helpful. * Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts. In addition, when you are ready to resubmit, please be prepared to provide the following: (1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors. (2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text. (3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution. Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are: - Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition). - Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video. - Funding information in the 'Financial Disclosure' box in the online system. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here. We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us. Sincerely, Jean Daunizeau Associate Editor PLOS Computational Biology Natalia Komarova Deputy Editor PLOS Computational Biology A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Review for “uncertainty-driven regulation of learning and exploration in adolescence” by Jepma et al. The authors performed an experiment involving adults (N=35) and adolescents (N=25) and two learning tasks: a value estimation task and two-armed bandit. The authors report that the adolescents underperform in both tasks by displaying: i) exaggerate learning rate and ii) exploration. They propose that the behaviour in these is well accounted by a Kalman filter model (instead of a standard RL model) and that their results indicate that adolescents over-estimate the volatility of the environment. I think the paper has the potential to deliver an important message to the cognitive developmental community, however, I believe that several issues have to be addressed in order to support their claims. 1/ the first point concerns the model space. The authors limit model comparison to a standard Rescorla-Wagner model and the Kalman-filter model. Learning in this kind of situations have been studied extensively and a lot of (psychologically sound) models have been proposed and validated. They should be included in the model space. These models include: An asymmetric reinforcement learning model (see Lefebvre et al.2017 and Sharot’s work) for we which we have good reasons to think that there will be differences between adults and adolescents (see Van Den Bos, Cerebral Cortex, 2013). My prediction is that the noise variance parameter will significantly correlate with the difference in learning rates in this model. A Pearce-Hall model (which is a simpler, non Baysian, instantiation of the idea that the learning rate is dynamically modulated). Another important model (even if harder to fit) is the noisy-RL model recently proposed by Findling et al. (nature neuroscience), as this noisy update could explain softmax modulations. 2/ the second point is that the model-comparison results (concerning the old and the new models) should be backed-up by a model recovery analysis, where the authors show that, in simulated datasets, their are capable to retrieve the correct, generative, model. The procedure should be extended to the parameters of the best fitting model to ensure that the conclusions based on parameters comparison are justified. 3/ it is unclear which aspect of the behaviour (especially in the bandit task) IS NOT captured by the standard model. The only model-free results that are displayed are the learning curves (figure 4) and this difference can be perfectly captured by a standard model. Is there more information in the behaviour that can help us discriminate the models? 4/ it would be important to know what is the correlation between the parameters estimated in the estimation and in the learning task. Is there some common ground? Are the two tasks tapping into the same processes? I suspect that they don’t and that different models (and different biases) apply to the two tasks. 5/ it is important to provide the instructions given to the subjects, as it is important to know whether or not the subjects were primed about the frequent changes in contingencies. This would change the interpretation of the results, as it could simply be that adolescents do not follow instructions. 6/ what is the rational to fix the noise variance parameter? How does its inclusion change the results (I guess it favour the Kalman filter because it reduces its number of free parameters)? 7/ how have the authors verified that the two groups were matched in terms of IQ and socio-economic status? Reviewer #2: Jepma and colleagues studied adolescents and adults on an estimation task and a 2-armed bandit task, varying the noise (standard deviation) of outcomes across blocks, and assessing learning rate and confidence on a single trial basis. They show that while evaluation of uncertainty (confidence) and its effects on decreasing learning rate do not differ between the age groups, younger participants asymptoted to a higher learning rate and showed lower choice accuracy. Model fitting suggests that these behavioral findings may be attributed to adolescents' belief that the environment is constantly changing, and therefore new information is more relevant to future prediction and choice than past knowledge. I found this paper very well written, with clear exposition of the computational terms and models, and a compelling set of results. I was especially impressed by the explanations of the Kalman Filter, which are the clearest I have encountered. The experiments are clean and well chosen, and the results described in detail, with well-thought-out statistical analyses. My main worry regards the discrepancy between the behavioral results and the conclusions from the modeling: while the behavioral results suggested no age-related differences in uncertainty and how it changed over time, and in the process of updating learning rates with experience, the results of the modeling were considered as suggesting the opposite. That is, based on the modeling results, it was suggested that adolescents assume more variation in the outcome-generating process over time, which drives their high learning rate and suboptimal choice behavior. First, since the models tested had only one means by which to generate a higher learning rate (the Kalman Filter determines learning rate based on estimated volatility and observation noise), and we know empirically that the asymptotic learning rate was higher for adolescents, I am not sure I buy this explanation of the provenance of the higher learning rate. At the very least it should be clarified that this was by design of the model -- another model, that explained higher asymptotic learning rates differently, might have suggest a different cause. Second, the conclusions from modeling relied on fits of individual parameters. However, parameters of models, especially reinforcement learning models, are often difficult to recover reliably due to interdependence (e.g., a low learning rate and high inverse temperature can lead to the same likelihood as a high learning rate and low inverse temperature). In order to believe conclusions from parameter fits, I would first want to see some testing of the reliability of parameter recovery from the model. this can easily be done by simulating data from a known set of parameters, and then attempting to recover those parameters. Alternatively, or better, in addition, confusion matrixes across models can be plotted (that is, simulate data from each of the models, and fit it with all models to test how often the correct model is recovered). See Collins & Wilson's "Ten Simple Rules for the Computational Modeling of Behavioral Data" (https://psyarxiv.com/46mbn/) for more ideas on how to validate that you can make conclusions from model fits. I had several other issues with the modeling: First, I was not sure what is the deviance information criterion (I have never encountered it before) -- how is this different from BIC? Are the results different if using the more standard BIC? Second, in Table 1 it was not clear that the difference between the DIC values of model 2B and 1B were significant -- what is the significance of a 1.5% difference between them, aggregated over subjects? It would be more compelling to see statistics, for instance, a t test on the difference in scores between the two models, across participants. I would also want to see a measure such as likelihood per trial, which would give some intuition regarding the goodness of fit of the models. This is especially important given the very low inverse temperature fits for the choice task -- such a high degree of randomness suggests a poor overall fit, given that the inverse temperature accounts not only for exploration, but also for all other sources of variance unexplained by the model. In Figure 5, the different Y axes in the adult graphs in A (top two) make it hard to read these graphs. Moreover, for S_1 I was not sure what the units were. Does the amount of initial variance that was fit by the model (in the order of hundreds) make sense? Finally, the modeling results were, for the most part, described without statistical analysis. It is not clear to me that the difference in \\sigma^2 between the two age groups was less pronounced (line 387) and that the increase of the inverse temperature was steeper for older adults (line 400). All of these results would be more convincing with some statistical analysis, rather than just eyeballing the results and describing them. On this last point, statistical analysis was also missing on line 212 -- please compare the learning rates in each age group to the adults if you would like to claim that the regulation of learning rates matures at age 15. Minor comments: - It was not clear to me until getting to the methods whether participants knew about the block structure. It would be good to make that clear upfront. - Figure 3b - why are the bins different for the two age groups? - Can the mediation analysis in Figure 6 be done on the reported uncertainty data as well? - Will the data and code be made publicly available upon publication? ******** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: No: Reviewer #2: No: I may have missed it, but it was not clear to me whether data and code were to be made public upon publication ******** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No https://doi.org/10.1371/journal.pcbi.1008276.r001
Revision 1
26 Mar 2020 Author Response Attachments Attachment Submitted filename: ReplyToReviewers_Jepma_etal.docx https://doi.org/10.1371/journal.pcbi.1008276.r002
26 Jul 2020 Decision Letter - Natalia L. Komarova, Editor, Jean Daunizeau, Editor Dear Dr. Jepma, Thank you very much for submitting your manuscript "Uncertainty-driven regulation of learning and exploration in adolescents: A computational account" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. * As you will see, reviewers are rather satisfied with your modifications. I simply want to highlight an important comment that was raised by reviewer #2: namely, that the softmax temperature is NOT a measure of people's tendency to explore. The softmax temperature also (if not mostly?) measures the inability of the model to capture people's choices, which may be driven by processes that are beyond the model's explanatory power (cf. model residuals). I suggest that, each time you interpret the softmax temperature in terms of exploration, you also explicitly recall the other (less interesting, but still important) interpretation. * Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Jean Daunizeau Associate Editor PLOS Computational Biology Natalia Komarova Deputy Editor PLOS Computational Biology ********************* A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment.** Reviewer #1: I found very intriguing that the model 2B (asymmetric reinforcement-learning model + dynamic softmax) performed best for the adolescents. I could find the information about the obtained values of alpha(+) and alpha(-). As there is some degree of disagreement in the literature (with Gershman 2015 reporting a negativity bias, Lefebvre 2017 reporting a positivity bias and Van Den Bos 2015 reporting both - as a function of the age) it would be very useful and informative for people in the field to report the results of the positive and negative learning rates of the model 2B as a function of the age group. Other than that, I am satisfied by the revisions. Reviewer #2: I thank the authors for their careful attention to all my comments, and I hope these were helpful. I am also extremely sorry for the delay with sending in this re-review — COVID-19 sent us all for a tailspin! In any case, I have only two remaining concerns, and they are mainly expositional: The first concern is the unqualified over-interpretation of the meaning of the softmax inverse temperature parameter, as exemplified in the below sentences: "The degree of choice randomness, or exploration, is often controlled by the inverse-temperature parameter, such that a higher inverse temperature results in a stronger tendency to choose the option with the highest expected value (i.e., less exploration)." "Importantly, all learning models performed better when combined with a dynamic than a constant softmax function, for both age groups, suggesting that both the adults and adolescents did adjust their degree of exploration over time.” It is important to remember that the softmax function is not necessarily the way the brain controls exploration — but rather how we link values in our model to probability of choice. This means that all misspecification of a model (that is, if the model learns values that differ, for any reason, from what is driving the participants’ behavior) will be folded into the softmax inverse temperature. If a model learns values that are not in accord with what is driving choice, the only way the model can fit the data is by “ignoring” these values to a larger extent, by lowering the inverse temperature. So while softmax can account for true exploration, we cannot assume that everything it accounts for is exploration. The first sentence quoted above, in the introduction, conflates what the model does with what the subject does — the degree of exploration of the subject may be controlled through random and/or directed exploration, which are not accurately modeled or estimated by the softmax temperature (see, for example, Wilson, R. C., Geana, A., White, J. M., Ludvig, E. A., & Cohen, J. D. (2014). Humans use directed and random exploration to solve the explore–exploit dilemma. Journal of Experimental Psychology: General, 143(6), 2074, and other related papers from the lab of Robert Wilson). It is important to separate generative aspects of behavior (which we can test, but can’t assume) and what your model does. The second sentence quoted above over-interprets the finding that a dynamic softmax function was a better fit for the data — this could also be due to the model being a worse explanation for the learning phase (the values it learns are different from those subjects learn, hence low inverse temperature) and a better explanation for the asymptotic phase (at that point, the learned values in the model and the learned values of subjects have converged (through different processes, perhaps) to the same values, so the model provides a better fit and the softmax inverse temperature can be increased. Basically, if a model and a human learn in different ways but arrive at the same conclusion (that is, they both learn to play the task), the best fit for a softmax model will be lower inverse temperature earlier on, and higher inverse temperature later on. Unfortunately, this may say nothing about the participants’ exploration strategy. I think it is important to be upfront about this and not misinterpret softmax findings as expressing conclusively something about participants’ strategy. Indeed, it could be that participants are reducing exploration over time, but we cannot know this without testing exploration directly, ideally with a tailored task, or, as you do, with some model-free analysis, for instance, looking at stay probability after a rewarded and unrewarded choice at different phases of the task. Given your model-free analysis, this is mainly an expositional point — I have seen so many papers confuse readers who are not versed with model fitting, to think that softmax = exploration. That is just not true. My second concern is with this addition: "We compared the learning rates in each adolescent age group to the adults, and added the results on p. 11” — the missing statistical analysis here is the interaction between age and learning rate/learning rate reduction. Differences in p-values in of themselves are not always significant, so the most correct way to do this analysis is to test formally for an interaction with age, as you did for your other analyses. Only if the interaction is significant can we interpret the results you now describe in lines 218-231. A few more minor comments: - Although you say in the response letter that you now mention upfront that participants knew when a block changed in the first experiment, I did not find this in the experimental design section (around line 122 and later). Saying that the participants were informed about the task structure in advance does not imply that the block changes were signaled within the task itself. Please add. - It is also not clear from the text or from Figure 1 how many blocks were in the choice task (line 137 suggests more than one — "the means differed 10 or 20 points, in half of the blocks each"), and whether subjects were informed of a block change (that is, are the subjects learning about new options from scratch several times? Or only once?) - in line 304 you label the Kalman Filter “Model 2” but you already have another Model 2. You probably also meant to label the Pearce-Hall model “Model 4”. - line 380 — I still wish there were some way to express the model fit in terms of (on average) how likely is each choice under the best fitting model (and each of the others). I find this very intuitive to understand (as we know that for a choice among two options, 50% is chance etc.). I know this is not trivial for MCMC fitting, but on the other hand, you extract - line 395 "reducing their learning rates over time as a function of their estimation certainty.” — as far as I understand, the models did not model the reported estimation certainty, and did not show that the reduction of learning rate was a function of that. This statement is therefore confusing/misleading. - the figures as uploaded were extremely grainy and rasterized and I could barely read them. ******** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: No: I could not find any link where to download the data Reviewer #2: Yes ******** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-materials-and-methods https://doi.org/10.1371/journal.pcbi.1008276.r003
Revision 2
11 Aug 2020 Author Response Attachments Attachment Submitted filename: ReplyToReviewers.docx https://doi.org/10.1371/journal.pcbi.1008276.r004
20 Aug 2020 Decision Letter - Natalia L. Komarova, Editor, Jean Daunizeau, Editor Dear Dr. Jepma, We are pleased to inform you that your manuscript 'Uncertainty-driven regulation of learning and exploration in adolescents: A computational account' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Jean Daunizeau Associate Editor PLOS Computational Biology Natalia Komarova Deputy Editor PLOS Computational Biology *********************************************************** https://doi.org/10.1371/journal.pcbi.1008276.r005
Formally Accepted
25 Sep 2020 Acceptance Letter - Natalia L. Komarova, Editor, Jean Daunizeau, Editor PCOMPBIOL-D-19-01704R2 Uncertainty-driven regulation of learning and exploration in adolescents: A computational account Dear Dr Jepma, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Matt Lyles PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1008276.r006

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .