Trial-by-trial learning of successor representations in human behavior

Ari E. Kahn; Dani S. Bassett; Nathaniel D. Daw

doi:10.1371/journal.pcbi.1013696

Peer Review History

Original SubmissionFebruary 10, 2025
13 May 2025 Decision Letter - Lyle J. Graham, Editor, Wolfgang Einhäuser, Editor PCOMPBIOL-D-25-00267 Trial-by-trial learning of successor representations in human behavior PLOS Computational Biology Dear Dr. Kahn, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 60 days Jul 13 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter We look forward to receiving your revised manuscript. Kind regards, Wolfgang Einhäuser Academic Editor PLOS Computational Biology Lyle Graham Section Editor PLOS Computational Biology Additional Editor Comments : While the reviews are positive in general, they also raise some major concerns, which need to be addressed. Some of the presentation of the Methods is overly concise, an appendix or supplement might help for issues the authors do not want to include in the main text, but which would be needed for replication. I in particular had trouble understanding the human experiments from the paper. I understand that the data is from a previous publication, but at least the group sizes (subjects per group) and the number of trials per subject would be useful and also help understanding whether the statistical tests conducted were appropriate. When I tried to access the OSF repository, I was required to send a personalized access request. While this would have been ok for me, it might compromise anonymity of the reviewers, so it cannot be expected that they consider this material. The authors should consider providing a hidden osf read-only link for review of a revision (and of course make the link public, once the manuscript is accepted) or double check that their repository is public. Journal Requirements: 1) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019. 2) Please provide an Author Summary. This should appear in your manuscript between the Abstract (if applicable) and the Introduction, and should be 150-200 words long. The aim should be to make your findings accessible to a wide audience that includes both scientists and non-scientists. Sample summaries can be found on our website under Submission Guidelines: https://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-parts-of-a-submission 3) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines: https://journals.plos.org/ploscompbiol/s/figures 4) Please ensure that all Figure files have corresponding citations and legends within the manuscript. Currently, Figure 1 in your submission file inventory does not have an in-text citation. Please include the in-text citation of the figure. 5) Thank you for stating "Data and code that support this study are available at https://osf.io/jupbz/." Please note that, though access restrictions are acceptable now, your entire minimal dataset will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption. Reviewers' comments: Reviewer's Responses to Questions Reviewer #1: In this manuscript, Kahn et al. present a series of model-based and model-agnostic analyses of previously collected data, in order to test how participants update expectations of upcoming stimulus-action requirements ("states"). They consider three main mechanisms through which participants may learn these expectations, being a "recency" mechanism, a "one-step" prediction mechanism, and a "successor representation" which is able to learn multi-step predictions of upcoming states. The authors define a previously unconsidered model of recency and demonstrate that this is a major driver of human response times, making an important methodological contribution to the modeling of response times in graph learning tasks. They then demonstrate how multistep predictions can still further improve the model, suggesting successor representation-like learning does take place in these tasks. Finally, the authors define two model-agnostic tests, to directly test the influence of eligibility traces and bootstrap updating, two learning mechanisms that may contribute to successor representation learning. They only find evidence for eligibility traces, but not for bootstrapping, questioning whether successor representation learning is based on vector prediction error signals as often considered in theoretical work. However, the authors note the limited amount of data available for the bootstrapping analysis compared to the trace analysis, and do not perform a test that can confirm the absence of an effect (e.g. a Bayes Factor analysis). This work addresses an important research question, and I am particularly fond of the two model-agnostic signatures the authors were able to identify to test traces and bootstrapping. While I think the research question and results are of importance and fit the scope of Plos Computational Biology, there are several minor points that I think can be addressed to improve the strength of the current work, which I will list in order of importance: 1. Perhaps any appendices to this paper were not communicated during this peer review process, but I was missing some proof of concept (a simulation study) that the parameters and models considered in this manuscript are indeed recoverable and distinguishable (cf. Wilson & Collins, 2019; https://doi.org/10.7554/eLife.49547). 2. a. Throughout the manuscript, there are several references to the idea that an SR-TD(lambda = 1) model corresponds to a Hebbian learning rule rather than a vector-prediction-error learning rule. However, this is not mathematically exposed in the main text, and is also not readily apparent to me based on the brief model description provided in the "SR-TD learning model" section of the methods. I think readers may find it very informative if this relationship is made more explicit, and it may inform some of the points that follow. 2. b. Further detailing this, I was wondering if the SR-TD(lambda = 1) model might mimic the "temporal context model" of the episodic memory system (Howard & Kahana, 2002; https://doi.org/10.1006/jmps.2001.1388), in which a trace of previous stimuli is maintained and associated with incoming stimuli. If this is true, the authors might cite this work to better contextualize this special case. 2. c. If SR-TD(lambda = 1) is indeed such a special case, I think it is worth it to consider a variant of SR-TD in which lambda is not a free parameter, but instead fixed to 1, and to see if there is a significant improvement when lambda is allowed to be free. This would at least expose the data as containing variance that goes beyond the simple associative mechanism of the temporal context model. Currently, the authors interpret the fact that their parameter estimates do not yield this extreme value of lambda = 1 as circumstantial evidence of this, but I don't believe this is an effective way to interpret the model fit. Instead, I think it is worth statistically formalizing this point with an explicit model comparison. A lack of significance then would further question the involvement of the vector prediction error for learning multistep predictions. 2. d. In the case of SR-TD(lambda = 1), does the discount factor gamma become redundant (because learning is Hebbian and not error-driven)? 3. Could the authors point out a qualitative difference between a pure recency learning model, and a model that includes multi-step predictions? This seems to be a major point of the manuscript, and such a difference might be identifiable in the modular graph for example by considering transitions within communities as in Figure 5 of Wientjes & Holroyd (2024; https://doi.org/10.1371/journal.pcbi.1011312). 4. In the "combined models" section, the formula for mean response times no longer seems to include the ntrials and lag10 regressors. I missed a motivation for dropping these regressors from the model and wonder if some of the associated variance might not be wrongly assigned to the successor representation regressor now, potentially influencing the estimates of lambda and gamma. 5. As a purely optional and supplementary suggestion, the authors might consider whether their estimation of successor representation-related parameters (lambda, gamma) show any reliability for these participants, for example by correlating estimates of an odd-even split of the trials (contribution to the likelihood) of a single graph learning task, or even considering the parameter values of two different graph learning tasks (for the participants that completed two different graphs). Especially, if estimates of the discount factor gamma show some reliability, this could maybe be taken as evidence that prediction error-driven learning (bootstrapping?) does contribute some variance to these data. 6. Some minor typographical errors: In the methods section "recency learning model", the beta(lag10) is not stylized into a greek letter. In the method section describing the "bootstrap analysis", it is described to identify an enhanced RT on the final step of "SR". I believe this should refer to "ST", being the last transition of the bootstrap sequence, rather than the abbreviation for the Successor Representation (SR; causing me some confusion initially). Reviewer #2: In this study, the authors examine response times in a serial reaction time task in which the stimuli trace out edges of a graph with one of three structures. Of particular interest, the modular graph has a temporal community structure as introduced in work by Anna Schapiro and colleagues. In previous work (e.g., Lynn, Papadopoulos, Kahn, & Bassett, 2020, Nat Physics) it was shown that human RTs are slower across community boundaries compared to within community. This finding is counterintuitive from a certain perspective; although in general RT is predicted by the entropy of the transition, the entropy of transitions within boundary are the same as across boundaries. Note that across-community edges are differentiated by powers of the transition matrix, this paper pursues the hypothesis that the successor representation (SR), which asymptotes as exponentially-weighted powers of the transition matrix, predicts human RTs. The idea is intuitively appealing; rather than ``moving'' along the true transition matrix, human learners ``move'' along the SR. The authors compare the SR to other simple models. By comparing specific sequences they can also evaluate whether the SR is updated by temporal difference learning or by simple aggregation of eligibility trace memories. There are some substantial methodological points (below) that speak to some of the empirical basis for the conclusions about the models, but these are relatively minor. The major result, which is very likely to be sound, is that a recency-biased model carries more variance in RT than the SR model. The observation of a recency effect here does not really get much attention (recency isn't part of standard SR) but it's pretty interesting. There are certainly extremely robust recency effects in (all kinds of) memory experiments with random lists of stimuli. Recency effects among responses are also quite widespread in RT tasks (sequential dependencies in simple choice RT tasks have been well-characterized since at least the mid-1970s). However, recency bias is not universally observed in SRT tasks. For instance Gureckis & Love (2010, Cog Sci) observed an ``anti-recency effect'' in an SRT task when stimuli were less likely than chance to repeat at short lags. Perhaps the recency effect in this dataset reflects some kind of learning about the statistical structure rather than simply decaying activations. Additional control analyses could possibly tease this out from the existing data. Perhaps a recency bias reflects an prior about the statistics of the world (Anderson & Schooler, 1991, Psych Sci; Gershman, et al., 2014, PLoS Comp Bio)? The other interesting result is that, by comparing RTs on specific motifs, the results favor a model in which the SR is constructed by updating with the current memory trace rather than bootstrapping via temporal difference learning. So, what to make of all this in the context of existing modeling on SR? Well, if the eligibility trace is allowed to cue subsequent states in addition to the SR, this would account for the recency effect. In models of laboratory memory, the exponentially-decaying eligibility trace is sometimes referred to as a temporal context vector (Gershman, ... Sederberg, 2012, Neural Comp). And the SR constructed from trace update rather than bootstrapping (and with \gamma=0) is (more or less) the temporal context model. Is this account falsified by the data? Presumably one would need to disentangle the exponential decay over powers of the transition matrix from \gamma and the exponential decay due to \lambda. In the temporal community graph at least, these seem to be perfectly confounded with one another, no? The paper would be strengthened a great deal by making the connection between the recency+trace SR model and temporal context models. This class of models has been extensively studied in laboratory memory tasks (although not any SRT papers that come to mind). Expository point: Methods/results should be fleshed out and, at least high level methods should be embedded. (If there's a supplement/appendix with this information, it didn't make it to this reviewer.) As it is, there are a bunch of results missing that make it difficult to evaluate the paper's empirical contribution and what can really be learned from the modeling. I don't see values of \beta_trial, \beta_{ntrials}, \beta_10 reported in the paper. One can learn a lot about whether the estimates of these parameters change across models (they should not). More broadly, the data should be reported in more detail. How do distributions of parameters across participants look? Do parameters trade off with one another? Minor Methodological points: There are some claims/suggestions that could be explicitly tested to be on solid empirical ground. For instance, ``Interestingly on the average over participants, we estimate a remark- ably consistent degree of temporal discounting across all three graph topologies..'' To make this observation one should test the model that supports this inference. The \gamma horizon seems to be about as big as could be sustained with these graphs, no? To seriously test the hypothesis, perhaps one can fix \gamma to be the same for all three graphs and compare that to the (nested) model with the \gamma's allowed to vary freely? All of the analyses in the paper model the effect of various variables on mean log RT. This is not unreasonable, but relies on an implicit assumption about the form of the RT distributions. Mean log RT is a pretty good proxy for drift rate in an evidence accumulation framework. But in at least some circumstances expectation is carried by non-decision time (Tiganj et al., 2022, JEP:G). And in at least some circumstances, recency is carried by the non-decision time (Bright, et al., 2022, bioRxiv). A change in non-decision time across conditions would not only change mean log RT but also \sigma. Tracking this down may turn out to be intricate but one could ask simple questions in an exploratory way. For instance, examination of RT distributions as a function of recency, or within/across-cluster transitions (etc) could be informative. Another straightforward approach would be to explicitly allow sigma as well as mu to vary for different variables. Those models are nested. As an aside, the shape of the RT distributions associated with the entropy effect (from the 2020 Nat Rev Phys paper) are also very interesting. Expository point related to the neural understanding of SR: The discussion writes: ``Alternatively, neurons representing eligible states might be reactivated after a delay, via history stored in some separate memory system '' True, but there is also very strong evidence for slowly changing neural activation expressed in firing rates. There is pretty good evidence for exponential decay of firing, as predicted by eligibility trace representations (e.g., Danskin et al., 2023, Sci Adv), in a variety of regions in a variety of tasks. However, unlike the standard RL account there appears to be a continuous spectrum of time constants across neurons within the population. Wouldn't Hebbian learning with a multiscale eligibility trace give a multi-scale SR? ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: None ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Sven Wientjes Reviewer #2: Yes: Marc William Howard [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions. Reproducibility:** To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1013696.r001
Revision 1
16 Oct 2025 Author Response Attachments Attachment Submitted filename: Response to Reviewers.pdf https://doi.org/10.1371/journal.pcbi.1013696.r002
3 Nov 2025 Decision Letter - Lyle J. Graham, Editor, Wolfgang Einhäuser, Editor Dear Dr. Kahn, We are pleased to inform you that your manuscript 'Trial-by-trial learning of successor representations in human behavior' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please also use this opportunity to fix the final issue (the possible deletion of the new limitation paragraph, see below). Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Wolfgang Einhäuser Academic Editor PLOS Computational Biology Lyle Graham Section Editor PLOS Computational Biology ********************************************************* I agree with reviewer #2 that the new limitations section is not really necessary, but to avoid an additional iteration, I would ask you to delete it (if you agree) when submitting the production material. Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have satisfactorily addressed all of my previous comments. Appropriate parameter and model recovery analyses are now reported. I am very excited to see the authors tested the reliability of the parameter estimates to the empirical data, and understand the reasoning to not include this in the manuscript. The expanded explanation of the roles of lambda and gamma is very helpful. As someone only superficially familiar with eligibility traces, I previously viewed them mainly as a computational shortcut to speed up learning. The revised text clarified that the inclusion of gamma in the decay also affects the final convergence of the model. I think this point could still be stated a bit more clearly, but the current version already conveys the essential idea well. Overall, the manuscript now makes a strong and valuable contribution to the literature on computational models of sequence learning and to the growing dialogue between reinforcement learning frameworks and models of episodic memory. Reviewer #2: The revision has addressed my concerns. I am happy with the response to my point 6---the RT distributions are pretty intriguing---but don't think the text on page 18 discussing it (starting with ``Limitations'') adds much. I agree that this is a topic for future investigation and would recommend omitting that paragraph. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: None ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes:** Sven Wientjes Reviewer #2: No https://doi.org/10.1371/journal.pcbi.1013696.r003
Formally Accepted
Acceptance Letter - Lyle J. Graham, Editor, Wolfgang Einhäuser, Editor PCOMPBIOL-D-25-00267R1 Trial-by-trial learning of successor representations in human behavior Dear Dr Kahn, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. For Research, Software, and Methods articles, you will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Anita Estes PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1013696.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .