Fig 1.
Artificial neural network (ANN) approach.
A) Traditional methods rely on computing the log-likelihood (LLH) of the data under the given model, and optimizing the likelihood to derive model parameter estimates. B) The ANN is trained to map parameter values onto data sequences using a large simulated data set; the trained network can then be used to estimate cognitive model parameters based on new data without the need to compute or approximate likelihood. C) The ANN structure inspired by [52] is suitable for data with strong inter-trial dependencies: it consists of an RNN and a fully connected feed-forward network, with an output containing ANN estimates of parameter values the data was simulated from for each agent. D) As in parameter estimation, traditional tools for model identification rely on likelihood to derive model comparison metrics (e.g., AIC, BIC) that are used to determine which model likely generated the data. E) ANN, instead, is trained to learn the mapping between data sequences and respective cognitive models the data was simulated from. F) The structure of the ANN follows the structure introduced for parameter estimation, with the key difference of the final layer containing the probability distribution over classes representing model candidates, with the highest probability class corresponding to the model the network identified as the one that likely generated the agent’s data.
Fig 2.
A) Parameter recovery loss from the held-out test set for the tractable-likelihood models (2P-RL, 4P-RL, BI, S-BI) using each of the tested methods. Loss is quantified as the mean squared error (MSE) based on the discrepancy between true and estimated parameters. Bars represent loss average for each parameter across all agents, with error bars representing standard error across agents. B) Parameter recovery from the 4P-RL model using MAP and GRU. ρ values represent the Spearman ρ correlation between true and estimated parameters. Red line represents a unity line (x = y) and black line represents a least squares regression line. All correlations were significant at p <.001.
Fig 3.
A) Parameter recovery loss from the held-out test set for the intractable-likelihood models (RL-LAS, HRL) using ABC and GRU network. Loss is quantified as the mean squared error (MSE) based on the discrepancy between true and estimated parameters. Bars represent MSE average for each parameter across all agents, with error bars representing standard error across agents ((S17 Fig) shows variability across seeds). B) Parameter recovery from the RL-LAS and HRL models using ABC (green) and GRU network (yellow). ρ values represent the Spearman ρ correlation between true and estimated parameters. Red line represents a unity line (x = y) and black line represents a least squares regression line.All correlations were significant at p <.001.
Fig 4.
Using evidential learning to evaluate uncertainty of parameter estimates for A) the 2-parameter RL model (tractable likelihood) and B) the RL model with latent attention states (intractable likelihood). Vertical lines around point estimates illustrate model uncertainty. We are showing only 100 data points for cleaner visualization, Spearman ρ values are computed based on the total number of agents in the held-out test data (3k).
Fig 5.
A) Confusion matrix of likelihood-tractable models from the PRL task based on 1) likelihood/AIC metric, and 2) ANN identification. AIC confusion matrix revealed a much higher degree of misclassification (e.g., true simulated model being incorrectly identified as a different model). B) Confusion matrix of likelihood-intractable models using ANN (2P-RL and RL-LAS models were simulated on the PRL task; HRL, BI and S-BI models were simulated on the HRL task).
Fig 6.
Robustness checks using different training (different line colors) and testing (x-axis) trial sequence lengths.
A) Parameter estimation in both RL-LAS and HRL models shows that training with a mixture of trial sequence lengths (purple line) yields more robust out-of-sample parameter value prediction compared to fixed trial sequence lengths. B) Best model identification results, performed on different combinations of model candidates, were also yielded by mixed trial sequence length training. The number of agents/simulations used for training was kept constant across all the tests (N agents = 30k).