Learning Rates Are Not All the Same: The Interpretation of Computational Model Parameters Depends on the Context

Reinforcement Learning (RL) has revolutionized the cognitive and brain sciences, explaining behavior from simple conditioning to problem solving, across the life span, and anchored in brain function. However, discrepancies in results are increasingly apparent between studies, particularly in the developmental literature. To better understand these, we investigated to which extent parameters generalize between tasks and models, and capture specific and uniquely interpretable (neuro)cognitive processes. 291 participants aged 8-30 years completed three learning tasks in a single session, and were fitted using state-of-the-art RL models. RL decision noise/exploration parameters generalized well between tasks, decreasing between ages 8-17. Learning rates for negative feedback did not generalize, and learning rates for positive feedback showed intermediate generalizability, dependent on task similarity. These findings can explain discrepancies in the existing literature. Future research therefore needs to carefully consider task characteristics when relating findings across studies, and develop strategies to computationally model how context impacts behavior.

Maria K Eckstein, Sarah L Master, Liyu Xia, Ronald E Dahl, Linda Wilbrecht, Anne Gabrielle Eva Collins, Learning Rates Are Not All the Same: The Interpretation of Computational Model Parameters Depends on the Context (May 2021), https://www.biorxiv.org/content/10.1101/2021.05.28.446162v1
doi: https://doi.org/10.1101/2021.05.28.446162

Learning Rates Are Not All the Same: The Interpretation of Computational Model Parameters Depends on the Context2021-06-02T16:28:08+00:00

What do Reinforcement Learning Models Measure? Interpreting Model Parameters in Cognition and Neuroscience

Reinforcement learning (RL) is a concept that has been invaluable to research fields including machine learning, neuroscience, and cognitive science. However, what RL entails partly differs between fields, leading to difficulties when interpreting and translating findings.

This paper lays out these differences and zooms in on cognitive (neuro)science, revealing that we often overinterpret RL modeling results, with severe consequences for future research. Specifically, researchers often assume—implicitly—that model parameters generalize between tasks, models, and participant populations, despite overwhelming negative empirical evidence for this assumption. We also often assume that parameters measure specific, unique, and meaningful (neuro)cognitive processes, a concept we call interpretability, for which empirical evidence is also lacking.

We conclude that future computational research needs to pay increased attention to these implicit assumptions when using RL models, and suggest an alternative framework that resolves these issues and allows us to unleash the potential of RL in cognitive (neuro)science.

Maria Eckstein, Linda Wilbrecht, Anne Collins, What do Reinforcement Learning Models Measure? Interpreting Model Parameters in Cognition and Neuroscience (May 2021), https://psyarxiv.com/e7kwx/

 

What do Reinforcement Learning Models Measure? Interpreting Model Parameters in Cognition and Neuroscience2021-06-02T16:27:25+00:00

The Unique Advantage of Adolescents in Probabilistic Reversal: Reinforcement Learning and Bayesian Inference Provide Adequate and Complementary Models

During adolescence, youth venture out, explore the wider world, and are challenged to learn how to navigate novel and uncertain environments. We investigated whether adolescents are uniquely adapted to this transition, compared to younger children and adults. In a stochastic, volatile reversal learning task with a sample of 291 participants aged 8-30, we found that adolescents 13-15 years old outperformed both younger and older participants. We developed two independent cognitive models, one based on Reinforcement learning (RL) and the other Bayesian inference (BI), and used hierarchical Bayesian model fitting to assess developmental changes in underlying cognitive mechanisms. Choice parameters in both models improved monotonously. By contrast, RL update parameters and BI mental-model parameters peaked closest to optimal values in 13-to-15-year-olds. Combining both models using principal component analysis yielded new insights, revealing that three readily-interpretable components contributed to the early-to mid-adolescent performance peak. This research highlights early-to mid-adolescence as a neurodevelopmental window that may be more optimal for behavioral adjustment in volatile and uncertain environments. It also shows how increasingly detailed insights can be gleaned by invoking different cognitive models.

Maria K. Eckstein, Sarah L. Master, Ronald E. Dahl, Linda Wilbrecht, Anne G.E. Collins, The Unique Advantage of Adolescents in Probabilistic Reversal: Reinforcement Learning and Bayesian Inference Provide Adequate and Complementary Models (Mar. 2021) https://www.biorxiv.org/content/10.1101/2020.07.04.187971v2.full
doi: https://doi.org/10.1101/2020.07.04.187971

The Unique Advantage of Adolescents in Probabilistic Reversal: Reinforcement Learning and Bayesian Inference Provide Adequate and Complementary Models2021-06-02T16:28:29+00:00

Choice suppression is achieved through opponent but not independent function of the striatal indirect pathway in mice

The dorsomedial striatum (DMS) plays a key role in action selection, but little is known about how direct and indirect pathway spiny projection neurons (dSPNs and iSPNs) contribute to choice suppression in freely moving animals. Here, we used pathway-specific chemogenetic manipulation during a serial choice foraging task to test opposing predictions for iSPN function generated by two theories: 1) the ‘select/suppress’ heuristic which suggests iSPN activity is required to suppress alternate choices and 2) the network-inspired Opponent Actor Learning model (OpAL) which proposes that the weighted difference of dSPN and iSPN activity determines choice. We found that chemogenetic activation, but not inhibition, of iSPNs disrupted learned suppression of nonrewarded choices, consistent with the predictions of the OpAL model. Our findings suggest that iSPNs’ role in stopping and freezing does not extend in a simple fashion to choice suppression. These data may provide insights critical for the successful design of interventions for addiction or other conditions in which suppression of behavior is desirable.

Kristen Delevich, Benjamin Hoshal, Anne GE Collins, Linda Wilbrecht, Choice suppression is achieved through opponent but not independent function of the striatal indirect pathway in mice, BioRxiv, https://www.biorxiv.org/content/10.1101/675850v3
doi: https://doi.org/10.1101/675850

 

Choice suppression is achieved through opponent but not independent function of the striatal indirect pathway in mice2021-06-02T16:29:27+00:00