The interpretation of computational model parameters depends on the context

Reinforcement Learning (RL) models have revolutionized the cognitive and brain sciences, promising to explain behavior from simple conditioning to complex problem solving, to shed light on developmental and individual differences, and to anchor cognitive processes in specific brain mechanisms. However, the RL literature increasingly reveals contradictory results, which might cast doubt on these claims. We hypothesized that many contradictions arise from two commonly-held assumptions about computational model parameters that are actually often invalid: That parameters generalize between contexts (e.g. tasks, models) and that they capture interpretable (i.e. unique, distinctive) neurocognitive processes. To test this, we asked 291 participants aged 8–30 years to complete three learning tasks in one experimental session, and fitted RL models to each. We found that some parameters (exploration / decision noise) showed significant generalization: they followed similar developmental trajectories, and were reciprocally predictive between tasks. Still, generalization was significantly below the methodological ceiling. Furthermore, other parameters (learning rates, forgetting) did not show evidence of generalization, and sometimes even opposite developmental trajectories. Interpretability was low for all parameters. We conclude that the systematic study of context factors (e.g. reward stochasticity; task volatility) will be necessary to enhance the generalizability and interpretability of computational cognitive models.

Maria Katharina Eckstein, Sarah L Master, Liyu Xia, Ronald E Dahl, Linda Wilbrecht, Anne GE Collins, The interpretation of computational model parameters depends on the context, eLife (2022) 11:e75474. https://doi.org/10.7554/eLife.75474

The interpretation of computational model parameters depends on the context2022-11-07T12:53:57+00:00

Activation, but not inhibition, of the indirect pathway disrupts choice rejection in a freely moving, multiple-choice foraging task

The dorsomedial striatum (DMS) plays a key role in action selection, but less is known about how direct and indirect pathway spiny projection neurons (dSPNs and iSPNs, respectively) contribute to choice rejection in freely moving animals. Here, we use pathway-specific chemogenetic manipulation during a serial choice foraging task to test the role of dSPNs and iSPNs in learned choice rejection. We find that chemogenetic activation, but not inhibition, of iSPNs disrupts rejection of nonrewarded choices, contrary to predictions of a simple “select/suppress” heuristic. Our findings suggest that iSPNs’ role in stopping and freezing does not extend in a simple fashion to choice rejection in an ethological, freely moving context. These data may provide insights critical for the successful design of interventions for addiction or other conditions in which it is desirable to strengthen choice rejection.

Kristen Delevich,  Benjamin Hoshal, Lexi Z. Zhou, Yuting Zhang, Satya Vedula, Wan Chen Lin, Juliana Chase, Anne G.E. Collins, Linda Wilbrecht, Activation, but not inhibition, of the indirect pathway disrupts choice rejection in a freely moving, multiple-choice foraging task, 40(4) Cell Reports 111129 (July 26, 2022). DOI: https://doi.org/10.1016/j.celrep.2022.111129

Activation, but not inhibition, of the indirect pathway disrupts choice rejection in a freely moving, multiple-choice foraging task2022-07-29T03:19:24+00:00

Reinforcement learning and Bayesian inference provide complementary models for the unique advantage of adolescents in stochastic reversal

During adolescence, youth venture out, explore the wider world, and are challenged to learn how to navigate novel and uncertain environments. We investigated how performance changes across adolescent development in a stochastic, volatile reversal-learning task that uniquely taxes the balance of persistence and flexibility. In a sample of 291 participants aged 8–30, we found that in the mid-teen years, adolescents outperformed both younger and older participants. We developed two independent cognitive models, based on Reinforcement learning (RL) and Bayesian inference (BI). The RL parameter for learning from negative outcomes and the BI parameters specifying participants’ mental models were closest to optimal in mid-teen adolescents, suggesting a central role in adolescent cognitive processing. By contrast, persistence and noise parameters improved monotonically with age. We distilled the insights of RL and BI using principal component analysis and found that three shared components interacted to form the adolescent performance peak: adult-like behavioral quality, child-like time scales, and developmentally-unique processing of positive feedback. This research highlights adolescence as a neurodevelopmental window that can create performance advantages in volatile and uncertain environments. It also shows how detailed insights can be gleaned by using cognitive models in new ways.

Maria K. Eckstein, Sarah L. Master, Ronald E. Dahl, Linda Wilbrecht, Anne G.E. Collins, Reinforcement learning and Bayesian inference provide complementary models for the unique advantage of adolescents in stochastic reversal, Developmental Cognitive Neuroscience, Volume 55, 2022, 101106, ISSN 1878-9293, https://doi.org/10.1016/j.dcn.2022.101106, https://www.sciencedirect.com/science/article/pii/S1878929322000494

Reinforcement learning and Bayesian inference provide complementary models for the unique advantage of adolescents in stochastic reversal2022-06-18T20:50:11+00:00

Modeling changes in probabilistic reinforcement learning during adolescence

In the real world, many relationships between events are uncertain and probabilistic. Uncertainty is also likely to be a more common feature of daily experience for youth because they have less experience to draw from than adults. Some studies suggest probabilistic learning may be inefficient in youths compared to adults, while others suggest it may be more efficient in youths in mid adolescence. Here we used a probabilistic reinforcement learning task to test how youth age 8-17 (N = 187) and adults age 18-30 (N = 110) learn about stable probabilistic contingencies. Performance increased with age through early-twenties, then stabilized. Using hierarchical Bayesian methods to fit computational reinforcement learning models, we show that all participants’ performance was better explained by models in which negative outcomes had minimal to no impact on learning. The performance increase over age was driven by 1) an increase in learning rate (i.e. decrease in integration time scale); 2) a decrease in noisy/exploratory choices. In mid-adolescence age 13-15, salivary testosterone and learning rate were positively related. We discuss our findings in the context of other studies and hypotheses about adolescent brain development.

Liyu Xia, Sarah L. Master, Maria K. Eckstein, Beth Baribault, Ronald E. Dahl, Linda Wilbrecht, Anne Gabrielle Eva Collins, Modeling changes in probabilistic reinforcement learning during adolescence, July 2021, https://doi.org/10.1371/journal.pcbi.1008524

Modeling changes in probabilistic reinforcement learning during adolescence2022-06-18T20:53:19+00:00

Learning Rates Are Not All the Same: The Interpretation of Computational Model Parameters Depends on the Context

Reinforcement Learning (RL) has revolutionized the cognitive and brain sciences, explaining behavior from simple conditioning to problem solving, across the life span, and anchored in brain function. However, discrepancies in results are increasingly apparent between studies, particularly in the developmental literature. To better understand these, we investigated to which extent parameters generalize between tasks and models, and capture specific and uniquely interpretable (neuro)cognitive processes. 291 participants aged 8-30 years completed three learning tasks in a single session, and were fitted using state-of-the-art RL models. RL decision noise/exploration parameters generalized well between tasks, decreasing between ages 8-17. Learning rates for negative feedback did not generalize, and learning rates for positive feedback showed intermediate generalizability, dependent on task similarity. These findings can explain discrepancies in the existing literature. Future research therefore needs to carefully consider task characteristics when relating findings across studies, and develop strategies to computationally model how context impacts behavior.

Maria K Eckstein, Sarah L Master, Liyu Xia, Ronald E Dahl, Linda Wilbrecht, Anne Gabrielle Eva Collins, Learning Rates Are Not All the Same: The Interpretation of Computational Model Parameters Depends on the Context (May 2021), https://www.biorxiv.org/content/10.1101/2021.05.28.446162v1
doi: https://doi.org/10.1101/2021.05.28.446162

Learning Rates Are Not All the Same: The Interpretation of Computational Model Parameters Depends on the Context2022-06-18T20:51:56+00:00

What do Reinforcement Learning Models Measure? Interpreting Model Parameters in Cognition and Neuroscience

Reinforcement learning (RL) is a concept that has been invaluable to research fields including machine learning, neuroscience, and cognitive science. However, what RL entails partly differs between fields, leading to difficulties when interpreting and translating findings.

This paper lays out these differences and zooms in on cognitive (neuro)science, revealing that we often overinterpret RL modeling results, with severe consequences for future research. Specifically, researchers often assume—implicitly—that model parameters generalize between tasks, models, and participant populations, despite overwhelming negative empirical evidence for this assumption. We also often assume that parameters measure specific, unique, and meaningful (neuro)cognitive processes, a concept we call interpretability, for which empirical evidence is also lacking.

We conclude that future computational research needs to pay increased attention to these implicit assumptions when using RL models, and suggest an alternative framework that resolves these issues and allows us to unleash the potential of RL in cognitive (neuro)science.

Maria Eckstein, Linda Wilbrecht, Anne Collins, What do Reinforcement Learning Models Measure? Interpreting Model Parameters in Cognition and Neuroscience (May 2021), https://psyarxiv.com/e7kwx/

 

What do Reinforcement Learning Models Measure? Interpreting Model Parameters in Cognition and Neuroscience2022-06-18T20:52:15+00:00

The Unique Advantage of Adolescents in Probabilistic Reversal: Reinforcement Learning and Bayesian Inference Provide Adequate and Complementary Models

During adolescence, youth venture out, explore the wider world, and are challenged to learn how to navigate novel and uncertain environments. We investigated whether adolescents are uniquely adapted to this transition, compared to younger children and adults. In a stochastic, volatile reversal learning task with a sample of 291 participants aged 8-30, we found that adolescents 13-15 years old outperformed both younger and older participants. We developed two independent cognitive models, one based on Reinforcement learning (RL) and the other Bayesian inference (BI), and used hierarchical Bayesian model fitting to assess developmental changes in underlying cognitive mechanisms. Choice parameters in both models improved monotonously. By contrast, RL update parameters and BI mental-model parameters peaked closest to optimal values in 13-to-15-year-olds. Combining both models using principal component analysis yielded new insights, revealing that three readily-interpretable components contributed to the early-to mid-adolescent performance peak. This research highlights early-to mid-adolescence as a neurodevelopmental window that may be more optimal for behavioral adjustment in volatile and uncertain environments. It also shows how increasingly detailed insights can be gleaned by invoking different cognitive models.

Maria K. Eckstein, Sarah L. Master, Ronald E. Dahl, Linda Wilbrecht, Anne G.E. Collins, The Unique Advantage of Adolescents in Probabilistic Reversal: Reinforcement Learning and Bayesian Inference Provide Adequate and Complementary Models (Mar. 2021) https://www.biorxiv.org/content/10.1101/2020.07.04.187971v2.full
doi: https://doi.org/10.1101/2020.07.04.187971

The Unique Advantage of Adolescents in Probabilistic Reversal: Reinforcement Learning and Bayesian Inference Provide Adequate and Complementary Models2022-06-18T20:52:28+00:00

Choice suppression is achieved through opponent but not independent function of the striatal indirect pathway in mice

The dorsomedial striatum (DMS) plays a key role in action selection, but little is known about how direct and indirect pathway spiny projection neurons (dSPNs and iSPNs) contribute to choice suppression in freely moving animals. Here, we used pathway-specific chemogenetic manipulation during a serial choice foraging task to test opposing predictions for iSPN function generated by two theories: 1) the ‘select/suppress’ heuristic which suggests iSPN activity is required to suppress alternate choices and 2) the network-inspired Opponent Actor Learning model (OpAL) which proposes that the weighted difference of dSPN and iSPN activity determines choice. We found that chemogenetic activation, but not inhibition, of iSPNs disrupted learned suppression of nonrewarded choices, consistent with the predictions of the OpAL model. Our findings suggest that iSPNs’ role in stopping and freezing does not extend in a simple fashion to choice suppression. These data may provide insights critical for the successful design of interventions for addiction or other conditions in which suppression of behavior is desirable.

Kristen Delevich, Benjamin Hoshal, Anne GE Collins, Linda Wilbrecht, Choice suppression is achieved through opponent but not independent function of the striatal indirect pathway in mice, BioRxiv, https://www.biorxiv.org/content/10.1101/675850v3
doi: https://doi.org/10.1101/675850

 

Choice suppression is achieved through opponent but not independent function of the striatal indirect pathway in mice2022-06-18T20:52:41+00:00

Distentangling the systems contributing to changes in learning during adolescence

Multiple neurocognitive systems contribute simultaneously to learning. For example, dopamine and basal ganglia (BG) systems are thought to support reinforcement learning (RL) by incrementally updating the value of choices, while the prefrontal cortex (PFC) contributes different computations, such as actively maintaining precise information in working memory (WM). It is commonly thought that WM and PFC show more protracted development than RL and BG systems, yet their contributions are rarely assessed in tandem. Here, we used a simple learning task to test how RL and WM contribute to changes in learning across adolescence. We tested 187 subjects ages 8 to 17 and 53 adults (25-30). Participants learned stimulus-action associations from feedback; the learning load was varied to be within or exceed WM capacity. Participants age 8-12 learned slower than participants age 13-17, and were more sensitive to load. We used computational modeling to estimate subjects’ use of WM and RL processes. Surprisingly, we found more protracted changes in RL than WM during development. RL learning rate increased with age until age 18 and WM parameters showed more subtle, gender- and puberty-dependent changes early in adolescence. These results can inform education and intervention strategies based on the developmental science of learning.

Sarah L. Master, Maria K. Eckstein, Neta Gotlieb, Ronald Dahl, Linda Wilbrecht, Anne G.E. Collins, Distentangling the systems contributing to changes in learning during adolescence, Developmental Cognitive Neuroscience (2019),
https://doi.org/10.1016/j.dcn.2019.100732, http://www.sciencedirect.com/science/article/pii/S1878929319303196

Distentangling the systems contributing to changes in learning during adolescence2019-11-20T23:08:54+00:00