Learned models of the environment provide reinforcement learning agents with flexible ways of making predictions about the environment. Models enable planning, i.e. using more computation to improve value functions or policies, without requiring additional environment interactions. In this talk, we investigate a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly self-consistent. This work covers possible ways to use self-consistency updates both for policy evaluation and control (Farquhar et al 20), as well as a proxy for epistemic uncertainty in exploration (Filos et al. 22).
Menu
Model-Value Self-Consistent Updates and Applications
April 4, 2023
1:00 pm