Learned models of the environment provide reinforcement learning agents with flexible ways of making predictions about the environment. Models enable planning, i.e. using more computation to improve value functions or policies, without requiring additional environment interactions. In this talk, we investigate a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly self-consistent. This work covers possible ways to use self-consistency updates both for policy evaluation and control (Farquhar et al 20), as well as a proxy for epistemic uncertainty in exploration (Filos et al. 22).
Model-Value Self-Consistent Updates and Applications
April 4, 2023
1:00 pm
Zita Marinho
Zita Marinho is a Research Scientist at Deepmind, where she is currently working on reinforcement learning. She holds a dual PhD/MSc in Robotics from the Robotics Institute, and from IST University of Lisbon as part of the CMU/Portugal program. She graduated from her MSc. degree in Physics Engineering from Instituto Superior Técnico, Universidade de Lisboa in 2010. Her research interests lie in the intersection of machine learning algorithms and Natural Language Processing. She is particularly interested in studying how agents can interact and learn more effectively from those interactions. She studied during her PhD spectral algorithms for sequence prediction and planning. She was jointly advised by Prof. André Martins at Unbabel/IST, Prof. Geoffrey Gordon at the Machine Learning Department/CMU and Prof. Siddhartha Srinivasa from University of Washington.DeepmindSeminários
Últimos seminários
Cost-Sensitive Learning to Defer to Multiple Experts
March 2, 2026Large language models (LLMs) have emerged as strong contenders in machine translation. Yet, they often fall behind specialized neural machine…
Fair Federated Learning under Group-Specific Distributed Concept Drift
February 24, 2026Machine learning models can become unfair when different groups experience changes in data over time, a phenomenon called group-specific concept…
Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding
June 17, 2025Large language models (LLMs) have emerged as strong contenders in machine translation. Yet, they often fall behind specialized neural machine…
Speech as a Biomarker for Disease Detection
May 20, 2025Today’s overburdened health systems face numerous challenges, exacerbated by an aging population. Speech emerges as a ubiquitous biomarker with strong…

