In this talk I will introduce a combination of moment based predictive models with deep reinforcement learning architectures, Recurrent Predictive State Policy (RPSP) networks. Predictive state serves as an equivalent representation of a belief state. Therefore, the policy component of the RPSP-network can be purely reactive, simplifying training while still allowing optimal behaviour. We show the efficacy of RPSP-networks under partial observability on a set of robotic control tasks from OpenAI Gym. We empirically show that RPSP-networks perform well compared with memory-preserving networks such as GRUs, as well as finite memory models. This work was done in collaboration with Ahmed Hefny at CMU.
Kernel and Moment Based Prediction and Planning
March 6, 2018
1:00 pm