Loading Events
This event has passed.

Abstract

The intersection of expressive, general-purpose function approximators, such as neural networks, with general-purpose model-free reinforcement learning (RL) algorithms holds the promise of automating a wide range of robotic behaviors: reinforcement learning provides the formalism for reasoning about sequential decision making, while large neural networks can process high-dimensional and noisy observations to provide a general representation for any behavior with minimal manual engineering. However, applying model-free RL algorithms with multilayer neural networks (i.e., deep RL) to real-world robotic control problems has proven to be very difficult in practice: the sample complexity of model-free methods tends to be quite high, and training tends to yield high-variance results. In this talk, I will discuss how maximum entropy principle can lead to a family of new algorithms that are better suitable for real-world robotic applications. These algorithms can train stochastic policies by combining exploration and exploitation into a single objective, they are more sample efficient, and they work consistently across different initial conditions, tasks, and domains. In the last part of the talk, I will discuss what are the missing components to fully enable deep RL for robotic applications and propose future research directions to fill these gaps.

Bio

Tuomas Haarnoja works in the intersection of reinforcement learning and robotics. The primary focus of his research is on reinforcement learning problems inspired by real-world robot applications that require good sample efficiency, reliability, safe exploration, and minimal supervision. Tuomas holds a PhD degree from the University of California, Berkeley, where he was advised by Pieter Abbeel and Sergey Levine. During his doctoral studies, he also spent time as a Student Researcher at Google in the Google Brain team. Tuomas is mostly known for his work on maximum entropy reinforcement learning, which provides a theoretically grounded framework for learning stochastic policies that are both sample efficient and reliable, and its applications to robotic manipulation and locomotion.