Bayesian reinforcement learning
My PhD research focusses on Bayesian deep reinforcement learning, i.e., the application of probabilistic methods (uncertainties) to enhance performance of deep RL agents.
Reinforcement learning is the dominant paradigm to learn sequential decision-making from data. In RL we usually start learning with zero prior knowledge, and need to actively explore the environment and collect our own data to learn to solve the task. Therefore, we initially have limited data/knowledge available, which makes probabilistic methods a natural candidate to cope with these uncertainties.

We have worked on uncertainty methods at two levels: 1) the state-action value function and 2) the transition dynamics functions.

Value/policy uncertainty (paper): RL attempts to learn an policy and/or value function that optimizes some cumulative reward in the environment. Standard RL tracks the value as point estimates, where exploration is ensured by some random perturbation. We have looked at methods to improve exploration by tracking uncertainties about the value function and/or the return distribution. The full algorithm is called Double Uncertain Value Network.

Transition dynamics uncertainty (paper): A important branch of RL focusses on ‘model-based’ methods. These approaches do not only learn a value/policy function, but also approximate the environment transition dynamics. The potential benefit is increased data efficiency, because we can now plan ahead in the learned transition model, instead of requiring continuous new sample data. However, the transition model has two probabilistic challenges as well. It may be uncertainty due to limited data, and the environment may be truly stochastic. We looked at the second problem with the Conditional Variational Auto-Encoder.


Previous/other work
Affective computing: The first half year of my PhD focused on ‘Emotion in RL agents’ (which is also an important research line of my supervisor Joost Broekens). We wrote a survey and a research paper.

Planning (& supervised learning): Together with my former research group, we are looking at applying supervised learning methods to Rapidly-exploring Random Trees (RRT’s), a successful sampling-based planning algorithm in the robotics community (paper).

Computer Vision: For my master’s thesis I worked on computer vision, in particular activity recognition. (paper)

Education: Even longer ago I was still a medical student (paper).