My PhD research focusses on model-based reinforcement learning (RL). I am actively working on probabilistic inference for transition function estimation.
The dominant part of RL research has focusses on model-free learning, where we directly learn a behavioural policy while sampling through the environment. However, these methods are generally highly data inefficient. More importantly, they do not naturally allow for targeted exploration, nor for transfer.
In model-based reinforcement learning, we first wish to predict the next state of the environment (given the current state and action). One can interpret this as learning to predict how the world will behave (its ‘physics’). There are a variety of potential benefits: 1) it may speed-up learning (by incorporating planning like updates over the model), 2) it may target exploration to less visited regions, and 3) transition dynamics naturally transfer between domains.
Active challenges and research
However, there are two important (probabilistic) challenges with transition function estimation:
- Stochasticity, which is the true probabilistic nature exhibited by the world (especially multi-modality).
- Uncertainty, which is the probability distribution induced over our predictions due to limited data.
Example The difference may be clear from a simple example. I give you a dice, and ask you whether it’s a fair dice. Initially, you can’t answer yet, since you haven’t observed enough data. Then, if you start throwing the dice, and all sides come up equally often, you may after some time conclude that it’s a fair dice. That is, you have seen enough data to reduce the model uncertainty. However, you can never predict what side will come up at the next throw, as stochasticity is an inherent property of the dice (at least to your observability).
Uncertainty is very fundamental to statistics and machine learning in general. Outcome stochasticity appears less often in standard supervised learning task, where the (conditional) outcome distribution is usually assumed to be unimodal (Gaussian). However, transition dynamics in real world scenario’s are frequently multi-modal.
Our work focusses on joint estimation (and separation!) of model stochasticity and model uncertainty. We built from work on deep generative models, variational inference and Bayesian deep learning.
Keywords: model-based reinforcement learning, (Bayesian) deep learning, variational inference, predictive uncertainty.
– Affective computing: The first half year of my PhD focused on computational models of emotion in reinforcement learning agents. This is also an important research direction of my supervisor Joost Broekens. We wrote a survey on this topic, and a research paper as well.
– Education: Even longer ago I was still a medical student (pdf).