My PhD research focusses on model-based reinforcement learning (RL). I am actively working on probabilistic inference for transition function estimation.
The dominant part of RL research has focusses on model-free learning, where we directly learn a behavioural policy while sampling through the environment. However, these methods are generally highly data inefficient. More importantly, they do not naturally allow for targeted exploration, nor for transfer.
In model-based reinforcement learning, we first wish to predict the next state of the environment (given the current state and action). One can interpret this as learning to predict how the world will behave (its ‘physics’). There are a variety of potential benefits: 1) it may speed-up learning (by incorporating planning like updates over the model), 2) it may target exploration to less visited regions, and 3) transition dynamics naturally transfer between domains.
Active challenges and research
However, there are two important (probabilistic) challenges with transition function estimation:
- Stochasticity, which is the true probabilistic nature exhibited by the world (especially multi-modality).
- Uncertainty, which is the probability distribution induced over our predictions due to limited data.
Example The difference may be clear from a simple example. I give you a dice, and ask you whether it’s a fair dice. Initially, you can’t answer yet, since you haven’t observed enough data. Then, if you start throwing the dice, and all sides come up equally often, you may after some time conclude that it’s a fair dice. That is, you have seen enough data to reduce the model uncertainty. However, you can never predict what side will come up at the next throw, as stochasticity is an inherent property of the dice (at least to your observability).
Uncertainty is very fundamental to statistics and machine learning in general. Outcome stochasticity appears less often in standard supervised learning task, where the (conditional) outcome distribution is usually assumed to be unimodal (Gaussian). However, transition dynamics in real world scenario’s are frequently multi-modal.
Our work focusses on joint estimation (and separation!) of model stochasticity and model uncertainty. We built from work on deep generative models, variational inference and Bayesian deep learning.
Keywords: model-based reinforcement learning, (Bayesian) deep learning, variational inference, predictive uncertainty.
– Affective computing: The first half year of my PhD focussed on computational models of emotion (affective computing) in RL (Markov Decision Process-based) agents. This is also an important research direction of my supervisor Joost Broekens. Should you be interested, there is an example paper here, and a survey on this topic is currently under review.
– Computer Vision: For my master’s thesis I worked on novelty detection (which gets really close to the uncertainty estimation discussed above) for action recognition (which has no connection to the above). Short and long versions.
– Education: Even longer ago I was still a medical student click.