Saturday, February 18, 2012

Bayesian Multi-Task Reinforcement Learning

Authors: Alessandro Lazaric, Mohammad Ghavamzadeh 

Conference: ICML, 2010


Summary:
The paper talks about multi-task Reinforcement Learning(RL) in an environment where the number of samples for a given task is limited in number because of the policy involved. This work assumes that the tasks share a similar structure and hence the corresponding value functions (vfs) are sampled from a common prior. Because of this assumption, the authors are able to do joint learning of vfs, in both cases of vfs from same task class or not. The paper stand out from others in its usage of Hierarchical Bayesian approach to model the distribution over vfs in parametric and non-parametric setting.
Strengths:
  1. Generative models and inference algorithms for both cases of learning (symmetric and asymmetric) considered.
  2. Modeling of value function similarity by HBM.
  3. Different modes of learning: symmetric parametric and asymmetric non-parametric learning
  4. Almost all the key machine learning areas like regression, Sampling, Bayesian modeling, Expectation Maximization, Dirichlet Process etc are touched upon here making it a paper with sound theoretical arguments.
  5. Transfer of information from the joint distribution of vfs to learn the value function for new task.
Weaknesses:
  1. Authors have compared three paradigms of STL, MCMTL and SCMTL but failed to compare, on the benchmark problems, how the other related techniques perform (given that they have quoted considerable number of related works) or even further, since the authors have significantly adapted ideas from literature, they could have given a comparison of BMTL with already published results.
  2. The sampling techniques are computationally expensive and they are employed for asymmetric settings. Discussion of time complexity would have helped.
  3. The paper appeared to be an amalgamation of already established techniques, combining them in some new combination and hence it had frequent referrals to old papers for all important parameters and results which made its reading hard. In that sense, the paper is not self contained.
  4. No clear experimental setup to corroborate the ability to handle undefined number of classes.
  5. It is surprising to see that the performance dips in all cases when the number of samples increase. While it is good to see that for limited samples and increase in number of tasks, the methods do well, the proposed method should be improved to take into account large number of samples, if available.
Next steps/Discussion:
  1. Referring to figure 5c, it would be good to have discussion about why MCMTL fails when the number of tasks is limited in number.
  2. It is clear that there is some kind of transfer learning happening while learning the value function of a newly observed task. It would be interesting to analyze under what paradigm of transfer learning this paper falls into.
  3. It would be useful to know types of features usually considered for representing vfs in RL, esp for benchmark problems like inverted pendulum.
  4. Since RL is predominantly used in Robotics, it would be good to know a real world example where the vfs are from same prior.
  5. How is simple Gaussian processes different from GPTD?

No comments:

Post a Comment