Authors: Alessandro Lazaric, Mohammad Ghavamzadeh
Conference: ICML, 2010
Conference: ICML, 2010
Summary:
The paper talks about
multi-task Reinforcement Learning(RL) in an environment where the
number of samples for a given task is limited in number because of
the policy involved. This work assumes that the tasks share a similar
structure and hence the corresponding value functions (vfs) are
sampled from a common prior. Because of this assumption, the authors
are able to do joint learning of vfs, in both cases of vfs from same
task class or not. The paper stand out from others in its usage of
Hierarchical Bayesian approach to model the distribution over vfs in
parametric and non-parametric setting.
Strengths:
- Generative models and inference algorithms for both cases of learning (symmetric and asymmetric) considered.
- Modeling of value function similarity by HBM.
- Different modes of learning: symmetric parametric and asymmetric non-parametric learning
- Almost all the key machine learning areas like regression, Sampling, Bayesian modeling, Expectation Maximization, Dirichlet Process etc are touched upon here making it a paper with sound theoretical arguments.
- Transfer of information from the joint distribution of vfs to learn the value function for new task.
Weaknesses:
- Authors have compared three paradigms of STL, MCMTL and SCMTL but failed to compare, on the benchmark problems, how the other related techniques perform (given that they have quoted considerable number of related works) or even further, since the authors have significantly adapted ideas from literature, they could have given a comparison of BMTL with already published results.
- The sampling techniques are computationally expensive and they are employed for asymmetric settings. Discussion of time complexity would have helped.
- The paper appeared to be an amalgamation of already established techniques, combining them in some new combination and hence it had frequent referrals to old papers for all important parameters and results which made its reading hard. In that sense, the paper is not self contained.
- No clear experimental setup to corroborate the ability to handle undefined number of classes.
- It is surprising to see that the performance dips in all cases when the number of samples increase. While it is good to see that for limited samples and increase in number of tasks, the methods do well, the proposed method should be improved to take into account large number of samples, if available.
Next steps/Discussion:
- Referring to figure 5c, it would be good to have discussion about why MCMTL fails when the number of tasks is limited in number.
- It is clear that there is some kind of transfer learning happening while learning the value function of a newly observed task. It would be interesting to analyze under what paradigm of transfer learning this paper falls into.
- It would be useful to know types of features usually considered for representing vfs in RL, esp for benchmark problems like inverted pendulum.
- Since RL is predominantly used in Robotics, it would be good to know a real world example where the vfs are from same prior.
- How is simple Gaussian processes different from GPTD?