MSc Dissertation Proposal 2017/2018
Prediction
of Video Frames to Observe After a Robot Action (id 9281)
Contacts:
Dr. Mihai Andries, mandries (at) isr.tecnico.ulisboa.pt
and Prof.
José Gaspar
Objectives
Prediction of action
effects plays an important role in machine learning research applied to
robotics. It allows a robot to reason on the effects of its actions, and to
plan its actions accordingly, so as to reach its
goals.
Current research in
this direction is mainly focused on predicting future video frames only by
using the past video sequence while passively observing the scene [1]. However,
in recent years the field of computer vision witnessed dramatic improvements in
the capacity of algorithms to make predictions conditioned
on actions of one or several frames of a video [2, 3, 4].
Similarly, this project
is also related to continuously predicting the future video frames that a robot
will observe while it performs a given action on a given object (from a
collection of available objects). Experimental validation can be made either on
the publicly available datasets or additional experiments can be done on our iCub humanoid robotic platform.
The tasks of the Master
student would be:
- Review
the state-of-the-art in video-frame prediction conditioned on actions.
- Implement
and compare state-of-the-art methods using code in C++ or Python or Matlab.
- Propose
ways to improve existing algorithms, such as
improve the quality of predictions, use more data-efficient
methods, or generalise the prediction of frames using
other modalities (e.g. proprioception).
References
[1] Michael Mathieu,
Camille Couprie, and Yann LeCun.
“Deep multi-scale video prediction beyond mean
square error”. In: arXiv preprint arXiv:1511.05440 (2015).
[2] Junhyuk
Oh, Xiaoxiao Guo, Honglak
Lee, Richard L Lewis, and Satinder Singh. “Action-Conditional Video Prediction
using Deep Networks in Atari Games". In: Advances
in Neural Information Processing Systems 28.
Ed. by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett. Curran
Associates, Inc., 2015, pp.2863-2871. url:
http://papers.nips.cc/paper/5859-actionconditional-video-prediction-using-deep-networks-in-atari-games.pdf.
[3] Chelsea Finn, Ian Goodfellow, and Sergey Levine. “Unsupervised Learning for
Physical Interaction through Video Prediction". In: Advances
in Neural Information Processing Systems 29.
Ed. by D. D. Lee, M. Sugiyama, U. V. Luxburg, I.
Guyon, and R. Garnett. Curran Associates, Inc., 2016, pp.64-72. url:
http://papers.nips.cc/paper/6161-unsupervised-learning-for-physicalinteraction-through-video-prediction.pdf.
[4] Chelsea Finn and
Sergey Levine. “Deep visual foresight for planning robot motion". In: Robotics
and Automation (ICRA), 2017 IEEE International Conference on.
IEEE. 2017, pp. 2786-2793.
Observations
The internship will
take place in the Computer Vision lab (VisLab) of the
Institute for Systems and Robotics (ISR-Lisboa) of
the Instituto Superior Técnico. The internship will be supervised by Dr. Mihai Andries, who is the postdoctoral researcher working on this
project.
In
order to apply or find out more about this
project, please contact Dr. Mihai Andries
(mandries@isr.tecnico.ulisboa.pt) or Atabak Dehban (adehban@isr.tecnico.ulisboa.pt).
More information:
http://users.isr.tecnico.ulisboa.pt/~jag/msc/msc_2017_2018.html