MSc Dissertation Proposal 2017/2018

Prediction of Video Frames to Observe After a Robot Action (id 9281)

Contacts: Dr. Mihai Andries, mandries (at) isr.tecnico.ulisboa.pt

and Prof. José Gaspar

Objectives

Prediction of action effects plays an important role in machine learning research applied to robotics. It allows a robot to reason on the effects of its actions, and to plan its actions accordingly, so as to reach its goals.

Current research in this direction is mainly focused on predicting future video frames only by using the past video sequence while passively observing the scene [1]. However, in recent years the field of computer vision witnessed dramatic improvements in the capacity of algorithms to make predictions conditioned on actions of one or several frames of a video [2, 3, 4].

Similarly, this project is also related to continuously predicting the future video frames that a robot will observe while it performs a given action on a given object (from a collection of available objects). Experimental validation can be made either on the publicly available datasets or additional experiments can be done on our iCub humanoid robotic platform.

The tasks of the Master student would be:

- Review the state-of-the-art in video-frame prediction conditioned on actions.

- Implement and compare state-of-the-art methods using code in C++ or Python or Matlab.

- Propose ways to improve existing algorithms, such as improve the quality of predictions, use more data-efficient methods, or generalise the prediction of frames using other modalities (e.g. proprioception).

References

[1] Michael Mathieu, Camille Couprie, and Yann LeCun. “Deep multi-scale video prediction beyond mean square error”. In: arXiv preprint arXiv:1511.05440 (2015).

[2] Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L Lewis, and Satinder Singh. “Action-Conditional Video Prediction using Deep Networks in Atari Games". In: Advances in Neural Information Processing Systems 28. Ed. by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett. Curran Associates, Inc., 2015, pp.2863-2871. url: http://papers.nips.cc/paper/5859-actionconditional-video-prediction-using-deep-networks-in-atari-games.pdf.

[3] Chelsea Finn, Ian Goodfellow, and Sergey Levine. “Unsupervised Learning for Physical Interaction through Video Prediction". In: Advances in Neural Information Processing Systems 29. Ed. by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett. Curran Associates, Inc., 2016, pp.64-72. url: http://papers.nips.cc/paper/6161-unsupervised-learning-for-physicalinteraction-through-video-prediction.pdf.

[4] Chelsea Finn and Sergey Levine. “Deep visual foresight for planning robot motion". In: Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE. 2017, pp. 2786-2793.

Observations

The internship will take place in the Computer Vision lab (VisLab) of the Institute for Systems and Robotics (ISR-Lisboa) of the Instituto Superior Técnico. The internship will be supervised by Dr. Mihai Andries, who is the postdoctoral researcher working on this project.

In order to apply or find out more about this project, please contact Dr. Mihai Andries (mandries@isr.tecnico.ulisboa.pt) or Atabak Dehban (adehban@isr.tecnico.ulisboa.pt).

More information:

http://users.isr.tecnico.ulisboa.pt/~jag/msc/msc_2017_2018.html