This is an old revision of the document!
PhD position at the Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Motivation
As robots increasingly become a part of daily life in our society, there is a preeminent need for the robots to be able to perceive humans in a more implicit manner in order to make human-robot interaction as natural as possible. For instance, in crowded urban environments, robots must use their own vision (or a network of sensors in the environment) to distinguish between those humans who require the robot's attention for some cooperative/collaborative purpose and those humans who simply occupy the same environment for other possible reasons. Such visual classification based on gesture recognition, emotion detection, etc., should precede any subsequent direct interaction (e.g., verbal) between the robot and a human in order to make the overall human-robot interaction natural and less complicated for the untrained human users of the robots. Simultaneously, diverse vision-based functionalities in robots are essential to accomplish complex tasks by human-robot or robot-only teams that involve interaction and/or collaboration. Such functionalities can range from simpler ones, e.g., single object or person detection, recognition and tracking using a single static camera to more complex ones, e.g., tracking multitude of people in crowded and highly dynamic environments and at the same time perceiving the emotional response of humans with whom the robot is directly interacting. Thanks to networked robot systems (NRS), presence of multiple mobile sensors (e.g., micro aerial vehicles equipped with camera) or static sensors (e.g., wall/ceiling mounted network cameras) provide a strong foundation to tackle such complex functionalities for real time applications. The focus of this thesis will, therefore, be on the issues of scalability and real time applicability of multiple vision-based functionalities in an NRS where human-robot interaction is one of the most essential components.
Keywords
Sensor fusion, Cooperative perception, Person tracking; detection and tracking from non-inertial frames; face and gesture recognition; stereo-vision systems; motion capture systems; human-robot interaction, multi-robot systems.
Summary of Global Objectives
Expected objectives of this PhD thesis are:
Expected Qualifications and Skills of the Candidate
Selection Procedure
Interested candidates who meet the above mentioned requirements should send the following documents as soon as possible (all in pdf format) to aahmad@isr.ist.utl.pt
Selected candidate will be expected to enroll in the PhD program in the beginning of September 2014 at the University of Tübingen and will carry out their research work at Max Planck Institute for Biological Cybernetics, Tübingen. However, prior to the PhD enrollment, the candidate will be expected to undertake an additional research internship at the Institute for Systems and Robotics in Instituto Superior Técnico, Lisbon. The internship is foreseen for a period of 3-4 months starting around May 2014.
Other Information
Homepage of Max Planck Institute for Biological Cybernetics (MPI-KYB) at Tübingen, Germany. http://www.kyb.tuebingen.mpg.de/
Homepage of Institute for Systems and Robotics, Lisbon, Portugal http://welcome.isr.ist.utl.pt/home/
Brief Description of Work
In the context of this PhD thesis work, a Network Robot System (NRS) will consist of i) a mobile robot with an omni-directional chassis equipped with vision sensors and simple actuators (arm/gripper), ii) multiple micro aerial vehicles (MAVs), and iii) static sensors fixed within the environment, e.g., network cameras.
Highly Scalable Sensor Fusion
To achieve robust vision-based functionalities through an NRS, one needs to perform optimal sensor fusion. However, as environments scale up in size and feature-richness, the amount of visual information that needs to be processed becomes overwhelmingly high. Consequently, performing sensor fusion optimally and in real-time becomes exponentially heavy. One good example is how the number of particles required by a particle filter-based (an approximately optimal technique) object tracker grow exponentially with the increase of the state space dimension to maintain a given accuracy of the tracker. Nevertheless, there are possible ways, e.g, exploiting dependencies between state variables, through which an increase in computational complexity can be restricted. In this PhD work, such techniques will be explored to develop highly scalable sensor fusion algorithms.
Implicit-and-Explicit Interaction
Another major focus of this work is to investigate methods for implicit human-robot interaction. Here, implicit interaction refers to embodied communication between humans and robots. Robots' understanding of human body/hand gestures, visual cues and human emotions based on facial expressions and body posture are among some forms of embodied communication that would eventually make human-robot interaction more fluid and natural. To this end, state-of-the-art vision-based techniques will be investigated for human body/hand gestures and emotion detection. Indeed, taking advantage of an NRS will facilitate the detection process, however, innovative algorithms must be developed for fusing visual information through various sensors available in the environment for this purpose.
On the other hand, explicit interaction between humans and robots involve activities such as voice-based communication, touch screen-based communication, etc. Humans naturally use both implicit and explicit form of communication in a general interaction. To this effect, fusion of visual information with that obtained through speech (microphones) and touch (touch screen) will be made. A hierarchical information fusion architecture will form the backbone of such human-robot interaction method.
Case Studies
Real robot implementation of the algorithms developed during this PhD thesis will be made in the following contexts: i) A domestic service robot (with an omni-directional mobile base) assisting an elderly person at home where the home environment will consist of static sensors as well as multiple MAVs with on-board sensors. ii) A service robot (same platform as in the first case study) assisting shoppers in a supermarket where the environment consists of several other robots of the same kind, multiple MAVs and static sensors.