The problem of visual object recognition in digital images problem is generally related to the following two tasks: object identification and segmentation. The identification estimates the likelihood of the presence of a visual object using features extracted from the whole image (top-down classification), while the segmentation determines the specific location and pose of the object using more localized image features (bottom-up classification). The usual approach in the literature for this task is a sequential procedure, where the identification is followed by the segmentation. Based on recent research results of the activity in the early visual cortex, it is clear that these two tasks are tightly coupled in a recurrent feedback/feedforward loop that integrates the top-down and bottom-up classifications. For instance, let us say that T is a random variable representing the top-down classification, and B a random variable denoting the bottom-up classification, X is the observation (i.e., the image), and H is a prior on the image context, then the goal is to determine P( T, B|X, H). In the literature, this joint probability is usually factorized to P( T, B| X, H) = P(T |X, H)P(B |T, X, H), which assumes a direct hierarchical link from T to B. Our goal with this work is the design of a hierarchical model that has an indirect link between T and B, which means that P( T | B, X, H) = (1/Z_T) f(T, B) f(T, H) and P( B | T, X, H ) = (1/Z_B) f(T, B)f(B, X), where Z_T and Z_B are constants to normalize the functions to probability distributions, and f( ., .) are potential functions denoting the how likely are the joint values of two random variables. Note that the inference is represented by an iterative process that alternates between the estimation of high level (object identification) and low level (object segmentation) beliefs.
Objectives:
The objective of this work is the introduction of a new inference model for the problem of visual object identification and segmentation. The model is essentially a Markov random field where layers of visual interpretation are undirected linked only to its immediate lower and higher level layers, and therefore independent of all other layers given these two neighbouring layers.
Detailed description:
Current state-of-the-art visual classification methods are usually represented by hierarchical models, where leach layer of interpretation is directed linked to other layers until it reaches the observation layer, represented by the image [Carneiro07,Torralba03]. This model suggests a high level interpretation of the image, followed sequentially by lower level interpretations given the results obtained at higher layers. This can be seen as a pure feedforward model, where lower level beliefs have no influence whatsoever onto higher level interpretations. The introduction of a recurrent feedback/feedforward model has been recently considered by a few groups in computer vision [Tu02,Sing] and machine learning [Hinton06], but there is still open research problems in the potential fields learning procedures and model inference.
The investigation of the problem will comprise the following steps:
1. Literature review in the area of recurrent feedback/feedforward classification models.
2. Study of sequential Monte-Carlo inference methods [Doucet00].
3. Implementation of the Image segmentation by data driven markov chain monte carlo by Tu and Zhu [Tu02].
4. Design and implementation of a multi-layer hierarchical recurrent feedback/feedforward classification model.
5. Experiments and comparisons with the model in [Tu02].
References:
[Carneiro07] G. Carneiro, A. Chan, P. Moreno, and N. Vasconcelos. Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Analysis Machine Intelligence, 29(3):394-410, 2007.
[Torralba03] A. Torralba. Contextual priming for object detection. International Journal of Computer Vision, 53(2):169-191, 2003.
[Tu02] Z. Tu and S. Zhu. Image segmentation by data driven markov chain monte carlo. IEEE Transactions on Pattern analysis and Machine Intelligence 24, pp 654-673. 2002.
[Sing] T. Sing and L. Mumford. Hierarchical Bayesian inference in the visual cortex. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.2565
[Hinton06] G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. In Science 313(5786), 504-507, 2006.
[Doucet00] Arnaud Doucet, Simon Godsill, and Christophe Andrieu. On sequential monte carlo sampling methods for bayesian filtering. Statistics and Computing, 10:197-208, 2000.
Requirements (grades, required courses, etc):
Expected results:
At the end of the work, the students will have enriched their experience in computer vision, and machine learning. In particular the following goals are expected:
- Implementation of the data driven markov chain monte carlo by Tu and Zhu [Tu02].;
- Design and implementation of a new multi-layer hierarchical recurrent feedback/feedforward classification model.
- Thorough experiments and comparisons with current state-of-the-art visual classification models; Place for conducting the work-proposal: