Current methods for visual object classification and image retrieval rely strongly on local image features [Lowe04]. In general, to be useful for visual classification tasks, local image features have to be at the same time robust to image deformations (such as geometrical transformations and brightness changes) and discriminating. The most successful local image feature extractors have been designed and implemented based purely on intuitive results that have not been formally proven to work in typical visual recognition conditions, but note that these extractors show quite good recognition results in practice. There have been two parallel trends to address the question of designing an image feature extractor that uses explicit robustness to image deformations and discriminating constraints. The first trend translates the task into a convex optimization problem, where either a linear or non-linear (using the kernel trick) transformation is obtained. The second trend is based on non-convex optimization methods and produce usually non-linear transformations. Compared to the ad-hoc designed local features, the main advantage of these approaches is the possibility of achieving certain explicitly given optimality criteria. In this context, we plan to investigate the following issue: the design of a multi-objective optimization problem that takes into account several aspects of robustness and discriminating constraints, where the output is an image feature extractor that works directly in the image space.
Objectives:
The main goal of this work is the design and implementation of a multi-objective optimization problem for the problem of extracting local image features to be applied in typical visual classification tasks. The objectives of the optimization involve terms on the discriminating power of the feature and on its robustness to geometrical image deformations (e.g., translation, scaling, and rotation) and brightness deformations.
Detailed description:
Designing local image descriptors that are optimal for visual classification tasks involve the optimization of an objective function that takes into account the opposing goals of robustness to image deformations and discriminating power. Robustness to image deformation roughly means that the feature values extracted from an image do not change with deformations applied to the image. In this context, the image deformations that are interesting are the following: translation, rotation, scaling, and brightness. On the other hand, the discriminating power of a local feature extractor has to do with the ability that the feature values extracted from a certain image are unique. Intuitively, these two goals (robustness and discriminating power) are conflicting. That is, transformations that produce feature values extremely robust to image deformations tend to have a quite low discriminating power, and vice versa.
The data given to run the optimization consists of a large number of corresponding image patches seen from different viewpoints. These corresponding image patches form visual classes, and the optimization code produce a feature transform such that the features extracted from images of a single visual class will form the same feature vector, while the features from different visual classes generate different feature values.
Most of the convex optimization problems present in the literature involve the search for linear transformations[Hoi06,Weinb06, Xing03]. Non-linear transforms are also interesting in this context, and it can be found in the literature of convex optimization problems using the kernel trick [Sugi06]. Moreover, non-linear non-convex optimization has also been given a great deal of attention by the machine learning community [Hinton06 ]. The first goal of this project is a thorough study of the trade-off between robustness and discriminating power of current feature transforms with similar goals in the literature. The feature transform proposed in [Varma07] will be given a greater attention given its similar goals. The second goal of this project is the introduction of a new feature transform that will be obtained from a multi-objective optimization problem involving several robustness discriminating constraints.
References:
[Lowe04] D. Lowe. Distinctive image features from scale-invariant keypoints. InternationalJournal of Computer Vision, 60(2):91-110, 2004.
[Weinb06] K. Weinberger, J. Blitzer, and L. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, 2006.
[Xing03] E. Xing, A. Ng, M. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. In NIPS, 2003.
[Hoi06] S. C. H. Hoi, W. Liu, M. R. Lyu, and W.-Y. Ma. Learning distance metrics with contextual constraints for image retrieval. In Proc. Computer Vision and Pattern Recognition, 2006.
[Sugi06] Masashi Sugiyama. Local fisher discriminant analysis for supervised dimensionality reduction. In Proc. Int. Conf. on Machine Learning, 2006.
[Hinton06] G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. In Science 313(5786), 504-507, 2006.
[Varma07] M. Varma and D. Ray. Learning the discriminative power-inariance trade-off. In International Conference on Computer Vision 2007.
Requirements (grades, required courses, etc):
Expected results:
At the end of the work, the students will have enriched their experience in computer vision, machine learning, and optimization. In particular the following goals are expected:
- Thorough experiments with current state-of-the-art optimization methods to find image feature extractors;
- Design and implementation of a new multi-objective optimization algorithm.