| Research Interests |
|
The design of feature spaces for local image descriptors is an important
research subject in computer vision due to its applicability
in several problems, such as visual classification and image matching.
In order to be useful, these descriptors have to present a good trade off between
discriminating power and robustness to typical image deformations. The feature spaces
of the most useful local descriptors have been manually designed based
on the goal above, but this design often limits
the use of these descriptors for some specific matching and visual classification problems.
Alternatively, there has been a growing interest in producing feature
spaces by an automatic combination of manually designed
feature spaces, or by an automatic selection of feature spaces and
spatial pooling methods, or by the use of distance metric learning methods.
While most of these approaches are usually applied to specific
matching or classification problems, where test classes are the
same as training classes, a few works aim at the general feature transform
problem where the training classes are different from
the test classes. The hope in the latter works is the automatic
design of a universal feature space for local descriptor matching,
which is the topic of our work.
In this paper, we propose a new incremental method for
learning automatically feature spaces for local descriptors. The method is based on an ensemble of
non-linear feature extractors trained in relatively
small and random classification
problems with supervised distance metric learning techniques. Results on
two widely used public databases show that our technique produces
competitive results in the field.
(see The Automatic Design of Feature Spaces
for Local Image Descriptors using an Ensemble of Non-linear Feature Extractors,
and A Comparison Study on the Use of an
Ensemble of Feature Extractors for the Automatic Design of Local Image Descriptors).
My CVPR presentation is here (Copyright (C) Gustavo Carneiro. Please acknowledge this paper if you want to use it). Here is the video of CVPR'10 oral presentation . Please watch this video with the ppt presentation above since the presentation available from videolectures is broken at some parts. Matlab code for toy example - please see header of public_UFT_toy.m (to change parameters) and run it!. |
|
The problem of automatic tracking and segmentation of the left ventricle (LV) of the heart from ultrasound images can be formulated with an algorithm that computes the expected segmentation value in the current time step given all previous and current observations using a filtering distribution. This filtering distribution depends on the observation and transition models, and since it is hard to compute the expected value using the whole parameter space of segmentations, one has to resort to Monte Carlo sampling techniques to compute the expected segmentation parameters. Generally, it is straightforward to compute probability values using the filtering distribution, but it is hard to sample from it, which indicates the need to use a proposal distribution to provide an easier sampling method. In order to be useful, this proposal distribution must be carefully designed to represent a reasonable approximation for the filtering distribution. In this paper, we introduce a new LV tracking and segmentation algorithm based on the method described above, where our contributions are focused on a new transition and observation models, and a new proposal distribution. Our tracking and segmentation algorithm achieves better overall results on a previously tested dataset used as a benchmark by the current state-of-the-art tracking algorithms of the left ventricle of the heart from ultrasound images. (see Multiple Dynamic Models for Tracking the Left Ventricle of the Heart from Ultrasound Data using Particle Filters and Deep Learning Architectures and Robust Left Ventricle Segmentation from Ultrasound Data using Deep Neural Networks and Efficient Search Methods. ). |
|
The use of 3-D ultrasound data has several advantages over 2-D ultrasound for fetal biometric measurements, such as considerable decrease in the examination time, possibility of post-exam data processing by experts, and the ability to produce 2-D views of fetal anatomies in orientations that cannot be obtained in common 2-D ultrasound exams. However, the search for standardized planes and the precise localization of fetal anatomies in ultrasound volumes are hard and time consuming processes even for expert physicians and sonographers. The relative low resolution in ultrasound volumes, small size of fetus anatomies and inter-volume position, orientation, and size variability makes this localization problem even more challenging. In order to make the plane searching and fetal anatomy localization problems completely automatic, we introduce a novel principled probabilistic model that combines discriminative and generative classifiers with contextual information and sequential sampling. We implement a system based on this model, where the user queries consist of semantic keywords that represent anatomical structures of interest. After queried, the system automatically displays standardized planes and produces biometric measurements of the fetal anatomies. Experimental results on a held-out test set show that the automatic measurements are within the inter-user variability of expert users. It resolves for position, orientation, and size of three different anatomies in less than 10 seconds on a dual-core computer running at 1.7GHz. (see Semantic-based Indexing of Fetal Anatomies From 3-D Ultrasound Data Using Global/Semi-local Context and Sequential Sampling). |
|
We present a novel method for the automatic detection and segmentation of (sub-)cortical gray matter structures in 3-D magnetic resonance images of the human brain. Essentially, the method is a top-down segmentation approach based on the recently introduced concept og marginal space learning (MSL). We shoe that MSL naturally decomposes the parameter space of anatomy shapes along decreasing levels of geometrical abstraction into subspaces of increasing dimensionality by exploiting parameter invariance. At each level of abstraction, i.e., in each subspace, we build strong discriminative models from annotated training data, and use these models to narrow the range of possible solutions until a final shape can be inferred. Contextual information is introduced into the system by representing candidate shape parameters with high-dimensional vectors of 3-D generalized Haar features and steerable features derived from the observed volume intensities. Our system allows us to detect and segment 8 (sub-)cortical gray matter structures in T1-weighted 3-D MR brain scans from a variety of different scanners in on average 13.9 sec., which is faster than most of the approaches in the literature. In order to ensure comparability of achieved results and to validate robustness, we evaluate our method on two publicly available gold standard databases consisting of several T1-weighted 3-D brain MR scans from different scanners and sites. The proposed method achieves an accuracy better than most state-of-the-art approaches using standardized distance and overlap metrics. (see Fast and Robust 3-D MRI Brain Structure Segmentation). |
|
In this paper we present a fully automated approach to the segmentation of pediatric brain tumors in multi-spectral 3-D magnetic resonance images. It is a top-down segmentation approach based on a Markov random field (MRF) model that combines probabilistic boosting trees (PBT) and lower-level segmentation via graph-cuts. The PBT algorithm provides a strong discriminative observation model that classifies tumor appearance while a spatial prior takes into account the pair-wise homogeneity in terms of classification labels and multi-spectral voxel intensities. The discriminative model relies not only on observed local intensities but also on surrounding context for detecting candidate regions for pathology. A mathematically sound formulation for integrating the two approaches into a unified statistical framework is given. Thr proposed method is applied to the challenging task of detection and delineation of pediatric brain tumors. This segmentation task is characterized by a high non-uniformity of both the pathology and the surrounding non-pathologic brain tissue. A quantitative evaluation illustrates the robustness of the proposed method. Despite dealing with more complicated cases of pediatric brain tumors the results obtained are mostly better than those reported for current state-of-the-art approaches to 3-D MR brain tumor segmentation in adult patients. The entire processing of one multi-spectral data set does not require any user interaction, and takes less time than previously proposed methods. (see A Discriminant Model-Constrained Graph Cuts Approach to Fully Automated Pediatric Brain Tumor Segmentation in 3-D MRI). |
|
Automatic delineation and robust measurement of fetal anatomical structures in 2D ultrasound images is a challenging task due to the complexity of the object appearance, noise, shadows, and quantity of information to be processed. Previous solutions rely on explicit encoding of prior knowledge and formulate the problem as a perceptual grouping task solved through clustering or variational approaches. These methods are known to be limited by the validity of the underlying assumptions and cannot capture complex structure appearances. We propose a novel system for fast automatic obstetric measurements by directly exploiting a large database of expert annotated fetal anatomical structures in ultra- sound images. Our method learns to distinguish between the appearance of the object of interest and background by training a discriminative constrained probabilistic boosting tree classifier. This system is able to handle previously unsolved problems in this domain, such as the effec- tive segmentation of fetal abdomens. We show results on fully automatic measurement of head circumference, biparietal diameter, abdominal cir- cumference and femur length. Unparalleled extensive experiments show that our system is, on average, close to the accuracy of experts in terms of segmentation and obstetric measurements. Finally, this system runs under half second on a standard dual-core PC computer. (see Automatic Fetal Measurements in Ultrasound Using Constrained Probabilistic Boosting Tree). |
![]() |
In recent years there has been growing interest in recognition models using local image features for applications ranging from long range motion matching to object class recognition systems. Currently, many state-of-the-art approaches have models involving very restrictive priors in terms of the number of local features and their spatial relations. The adoption of such priors in those models are necessary for simplifying both the learning and inference tasks. Also, most of the state-of-the-art learning approaches are semi-supervised batch processes, which considerably reduce their suitability in dynamic environments, where unannotated new images are continuously presented to the learning system. In this work we propose: 1) a new model representation that has a less restrictive prior on the geometry and number of local features, where the geometry of each local feature is in uenced by its k closest neighbors and models may contain hundreds of features; and 2) a novel unsupervised on-line learning algorithm that is capable of estimating the model parameters e (see Sparse Flexible Models of Local Features). |
![]() |
We introduce a new method to automatically annotate and retrieve images using a vocabulary of image semantics. The novel contributions include a discriminant formulation of the problem, a multiple instance learning solution that enables the estimation of concept probability distributions without prior image segmentation, and a hierarchical description of the density of each image class that enables very efficient training. %This method has been shown to be well suited to problems %involving large databases where groups of images can be combined %into higher-level groups. Compared to current methods of image annotation and retrieval, the one now proposed has significantly smaller time complexity and better recognition performance. Specifically, its recognition complexity is O(CxR), where C is the number of classes (or image annotations) and R is the number of image regions, while the best results in the literature have complexity O(TxR), where T is the number of training images. Since the number of classes grows substantially slower than that of training images, the proposed method scales better during training, and processes test images faster. This is illustrated through comparisons in terms of complexity, time, and recognition performance with current state-of-the-art methods. (see Formulating Semantic Image Annotation as a Supervised Learning Problem and A Database Centric View of Semantic Image Annotation and Retrieval). |
![]() |
We introduce a new method that characterizes typical local image features (e.g., SIFT [LOWE99], phase feature [CARNEIRO02]) in terms of their distinctiveness, detectability, and robustness to image deformations. This is useful for the task of classifying local image features in terms of those three properties. The importance of this classification process for a recognition system using local features is as follows: a) reduce the recognition time due to a smaller number of features present in the test image and in the database of model features; b) improve the recognition accuracy since only the most useful features for the recognition task are kept in the model database; and c) increase the scalability of the recognition system given the smaller number of features per model. A discriminant classifier is trained to select well behaved feature points. A regression network is then trained to provide quantitative models of the detection distributions for each selected feature point. It is important to note that both the classifier and the regression network use image data alone as their input. Experimental results show that the use of these trained networks not only improves the performance of our recognition system, but it also significantly reduces the computation time for the recognition process. (see The Distinctiveness, Detectability, and Robustness of Local Image Features and The Quantitative Characterization of the Distinctiveness and Robustness of Local Image Descriptors ). |
![]() |
A key step for the effective use of local image features (i.e., highly distinctive and robust features) for recognition or image matching is the appropriate grouping of feature matches. Spatial constraints are important in this grouping because, during a recognition process, they allow for the reduction of the number of hypotheses that must be verified and also reduce the number of false positives present in each of these hypotheses. A common choice for this grouping task is to use the Hough transform on the global spatial transformation parameters of the hypothesized matches. Here, instead, we use semi-local spatial constraints which allow for a greater range of shape deformations (see Flexible Spatial Models for Grouping Local Image Features). We also addressed this problem by combining typical local features [Carneiro,CVPR'03,Lowe,ICCV'99] with shape context [Belongie,PAMI'02] (see Pruning Local Feature Correspondences Using Shape Context) A comparison with Hough transform shows that our methods are more robust to both rigid and non-rigid deformations. Their functionality is demonstrated in an exemplar-based object recognition system that deals well with severe non-rigid deformations. We also show the efficacy of our flexible spatial grouping for long range motion problems. |
![]() |
The extraction of optimal features, in a classification sense, is still quite challenging in the context of large-scale classification problems (such as visual recognition), involving a large number of classes and significant amounts of training data per class. We present an optimal, in the minimum Bayes error sense, algorithm for feature design that combines the most appealing properties of the two strategies that are currently dominant: feature extraction (FE) and feature selection (FS). The new algorithm proceeds by interleaving pairs of FS and FE steps, which amount to a sequential search for the most discriminant directions in a collection of two dimensional subspaces. It combines the fast convergence rate of FS with the ability of FE to uncover optimal features that are not part of the original basis functions, leading to solutions that are better than those achievable by either FE or FS alone, in a small number of iterations. Because the basic iteration has very low complexity, the new algorithm is scalable in the number of classes of the recognition problem, a property that is currently only available for feature extraction methods that are either sub-optimal or optimal under restrictive assumptions that do not hold for generic recognition. Experimental results show significant improvements over these methods, either through much greater robustness to local minima or by achieving significantly faster convergence. (see Minimum Bayes Error Features for Visual Recognition by Sequential Feature Selection and Extraction). |
![]() |
Local feature methods suitable for image feature based object recognition and for the estimation of motion and structure are composed of two steps, namely the `where' and `what' steps. The `where' step (e.g., interest point detector) must select image points that are robustly localizable under common image deformations and whose neighborhoods are relatively informative. The `what' step (e.g., local feature extractor) then provides a representation of the image neighborhood that is semi-invariant to image deformations, but distinctive enough to provide model identification. We present a quantitative evaluation of both the `where' and the `what' steps for three recent local feature methods: a) phase-based local features [Carneiro,ECCV'02], b) differential invariants [Schmid,PAMI'97], and c) the scale invariant feature transform (SIFT) [Lowe,ICCV'99]. Moreover, in order to make the phase-based approach more comparable to the other two approaches, we also introduce a new form of multi-scale interest point detector to be used for its `where' step. The results show that the phase-based local features lead to better performance than the other two approaches when dealing with common illumination changes, 2D rotation, and sub-pixel translation. On the other hand, the phase-based local features are somewhat more sensitive to scale and large shear changes than the other two methods. Finally, we demonstrate the viability of the phase-based local feature in a simple object recognition system. (See Multi-scale Phase-based Local Features and Local Phase-based Features). |
![]() |
Independent representations have recently attracted significant attention from the biological vision and cognitive science communities. It has been 1) argued that properties such as sparseness and independence play a major role in visual perception, and 2) shown that imposing such properties on visual representations originates receptive fields similar to those found in human vision. We present a study of the impact of feature independence in the performance of visual recognition architectures. The contributions of this study are of both theoretical and empirical natures, and support two main conclusions. The first is that the intrinsic complexity of the recognition problem (Bayes error) is higher for independent representations. The increase can be significant, close to 10% in the databases we considered. The second is that criteria commonly used in independent component analysis are not sufficient to eliminate all the dependencies that impact recognition. In fact, ``independent components'' can be less independent than previous representations, such as principal components or wavelet bases. (see What is the Role of Independence for Visual Recognition). |
![]() |
We propose a MUlti-level Fusion Architecture (MUFA) for controlling the navigation of a telecommanded Autonomous Guided Vehicle (AGV). The architecture combines ideas derived from the fundamental concepts of sensor fusion and distributed intelligence. The focus of the work is the development of an intelligence navigation system for a tricycle AGV with the ability to move autonomously within ant office environment, following instructions issued by client stations connected to the office network and react accordingly to different situations found in the real world. The modules which integrate the MUFA architecture are discussed and results of some simulation experiments are presented (see Internet Request Server Architecture for Telecommanding the CONTROLAB AGV through Real Time Data and Image and CONTROLAB MUFA: A Multilevel Fusion Architecture for Intelligent Navigation of a Telerobot). |