Image and Video Processing
This a graduate
level course on computer image analysis, included in the CMU-Portugal dual PhD
program.
Instructor
Pedro M. Q. Aguiar. Office: ISR / IST,
North tower, 7.24. Contact: aguiar at isr dot ist dot utl dot pt
Description
The course introduces students to a wide class of problems in the field
of image and video processing, with emphasis on representation, content
analysis and understanding. The course is structured in three major sections: (i) image-domain processing, (ii) image and video
segmentation and analysis, and (iii) video processing and multiple views. The
first section covers 2D signal processing, including image representation (Fourier
and Wavelet transforms), linear and nonlinear filtering techniques, image
enhancement, and image restoration. The section on image and video segmentation
and analysis focuses on obtaining higher level representations through boundary
estimation (contour and region-based methods), motion detection and
segmentation, and probabilistic methods for recognition and clustering. The
last section, video processing and multiple views, concentrates on spatially
distributed video analysis, including projective geometry and 3D
reconstruction. Upon completion students should be able to sketch a global
picture of the field and to use general signal processing methodologies to
tackle problems in scenarios with multiple (moving) video cameras. Course
topics:
o Digital image fundamentals. The origins of
digital image processing. Light and the electromagnetic spectrum. Imaging
examples. Elements of visual perception. Image sensing, acquisition, sampling,
and quantization. Spatial relationships and basic image operations.
o Intensity transformations and spatial filtering. Intensity
transformations: negative, log, power-law (gamma), stretching, slicing.
Histogram processing: equalization, matching, local enhancement. Spatial
correlation and convolution. Smoothing filters: linear and order-statistic.
Sharpening filters: gradient and Laplacian.
o Fourier representation. Review of the
sampling theorem and Fourier transform of sampled functions. Discrete Fourier
transform (DFT). 2D Fourier transform. 2D sampling and image aliasing. 2D DFT.
Frequency domain filters for smoothing and sharpening. Selective filters (bandreject, bandpass, notch).
o Wavelets and multiresolution. Image pyramids.
Iterated filter banks for subband coding and the
wavelet transform. Multiresolution expansions:
scaling and wavelet functions. Wavelet series. Discrete wavelet transform
(DWT). Haar basis examples. Time-frequency tilings. 2D DWT. Wavelet packets.
o Image restoration. Model for image
degradation / restoration. Noise models. Linear position-invariant degradations.
Estimation of the degradation function. Inverse filtering. Wiener filtering.
Constrained LS filtering. Geometric mean filter.
o Image and video segmentation. Point, line, and
edge detection: isolated points, lines, edges, edge linking and boundaries, Hough
transform. Thresholding: optimal, multiple
thresholds, variable threshold. Region-based segmentation: growing, split and
merge. Motion segmentation: detection, estimation using multiresolution.
o Visual representation and description.
Internal/external representation/description of regions,
invariance/discrimination. Boundary following. Boundary (shape) representation:
chain codes, polygonal approximation, signatures, boundary segments, skeletons.
Fourier descriptors. Region descriptors: moment invariants, texture
description.
o Linear models for recognition. Approaches to
classification problems. Linear discriminants: LS,
Fisher’s, perceptron. Probabilistic generative
models and the maximum likelihood (ML) solution. Basis functions. Probabilistic
discriminative models: logistic regression and iterative reweighted LS.
o Sparse kernel machines. Kernel methods:
kernels as inner products in a feature space, kernel trick. Dual
representations. Constructing kernels. Maximum margin classifiers. Support
vector machines. Overlapping class distributions and soft margin. Relation to
logistic regression.
o Clustering and mixture models. K-means
clustering. Clustering in image segmentation. Mixtures of Gaussians. ML
solution. Expectation-Maximization (EM) algorithm.
o Projective geometry. Projective
plane: homogeneous representation of points and lines, intersection of lines,
ideal points and line at infinity, conics. Projectivities:
transformations of lines and conics. A hierarchy of transformations: isometries, similarities, affine transformations, and their
invariants. The projective geometry of 1D and the cross ratio. Decomposition of
a projective transformation and the recovery of affine and metric properties
from images. Projective space: homogeneous representation of points and planes.
Representation of lines: null-space and span representation, Plucker coordinates. Plane at infinity. Projective space
transformations.
o Camera model and single-view geometry. The pinhole
camera in homogeneous coordinates. The camera matrix. Internal and external
camera parameters. Finite projective cameras and cameras at infinity. Affine
cameras. Computation of the camera matrix. Linear algorithms vs. minimization
of the geometric error. Radial distortion. Images of planes and lines. Images with
the same camera center and homographies.
Applications: synthetic views and panoramic mosaicing.
Vanishing points and camera orientation.
o Epipolar geometry and stereo-based 3D reconstruction. Two views and epipolar geometry. The fundamental matrix. Fundamental
matrices arising from special motions. Retrieving the camera matrices:
projective ambiguity and canonical cameras. 3D reconstruction of cameras and
structure. Reconstruction ambiguity. Projective reconstruction. Stratified
reconstruction: affine, metric. Direct reconstruction.
o N-view 3D reconstruction methods. Projective
reconstruction. Bundle adjustment. Affine reconstruction. Matrix factorization
method. Non-rigid factorization. Projective factorization. Projective
reconstruction using planes. Reconstruction from sequences.
Lectures
Thursdays,
9:00-10:30, room V1.17, and Thursdays, 10:30-12:00, room C22, IST.
Pre-requisites
Background in
linear algebra, probability, and signals and systems.
Grading
Grading is 30% on
the HWs and 70% on the final exam.
Homework
HWs (and their due dates) are indicated in the schedule below. Students
should send their HWs by e-mail to the instructor, in the form of a pdf file, identified by their name and HW number, e.g., PedroAguiarHW1.pdf. Students should
show all their work on the HW pages and make sure they justify all the answers
(results that are not explained or justified may count less,
even if they are correct).
Readings
o [GW]
“Digital Image Processing”, R. Gonzalez and R. Woods, Prentice
Hall, 3rd Ed., 2008.
o [HZ]
“Multiple View Geometry”, R. Hartley and A. Zisserman,
Prentice Hall, 2nd Ed., 2004.
o [B]
“Pattern Recognition and Machine Learning”, C. Bishop, Springer,
2006.
o [A] “Multiresolution image alignment – lecture
notes”, P. Aguiar, IST, 2008. [link]
Additional
readings (research papers or sections from other books) will be posted here or
handed out during the semester.
Last modified:
Nov. 26, 2009.
Schedule
(tentative)
Lecture, date |
Topic |
Readings |
HWs (due dates) |
#1, Sep. 22 |
Course presentation |
||
#2, Sep. 24 |
Digital image fundamentals |
[GW], ch. 1, 2 |
|
#3, Oct. 1 |
Intensity transformations and spatial
filtering |
[GW], ch. 3 |
|
#4, Oct. 1 |
Fourier representation |
[GW], ch. 4 |
HW #2: [GW] 3.11
(remember: monotonically increasing transformation!), 3.14,
3.27, 4.19, 4.29 |
#5, Oct. 8 |
Wavelets and multiresolution |
[GW], ch. 7 |
|
#6, Oct. 8 |
Wavelets and multiresolution |
[GW], ch. 7 |
HW #3: [GW] 7.16 |
#7, Oct. 22 |
Image restoration |
[GW], ch. 5 |
|
#8, Oct. 22 |
Image restoration |
[GW], ch. 5 |
|
#9, Oct. 29 |
Image and video segmentation |
[GW], ch. 10 |
|
#10, Oct. 29 |
Image and video segmentation |
[A] |
HW #5: [GW] 10.30,
motion estimation |
#11, Nov. 5 |
Visual representation and description |
[GW], ch. 11 |
|
# |
|
HW #6: [GW] 11.12 |
|
#12, Nov. 19 |
Linear models for recognition |
[B], ch. 4 |
|
#13, Nov. 19 |
Sparse kernel machines |
[B], ch. 6 |
HW #7: [B] 4.2, 4.3, 4.14 |
#14, Nov. 26 |
Sparse kernel machines |
[B], ch. 7 |
|
#15, Nov. 26 |
Mixture models and EM |
[B], ch. 9 |
HW #8: choose three from [B] 6.11, 7.4,
7.5, 9.1, 9.6 |
#16, Dec. 3 |
Projective geometry |
[HZ], ch. 2 |
|
#17, Dec. 3 |
Projective geometry |
[HZ], ch. 3 |
HW #9: [HZ] 2.10.2 (v), 3.8.2 (i) |
#18, Dec. 10 |
Camera model and single-view geometry |
[HZ], ch. 6 |
|
#19, Dec. 10 |
Camera model and single-view geometry |
[HZ], ch. 7 |
HW #10: [HZ] 6.5.2 (iii), 7.5.2 (iii) d) e) |
#20, Dec. 17 |
Camera model and single-view geometry |
[HZ], ch. 8 |
|
#21, Dec. 17 |
Epipolar geometry and stereo-based
3D reconstruction |
[HZ], ch. 9 |
HW #11: stereo, [HZ] 9.7.2 (i) |
#22, Jan. 7 |
Epipolar geometry and stereo-based
3D reconstruction |
[HZ], ch. 10 |
|
#23, Jan. 7 |
N-view 3D reconstruction methods |
[HZ], ch. 18 |
|
Jan. 11 |
Exam |