CMU-PT Image and Video Processing

Image and Video Processing

This a graduate level course on computer image analysis, included in the CMU-Portugal dual PhD program.

Instructor

Pedro M. Q. Aguiar. Office: ISR / IST, North tower, 7.24. Contact: aguiar at isr dot ist dot utl dot pt

Description

The course introduces students to a wide class of problems in the field of image and video processing, with emphasis on representation, content analysis and understanding. The course is structured in three major sections: (i) image-domain processing, (ii) image and video segmentation and analysis, and (iii) video processing and multiple views. The first section covers 2D signal processing, including image representation (Fourier and Wavelet transforms), linear and nonlinear filtering techniques, image enhancement, and image restoration. The section on image and video segmentation and analysis focuses on obtaining higher level representations through boundary estimation (contour and region-based methods), motion detection and segmentation, and probabilistic methods for recognition and clustering. The last section, video processing and multiple views, concentrates on spatially distributed video analysis, including projective geometry and 3D reconstruction. Upon completion students should be able to sketch a global picture of the field and to use general signal processing methodologies to tackle problems in scenarios with multiple (moving) video cameras. Course topics:

o Digital image fundamentals. The origins of digital image processing. Light and the electromagnetic spectrum. Imaging examples. Elements of visual perception. Image sensing, acquisition, sampling, and quantization. Spatial relationships and basic image operations.

o Intensity transformations and spatial filtering. Intensity transformations: negative, log, power-law (gamma), stretching, slicing. Histogram processing: equalization, matching, local enhancement. Spatial correlation and convolution. Smoothing filters: linear and order-statistic. Sharpening filters: gradient and Laplacian.

o Fourier representation. Review of the sampling theorem and Fourier transform of sampled functions. Discrete Fourier transform (DFT). 2D Fourier transform. 2D sampling and image aliasing. 2D DFT. Frequency domain filters for smoothing and sharpening. Selective filters (bandreject, bandpass, notch).

o Wavelets and multiresolution. Image pyramids. Iterated filter banks for subband coding and the wavelet transform. Multiresolution expansions: scaling and wavelet functions. Wavelet series. Discrete wavelet transform (DWT). Haar basis examples. Time-frequency tilings. 2D DWT. Wavelet packets.

o Image restoration. Model for image degradation / restoration. Noise models. Linear position-invariant degradations. Estimation of the degradation function. Inverse filtering. Wiener filtering. Constrained LS filtering. Geometric mean filter.

o Image and video segmentation. Point, line, and edge detection: isolated points, lines, edges, edge linking and boundaries, Hough transform. Thresholding: optimal, multiple thresholds, variable threshold. Region-based segmentation: growing, split and merge. Motion segmentation: detection, estimation using multiresolution.

o Visual representation and description. Internal/external representation/description of regions, invariance/discrimination. Boundary following. Boundary (shape) representation: chain codes, polygonal approximation, signatures, boundary segments, skeletons. Fourier descriptors. Region descriptors: moment invariants, texture description.

o Linear models for recognition. Approaches to classification problems. Linear discriminants: LS, Fisher’s, perceptron. Probabilistic generative models and the maximum likelihood (ML) solution. Basis functions. Probabilistic discriminative models: logistic regression and iterative reweighted LS.

o Sparse kernel machines. Kernel methods: kernels as inner products in a feature space, kernel trick. Dual representations. Constructing kernels. Maximum margin classifiers. Support vector machines. Overlapping class distributions and soft margin. Relation to logistic regression.

o Clustering and mixture models. K-means clustering. Clustering in image segmentation. Mixtures of Gaussians. ML solution. Expectation-Maximization (EM) algorithm.

o Projective geometry. Projective plane: homogeneous representation of points and lines, intersection of lines, ideal points and line at infinity, conics. Projectivities: transformations of lines and conics. A hierarchy of transformations: isometries, similarities, affine transformations, and their invariants. The projective geometry of 1D and the cross ratio. Decomposition of a projective transformation and the recovery of affine and metric properties from images. Projective space: homogeneous representation of points and planes. Representation of lines: null-space and span representation, Plucker coordinates. Plane at infinity. Projective space transformations.

o Camera model and single-view geometry. The pinhole camera in homogeneous coordinates. The camera matrix. Internal and external camera parameters. Finite projective cameras and cameras at infinity. Affine cameras. Computation of the camera matrix. Linear algorithms vs. minimization of the geometric error. Radial distortion. Images of planes and lines. Images with the same camera center and homographies. Applications: synthetic views and panoramic mosaicing. Vanishing points and camera orientation.

o Epipolar geometry and stereo-based 3D reconstruction. Two views and epipolar geometry. The fundamental matrix. Fundamental matrices arising from special motions. Retrieving the camera matrices: projective ambiguity and canonical cameras. 3D reconstruction of cameras and structure. Reconstruction ambiguity. Projective reconstruction. Stratified reconstruction: affine, metric. Direct reconstruction.

o N-view 3D reconstruction methods. Projective reconstruction. Bundle adjustment. Affine reconstruction. Matrix factorization method. Non-rigid factorization. Projective factorization. Projective reconstruction using planes. Reconstruction from sequences.

Lectures

Thursdays, 9:00-10:30, room V1.17, and Thursdays, 10:30-12:00, room C22, IST.

Pre-requisites

Background in linear algebra, probability, and signals and systems.

Grading

Grading is 30% on the HWs and 70% on the final exam.

Homework

HWs (and their due dates) are indicated in the schedule below. Students should send their HWs by e-mail to the instructor, in the form of a pdf file, identified by their name and HW number, e.g., PedroAguiarHW1.pdf. Students should show all their work on the HW pages and make sure they justify all the answers (results that are not explained or justified may count less, even if they are correct).

Readings

o [GW] “Digital Image Processing”, R. Gonzalez and R. Woods, Prentice Hall, 3rd Ed., 2008.

o [HZ] “Multiple View Geometry”, R. Hartley and A. Zisserman, Prentice Hall, 2nd Ed., 2004.

o [B] “Pattern Recognition and Machine Learning”, C. Bishop, Springer, 2006.

o [A] “Multiresolution image alignment – lecture notes”, P. Aguiar, IST, 2008. [link]

Additional readings (research papers or sections from other books) will be posted here or handed out during the semester.

Last modified: Nov. 26, 2009.

Schedule (tentative)

Lecture, date	Topic	Readings	HWs (due dates)
#1, Sep. 22	Course presentation		HW #1: [GW] 2.5 , 2.15
#2, Sep. 24	Digital image fundamentals	[GW], ch. 1, 2
#3, Oct. 1	Intensity transformations and spatial filtering	[GW], ch. 3
#4, Oct. 1	Fourier representation	[GW], ch. 4	HW #2: [GW] 3.11 (remember: monotonically increasing transformation!), 3.14, 3.27, 4.19, 4.29
#5, Oct. 8	Wavelets and multiresolution	[GW], ch. 7
#6, Oct. 8	Wavelets and multiresolution	[GW], ch. 7	HW #3: [GW] 7.16
#7, Oct. 22	Image restoration	[GW], ch. 5	HW #3: [GW] 7.16
#8, Oct. 22	Image restoration	[GW], ch. 5	HW #4: [GW] 5.21, 5.22, 5.23
#9, Oct. 29	Image and video segmentation	[GW], ch. 10	HW #4: [GW] 5.21, 5.22, 5.23
#10, Oct. 29	Image and video segmentation	[A]	HW #5: [GW] 10.30, motion estimation
#11, Nov. 5	Visual representation and description	[GW], ch. 11	HW #5: [GW] 10.30, motion estimation
#			HW #6: [GW] 11.12
#12, Nov. 19	Linear models for recognition	[B], ch. 4	HW #6: [GW] 11.12
#13, Nov. 19	Sparse kernel machines	[B], ch. 6	HW #7: [B] 4.2, 4.3, 4.14
#14, Nov. 26	Sparse kernel machines	[B], ch. 7	HW #7: [B] 4.2, 4.3, 4.14
#15, Nov. 26	Mixture models and EM	[B], ch. 9	HW #8: choose three from [B] 6.11, 7.4, 7.5, 9.1, 9.6
#16, Dec. 3	Projective geometry	[HZ], ch. 2	HW #8: choose three from [B] 6.11, 7.4, 7.5, 9.1, 9.6
#17, Dec. 3	Projective geometry	[HZ], ch. 3	HW #9: [HZ] 2.10.2 (v), 3.8.2 (i)
#18, Dec. 10	Camera model and single-view geometry	[HZ], ch. 6	HW #9: [HZ] 2.10.2 (v), 3.8.2 (i)
#19, Dec. 10	Camera model and single-view geometry	[HZ], ch. 7	HW #10: [HZ] 6.5.2 (iii), 7.5.2 (iii) d) e)
#20, Dec. 17	Camera model and single-view geometry	[HZ], ch. 8	HW #10: [HZ] 6.5.2 (iii), 7.5.2 (iii) d) e)
#21, Dec. 17	Epipolar geometry and stereo-based 3D reconstruction	[HZ], ch. 9	HW #11: stereo, [HZ] 9.7.2 (i)
#22, Jan. 7	Epipolar geometry and stereo-based 3D reconstruction	[HZ], ch. 10	HW #11: stereo, [HZ] 9.7.2 (i)
#23, Jan. 7	N-view 3D reconstruction methods	[HZ], ch. 18
Jan. 11	Exam