Non-linear Stereo-Motion

Non-linear optimization for stereo factorization

The goal is to estimate the left and right camera matrices, the configuration weights and the shape parameters such that the distance between the measured image points and the estimated image points is minimized. We minimize this reprojection error by introducing a meaningful geometric cost function based on the non-rigid model parameters:

$\arg \min_{R_i^L,R_i^R,l_{ik},S_k} \sum_{i,j} \left\| x_{ij}^L - \hat{x}_{ij}^L \right\|^2 + \left\| x_{ij}^R - \hat{x}_{ij}^R \right\|^2 =$
$\arg \min_{R_i^L,R_i^R,l_{ik},S_k} \sum_{i,j} \left\| x_{ij}^L - R_i^L \left( \sum_k l_{ik} S_{kj} \right) \right\|^2 + \left\| x_{ij}^R - R_i^R \left( \sum_k l_{ik} S_{kj} \right) \right\|^2$

The non-linear optimization of the cost function was achieved using a Levenberg Marquadt minimization scheme modified to take advantage of the sparse block structure of the matrices involved in the process. The initialization is given by the linear stereo factorization method presented in:

• A. Del Bue and L. Agapito, "Non-rigid 3D shape recovery using stereo factorization," Asian Conference of Computer Vision, vol. 1, pp. 25-30, 2004.
@Article{DelBue:Agapito:2004,   author = {A. {Del Bue} and L. Agapito},   title = {Non-rigid {3D} shape recovery using stereo factorization},   journal = {Asian Conference of Computer Vision},   year = {2004},   month = {January},   volume = {1},   address = {Jeju, South Korea},   pages = {25-30} }
 Non-linear optimization results for a stereo sequence where the subject is performing a smile expression. The video (MPEG-1, 677KB) shows the 3D reconstructions for each frame and the original sequence. The shape depth and mouth deformation are recovered accurately.
 To compare effectively the optimization results we plot the value of the camera rotation angles before and after the non-linear refinement. It is possible to notice a smoother behavior. The figure on the left shows the results obtained for the estimated motion parameters and configuration weights using the initial stereo factorization method and the improved results after the non-linear optimization.
 We have generated a stereo sequence using a synthetic face model. The model starts with a closed mouth, the deformation starts on frame 1, reaches its maximum on frame 60 and finishes in frame 120 of the sequence. The synthetic face performs no rigid motion. Reconstructions are shown in the following videos for the front (MPEG-1, 915KB) and side (MPEG-1, 1067KB) views.
 The parameters recovered from the non-linear optimization are compared with the one extracted from the stereo algorithm presented in [1]. The estimated values are very similar to the ground truth, especially the camera rotation parameters are very close to the one imposed in the experiment. Notice the first mode of deformation (in red) capture reliably the mouth opening and closing.

References on non-linear Stereo-Motion

• A. Del Bue and L. Agapito, "Stereo non-rigid factorization," International Journal of Computer Vision, vol. 66, iss. 2, pp. 193-207, 2006.
@Article{DelBue:Agapito:IJCV2006,   author = {A. {Del Bue} and L. Agapito},   title = {Stereo non-rigid factorization},   journal = {International Journal of Computer Vision},   year = {2006},   volume = {66},   number = {2},   pages = {193--207},   month = {February} }
• A. Del Bue and L. Agapito, "Non-rigid Stereo-Motion," , Stolkin, R., Ed., , 2007.
@InCollection{DelBue:Agapito:2007,   author = "A. {Del Bue} and L. Agapito", title = "Non-rigid Stereo-Motion", booktitle = "Scene Reconstruction, Pose Estimation and Tracking", year = "2007", editor = "Rustam Stolkin" }