Archive for December, 2010

Outdoor tracking using panoramic image

December 22nd, 2010

I have made this experiment in 2 days:

First of all, I must admit that this is more a “proof-of-concept” rather than a prototype… But the goal was to illustrate a concept needed for my job. I love this kind of challenge! Building something like this in 2 days was only possible thanks to great open-source library:


I’m using a panoramic image as reference. For each frame of the video I’m extracting Sift feature using SiftGPU and matching them with those of the reference image. Then I’m computing the homography between the 2 images using Ransac homography estimator (OpenCV cvFindHomography).


The performance are low due to complexity of the Sift detection and matching and that I’m applying the homography using cvWarpPerspective.

Sift extraction: 28ms 1228 features
Sift matching: 17ms using SiftGPU
Ransac Homography estimation: 2ms 89 inliers of 208 matches
Homography application: 36ms done on the CPU with OpenCV
Global: 12fps

I’m working on another version using Fast (or Agast) as feature detector and Brief as descriptor. This should lead to a significant speed-up and may eventually run on a mobile… Using the GPU vertex and pixel shader instead of the CPU to apply the homography should also gives a nice speed-up.

I’m also aware that it is not correct to apply an homography on a cylindric panoramic image (especially if you don’t undistort the input video frame too ;) )


Structure from motion projects

December 20th, 2010

I’ve introduced my tracking algorithm in the previous post. One of the issue I have is that the point cloud generated by my SFMToolkit (using Bundler) is not always accurate. This is a list of structure from motion projects alternative I’m interested in:

Building Rome in a Day:

Project home is using Bundler (GPL)

Building Rome on a Cloudless Day:

Project home | Source code (Non-profit license, I’ve ported their source to windows)


Project home (I’ve contacted them without response but they said that they were going to release the source code: check at 28:50)

Samantha Bundler


Website – Microsoft closed-source SFM application: check out my PhotoSynthToolkit

PhotoSynth Bundler

ETH-V3D Structure-and-Motion software:

Project home with source code (GPL, I’ve partially ported it to windows)

Simple Sparse Bundle Adjustment:

Project home with source code (LGPL, I’ve ported it to windows)

A multi-stage linear approach to structure from motion:

Project home | paper

Results from the paper of LinearSFM (Microsoft Research)

This list is not exhaustive, I’ve seen other projects (Efficient Large Scale Multi-View Stereo for Ultra High Resolution Image Sets: not sure how it is related to ETH-V3D Structure-and-Motion software)


Augmented Reality outdoor tracking becoming reality

December 13th, 2010

My interest in structure from motion was primary motivated by the capability of creating a point cloud that can be used as a reference for tracking reference. The video below is more a proof-of-concept than a prototype but this is an overview of my outdoor tracking algorithm for Augmented Reality:


In a pre-processing step I’ve built a sparse point cloud of the place using my SFMToolkit. Each vertex of the point cloud has several 2D Sift features correspondences. I’ve only kept one Sift descriptor per vertex (mean of the descriptors) and put all descriptors in an index using Flann.

For each frame of the video to be augmented, I’ve extracted Sift feature with SiftGPU and then matched them using Flann 2-nearest neighbor search and a distance ratio threshold. The Flann matching is done in parallel with boost::threadpool. The matches computed contains a lot of outliers. So I have implemented a Ransac pose estimator using EPnP that permits to filter bad 2d/3d correspondences.


My implementation is slow (due to my implementation of Ransac EPnP that could be improved).

Sift first octave: -1
Sift extraction: 49ms 2917 features
Sift matching: 57ms (parallel matching using Flann)
Ransac EPnP: 110ms 121 inliers of 208 matches
Global: 4.6fps (9.4fps without pose estimation)

Sift first octave: 0
Sift extraction: 32ms 707 features
Sift matching: 15ms (parallel matching using Flann)
Ransac EPnP: 144ms 62 inliers of 93 matches
Global: 5.2fps (21.2fps without pose estimation)

The slowness is not a so big issue because it doesn’t need to run at 30fps. Indeed the goal of my prototype is to have absolute pose with this tracking system each second and relative pose using inertial system available on mobile device (or using KLT tracking).


  • Performance (faster is better ;-) )
  • Point cloud reference is not always accurate (Bundler fault)

In another post I’ll introduce alternative to Bundler: faster and more accurate.