Outdoor tracking using panoramic image

December 22nd, 2010 by Henri Leave a reply »

I have made this experiment in 2 days:

First of all, I must admit that this is more a “proof-of-concept” rather than a prototype… But the goal was to illustrate a concept needed for my job. I love this kind of challenge! Building something like this in 2 days was only possible thanks to great open-source library:


I’m using a panoramic image as reference. For each frame of the video I’m extracting Sift feature using SiftGPU and matching them with those of the reference image. Then I’m computing the homography between the 2 images using Ransac homography estimator (OpenCV cvFindHomography).


The performance are low due to complexity of the Sift detection and matching and that I’m applying the homography using cvWarpPerspective.

Sift extraction: 28ms 1228 features
Sift matching: 17ms using SiftGPU
Ransac Homography estimation: 2ms 89 inliers of 208 matches
Homography application: 36ms done on the CPU with OpenCV
Global: 12fps

I’m working on another version using Fast (or Agast) as feature detector and Brief as descriptor. This should lead to a significant speed-up and may eventually run on a mobile… Using the GPU vertex and pixel shader instead of the CPU to apply the homography should also gives a nice speed-up.

I’m also aware that it is not correct to apply an homography on a cylindric panoramic image (especially if you don’t undistort the input video frame too ;) )



  1. Pierre says:

    If you use AGAST/FAST you will not be scale invariant anymore… So you must be near the place where the panorama was taken in order to make the things match. But it could be sufficient !

    I’m curious to see what can BRIEF can give in such a application. And matching with many panorama for sure to see if the system could scale to large database !

  2. admin says:

    @Pierre: if you build the pyramid of the input frame and run Agast/Fast on each level you can be scale invariant. But this is not needed for this type of application. As you said the user need to be placed near the center of the reference panorama and the augmented object need to be far from the user (no parallax, otherwise the simple homography applied won’t work). The rotation invariance however was more problematic, but as the mobile has accelerometer you can compensate the rotation of the input frame before BRIEF descriptor computation, thus you should be rotation invariant. The project I’m working on will use the GPS to select the reference panorama so I don’t really need to have heavy BRIEF matching across different panorama. But BRIEF is indeed a very lightweight descriptor that is both very fast to compute and match!

  3. Alessandro says:

    Very impressive! I’m also playing with opencv but It took me a lot more to do a lot less =(.

  4. Reena says:

    Hi I was wondering if I could refer your code cause I too am trying to extract videos from .avi file using SiftGPU but i have some problem with the arguments while processing

  5. Reena says:

    I wanted to know how you gave the input in
    sift->RunSift() as

Leave a Reply