Augmented Reality outdoor tracking becoming reality

December 13th, 2010 by Henri Leave a reply »

My interest in structure from motion was primary motivated by the capability of creating a point cloud that can be used as a reference for tracking reference. The video below is more a proof-of-concept than a prototype but this is an overview of my outdoor tracking algorithm for Augmented Reality:


In a pre-processing step I’ve built a sparse point cloud of the place using my SFMToolkit. Each vertex of the point cloud has several 2D Sift features correspondences. I’ve only kept one Sift descriptor per vertex (mean of the descriptors) and put all descriptors in an index using Flann.

For each frame of the video to be augmented, I’ve extracted Sift feature with SiftGPU and then matched them using Flann 2-nearest neighbor search and a distance ratio threshold. The Flann matching is done in parallel with boost::threadpool. The matches computed contains a lot of outliers. So I have implemented a Ransac pose estimator using EPnP that permits to filter bad 2d/3d correspondences.


My implementation is slow (due to my implementation of Ransac EPnP that could be improved).

Sift first octave: -1
Sift extraction: 49ms 2917 features
Sift matching: 57ms (parallel matching using Flann)
Ransac EPnP: 110ms 121 inliers of 208 matches
Global: 4.6fps (9.4fps without pose estimation)

Sift first octave: 0
Sift extraction: 32ms 707 features
Sift matching: 15ms (parallel matching using Flann)
Ransac EPnP: 144ms 62 inliers of 93 matches
Global: 5.2fps (21.2fps without pose estimation)

The slowness is not a so big issue because it doesn’t need to run at 30fps. Indeed the goal of my prototype is to have absolute pose with this tracking system each second and relative pose using inertial system available on mobile device (or using KLT tracking).


  • Performance (faster is better ;-) )
  • Point cloud reference is not always accurate (Bundler fault)

In another post I’ll introduce alternative to Bundler: faster and more accurate.



  1. Pierre says:

    Nice usage of tech ! Congrats
    But the main problem of such approach is how to run it on low end hardware such as phone that could not handle so many descriptor comparison in a fraction of second.

  2. Cesar.Lopez says:

    So did you manage to integrate the Insight3d camera calibration code with your own (i’m guessing SAMANTHA based) bundler alternative?

  3. admin says:

    @Pierre: this is indeed a big issue. But my idea was to compute the absolute pose with this algorithm on the server side and compute relative pose using KLT (or inertial data) on the mobile device. I have introduced this idea in my remote augmented reality prototype.

    @Cesar.Lopez: No, this prototype is still using Bundler (I’ll post a list of alternative I’m working on in another post).

  4. MarkAlexander says:

    I impressed with the rate you crank out new things ;) we are using an adaption of PTAM as a tracker. In PTAM there are no discriptors as in Sift/Surf but “warped”patches which makes it fast and detectable at bigger angles and the bundle adjuster does not run every frame but every 30th frame or so. However, the accuracy is not very good. Its great for augmented reality though ;) keep up the good work

  5. Pierre says:

    I have an interesting article for you !
    Wide Area Localization on Mobile Phones. ISMAR 2009.

  6. Pierre says:

    The following video will insterest you :

    It show matches in yellow too, has you have done !

  7. admin says:

    @MarkAlexander: Thanks for the encouragements! I’m wondering if your are using your PTAM adaptation on a mobile ? Because almost all mobile only have 1 ARM, and a dual-core machine is more suitable for running bundle adjustement on another thread…
    @Pierre: Thanks for the article! (I’ve already seen it but never spent time to read it completely). Concerning the yellow matches of the video this is not a coincidence ;-) . When I’ve completed this prototype I was looking for option to compress the point cloud used for tracking (keeping only 1 Sift descriptor per vertex, scoring descriptors used as inliers in pose estimation with learnt video and removing unused descriptors, manual cleaning of the point cloud, …). And I’ve seen this video on the homepage of Arnold Irschara. But at this time the video was only available as AVI download, this is why I don’t have embedded the youtube video. BTW their solution is very different: they are compressing the point cloud using mean-shift and they are using exact GPU matching with synthetic view as opposed to my solution using multi-core CPU approximative matching with FLANN on the whole point cloud.

  8. Ngoc Vu says:

    Hi Henri,

    How would you evaluate the matching performance if we replace FLANN with GPU-based KNN ( in your implementation?

  9. Hi Henri,

    Very impressive, great result!
    The ISMAR 2011 paper ‘Real-Time Self-Localization from Panoramic Images on Mobile Devices’ by Clemens Arth will interest you as well. He uses a panoramic tracker to locate enough feature points for a 6DOF pose esitimation on a mobile phone (with a small FOV). The idea is to pick up the tracking from there using another system, e.g. with SLAM, PTAM, Optical Flow, etc.

    But you have managed to do it without a panorama!

    One question: what are the (approximate) specs of the device used to create the movie in this post?


  10. Henri says:

    @Ngoc Vu: I don’t have used GPU knn because all sift features will need to stay in GPU memory (which is ok for small areas but not for bigger ones).

    @Lex: actually I’ve already read this paper: this is indeed very interesting to me (and IMO much more complex than my proof-of-concept). As explained in this post (and answer) you don’t need to run this algorithm in real-time: this is only needed to bootstrap (or update) a slam algorithm to real world coordinates. So the idea behind my prototype was to run the global pose estimator on the server side (in my case a Corei7 with a GeForce GTX 470). But I’ve also implemented a prototype based on panoramic image which use a reference panoramic image as boostrap (detection based on surf) and a Kalman filter on sensor information (tracking) that will run on an iPad. I’ve implemented the prototype on a low-end device: Samsun UMPC Q1 and my previous company is supposed to port it to the iPad…

Leave a Reply