Visual-Experiments.com » augmented reality

News about OpenSynther

Henri — Mon, 09 May 2011 08:24:30 +0000

I’ve worked a lot on OpenSynther lately: OpenSynther is the name of my structure-from-motion solution. This new version is a major rewrite of the previous version which was using Surf with both GPU and multi-core CPU matching. The new version is using SiftGPU and Flann to achieve linear matching complexity of unstructured input as described in Samantha paper. You can find more information about OpenSynther features on it dedicated page (including source code).

OpenSynther has been designed as a library (OpenSyntherLib) which has already proven to be useful for several programs written by myself:

OpenSynther: work in progress… used by my augmented reality demo
PhotoSynth2CMVS: this allow to use CMVS with PhotoSynthToolkit
BundlerMatcher: this is the matching solution used by SFMToolkit

Outdoor augmented reality demo using OpenSynther

I’ve improved my first attempt of outdoor augmented reality: I’m now relying on PhotoSynth capability of creating a point cloud of the scene instead of Bundler. Then I’m doing some processing with OpenSynther and here is what what you get:

You can also take a look at the 3 others youtube videos showing this tracking in action around this church: MVI_6380.avi, MVI_6381.avi, MVI_6382.avi.

PhotoSynth2CMVS

This is not ready yet, I still have some stuff to fix before releasing it. But I’m already producing a valid “bundle.out” file compatible with CMVS processing from PhotoSynth. I’ve processed the V3D dataset with PhotoSynth2CMVS and sent the bundle.out file to Olafur Haraldsson who has managed to create the corresponding 36 million vertices point cloud using CMVS and PMVS2:

The V3D dataset was created by Christopher Zach.

BundlerMatcher

The new unstructured linear matching is really fast as you can see on the above chart compared to PhotoSynth. But the quality of the generated point cloud is not as good as PhotoSynth.

This benchmark was computed on a Core i7 with an Nvidia 470 GTX. I’ve also compared the quality of the matching methods implemented in OpenSynther (linear VS quadratic). I’ve used Bundler as a comparator with a dataset of 245 pictures:

	Linear	Quadratic
Nb pictures registered	193	243
Time spent to register 193 pictures	33min	1h43min

On the one hand, both the matching and the bundle adjustment are faster with linear matching but on the other hand, having only 193 out of 245 pictures registered is not acceptable. I have some idea on how to improve the linear matching pictures registering ratio but this is not implemented yet (this is why PhotoSynth2CMVS is not released for now).

Future

I’ve been playing with LDAHash last week and I’d like to support this in OpenSynther to improve matching speed and accuracy. It would also help to reduce the memory used by OpenSynther (by a factor 16: 128 floats -> 256bits per feature). I’m also wondering if the Cuda knn implementation could speed-up the matching (if applicable)? I ‘d also like to restore the previous Surf version of OpenSynther which was really fun to implement. Adding a sequential bundle adjustment (as in bundler) would be really interesting too…

Off-topic

I’ve made some modifications to my blog: switched to WordPress 3.x, activated page caching, added social sharing buttons and added my LinkedIn account next to the donate button…

2010 visual experiments

Henri — Fri, 07 Jan 2011 16:01:17 +0000

Happy new year everyone!

2010 was a year full of visual experiments for me, I hope that you like what you see on this blog. In this post I’m making a little overview of all visual experiments created by me during this year. This is an opportunity to catch-up something you’ve missed! I’d like also to thanks some person that have been helping me too:

Olafur Haraldsson: for creating the photogrammetry forum
Josh Harle: for his videos tutorials and his nice blog
You: for reading this

Visual experiments created in 2010:

During this year I have added some features to Ogre3D:

ArToolKitPlus: augmented reality marker-based system
Cuda: for beginner only (at least advanced user could grab some useful code)
OpenCL: for beginner only (at least advanced user could grab some useful code)
Html5 Canvas: implementation based on skia for graphics and V8 for javascript scripting
Kinect: this is a very hacky solution, I’ll improve it later

I also have learned GPGPU programming by myself while coding a partial GPUSurf implementation based on Nico Cornelis paper. But this implementation is not complete and I’m willing to rewrite it with a GPGPU framework based on OpenGL and CG only (not Ogre3D). With such a framework writing Sift/Surf detector should be easier and more efficient.

I have created some visual experiments related to Augmented Reality:

My outdoor 3D tracking algorithm for augmented reality needs an accurate point cloud: this is why I’m interested in structure from motion and I’ve created two SfM toolkit:

SFMToolkit (SiftGPU -> Bundler -> CMVS -> PMVS2)
PhotoSynthToolkit (PhotoSynth -> PMVS2)

Posts published in 2010:

2010/12/22: Outdoor tracking using panoramic image
2010/12/20: Structure from motion projects
2010/12/13: Augmented Reality outdoor tracking becoming reality
2010/11/20: Kinect experiment with Ogre3D
2010/11/19: PhotoSynthToolkit results
2010/11/09: PhotoSynth Toolkit updated
2010/11/05: Structure From Motion Toolkit released
2010/09/27: My 5 years old Quiksee competitor
2010/09/23: PMVS2 x64 and videos tutorials
2010/09/08: Introducing OpenSynther
2010/08/22: Dense point cloud created with PhotoSynth and PMVS2
2010/08/19: My PhotoSynth ToolKit
2010/07/12: Pose Estimation using SfM point cloud
2010/07/11: Remote Augmented Reality Prototype
2010/07/08: Structure From Motion Experiment
2010/06/25: GPU-Surf video demo
2010/06/23: GPUSurf and Ogre::GPGPU
2010/06/20: Ogre::Canvas, a 2D API for Ogre3D
2010/05/09: Ogre::OpenCL and Ogre::Canvas
2010/04/26: Cuda integration with Ogre3D
2010/04/09: Multitouch prototype done using Awesomium and Ogre3D
2010/03/05: ArToolKitPlus integration with Ogre3D
2010/02/20: Hello World !

Outdoor tracking using panoramic image

Henri — Wed, 22 Dec 2010 13:10:29 +0000

I have made this experiment in 2 days:

First of all, I must admit that this is more a “proof-of-concept” rather than a prototype… But the goal was to illustrate a concept needed for my job. I love this kind of challenge! Building something like this in 2 days was only possible thanks to great open-source library:

Ogre3D (MIT)
OpenCV (BSD)
SiftGPU (non-profit license)

Analysis

I’m using a panoramic image as reference. For each frame of the video I’m extracting Sift feature using SiftGPU and matching them with those of the reference image. Then I’m computing the homography between the 2 images using Ransac homography estimator (OpenCV cvFindHomography).

Performance

The performance are low due to complexity of the Sift detection and matching and that I’m applying the homography using cvWarpPerspective.

Sift extraction:	28ms	1228 features
Sift matching:	17ms	using SiftGPU
Ransac Homography estimation:	2ms	89 inliers of 208 matches
Homography application:	36ms	done on the CPU with OpenCV
Global: 12fps

I’m working on another version using Fast (or Agast) as feature detector and Brief as descriptor. This should lead to a significant speed-up and may eventually run on a mobile… Using the GPU vertex and pixel shader instead of the CPU to apply the homography should also gives a nice speed-up.

I’m also aware that it is not correct to apply an homography on a cylindric panoramic image (especially if you don’t undistort the input video frame too )

Augmented Reality outdoor tracking becoming reality

Henri — Mon, 13 Dec 2010 10:08:11 +0000

My interest in structure from motion was primary motivated by the capability of creating a point cloud that can be used as a reference for tracking reference. The video below is more a proof-of-concept than a prototype but this is an overview of my outdoor tracking algorithm for Augmented Reality:

Analysis

In a pre-processing step I’ve built a sparse point cloud of the place using my SFMToolkit. Each vertex of the point cloud has several 2D Sift features correspondences. I’ve only kept one Sift descriptor per vertex (mean of the descriptors) and put all descriptors in an index using Flann.

For each frame of the video to be augmented, I’ve extracted Sift feature with SiftGPU and then matched them using Flann 2-nearest neighbor search and a distance ratio threshold. The Flann matching is done in parallel with boost::threadpool. The matches computed contains a lot of outliers. So I have implemented a Ransac pose estimator using EPnP that permits to filter bad 2d/3d correspondences.

Performance

My implementation is slow (due to my implementation of Ransac EPnP that could be improved).

Sift first octave: -1
Sift extraction:	49ms	2917 features
Sift matching:	57ms	(parallel matching using Flann)
Ransac EPnP:	110ms	121 inliers of 208 matches
Global: 4.6fps (9.4fps without pose estimation)

Sift first octave: 0
Sift extraction:	32ms	707 features
Sift matching:	15ms	(parallel matching using Flann)
Ransac EPnP:	144ms	62 inliers of 93 matches
Global: 5.2fps (21.2fps without pose estimation)

The slowness is not a so big issue because it doesn’t need to run at 30fps. Indeed the goal of my prototype is to have absolute pose with this tracking system each second and relative pose using inertial system available on mobile device (or using KLT tracking).

Issue

Performance (faster is better )
Point cloud reference is not always accurate (Bundler fault)

In another post I’ll introduce alternative to Bundler: faster and more accurate.

Remote Augmented Reality Prototype

Henri — Sun, 11 Jul 2010 17:30:03 +0000

I have created a new augmented reality prototype (5 days experiments). It is using a client/server approach based on Boost.Asio. The first assumption of this prototype is that you’ve got a mobile client not so powerful and a powerful server with a decent GPU.

So the idea is simple: the client uploads a video frame and the server does the pose estimation and send back the augmented rendering to the client. My first prototype is using ArToolKitPlus in almost real-time (15fps) but I’m also working on a markerless version that would be less interactive (< 1fps). The mobile client was an UMPC (Samsung Q1).

Thanks to Boost.Asio I’ve been able to produce a strong client/server very quickly. Then I have created two implementations of PoseEstimator :

class PoseEstimator
{
	public:
		bool computePose(const Ogre::PixelBox& videoFrame);
		Ogre::Vector3 getPosition() const;
		Ogre::Quaternion getOrientation() const;
}

ArToolKitPoseEstimator (using ArToolKitPlus to get pose estimation)
SfMPoseEstimator (using EPnP and a point cloud generated with Bundler -Structure from Motion tool- to get pose estimation)

ArToolKitPoseEstimator

There is nothing fancy about this pose estimator, I’ve just implemented this one as proof of concept and to check my server performance. In fact, ArToolKit pose estimation is not expensive and can run in real-time on a mobile.

SfMPoseEstimator

I’ll just introduce the concept of this pose estimator in this post. So the idea is simple, in augmented reality fake rolex you generally know the object you are looking at because you want to augment it. The idea was to create a point cloud of the object you want to augment (using Structure from Motion) and keep the link between the 3D points and theirs 2D descriptors. Thus when you take a shot of the scene you can compare the 2D descriptors of your shot with those of the point cloud and so create 2D/3D correspondence. Then the pose estimation can be estimated by solving the Perspective-n-Point camera calibration problem (using EPnP for example).

Performance

The server is very basic, it doesn’t handle client queuing yet (1 client = 1 thread), but it already does the off-screen rendering and send back the texture in raw RGB.

The version using ArToolKit is only Replica Handbag running at 15fps because I had trouble with the jpeg compression so I turn it off. So this version is only bandwidth limited. I didn’t investigate this issue that much because I know that the SfMPoseEstimator is going to be limited by the matching step. Furthermore I’m not sure that it’s a good idea to send highly compressed image to the server (compression artifact can add extra features).

My SfMPoseEstimator is also working but it’s very expensive (~1s using the GPU) and it’s not always accurate due to some flaws of my original implementation. I’ll explain how it works in my following post.