I’ve spent 2 days implementing a WebService in C++. I’ve lost 1 day trying to use WebSocket (this wasn’t a good idea as Firefox and Google Chrome were not implementing the same version). I’ve also lost a lot of time with the beautiful drag and drop API… who is responsible for this mess? Anyway I’ve managed to create this cool demo:
The principle is very simple: the user drag and drop an unknown picture on the webpage. The picture is resized on the client side using canvas and then sent to the webservice using a classic POST Ajax call with the picture sent in jpeg base64. Then the pose estimator web service try to compute the pose and send an xml response. Then the pose is applied to the Three.js camera used in my PhotoSynth viewer.
I have cheated a little as I’ve used shots of a video with known intrinsic parameters. But I’ve two options to handle really unknown pictures:
- If the jpeg as Exif I can use it to find the focal (using a CCD database and a JS Exif parser)
- Otherwise I am planning to implement this in C++
I hope you enjoy this demo as much as I do and that Google and Microsoft are working in that direction too (especially Microsoft with Read/Write World…)
Whow, it looks cool Are you planning the pointcloud extension with added photos?
How do you estimete the right pose? I believe that you have only pointcloud downloaded from photosynth and you need to find relation between photo and the pointcloud, or do you have something else?
@David: I wish I could have the servers needed to make this demo live. But I’m not Microsoft nor Google! This is why I really hope that Microsoft is working in that direction with Read/Write World…
The pose is computed using OpenSynther: I’m not using the photosynth point cloud, only cameras positions to compute my own Sift-based point cloud. Then I’m estimating the pose with a ransac epnp solution using 2d/3d correspondences. You’ll find more information on my augmented reality post which is describing the same tracking algorithm except that it is running on a server in this demo.
You are probably aware of Willow Garage’s support of openCV ( computer vision ) and PCL ( point cloud libraries ) in C++. There are also Python libraries for interfacing to openCV and Nvidia CUDA acceleration for some of openCV. http://opencv.willowgarage.com/wiki/ http://pointclouds.org/
Whoops. You have an openCV link on your demo page. Disregard my previous comment.
Google, Microsoft and others are expanding their use of machine learning algorithms applied to the computer vision domain. They use segmentation to break the image into pieces and then use Bayesian inference for recognition. Prof Fei Fei Li’s presentations on machine learning applied to computer vision.
http://people.csail.mit.edu/torralba/shortCourseRLOC/index.html
http://videolectures.net/mlas06_li_gmvoo/
Look at the web service http://www.iqengines.com/ You have 1000 free queries to play with. You can even submit your own training data.