<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>Visual-Experiments.com &#187; sift</title>
	<atom:link href="http://www.visual-experiments.com/tag/sift/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.visual-experiments.com</link>
	<description>ASTRE Henri experiments with Ogre3D and web stuff</description>
	<lastBuildDate>Mon, 16 Jan 2017 18:59:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
		<item>
		<title>Outdoor tracking using panoramic image</title>
		<link>http://www.visual-experiments.com/2010/12/22/outdoor-tracking-using-panoramic-image/</link>
		<comments>http://www.visual-experiments.com/2010/12/22/outdoor-tracking-using-panoramic-image/#comments</comments>
		<pubDate>Wed, 22 Dec 2010 13:10:29 +0000</pubDate>
		<dc:creator>Henri</dc:creator>
				<category><![CDATA[augmented reality]]></category>
		<category><![CDATA[ogre3d]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[sift]]></category>
		<guid isPermaLink="false">http://www.visual-experiments.com/?p=1167</guid>
		<description><![CDATA[I have made this experiment in 2 days: First of all, I must admit that this is more a &#8220;proof-of-concept&#8221; rather than a prototype&#8230; But the goal was to illustrate a concept needed for my job. I love this kind of challenge! Building something like this in 2 days was only possible thanks to great [...]]]></description>
			<content:encoded><![CDATA[<p>I have made this experiment in 2 days:</p>
<p><object width="560" height="340"><param name="movie" value="http://www.youtube.com/v/ZmbP022QXpk?fs=1&amp;hl=en_US"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/ZmbP022QXpk?fs=1&amp;hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="560" height="340"></embed></object></p>
<p>First of all, I must admit that this is more a &#8220;proof-of-concept&#8221; rather than a prototype&#8230; But the goal was to illustrate a concept needed for my job. I love this kind of challenge! Building something like this in 2 days was only possible thanks to great open-source library:</p>
<ul style="margin-left: 20px;">
<li><a href="http://www.ogre3d.org/">Ogre3D</a> (MIT)</li>
<li><a href="http://opencv.willowgarage.com/wiki/">OpenCV</a> (BSD)</li>
<li><a href="http://www.cs.unc.edu/~ccwu/siftgpu/">SiftGPU</a> (non-profit license)</li>
</ul>
<h3>Analysis</h3>
<p>I&#8217;m using a panoramic image as reference. For each frame of the video I&#8217;m extracting Sift feature using SiftGPU and matching them with those of the reference image. Then I&#8217;m computing the homography between the 2 images using Ransac homography estimator (OpenCV cvFindHomography).</p>
<h3>Performance</h3>
<p>The performance are low due to complexity of the Sift detection and matching and that I&#8217;m applying the homography using cvWarpPerspective.</p>
<style type="text/css">
table.result {
color: black;
border: 1px solid black;
}
table.result td {
text-align: left;
padding: 1px;
}
</style>
<table class="result">
<tr>
<td>Sift extraction:</td>
<td>28ms</td>
<td>1228 features</td>
</tr>
<tr>
<td>Sift matching:</td>
<td>17ms</td>
<td>using SiftGPU</td>
</tr>
<tr>
<td>Ransac Homography estimation:</td>
<td>2ms</td>
<td>89 inliers of 208 matches</td>
</tr>
<tr>
<td>Homography application:</td>
<td>36ms</td>
<td>done on the CPU with OpenCV</td>
</tr>
<tr>
<td colspan="3">Global: 12fps</td>
</tr>
</table>
<div style="height: 20px;">&nbsp;</div>
<p>I&#8217;m working on another version using <a href="http://svr-www.eng.cam.ac.uk/~er258/work/fast.html">Fast</a> (or <a href="http://www6.in.tum.de/Main/ResearchAgast">Agast</a>) as feature detector and <a href="http://cvlab.epfl.ch/software/brief/index.php">Brief</a> as descriptor. This should lead to a significant speed-up and may eventually run on a mobile&#8230; Using the GPU vertex and pixel shader instead of the CPU to apply the homography should also gives a nice speed-up.</p>
<p>I&#8217;m also aware that it is not correct to apply an homography on a cylindric panoramic image (especially if you don&#8217;t undistort the input video frame too <img src='http://www.visual-experiments.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> )</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.visual-experiments.com%2F2010%2F12%2F22%2Foutdoor-tracking-using-panoramic-image%2F&amp;title=Outdoor%20tracking%20using%20panoramic%20image"><img src="http://www.visual-experiments.com/blog/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://www.visual-experiments.com/2010/12/22/outdoor-tracking-using-panoramic-image/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Augmented Reality outdoor tracking becoming reality</title>
		<link>http://www.visual-experiments.com/2010/12/13/augmented-reality-outdoor-tracking-becoming-reality/</link>
		<comments>http://www.visual-experiments.com/2010/12/13/augmented-reality-outdoor-tracking-becoming-reality/#comments</comments>
		<pubDate>Mon, 13 Dec 2010 10:08:11 +0000</pubDate>
		<dc:creator>Henri</dc:creator>
				<category><![CDATA[ogre3d]]></category>
		<category><![CDATA[photogrammetry]]></category>
		<category><![CDATA[photosynth]]></category>
		<category><![CDATA[augmented reality]]></category>
		<category><![CDATA[bundler]]></category>
		<category><![CDATA[sift]]></category>
		<category><![CDATA[tracking]]></category>
		<guid isPermaLink="false">http://www.visual-experiments.com/?p=909</guid>
		<description><![CDATA[My interest in structure from motion was primary motivated by the capability of creating a point cloud that can be used as a reference for tracking reference. The video below is more a proof-of-concept than a prototype but this is an overview of my outdoor tracking algorithm for Augmented Reality: Analysis In a pre-processing step [...]]]></description>
			<content:encoded><![CDATA[<p>My interest in structure from motion was primary motivated by the capability of creating a point cloud that can be used as a reference for tracking reference. The video below is more a proof-of-concept than a prototype but this is an overview of my <strong>outdoor tracking algorithm for Augmented Reality</strong>:</p>
<p><object width="560" height="340"><param name="movie" value="http://www.youtube.com/v/DdVz4xQJPC0?fs=1&amp;hl=en_US"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/DdVz4xQJPC0?fs=1&amp;hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="560" height="340"></embed></object></p>
<h3>Analysis</h3>
<p>In a pre-processing step I&#8217;ve built a sparse point cloud of the place using my <a href="http://www.visual-experiments.com/2010/11/05/structure-from-motion-toolkit-released/">SFMToolkit</a>. Each vertex of the point cloud has several 2D Sift features correspondences. I&#8217;ve only kept one Sift descriptor per vertex (mean of the descriptors) and put all descriptors in an index using <a href="http://www.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN">Flann</a>.</p>
<p>For each frame of the video to be augmented, I&#8217;ve extracted Sift feature with <a href="http://www.cs.unc.edu/~ccwu/siftgpu/">SiftGPU</a> and then matched them using Flann 2-nearest neighbor search and a distance ratio threshold. The <a href="http://www.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN">Flann</a> matching is done in parallel with <a href="http://threadpool.sourceforge.net/">boost::threadpool</a>. The matches computed contains a lot of outliers. So I have implemented a <a href="http://en.wikipedia.org/wiki/RANSAC">Ransac</a> pose estimator using <a href="http://cvlab.epfl.ch/software/EPnP/">EPnP</a> that permits to filter bad 2d/3d correspondences.</p>
<h3>Performance</h3>
<p>My implementation is slow (due to my implementation of Ransac EPnP that could be improved).</p>
<style type="text/css">
table.result {
color: black;
border: 1px solid black;
}
table.result td {
text-align: left;
padding: 1px;
}
</style>
<table class="result">
<tr>
<td colspan="3">Sift first octave: -1</td>
</tr>
<tr>
<td>Sift extraction:</td>
<td>49ms</td>
<td>2917 features</td>
</tr>
<tr>
<td>Sift matching:</td>
<td>57ms</td>
<td>(parallel matching using Flann)</td>
</tr>
<tr>
<td>Ransac EPnP:</td>
<td>110ms</td>
<td>121 inliers of 208 matches</td>
</tr>
<tr>
<td colspan="3">Global: 4.6fps (9.4fps without pose estimation)</td>
</tr>
</table>
<p></p>
<table class="result">
<tr>
<td colspan="3">Sift first octave: 0</td>
</tr>
<tr>
<td>Sift extraction:</td>
<td>32ms</td>
<td>707 features</td>
</tr>
<tr>
<td>Sift matching:</td>
<td>15ms</td>
<td>(parallel matching using Flann)</td>
</tr>
<tr>
<td>Ransac EPnP:</td>
<td>144ms</td>
<td>62 inliers of 93 matches</td>
</tr>
<tr>
<td colspan="3">Global: 5.2fps (21.2fps without pose estimation)</td>
</tr>
</table>
<p>The slowness is not a so big issue because it doesn&#8217;t need to run at 30fps. Indeed the goal of my prototype is to have absolute pose with this tracking system each second and relative pose using inertial system available on mobile device (or using KLT tracking).</p>
<h3>Issue</h3>
<ul style="margin-left: 20px;">
<li>Performance (faster is better <img src='http://www.visual-experiments.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> )</li>
<li>Point cloud reference is not always accurate (<a href="http://phototour.cs.washington.edu/bundler/">Bundler</a> fault)</li>
</ul>
<p>In another post I&#8217;ll introduce alternative to Bundler: faster and more accurate.</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.visual-experiments.com%2F2010%2F12%2F13%2Faugmented-reality-outdoor-tracking-becoming-reality%2F&amp;title=Augmented%20Reality%20outdoor%20tracking%20becoming%20reality"><img src="http://www.visual-experiments.com/blog/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://www.visual-experiments.com/2010/12/13/augmented-reality-outdoor-tracking-becoming-reality/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Pose Estimation using SfM point cloud</title>
		<link>http://www.visual-experiments.com/2010/07/12/pose-estimation-using-sfm-point-cloud/</link>
		<comments>http://www.visual-experiments.com/2010/07/12/pose-estimation-using-sfm-point-cloud/#comments</comments>
		<pubDate>Mon, 12 Jul 2010 08:42:14 +0000</pubDate>
		<dc:creator>Henri</dc:creator>
				<category><![CDATA[augmented reality]]></category>
		<category><![CDATA[ogre3d]]></category>
		<category><![CDATA[bundler]]></category>
		<category><![CDATA[gpusurf]]></category>
		<category><![CDATA[pose estimation]]></category>
		<category><![CDATA[sift]]></category>
		<category><![CDATA[structure from motion]]></category>
		<guid isPermaLink="false">http://www.visual-experiments.com/?p=600</guid>
		<description><![CDATA[The idea of this pose estimator is based on PTAM (Parallel Tracking and Mapping). PTAM is capable of tracking in an unknown environment thanks to the mapping done in parallel. But in fact if you want to augment reality, it&#8217;s generally because you already know what you are looking at. So, being able to have [...]]]></description>
			<content:encoded><![CDATA[<p>The idea of this pose estimator is based on <a href="http://www.robots.ox.ac.uk/~gk/PTAM/">PTAM</a> <em>(Parallel Tracking and Mapping)</em>. PTAM is capable of tracking in an unknown environment thanks to the mapping done in parallel. But in fact if you want to augment reality, it&#8217;s generally because you already know what you are looking at. So, being able to have a tracking working in an unknown environment is not always needed. My idea was simple: <strong>instead of doing a mapping in parallel, why not using SFM in a pre-processing step ?</strong></p>
<table>
<tbody style="background-color: white">
<tr>
<td colspan="2"><img src="http://www.visual-experiments.com/blog/wp-content/uploads/2010/07/sfm.pose_.estimation.png" alt="" title="sfm.pose.estimation" width="571" height="258" class="alignnone size-full wp-image-621" /></td>
</tr>
<tr>
<td>input: point cloud + camera shot</td>
<td>output: position and orientation of the camera</td>
</tr>
</tbody>
</table>
<div style="height: 10px"></div>
<p>So my outdoor tracking algorithm will eventually work like this:</p>
<ul style="margin-left: 20px">
<li>pre-processing step
<ul style="margin-left: 20px">
<li>generate a point cloud of the outdoor scene you want to track using Bundler</li>
<li>create a binary file with a descriptor <em>(Sift/Surf)</em> per vertex of the point cloud</li>
</ul>
</li>
<li>in real-time, for each frame N:
<ul style="margin-left: 20px">
<li>extract feature using <a href="http://mi.eng.cam.ac.uk/~er258/work/fast.html">FAST</a></li>
<li>match feature from frame N-1 using 2D patch</li>
<li>compute <strong>&#8220;relative pose&#8221;</strong> between frame N and N-1</li>
</ul>
</li>
<li>in almost real-time, for each &#8220;key frame&#8221;:
<ul style="margin-left: 20px">
<li>extract feature and descriptor</li>
<li>match descriptor with those of the point cloud</li>
<li>generate 2D/3D correspondence from matches</li>
<li>compute <strong>&#8220;absolute pose&#8221;</strong> using PnP solver <em>(<a href="http://cvlab.epfl.ch/software/EPnP/">EPnP</a>)</em></li>
</ul>
</li>
</ul>
<p>The tricky part is that absolute pose computation could last several &#8220;relative pose&#8221; estimation. So once you&#8217;ve got the absolute pose you&#8217;ll have to compensate the delay by cumulating the previous relative pose&#8230;</p>
<p>This is what I&#8217;ve got so far:</p>
<ul style="margin-left: 20px">
<li><strong>pre-processing step:</strong> binary file generated using SiftGPU (planning to move on my GPUSurf implementation) and Bundler (planning to move on <a href="http://insight3d.sourceforge.net/">Insight3D</a> or implement it myself using <a href="http://www.ics.forth.gr/~lourakis/sba/index.html">sba</a>)</li>
<li><strong>relative pose:</strong> I don&#8217;t have an implementation of the relative pose estimator</li>
<li><strong>absolute pose:</strong> it&#8217;s basically working but needs some improvements:
<ul style="margin-left: 20px">
<li>switch feature extraction/matching from Sift to Surf</li>
<li>remove unused descriptors to speed-up maching step (by scoring descriptors used as inlier with training data)</li>
<li>use another PnP solver (or add ransac to support outliers and have more accurate results)</li>
</ul>
</li>
</ul>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.visual-experiments.com%2F2010%2F07%2F12%2Fpose-estimation-using-sfm-point-cloud%2F&amp;title=Pose%20Estimation%20using%20SfM%20point%20cloud"><img src="http://www.visual-experiments.com/blog/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://www.visual-experiments.com/2010/07/12/pose-estimation-using-sfm-point-cloud/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Remote Augmented Reality Prototype</title>
		<link>http://www.visual-experiments.com/2010/07/11/remote-augmented-reality-prototype/</link>
		<comments>http://www.visual-experiments.com/2010/07/11/remote-augmented-reality-prototype/#comments</comments>
		<pubDate>Sun, 11 Jul 2010 17:30:03 +0000</pubDate>
		<dc:creator>Henri</dc:creator>
				<category><![CDATA[ogre3d]]></category>
		<category><![CDATA[artoolkit]]></category>
		<category><![CDATA[augmented reality]]></category>
		<category><![CDATA[boost]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[sift]]></category>
		<guid isPermaLink="false">http://www.visual-experiments.com/?p=514</guid>
		<description><![CDATA[I have created a new augmented reality prototype (5 days experiments). It is using a client/server approach based on Boost.Asio. The first assumption of this prototype is that you&#8217;ve got a mobile client not so powerful and a powerful server with a decent GPU. So the idea is simple: the client uploads a video frame [...]]]></description>
			<content:encoded><![CDATA[<p>I have created a new augmented reality prototype (5 days experiments). It is using a client/server approach based on <a href="http://think-async.com/">Boost.Asio</a>. The first assumption of this prototype is that you&#8217;ve got a mobile client not so powerful and a powerful server with a decent GPU.<br />
<img src="http://www.visual-experiments.com/blog/wp-content/uploads/2010/07/remoteArToolKit.png" alt="" title="remoteArToolKit" width="467" height="205" class="alignnone size-full wp-image-528" /></p>
<table>
<tbody style="background-color: white; color: #4D4D4D; text-align: left; vertical-align: top;">
<tr>
<td>So the idea is simple: the client uploads a video frame and the server does the pose estimation and send back the augmented rendering to the client. My first prototype is using ArToolKitPlus in almost real-time (15fps) but I&#8217;m also working on a markerless version that would be less interactive (< 1fps). The mobile client was an UMPC (Samsung Q1).</td>
<td><img src="http://www.visual-experiments.com/blog/wp-content/uploads/2010/07/samsung.q1.jpg" alt="" title="samsung.q1" width="150" height="135" class="alignnone size-full wp-image-583" /></td>
</tr>
</tbody>
</table>
<p>Thanks to Boost.Asio I&#8217;ve been able to produce a strong client/server very quickly. Then I have created two implementations of PoseEstimator :</p>
<pre class="brush: cpp; title: ;">
class PoseEstimator
{
	public:
		bool computePose(const Ogre::PixelBox&amp; videoFrame);
		Ogre::Vector3 getPosition() const;
		Ogre::Quaternion getOrientation() const;
}
</pre>
<ul style="margin-left: 20px">
<li>ArToolKitPoseEstimator <em>(using <a href="http://studierstube.icg.tu-graz.ac.at/handheld_ar/artoolkitplus.php">ArToolKitPlus</a> to get pose estimation)</em></li>
<li>SfMPoseEstimator <em>(using <a href="http://cvlab.epfl.ch/software/EPnP/">EPnP</a> and a point cloud generated with <a href="http://phototour.cs.washington.edu/bundler/">Bundler</a>  -Structure from Motion tool- to get pose estimation)</em></li>
</ul>
<h3>ArToolKitPoseEstimator</h3>
<p>There is nothing fancy about this pose estimator, I&#8217;ve just implemented this one as proof of concept and to check my server performance. In fact, ArToolKit pose estimation is not expensive and can run in real-time on a mobile.</p>
<h3>SfMPoseEstimator</h3>
<p>I&#8217;ll just introduce the concept of this pose estimator in this post. So the idea is simple, in augmented reality <a href="http://www.midnightliaison.co.uk/">fake rolex</a> you generally know the object you are looking at because you want to augment it. The idea was to create a point cloud of the object you want to augment (using Structure from Motion) and keep the link between the 3D points and theirs 2D descriptors. Thus when you take a shot of the scene you can compare the 2D descriptors of your shot with those of the point cloud and so create 2D/3D correspondence. Then the pose estimation can be estimated by solving the Perspective-n-Point camera calibration problem (using <a href="http://cvlab.epfl.ch/software/EPnP/index.php">EPnP</a> for example).</p>
<h3>Performance</h3>
<p>The server is very basic, it doesn&#8217;t handle client queuing yet (1 client = 1 thread), but it already does the off-screen rendering and send back the texture in raw RGB.</p>
<p>The version using ArToolKit is only <a href="http://www.pursevillage.com/">Replica Handbag</a> running at 15fps because I had trouble with the jpeg compression so I turn it off. So this version is only bandwidth limited. I didn&#8217;t investigate this issue that much because I know that the SfMPoseEstimator is going to be limited by the matching step. Furthermore I&#8217;m not sure that it&#8217;s a good idea to send highly compressed image to the server (compression artifact can add extra features).</p>
<p>My SfMPoseEstimator is also working but it&#8217;s very expensive (~1s using the GPU)  and it&#8217;s not always accurate due to some flaws of my original implementation. I&#8217;ll explain how it works in my following post.</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.visual-experiments.com%2F2010%2F07%2F11%2Fremote-augmented-reality-prototype%2F&amp;title=Remote%20Augmented%20Reality%20Prototype"><img src="http://www.visual-experiments.com/blog/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://www.visual-experiments.com/2010/07/11/remote-augmented-reality-prototype/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
