2007年11月1日星期四

Champion of RoboCup2007@Home

Good News: Autonomous Robot Group of Shanghai Jiao Tong University, won the Champion of RoboCup2007@Home--China Open!

Equipped with many sensors, including :
  • a front usb camera
  • an omni-directional camera
  • a Sick laser
  • odometers
our robot@home is quite smart and powerful, able to perform various tasks in natural home environment, such as:
  • recognizing many different objects;
  • tracking people according to special color patterns;
  • self-positioning with vision technology;
  • measuring distance and avoiding obstacles.
I developed the whole vision system, its functions including:
  • object recognition
  • color tracking with camshift
  • 3D positioning in omnivision system
Experiment shows that the whole system works quite well and can help the robot localize itself quickly and accurately.

2007年10月27日星期六

Farewell to TOEFL and GRE

终于考完了~~~
算起来正好是一个月内考了两次,TOEFL & GRE,人生算是又完整了一些。想想还是有点成就感的,觉得自己还是有些毅力和耐心的,呵呵。
本无心考这些,不过当初被师姐"训"一了顿:不能平平淡淡,想做了,自然就能做到。
现在想想确实如此,有了追求,才有动力。人生不就是一场自我折腾嘛。

A little tired, however, a sense of self-satisfaction ^_^

2007年9月3日星期一

SIFT Implementation

SIFT is perhaps the most popular invariant local feature detector at present, and has been applied succefully in many occasions, such as object recognition, image match and registration, SFM & 3D reconstruction, vSLAM, image panorama, etc.
Previously, I've developped some versions of SIFT algorithm, as well as Lowe's object recognition system mentioned in the paper(IJCV04). One of the version, named Harris-SIFT, works quite well and fast in my previous vision system--RTL(Real-time Tracking and Localization). In fact, it's a simplified SIFT and retains virtues of the original algorithm.
Recently, I've foud someone mentioned that Andrea Vedaldi(UCLA) had developped a good version of SIFT, named siftpp. I looked deep into it and found its performance quite like that of Lowe's demo. More important, that code strictly follows steps in Lowe's paper, thus is easy to read or study Lowe's algorithm. However, it runs slow and has some bugs. I'd like to opimize the code so that it can be applied in other occasions.
Other codes I refered, including Rob Hess's sift and iLab Neuromorphic Vision C++ Toolkit. The former is implemented in c and uses OpenCV. In fact, his implementation, in some sense, is more correct than siftpp. However, the code is not very efficient due to some redundant processes, and the not-well-organized feature structure. Further more, personally I favor C++ than C, for the reason that C++ provides many advanced features, powerful, useful and convenient for us to write robust codes, esepcially true for algorithm or libraries. The later is an excellent vision library designed for modeling human's neuron vision model. It's still under developping and CVS verion is available. I prefer the structure of this library and it is implemented with moden C++ language, such as meta-template, and some kind of design pattens. A SIFT-based object recognition framework is nested in the library, which can be a good reference.
So now, above are the main sources of my references. That's quite enough. Recently, I spent 3 days to rewrite the SIFT algorithm, and it is finished now. Main properties as follows :

1. Correct and Accurate
Algorithm routines of the code strictly conforms to Lowe's paper, and the result is quite similar to Lowe's demo.
In fact, both the feature location, scale, orientation, and the descritptor, are approximately same as that from Lowe's demo. For example, below are two sets of sample features extracted with my code and Lowe's demo respectively. There are only small differences between them :

##### my code : (x and y's position are different as that of Lowe's)
183.749 279.852 16.3574 4.86277
0 27 53 26 39 37 5 0 16 19 19 18 44 32 23 35 115 72 8 1
2 3 14 52 30 13 3 22 26 17 7 2 27 112 115 14 4 1 1 1
115 47 22 12 6 2 27 99 70 28 37 7 1 2 53 102 24 20 22 44
92 26 3 20 57 30 7 1 2 18 102 29 115 79 47 28 1 3 12 33
15 81 115 78 4 5 10 9 7 101 115 39 36 11 1 2 8 0 1 3
4 86 115 18 14 10 9 10 7 43 86 37 3 13 15 36 56 37 42 3
14 50 55 31 12 2 2 10

##### Lowe's demo :
279.57 183.61 16.34 -1.453
0 20 44 24 43 43 6 0 20 23 20 16 40 32 22 36 116 77 8 1
1 2 11 46 23 12 3 25 29 16 7 1 22 105 116 16 5 0 0 1
116 45 21 11 7 2 27 100 73 28 38 8 0 1 49 100 22 19 24 42
96 31 3 18 56 36 10 1 1 12 91 27 116 88 45 28 0 1 11 30
15 75 116 86 4 4 10 8 6 99 116 40 31 11 0 1 9 0 0 1
2 74 116 20 17 11 9 9 5 41 85 40 2 11 13 33 55 36 47 4
14 46 55 30 15 2 1 10

sample feature images : (left- my imp. right - Lowe's demo)

















2. Fast
I've done a lot efforts to optimize the code. Needless to say, SIFT is very time-consuming, due to a serial times of Gauss blur when constructing scale space, and multiple times of loops when genrating descriptors. Convolution and complex arithmatic operations dealing with float-point datas, such as exp, floor, sin, cos, are known to be time-consuming. Thus it is surely impossible for SIFT to attain real-time performance on current CPU(The fast implementation I know--a comerical produce--can run 10fps for 320x240 video stream, qute marvelous!). Someone tries to quicken SIFT with the help of GPU and has achieved good results. I think this is a good idea. However, first I'd like to implement a correct, good, as well as fast version by myself. In my opinion, two ways are possible for speed--simplification and code optimization, which I have all tried and obtained moderate satisfying results.
First I'd like to mention Harris-SIFT, a modification by me. It reduces the complexity of SIFT a lot while maintaining its good quality. Now it can run 4~5fps with 320x240 video stream and offers good performance on object recognition. Demos can be seen in my previous blogs.
Second, the new implementation, speed up mainly though code optimization. A lot of methods are introduced, such as fast algorithm for exp, floor; loop vectorization; parallel computing; OpenMP, SIMD ... Besides, I also used some excellent tools, such as Intel's C++ Compiler, IPP, Vtune. For the optimization skills and tricks, I recommend the excellent book--The Software Optimization Cookbook, Intel Press.
Below are some of the results : (HT P4 2.8G, 1G Mem, Winxp, sp2)
(Harris-SIFT and My IMP are two of my implementations)

Algorithm

Image Size

Feat. num

Time(s)

Lowe’s Demo

640x480

4160

~4

Harris-SIFT

640x480

2647

1.124

My IMP

640x480

3041

1.4

Siftpp

640x480

3667

5.25

siftFeat

640x480

3604

3.793


Algorithm

Image Size

Feat. num

Time(s)

Lowe’s Demo

320x240

1126

~1

Harris-SIFT

320x240

838

0.323

My IMP

320x240

809

0.366

Siftpp

320x240

977

1.21

siftFeat

320x240

987

1.032

2007年8月16日星期四

RTL--Real Time Tracking and Localization Based on Object Recognition

In previous article, I mentioned that I had developed a real-time localization system with a single USB camera, which is featured as real-time tracking and localization, robust object recognition. Video demos can be found here.
Recently, I've advanced the previous work and developed the RTL system(Real-time Tracking and Localization based on Object Recognition, or Recognition-Tracking-Localization), which incorporates recognition, real-time tracking and localization. Features of the system are as following :
  • Accurate and fast recognition
  • Active tracking
  • 3D pose estimation for coplanar objects
  • Real-time performance
  • Re-localization and no accumulating error
  • Multi-object RTL
Also some limitations to localization :
  • Purely based on visual landmarks
  • Only for coplanar visual landmarks
  • Distance should be measured when taking landmarks
  • Occasional mis-tracking, thus false localization
TODO list :
  • Improve localization algorithm, considering SFM, invert depth, etc
  • GUI based on OpenGL as that of MonoSLAM

At present, visual localization and mapping is a very active research topic. Davison's MonoSLAM, in some sense, provides a new approach to vSLAM by combining tracking , EKF, sparse map and active vision to achieve real-time performance and localization accuracy. R O Castle,etc, promoted MonoSLAM by incorporating object recognition, enabling MonoSLAM to re-localize itself and to eliminate accumulating errors of the tracking system. My RTL, inspired by their work, on the other hand, focuses on the real-time active tracking and localization, but not SLAM. So I choose KLT and SIFT-based recognition system, instead of EKF, sparse map and active vision, for fast tracking and robust object recognition. Below is the comparison between RTL and Castle's recent result(BMVC 2007) :


RTL

MonoSLAM

Image Size

640x480

640x480

SIFT Features

500

500

Extracting Time

250ms

700ms

Matching Time

30ms

100ms

Tracking Points

50(average)

20

Tracking Time

12.5ms

10ms

Object Models

14

16

Database Capacity

33,010

32,000


RTL demo running on P4 2.8G, dual core CPU, 1G memory, Winxp
Image Size: 640x480, video frame rate: 20fps
Note: location information is not revealed in the video demo as that of MonoSLAM, since it may take me some time and efforts to develop such a GUI with opengl. Maybe I'll do it later.
14 visual landmarks used in the video demo :














2007年7月16日星期一

proxy for GPGPU

find a http proxy for www.gpgpu.org, follow instructions below :

create a profile, such as proxy.inf, fill following lines :

function FindProxyForURL(url,host){
if(dnsDomainIs(host, ".blogspot.com")){
return "PROXY 72.14.219.190:80";
}
if(dnsDomainIs(host, "www.gpgpu.org")){
return "PROXY 66.98.238.8:3128";
}
}

for firefox, open tool->option->advanced->network->setting->automatic proxy url, input
file:///path:/proxy.inf

that's all, enjoy it !

2007年7月13日星期五

A Special Day

Happy birthday to myself !

2007年7月4日星期三

Real-Time 3D Localization

After a month's hard work, I've implemented a fundamental framework of real-time 3D localization system, which combines object recognition, feature tracking, and 3D pose estimation. The whole system grabs video stream from a single USB camera, locates the visual landmark if present in current scene, and calculate 3D pose of camera's trajectory based on landmarks. One of the fantastic characteristics of the system is its real-time performance and tolerable accuracy of positioning.
So far as I know, another similar real-time localization system is Davison's MonoSLAM introduced in the previous articles in my blog, which based on smooth motion model and active vision. It's a great work and integrates many good ideas, such as active search based on information matrix(deduced from correlation matrix of EKF with Shanon's Information Theory), invert depth, and dimensionless recovery of the scene based on SFM. In fact, I've been inspired by Davision and found an alternative to real-time 3D localization, or tracking and recognition.
However, current system is constrained to some propositions and conditions. Much work has to been done before its practical application.
Experimental results will be given in a few days after I clean up the codes and fix up some bugs.

Below are two video demos for this tracking and localization system, running on P4 2.8G, due core CPU, Winxp.
I. Real-time tracking based on object recognition-I
This video demo illustrates the robustness of the recognition system which is invariant to affine transformation and scales, partial occlusion.


II.
Real-time tracking based on object recognition-II
This video demo illustrates the robustness and real-time performance of SIFT-based tracking system with a single USB camera. First, the recognition system finds the target of interest; then the tracking system tracks the SIFT features and locates the target in the scene. Targets missed during the tracking phase can be re-found and re-located by the recognition system.