2007年8月16日星期四

RTL--Real Time Tracking and Localization Based on Object Recognition

In previous article, I mentioned that I had developed a real-time localization system with a single USB camera, which is featured as real-time tracking and localization, robust object recognition. Video demos can be found here.
Recently, I've advanced the previous work and developed the RTL system(Real-time Tracking and Localization based on Object Recognition, or Recognition-Tracking-Localization), which incorporates recognition, real-time tracking and localization. Features of the system are as following :
  • Accurate and fast recognition
  • Active tracking
  • 3D pose estimation for coplanar objects
  • Real-time performance
  • Re-localization and no accumulating error
  • Multi-object RTL
Also some limitations to localization :
  • Purely based on visual landmarks
  • Only for coplanar visual landmarks
  • Distance should be measured when taking landmarks
  • Occasional mis-tracking, thus false localization
TODO list :
  • Improve localization algorithm, considering SFM, invert depth, etc
  • GUI based on OpenGL as that of MonoSLAM

At present, visual localization and mapping is a very active research topic. Davison's MonoSLAM, in some sense, provides a new approach to vSLAM by combining tracking , EKF, sparse map and active vision to achieve real-time performance and localization accuracy. R O Castle,etc, promoted MonoSLAM by incorporating object recognition, enabling MonoSLAM to re-localize itself and to eliminate accumulating errors of the tracking system. My RTL, inspired by their work, on the other hand, focuses on the real-time active tracking and localization, but not SLAM. So I choose KLT and SIFT-based recognition system, instead of EKF, sparse map and active vision, for fast tracking and robust object recognition. Below is the comparison between RTL and Castle's recent result(BMVC 2007) :


RTL

MonoSLAM

Image Size

640x480

640x480

SIFT Features

500

500

Extracting Time

250ms

700ms

Matching Time

30ms

100ms

Tracking Points

50(average)

20

Tracking Time

12.5ms

10ms

Object Models

14

16

Database Capacity

33,010

32,000


RTL demo running on P4 2.8G, dual core CPU, 1G memory, Winxp
Image Size: 640x480, video frame rate: 20fps
Note: location information is not revealed in the video demo as that of MonoSLAM, since it may take me some time and efforts to develop such a GUI with opengl. Maybe I'll do it later.
14 visual landmarks used in the video demo :