Recently, I've advanced the previous work and developed the RTL system(Real-time Tracking and Localization based on Object Recognition, or Recognition-Tracking-Localization), which incorporates recognition, real-time tracking and localization. Features of the system are as following :
- Accurate and fast recognition
- Active tracking
- 3D pose estimation for coplanar objects
- Real-time performance
- Re-localization and no accumulating error
- Multi-object RTL
- Purely based on visual landmarks
- Only for coplanar visual landmarks
- Distance should be measured when taking landmarks
- Occasional mis-tracking, thus false localization
- Improve localization algorithm, considering SFM, invert depth, etc
- GUI based on OpenGL as that of MonoSLAM
At present, visual localization and mapping is a very active research topic. Davison's MonoSLAM, in some sense, provides a new approach to vSLAM by combining tracking , EKF, sparse map and active vision to achieve real-time performance and localization accuracy. R O Castle,etc, promoted MonoSLAM by incorporating object recognition, enabling MonoSLAM to re-localize itself and to eliminate accumulating errors of the tracking system. My RTL, inspired by their work, on the other hand, focuses on the real-time active tracking and localization, but not SLAM. So I choose KLT and SIFT-based recognition system, instead of EKF, sparse map and active vision, for fast tracking and robust object recognition. Below is the comparison between RTL and Castle's recent result(BMVC 2007) :
| RTL | MonoSLAM |
Image Size | 640x480 | 640x480 |
SIFT Features | 500 | 500 |
Extracting Time | 250ms | 700ms |
Matching Time | 30ms | 100ms |
Tracking Points | 50(average) | 20 |
Tracking Time | 12.5ms | 10ms |
Object Models | 14 | 16 |
Database Capacity | 33,010 | 32,000 |
RTL demo running on P4 2.8G, dual core CPU, 1G memory, Winxp
Image Size: 640x480, video frame rate: 20fps
Note: location information is not revealed in the video demo as that of MonoSLAM, since it may take me some time and efforts to develop such a GUI with opengl. Maybe I'll do it later.
14 visual landmarks used in the video demo :