Previously, I've developped some versions of SIFT algorithm, as well as Lowe's object recognition system mentioned in the paper(IJCV04). One of the version, named Harris-SIFT, works quite well and fast in my previous vision system--RTL(Real-time Tracking and Localization). In fact, it's a simplified SIFT and retains virtues of the original algorithm.
Recently, I've foud someone mentioned that Andrea Vedaldi(UCLA) had developped a good version of SIFT, named siftpp. I looked deep into it and found its performance quite like that of Lowe's demo. More important, that code strictly follows steps in Lowe's paper, thus is easy to read or study Lowe's algorithm. However, it runs slow and has some bugs. I'd like to opimize the code so that it can be applied in other occasions.
Other codes I refered, including Rob Hess's sift and iLab Neuromorphic Vision C++ Toolkit. The former is implemented in c and uses OpenCV. In fact, his implementation, in some sense, is more correct than siftpp. However, the code is not very efficient due to some redundant processes, and the not-well-organized feature structure. Further more, personally I favor C++ than C, for the reason that C++ provides many advanced features, powerful, useful and convenient for us to write robust codes, esepcially true for algorithm or libraries. The later is an excellent vision library designed for modeling human's neuron vision model. It's still under developping and CVS verion is available. I prefer the structure of this library and it is implemented with moden C++ language, such as meta-template, and some kind of design pattens. A SIFT-based object recognition framework is nested in the library, which can be a good reference.
So now, above are the main sources of my references. That's quite enough. Recently, I spent 3 days to rewrite the SIFT algorithm, and it is finished now. Main properties as follows :
1. Correct and Accurate
Algorithm routines of the code strictly conforms to Lowe's paper, and the result is quite similar to Lowe's demo.
In fact, both the feature location, scale, orientation, and the descritptor, are approximately same as that from Lowe's demo. For example, below are two sets of sample features extracted with my code and Lowe's demo respectively. There are only small differences between them :
##### my code : (x and y's position are different as that of Lowe's)
183.749 279.852 16.3574 4.86277
0 27 53 26 39 37 5 0 16 19 19 18 44 32 23 35 115 72 8 1
2 3 14 52 30 13 3 22 26 17 7 2 27 112 115 14 4 1 1 1
115 47 22 12 6 2 27 99 70 28 37 7 1 2 53 102 24 20 22 44
92 26 3 20 57 30 7 1 2 18 102 29 115 79 47 28 1 3 12 33
15 81 115 78 4 5 10 9 7 101 115 39 36 11 1 2 8 0 1 3
4 86 115 18 14 10 9 10 7 43 86 37 3 13 15 36 56 37 42 3
14 50 55 31 12 2 2 10
##### Lowe's demo :
279.57 183.61 16.34 -1.453
0 20 44 24 43 43 6 0 20 23 20 16 40 32 22 36 116 77 8 1
1 2 11 46 23 12 3 25 29 16 7 1 22 105 116 16 5 0 0 1
116 45 21 11 7 2 27 100 73 28 38 8 0 1 49 100 22 19 24 42
96 31 3 18 56 36 10 1 1 12 91 27 116 88 45 28 0 1 11 30
15 75 116 86 4 4 10 8 6 99 116 40 31 11 0 1 9 0 0 1
2 74 116 20 17 11 9 9 5 41 85 40 2 11 13 33 55 36 47 4
14 46 55 30 15 2 1 10
sample feature images : (left- my imp. right - Lowe's demo)
2. Fast
I've done a lot efforts to optimize the code. Needless to say, SIFT is very time-consuming, due to a serial times of Gauss blur when constructing scale space, and multiple times of loops when genrating descriptors. Convolution and complex arithmatic operations dealing with float-point datas, such as exp, floor, sin, cos, are known to be time-consuming. Thus it is surely impossible for SIFT to attain real-time performance on current CPU(The fast implementation I know--a comerical produce--can run 10fps for 320x240 video stream, qute marvelous!). Someone tries to quicken SIFT with the help of GPU and has achieved good results. I think this is a good idea. However, first I'd like to implement a correct, good, as well as fast version by myself. In my opinion, two ways are possible for speed--simplification and code optimization, which I have all tried and obtained moderate satisfying results.
First I'd like to mention Harris-SIFT, a modification by me. It reduces the complexity of SIFT a lot while maintaining its good quality. Now it can run 4~5fps with 320x240 video stream and offers good performance on object recognition. Demos can be seen in my previous blogs.
Second, the new implementation, speed up mainly though code optimization. A lot of methods are introduced, such as fast algorithm for exp, floor; loop vectorization; parallel computing; OpenMP, SIMD ... Besides, I also used some excellent tools, such as Intel's C++ Compiler, IPP, Vtune. For the optimization skills and tricks, I recommend the excellent book--The Software Optimization Cookbook, Intel Press.
Below are some of the results : (HT P4 2.8G, 1G Mem, Winxp, sp2)
(Harris-SIFT and My IMP are two of my implementations)
Algorithm | Image Size | Feat. num | Time(s) |
Lowe’s Demo | 640x480 | 4160 | ~4 |
Harris-SIFT | 640x480 | 2647 | 1.124 |
My IMP | 640x480 | 3041 | 1.4 |
Siftpp | 640x480 | 3667 | 5.25 |
siftFeat | 640x480 | 3604 | 3.793 |
Algorithm | Image Size | Feat. num | Time(s) |
Lowe’s Demo | 320x240 | 1126 | ~1 |
Harris-SIFT | 320x240 | 838 | 0.323 |
My IMP | 320x240 | 809 | 0.366 |
Siftpp | 320x240 | 977 | 1.21 |
siftFeat | 320x240 | 987 | 1.032 |