闲庭信步: 2007

2007年11月1日星期四

Champion of RoboCup2007@Home

Good News: Autonomous Robot Group of Shanghai Jiao Tong University, won the Champion of RoboCup2007@Home--China Open!

Equipped with many sensors, including :

a front usb camera
an omni-directional camera
a Sick laser
odometers

our robot@home is quite smart and powerful, able to perform various tasks in natural home environment, such as:

recognizing many different objects;
tracking people according to special color patterns;
self-positioning with vision technology;
measuring distance and avoiding obstacles.

I developed the whole vision system, its functions including:

object recognition
color tracking with camshift
3D positioning in omnivision system

Experiment shows that the whole system works quite well and can help the robot localize itself quickly and accurately.

2007年10月27日星期六

终于考完了~~~
算起来正好是一个月内考了两次，TOEFL & GRE，人生算是又完整了一些。想想还是有点成就感的，觉得自己还是有些毅力和耐心的，呵呵。
本无心考这些，不过当初被师姐"训"一了顿：不能平平淡淡，想做了，自然就能做到。
现在想想确实如此，有了追求，才有动力。人生不就是一场自我折腾嘛。

A little tired, however, a sense of self-satisfaction ^_^

2007年9月3日星期一

SIFT Implementation

SIFT is perhaps the most popular invariant local feature detector at present, and has been applied succefully in many occasions, such as object recognition, image match and registration, SFM & 3D reconstruction, vSLAM, image panorama, etc.
Previously, I've developped some versions of SIFT algorithm, as well as Lowe's object recognition system mentioned in the paper(IJCV04). One of the version, named Harris-SIFT, works quite well and fast in my previous vision system--RTL(Real-time Tracking and Localization). In fact, it's a simplified SIFT and retains virtues of the original algorithm.
Recently, I've foud someone mentioned that Andrea Vedaldi(UCLA) had developped a good version of SIFT, named siftpp. I looked deep into it and found its performance quite like that of Lowe's demo. More important, that code strictly follows steps in Lowe's paper, thus is easy to read or study Lowe's algorithm. However, it runs slow and has some bugs. I'd like to opimize the code so that it can be applied in other occasions.
Other codes I refered, including Rob Hess's sift and iLab Neuromorphic Vision C++ Toolkit. The former is implemented in c and uses OpenCV. In fact, his implementation, in some sense, is more correct than siftpp. However, the code is not very efficient due to some redundant processes, and the not-well-organized feature structure. Further more, personally I favor C++ than C, for the reason that C++ provides many advanced features, powerful, useful and convenient for us to write robust codes, esepcially true for algorithm or libraries. The later is an excellent vision library designed for modeling human's neuron vision model. It's still under developping and CVS verion is available. I prefer the structure of this library and it is implemented with moden C++ language, such as meta-template, and some kind of design pattens. A SIFT-based object recognition framework is nested in the library, which can be a good reference.
So now, above are the main sources of my references. That's quite enough. Recently, I spent 3 days to rewrite the SIFT algorithm, and it is finished now. Main properties as follows :

1. Correct and Accurate
Algorithm routines of the code strictly conforms to Lowe's paper, and the result is quite similar to Lowe's demo.
In fact, both the feature location, scale, orientation, and the descritptor, are approximately same as that from Lowe's demo. For example, below are two sets of sample features extracted with my code and Lowe's demo respectively. There are only small differences between them :

##### my code : (x and y's position are different as that of Lowe's)
183.749 279.852 16.3574 4.86277
0 27 53 26 39 37 5 0 16 19 19 18 44 32 23 35 115 72 8 1
2 3 14 52 30 13 3 22 26 17 7 2 27 112 115 14 4 1 1 1
115 47 22 12 6 2 27 99 70 28 37 7 1 2 53 102 24 20 22 44
92 26 3 20 57 30 7 1 2 18 102 29 115 79 47 28 1 3 12 33
15 81 115 78 4 5 10 9 7 101 115 39 36 11 1 2 8 0 1 3
4 86 115 18 14 10 9 10 7 43 86 37 3 13 15 36 56 37 42 3
14 50 55 31 12 2 2 10

##### Lowe's demo :
279.57 183.61 16.34 -1.453
0 20 44 24 43 43 6 0 20 23 20 16 40 32 22 36 116 77 8 1
1 2 11 46 23 12 3 25 29 16 7 1 22 105 116 16 5 0 0 1
116 45 21 11 7 2 27 100 73 28 38 8 0 1 49 100 22 19 24 42
96 31 3 18 56 36 10 1 1 12 91 27 116 88 45 28 0 1 11 30
15 75 116 86 4 4 10 8 6 99 116 40 31 11 0 1 9 0 0 1
2 74 116 20 17 11 9 9 5 41 85 40 2 11 13 33 55 36 47 4
14 46 55 30 15 2 1 10

sample feature images : (left- my imp. right - Lowe's demo)

2. Fast
I've done a lot efforts to optimize the code. Needless to say, SIFT is very time-consuming, due to a serial times of Gauss blur when constructing scale space, and multiple times of loops when genrating descriptors. Convolution and complex arithmatic operations dealing with float-point datas, such as exp, floor, sin, cos, are known to be time-consuming. Thus it is surely impossible for SIFT to attain real-time performance on current CPU(The fast implementation I know--a comerical produce--can run 10fps for 320x240 video stream, qute marvelous!). Someone tries to quicken SIFT with the help of GPU and has achieved good results. I think this is a good idea. However, first I'd like to implement a correct, good, as well as fast version by myself. In my opinion, two ways are possible for speed--simplification and code optimization, which I have all tried and obtained moderate satisfying results.
First I'd like to mention Harris-SIFT, a modification by me. It reduces the complexity of SIFT a lot while maintaining its good quality. Now it can run 4~5fps with 320x240 video stream and offers good performance on object recognition. Demos can be seen in my previous blogs.
Second, the new implementation, speed up mainly though code optimization. A lot of methods are introduced, such as fast algorithm for exp, floor; loop vectorization; parallel computing; OpenMP, SIMD ... Besides, I also used some excellent tools, such as Intel's C++ Compiler, IPP, Vtune. For the optimization skills and tricks, I recommend the excellent book--The Software Optimization Cookbook, Intel Press.
Below are some of the results : (HT P4 2.8G, 1G Mem, Winxp, sp2)
(Harris-SIFT and My IMP are two of my implementations)

Algorithm	Image Size	Feat. num	Time(s)
Lowe’s Demo	640x480	4160	~4
Harris-SIFT	640x480	2647	1.124
My IMP	640x480	3041	1.4
Siftpp	640x480	3667	5.25
siftFeat	640x480	3604	3.793

Algorithm	Image Size	Feat. num	Time(s)
Lowe’s Demo	320x240	1126	~1
Harris-SIFT	320x240	838	0.323
My IMP	320x240	809	0.366
Siftpp	320x240	977	1.21
siftFeat	320x240	987	1.032

2007年8月16日星期四

RTL--Real Time Tracking and Localization Based on Object Recognition

In previous article, I mentioned that I had developed a real-time localization system with a single USB camera, which is featured as real-time tracking and localization, robust object recognition. Video demos can be found here.
Recently, I've advanced the previous work and developed the RTL system(Real-time Tracking and Localization based on Object Recognition, or Recognition-Tracking-Localization), which incorporates recognition, real-time tracking and localization. Features of the system are as following :

Accurate and fast recognition
Active tracking
3D pose estimation for coplanar objects
Real-time performance
Re-localization and no accumulating error
Multi-object RTL

Also some limitations to localization :

Purely based on visual landmarks
Only for coplanar visual landmarks
Distance should be measured when taking landmarks
Occasional mis-tracking, thus false localization

TODO list :

Improve localization algorithm, considering SFM, invert depth, etc
GUI based on OpenGL as that of MonoSLAM

At present, visual localization and mapping is a very active research topic. Davison's MonoSLAM, in some sense, provides a new approach to vSLAM by combining tracking , EKF, sparse map and active vision to achieve real-time performance and localization accuracy. R O Castle,etc, promoted MonoSLAM by incorporating object recognition, enabling MonoSLAM to re-localize itself and to eliminate accumulating errors of the tracking system. My RTL, inspired by their work, on the other hand, focuses on the real-time active tracking and localization, but not SLAM. So I choose KLT and SIFT-based recognition system, instead of EKF, sparse map and active vision, for fast tracking and robust object recognition. Below is the comparison between RTL and Castle's recent result(BMVC 2007) :

	RTL	MonoSLAM
Image Size	640x480	640x480
SIFT Features	500	500
Extracting Time	250ms	700ms
Matching Time	30ms	100ms
Tracking Points	50(average)	20
Tracking Time	12.5ms	10ms
Object Models	14	16
Database Capacity	33,010	32,000

RTL demo running on P4 2.8G, dual core CPU, 1G memory, Winxp
Image Size: 640x480, video frame rate: 20fps
Note: location information is not revealed in the video demo as that of MonoSLAM, since it may take me some time and efforts to develop such a GUI with opengl. Maybe I'll do it later.
1

visual landmarks used

in the video demo :

2007年7月16日星期一

proxy for GPGPU

find a http proxy for www.gpgpu.org, follow instructions below :

create a profile, such as proxy.inf, fill following lines :

function FindProxyForURL(url,host){
if(dnsDomainIs(host, ".blogspot.com")){
return "PROXY 72.14.219.190:80";
}
if(dnsDomainIs(host, "www.gpgpu.org")){
return "PROXY 66.98.238.8:3128";
}
}

for firefox, open tool->option->advanced->network->setting->automatic proxy url, input
file:///path:/proxy.inf

that's all, enjoy it !

2007年7月13日星期五

A Special Day

Happy birthday to myself !

2007年7月4日星期三

Real-Time 3D Localization

After a month's hard work, I've implemented a fundamental framework of real-time 3D localization system, which combines object recognition, feature tracking, and 3D pose estimation. The whole system grabs video stream from a single USB camera, locates the visual landmark if present in current scene, and calculate 3D pose of camera's trajectory based on landmarks. One of the fantastic characteristics of the system is its real-time performance and tolerable accuracy of positioning.
So far as I know, another similar real-time localization system is Davison's MonoSLAM introduced in the previous articles in my blog, which based on smooth motion model and active vision. It's a great work and integrates many good ideas, such as active search based on information matrix(deduced from correlation matrix of EKF with Shanon's Information Theory), invert depth, and dimensionless recovery of the scene based on SFM. In fact, I've been inspired by Davision and found an alternative to real-time 3D localization, or tracking and recognition.
However, current system is constrained to some propositions and conditions. Much work has to been done before its practical application.
Experimental results will be given in a few days after I clean up the codes and fix up some bugs.

Below are two video demos for this tracking and localization system, running on P4 2.8G, due core CPU, Winxp.
I. Real-time tracking based on object recognition-I
This video demo illustrates the robustness of the recognition system which is invariant to affine transformation and scales, partial occlusion.

II. Real-time tracking based on object recognition-II
This video demo illustrates the robustness and real-time performance of SIFT-based tracking system with a single USB camera. First, the recognition system finds the target of interest; then the tracking system tracks the SIFT features and locates the target in the scene. Targets missed during the tracking phase can be re-found and re-located by the recognition system.

2007年6月24日星期日

dodo:人脸识别方法个人见解(转载)

最近对人脸识别产生兴趣。其实一直觉得此方向比较有意思，不过没有时间仔细研究模式分类和机器学习方面的内容，
所以也只是了解个大概。
此处转载网友写的有脸识别方法的个人综述，方便参考。
最近Mr Mao推荐一篇比较好的人脸识别综述：Face Recognition-A Literature Survey,ACM 2005。正准备研读。

dodo:人脸识别方法个人见解

看到j.liu关于人脸识别的帖子，萌发写这个帖子的念头。没有别的意思，就是想抛砖引玉，把问题说的全面一点，希望j.liu和回其帖子的兄弟姐妹们不要介意。如有兴趣，欢迎继续讨论。在以下讨论中，

TPAMI = IEEE Transactions on PAMI 这个杂志
PAMI 是指 pattern analysis and machine intelligence这两个领域

1）PCA和LDA及其相关方法

Eigenfaces和Fisherfaces无疑是人脸识别中里程碑式的工作。就使用的方法而言，PCA和LDA都不是新方法，但是他们都是被第一次十分明确的用在人脸识别中的方法。之所以说"十分明确"，是因为就发表的时间来看，这两个论文都不是首次把这两个方法用在PAMI相关的分类识别中。这给我们一个小小的启示：一个新的方法专注于解决一个具体的问题可能会带来更大的影响，虽然这个方法具有一般性。

在现在人脸识别的方法中，这两个方法也是follow的人最多的。究其原因，除了其有效性之外，简单是其最大的特点。纵观PAMI历史风云，能经受住时间考验而流传下来的方法，除了有效之外一般都有两个特点其一：1）简单（PCA, LDA, K-Means, Normalized Cutsetc.）；2）复杂，但是解决一个具有一般性而且很难被解决的问题 (在AAM、3d morphable model有深刻影响的Lucas-Kanade算法)。所以如果你的方法一般人都有能力做得到，那就尽量把你的方法做的简单明确。这就是外国人推崇备至的所谓的Ockham's Razor原理（就个人情感而言，我十分讨厌这个名词）。在这里我要强调一点是，这里说的简单并不是说原理简单，Normalized Cuts就方法本身来说简单，但是原理并不简单；微分几何中的Gauss-Bonnet定理形式非常简单，内涵何其丰富。

在此我想多提两句。由于国内有诸多发论文的方法论，其中一个流传下来的一句话就是：系统做的越简单越好，理论做的越复杂越好。不可否认，这句话有它有道理的地方，但是如果用这句话教育后人，误人子弟矣。

后来出现了许多新的与之类似的方法，就TPAMI上发表的来看，比较有代表性就是 HE Xiaofei 的LPP和 YAN Shuicheng 的MFA。关于这两个方法的评论大家可参看j.liu贴中knato的回帖。在这里我想谈谈我的个人见解。首先这两个方法的出现有它们的意义。LPP是流形学习中Laplacian Eigenmaps线性化，这样无疑会带动其它流形学习方法在识别问题中的尝试，一个为解决问题找到一个思路，二个为进入寒冬的流形学习找到新的用武之地，虽然这两个都不是上档次的思路，但是潜在影响还是有的。后来 YANG Jian 的UDP就是在LPP号召下在TPAMI上的产物。LPP是非监督方法，所以它的识别性能比LDA好的概率极其微弱。MFA是基于局部数据关系的监督鉴别方法。它有两个最近临近点数量的参数要调。这两个参数是这个方法双刃剑。参数调的好，MFA会比LDA效果好，调的不好则不行。这样MFA用起来比LDA复杂，这样如果MFA的性能比LDA好的有限，而用起来复杂得多的话，它终将被历史所抛弃。
另外就像j.Liu在他的帖子中说出的一样，这些方法有一定的投机性，比如这两篇文章的试验，他们都把Fisherfaces(PCA+LDA)设为c-1，虽然这是按照原始论文的取法，但是是做过这方面工作的人都知道PCA的主元数目如果取得太大，PCA+LDA的性能会显著降低，在WANGXiaogang的IJCV上的Random sampling LDA中清楚地给出了图形说明。所以他们论文中给出的实验比较不具可信性。

LPP, UDP, MFA都是我们中国人（至少这些方法发表时还都是）为第一作者发表的方法，个人认为其存在有一定的价值，但它们将是PAMI研究发展中的过眼烟云，无法与PCA,LDA相媲美。

2）LDA奇异性问题

众所周知，LDA是基于求解广义特征值问题（Sb*u=Alpha*Sw*u），所以在实际应用时遇到奇异性的问题，就是Sw矩阵不可逆。在人脸识别中解决这一问题的论文“浩如烟海”。这也说明了LDA的影响力之大。在这一类方法中，也有风格之分。

o. PCA 降维
在Fisherfaces中采用的就是先用PCA降维，再用LDA，这也是现在处理这一问题的一般方法。这里有个比较讽刺的事情。Belhumeur在他的论文里说：PCA actually smears the classes together。那末既然smears the classes together，既然PCA破坏类的结构，那为什莫还要用PCA降维？而且事实证明，即使在Sw可逆的情况下，用PCA features也会增强LDA在人脸识别中的性能。这里只能说明，PCA的作用或是PCA features并不是Belhumeur和其以后follow这样说法的人叙述的那样。PCA虽然简单，但是人们应该对它有个正确的认识，这个以后如果有机会再谈。

a. RDA
至今影响最大最实用的还是基于regularization思想的RDA。其实这个问题不仅仅在人脸识别中才被注意到。很早在统计中就被解决过，RDA发表于1989的Journal of the AmericalStatistical Association杂志上，可见其久远。在Sw上加一个扰动项也是解决这一问题的最简单方法。

b.子空间投影
论文最多的也就在这一块。应用knato类似的排列组合方法，令image(Sw)和null(Sw)分别表示Sw的列（像）空间和零空间，则我们可很容易的就列出如下组合方法（强调：这里却不是提供给大家发论文的方法论，而是以较形象的方式叙述！）把样本投影到aa. image(Sb), bb. null(Sw), cc. image(Sw), dd. image(Sw)+null(Sw), ee. image(Sb)+null(Sw) 可并列可串行, ff.image(St)+null(Sw)
以上每一种组合就代表不止一篇论文，在此就不详细列举了。另外，你还可以把random sampling技术加进来，这样就可以不止翻倍。还有，你还可以把同样的技术用到KPCA KLDA(kFA)上，这样又可翻倍。更进一步，你还可以把ICA,LBP, Gabor features等诸如此类的东西和以上子空间混合，...，子子孙孙无穷尽焉。把这个东西做到极致的是国内的 YANG Jian。另外香港中文大学的 TANG Xiaoou 和他以前的学生 WANG Xiaogang 也做这相关的工作，但是他们做一个idea就是一个，没有灌水之嫌。YANG Jian的工作可以用他在TPAMI上的 KPCA plus LDA 这篇文章来概括，虽然他灌水无数，但就子空间方法而言，他这篇文章还有他发表在国内自动化学报上的那篇长文还是有东西的。如果你想做这一块的工作，值得看一看，是个较为全面的总结。TANG Xiaoou在子空间方面的代表工作（开山之作）就是dual spaces LDA, random sampling (and bagging) LDA, unified subspaces。（在此之后他还有学生一直在做，就不详细列举了。）

我建议想做这一块工作的同学们，要把TANG and YANG的工作烂熟于心，取长补短，相互学习，取其精华，这样可以较为快速而全面地掌握。

c. QR分解
矩阵和数值功底比较好的人，能做得更像模像样。Cheong Hee Park 和 YE Jieping 无疑是这方面的高手。去看看他们在TPAMI,JMLR, 和SIAM的J. Matrix Anal. & Appl上发表的论文可知一二。

d. 相关性
如果Sw可逆，则Sb*u=Alpha*Sw*u可以转化为 inv(Sw)*Sb*u=Alpha*u。那末就可以考察Sw的子空间和Sb子空间的相关性。这方面的代表工作就是Aleix M. Martinez在TPAMI上长文的那个工作。

e. 变商为差变u'*Sb*u/(u'*Sw*u)为u'*(Sb-Sw)*u。

3）基于图像局部结构的方法

这一类获得广泛认可的方法有Gabor和LBP，另外还有可能有用的SIFT和differential features。
Gabor应用比较早有影响力的代表作就是EBGM。Gabor也是提取用来识别的visual feature的最常用手段。
有无数人因为LBP的极其简单而怀疑它的性能，但是有趣的是最近Ahonen在TPAMI上的短文，就是把LBP应用在人脸识别上，没有任何新的改进，这也说明Reviewer们和editor对这类方法的肯定和鼓励。在非监督feature extraction中，LBP有明显的优势，但是绝对没有达到作者在论文显示的那个水平。在他的论文中，LBP特别weighted LBP效果非常好，这和他们应用的FERET人脸库的人脸crop形式有关。他们应用CSU的椭圆模板来crop人脸，如果应用正方形的模板weighted LBP提高很有限。特别在FRGC Version 2上测试，LBP绝对没有一般监督性的识别方法好。另外这也给我们一个小小启示，就是加个weight其识别性能就能大大提高，这说明什莫问题呢？

另外我不敢苟同j.liu在他文章说的LBP对image blocks大小不敏感是个美丽谎言的说法。首先，有一定的敏感性，这个是要承认的。但是LBP有一个性能稳定的image blocks，并不是人们认为的histogram要符合一定的统计性等等。这个block size的选取比最优的PCA主元数目的选取要容易得多。当然这些都是小问题。

国内有人做Gabor和LBP的结合。当然是值得探索的，但是我个人认为不应该在这两种方法结合上花费太多精力。完全可以用类似形式考虑别的思路。

4) Sparse representation

NMF和NTF都属于sparse representation的方法，都曾被应用在人脸识别中，但效果都非常有限。特别是NTF，属于数学理论上非常优美，但是实际效果很勉强的典型。

另外，Sparse representation (coding) 是一个很有趣也是很有前途的方法，Sparse representation 有很多方式，关键要看你怎莫用、解决怎样的问题。过段时间我们还有机会再谈。

5）Tensor方法

Tensor在人脸识别中至少到现在为止，还非常得不成功。最典型的就是M. Alex O.Vasilescu在ECCV'02上的tensorfaces。他们对于问题的分析和tensor的对应天衣无缝，非常有道理，数学实现上也同样简单，但是自从那个方法发表出来以后基本无人follow。究其原因，个人认为就是把本来简单的问题复杂化，最重要的就是复杂化以后并没有带来该有的益处。

Alex对tensor的应用是flattening high-way tensor。这是一种常见的处理tensor的方法，这样做的好处就是使tensor好处理易于计算。two-way tensorfaces就是我们理解的Eigenfaces。但是同样是tensor，这种tensor和Amnon Shashua的NTF有着本质的区别。NTF是纯正的tensor思想。但是它实现起来过于复杂，又加上原理比Alex的tensor更复杂，所以无人问津。但是不可否认，它们都是数学上十分优美的方法。如果你想学习tensor而又不想枯燥，我推荐你去看这三篇论文（Shashua两篇）。

6）参数模型
参数模型的应用也多种多样，比如HMM, GMM等。这两个都是一般性的建模方法，所以应用也很庞杂，而且在人脸识别中的应用大多是从speech recognition中的方法转化而来，在此就不多谈。有兴趣的同学们可以参看H. Othman在PAMI上的论文和Conrad Sanderson在PR上的论文。

但是在此其中，最简单的是Baback Moghaddam在TPAMI上那个Probabilistic Subspaces的文章，这个文章也是WANG Xiaogang的unified spaces的参考原本。

7) 3D 模型

代表作是Volker Blanz在TPAMI上的那个文章。不过个人十分不看好。

8）Personal Perspectives

a. 基于子空间的方法很难在实际应用中有所用处

b. 基于找图像局部结构的方法更有希望。像EBGM, LBP, SIFT之类可以给我们很多有益的启示。这点和j.liu的观点一致。

c. 把人脸识别中的方法推广开来，应用到一般的分类和统计问题中，这也是人脸识别衍生出来的一大作用。

d. 由于我们国内的特殊研究环境，大家一般都喜欢做简易快的工作，所以人脸识别这一领域出现有华人名字的论文为数可观。其实在某些压力之下这也无可厚非，但是还是希望我们国人在有条件的情况下，不要以发论文为主，多关注于解决问题本身、尽量向推动理论发展的方向努力。我们绝对有这个能力。君不见，NIPS‘06两篇Best student paper被在国外留学的中国人获得，CVPR'07更是又传来喜讯：Best student paper由清华学生获得，这些都是迹象。我们正处于一个意气风发、大有可为的时代。就本人学术水平和资历来说，绝没有资格来说这些话，这只不过是个人的一点心愿和号召而已，同时更是勉励自己。

以上均是dodo个人拙见，囿于本人才疏学浅，难免出现挂一漏万和观点偏颇的情况，还请大家及时批评指正，以免曲彼误人。谢谢

dodo: 人脸识别 II

这篇文章是接着《dodo：人脸识别方法个人见解》，之所以没有用《dodo：人脸识别方法个人见解 II》，仅仅是因为prfans显示的标题较短，为了使大家便与区别，就用了一个醒目的短标题。

这个帖子主要是谈谈在上一篇中没有谈到或是一带而过的问题。和上一篇一样，还是就方法论方法。

1，kernel methods

a. KPCA及其相关

kernel席卷PAMI领域的趋势还在加强。原因很简单，绝大多数的问题都能和kernel挂上钩。在人脸识别里，KPCA和KFA的影响力远不及PCA和LDA。就应用领域来说，KPCA也远没有PCA应用的广泛。YANG Jian在PAMI上的那个KPCA plus LDA就是子空间和kernel结合的典型论文。如果用作一般性的降维KPCA确实会比PCA效果好，特别是你用的feature空间不是一般的欧式空间的时候更为明显。所以，把LDA用在KPCA变换的空间里自然会比用在PCA变换的空间里效果好。但是就降维来说，KPCA有一个严重的缺点，就是由它不能得到一个可表示的子空间，比如PCA也可以得到一组正交基作为表示基。当然，这也是kernel方法的本质属性导致的。这样就会限制kernel方法的应该范围。举个简单的例子，有人做过用PCA来给SIFT特征降维的方法，也就是那个SIFT+PCA，但他们没有用KPCA+SIFT。就原理上来说，KPCA更适合给SIFT降维，但是在实际应用中，对于SIFT来说，如果需要降维的话，用来降维的东西必须事先学好，PCA就可以事先通过大量的自然图片来学习一个子空间。但是，KPCA做不到。虽然有out-of-sample的方法，但是这种方法有明显的缺点：如果训练样本过大，KPCA的kernel矩阵就很大，这样就很不方便应用，如果过小，效果又不好。其实这也是这类kernel方法的通病（不是一般）。

b. regression

regression也是分类常用的一种方法。CVPR'07就有一篇Kernel ridge regression。regression用来分类的原理很简单，但是他和传统的LDA等类似的方法有着明显的区别。就ridge regression来说，它就是要找一个变换，使样本在变换后的空间里和他们本身的label尽量接近，那末这个学到的变换就在最小二乘意义下尽量好的刻画了样本空间的类结构。一般的，对变换函数（离散就是向量或是矩阵）做一个l2范数上的限制，美其名曰保证函数的smooth（这个下面还会再谈）。这样就可以得到一个形式上较为美的闭解。其实根本不用kernelizaton，regression本身就可以和kernel直接挂上钩，因为求出来变换矩阵在一定限制下就可以看成kernel矩阵（YE Jieping CVPR‘07的metric learning中就用到了类似的思想）。这个和用graph Laplacian做ranking的方法非常相似。Laplacian（或是其简单变形）的逆矩阵如果是正定的，那末就把这个逆看作kernel矩阵。那末和kernel直接相关的方法和思路就用上来了，特别是learning中，种类繁杂。

把ridge regression核化的全部技术含量就在计算的trick上。由于把样本映射到Hilbert空间中只是一个虚的表示，在出现内积的情况下才能写成现实的表达式，所以对于kernel方法来说，计算上的trick要求就比较高。但是，往往这类trick都是在统计和矩阵早已被解决的问题，所以大部分工作就是怎样用好而已。

像这样“借壳还魂”的做法，在很多理论的研究上都非常重要。我们要达到我们的目的，但是这个东西又不是直接可表达的，那末就可以把它放到一定的空间中，按照这个空间中的基本原理来计算，最后到达一个可以表达的形式，而且是按照你的idea来推导的。这种东西一旦做出来，质量还不低。

2，regularization

虽然名字叫regularization，其实就想谈谈优化目标和优化约束问题。如果你看了ICML'07,CVPR'07和即将出炉的ICCV'07，你就会发现07年是个不平凡的一年，降维领域有点混乱。或者说自从97年以来一直就没有平静过，都是Fisherfaces惹的祸：）

还记得knato回帖中斗胆列出的排列组合吗？如果不记得暂且去温习一下，因为我要用一把。把knato列出的不同排列组合加上如下regression一个的一个优化||Y-W'X||^2，就可以概括所有今年的和这类相关论文的思想。然后，如果你愿意，你还可以衍生出很多。优化目标确定以后，所不同的就是求解方法。你可以带着这个观点再去看一下今年的论文，了然于胸。

由此，线性降维的混乱过程经历了一个小小的转折————从子空间组合到优化目标和优化约束的组合。子空间主要集中在1998--2005（当然还不会消失），后一种在今年可以说是达到一个小小的高潮。如果再加上应用算法的策略，就形成了乱世中的三足鼎立局面。特别是后一种，往往穿插出现，而且有待加强。这其中的代表人物 TANG Xiaoou, YANG Jian, YE Jieping, HE Xiaofei，YAN Shuicheng。导致这一变更的主要因素来源于非线性方法的应用，特别kernel和manifold learning的线性化应用。这其中LPP起了很大的刺激作用，个人认为，这也是这个方法的最大意义所在。

如果你能站在一个高度（一定范围内）看待这些东西，那末当你面临毕业出国压力时，你就可以“察若水三千，得一瓢饮”来缓解压力。而且还可以尽量饮得好水。（再次郑重声明：这不是发这个帖子的原意。）

3，子空间方法中常用的计算技巧
a.
关于这一块的东西，Stan Z. Li编辑过一个小书挺好的，可以通过下面的网站找到。http://www.face-rec.org/
不过，我想谈谈规律性的东西。这其中涉及到的东西就是 column (range) space, nullspace, generalized inverse。这些东西都和QR分解，SVD或是GSVD相关。遇到这些东西，就想起他们准没错。如果你有兴趣，可以看看YE Jieping和Haesun Park关于子空间的论文，都是一个模式。

b. 正交化
从发表的论文来看，对于广义特征值问题，如果求解一组相互正交的基，比B-orthogonal效果要好很多。代表作就是CAI Deng的orthogonal LPP和YE Jieping的 orthogonal LDA。

CAI Deng做了一个orthogonal LPP发在TIP上。他用的就是88年发在TPAMI上的方法，原理一模一样。YE Jieping用的是同时对角化三个矩阵。风格不同，各有长短。个人还是倾向于CAI Deng用的那个方法。

4，Tensor revisited

在上一篇中，我谈了tensor的方法，主要说了tensorfaces和NTF。这里再多说几句。

最近在tensor方面功夫最多的是YAN Shuicheng，最近的TPAMI, TIP, 和 CVPR'07都有他与此相关的文章。个人认为，这对发文章有帮助，对于解决问题帮助不大。不过，这对于发扬和推广tensor的思想和方法确实是个好事情，我是赞同探讨的。
另外，HE Xiaofei和CAI Deng也做过tensor subspace。准确地说，他们只是借用了tensor的概念，并不是正宗的tensor方法。他们的方法可以和2D PCA, 2D LDA归为一类。

其实做这一块东西最早的是YANG Jian的一个大师兄，在90年代PR上的工作，后来YANG Jian把它发扬光大，最初的结果就是PR和TPAMI上各一篇短文（2DPCA）。
最早把这类东西以tensor形式呈现的是CV中的大牛Amnon Shashua在01年CVPR上的论文，有兴趣可以看看。不过，大牛终究是大牛，当他听说了NMF以后，NTF立马横空出世（ICML'05）。这个中间的变化是质的跨越，能做出前面那种方法的可以说非常之多，能做出后面那种方法的真是寥寥。这是值得我们好好学习的。
（B.T.W.,Amnon此人并不只是学术了得，其妻子是以色列小姐，again，也值得大家学习的榜样，特别是整天闷头做科研的我们）

在这里要强调的是，我们不能完全否定一些简单的东西，上轨道的或是正宗有深度的方法往往就是这样慢慢做出来的。所以，我们也不要轻易去否定HE Xiaofei和CAI Deng在这一块做的东西。

5，其它

关于kernel的方法我就是点到而止。在上一个帖子中有人提出说说SVM和Boosting，如果谁有兴趣，可以谈谈。

另外也有人说在上一个贴中我漏掉了Bayesianfaces，实际这个就是我在参数模型中提到的Probabilistic Subspaces方法。有兴趣可以看看。

结束语

纵观PAMI领域困扰纷争，虽然我们达不到“跳出三界外，不在五行中”的境界，但是至少我们可以更好的看清楚这个领域的情况。如果你能站在一个高度看待这些东西，你就有可能认清你自己认为有希望的方向在哪儿，从而更准确地找到自己的目标而少走弯路，或是更好地给自己定位。

写这些东西，就是想帮助了解这一领域的人能全面准确地了解这一块的东西，少走弯路。另外，对于已经谙熟于心的人，激发一个讨论的话题。在上一篇贴子中，看贴的人多，回帖的人少，这个现象可不好。欢迎大家踊跃发言，良性讨论，这样才会带来更多益处，千万不要担心自己是新手，越是新手越需要发言。

俗话说：“乱世出英雄”，当今在PAMI领域正是需要英雄的时机，就是我在I中说的“我们正处在一个大有可为的时代”，希望下次力挽狂澜的是华人的名字。

以上尽是一家之言，欢迎大家批评指正、主动参与讨论。

2007年6月19日星期二

Doubt about ETS

I think no body will doubt that ETS has wasted a lot of precious time of our Chinese students. Due to the limited number of seats for Tofel in the mainland of China, all the seats have been taken up in the first few days when ETS announced access to the web-based register. Then, of course, a lot of misfortunates who are eagle for exams have to take their chances to look up ETS's website for seats and information constantly. I am just one of them~
This noon, ETS again announced the additional seats for Tofel for the next half of year. It was a good news. However, we were too worry to miss the chance again, that all of us rushed to take the seats and nearly cracked down ETS's servers, which in consequence, made register rather a tough and boring issure, for we were frequently notified that "The server is disconnected".
Finally, after neary an hour of efforts, I've fortunately enough to finished the register and was rather tired of the whole thing. Behind me, many other students were still on the process. God bless them!
The whole thing seems to me just like a funny joke, and I strongly doubt the ability of ETS on this issue. I concede that someone played a bashful role(those took up some seats at first and sold them at high price later!). But, if ETS had foreseed this scenario and taken som measures, for example, adding more seats, then I think, it would save a lot of time and efforts of the our Chinese students!

2007年6月14日星期四

Rainy June

Rainy June, rainy in the sky, also rainy in my heart.

video format conversion

These two days, I developped a library to convert YUV format to RGB, and vice visa. The core algorithm is based on ccvt's codes. But I found the interfaces provided by ccvt are not sufficient, and also contain bugs. So I sat down to analyse the YUV formats(420p, 420i, yuyv), program and debug. Finally a satisfying library is avaliable which is quite easy for use.
The algorithm itself is very simple, but debugging is not so easy a thing. You have to design special routines to expose the errors. The problem is, programmer may always presumes some "facts" or "certainties" which are total error in essence. And sometimes, the code's behavior depends on the dataset fed to the program. All these add to the difficulty of code testing. I've spend neary 3/4 time on testing and debugging~~

2007年6月7日星期四

tips on efficient program

How to write elegent, efficient program is, nearly, all the programmers' desires and dreams. But how to make your program run faster or more efficiently, generally is beyond our ken. On some occasions, it's the very fact that an expert's hunch leads him to tune some codes which turns out to hit the point! But with the increasing complexity in modern compilers, it's much harder for us to "guess" the bottleneck of the program, without the help of profiling tools. Though we can rely on these profiling tools, however, they only tell us where the bottlenecks may lie by lots of datas, yet we have to figure out the reason in the background. If you can do so, then you must be satisfied with yourself, and say, "hei, look, I know the whole thing now!"

I'll give an interesting example below to test your ability on code optimization. The original code came from Mr Mao. Thanks him for the code !

#define NN 5000000

idata = new unsigned char [NN];
odata = new unsigned char [NN];

N1 = (int)(sqrtf( N ));
total = N1 * N1;
Table = new int [total];
for ( y = 0; y < N1; y++ )
for ( x = 0; x < N1; x++ )
{
x1 = ((a*b+1)*x+a*y)%N1;
y1 = (b*x+y)%N1;
Table[j++] = y1*N1+x1;
}

/* code one */
for ( j = 0; j < total; j++ )
{
tp = Table[j];
odata[tp] = idata[j];
}

for ( j = 0; j < total; j++ )
{
tp = Table[j];
idata[tp] = odata[N-1-j];
}

/*code two */
for ( j = 0; j < total; j++ )
{
tp = Table[j];
odata[j] = idata[tp];
}

for ( j = 0; j < total; j++ )
{
tp = Table[j];
idata[j] = odata[N-1-tp];
}

then can you tell me the difference between the two pieces of codes? which one is more efficient, and why?

2007年5月29日星期二

Testing Davison's MonoSLAM(Updated!)

Two months ago, I tried Davision's MonoSALM which is very interesting. For research convenience, I successfully ported the project to both linux(ubuntu 6.10 & 7.04, gcc-4.1.2) and windows(WinXP,VS2005) platform with USB camera support. Similar work has been done by Bob Mottram, where he translated the project into c#.

As some other researchers may be interested in Davison's MonoSLAM, and just like me, want to experience the demo under both windows and linux with USB Camera support. So at here, I took down some of the key tips of my porting experience.

1.USB camera support
Davison's original demo only support 1394 camera under linux, but I didn't have one. So first I added USB camera support in order to experience the demo.

1.1 Under linux -- V4L2 Support
The following work was done under ubuntu 6.10 & 7.04(gcc-4.1.2), and you many have to change it for some other linux versions.

Download v4l2 driver according to the kernel version and install it properly. Try spacaview(availabel on the same website) to test the USB camera. A list of supported USB camera can be fond here, which includes almost all the popular USB camera.

Spcaview is a smart but powerful tool to control the USB camera under linux, and it provides a good encapsulation to V4L2 APIs. I extraced some of the source code and built them into a library(libspcav4l) for easy usage in other applications. My transported project uses this library too.

The VW34 library used by MonoSLAM provides an abstraction to the manipulation of image source, either from file sequences or video capture device. Its general mechanism looks like this :
1) A looper runs continuously on the backgournd to read image file or capture video data from the caputre device. Once it gets the image data, it then notify the upper layer(image processing layer) to fetch the data.
2) The upper layer responds to the notification from the lower loop and process the image data in a callback function registered during the initialization phase.

One of the drawback of this mechanism is that, the looper runs continuously and takes up a lot of CPU resources. In fact, the approach adopted by windows' WDM driver is much more reasonable and elegant : whenever the video data arrives, WDM notifiles the upper layer to fetch it. That is a interruption approach, not loop approach.

The related code can be found at VW34/VWGLOW/Interface folder. Refering to VW34/VWFirewire, I added an interface named VWUSBCam, which calls functions in libspcav4l and added VWUSBCam interface to MonoSLAMGlow/monoslamglow*. Now the USB camera works for MonoSLAM!

1.2 Under Windows -- DirectShow Support
DShow provides a routine to easily manipulate the usb camera. Same as under linux, I added usb camera support with DShow to VWUSBCam interface.

2. Poring to Windows
It's usually troublesome to port a linux application to windows, especially when the application uses many libraries under linux. Fortunately, Oxford's VW34 is designed to be platform independent, and that saved me a lot of time. monoslamglow and scenelib are purely coded with c++, which is also cross-platform.

STL can be seen everywhere in Davison's MonoSLAM. This is good, for STL is efficient and elegant. But some fatal bugs lurk in the code due to the misusing of STL, for example vector::erase. Scott Meyer's Effective STL is really a good reference to write both effective and robust code with STL.

As VC6 does not fully support templates, so I chose VS2005 to compile the projects. Some additinal libraries should be prepared first. They are :

glut-3.7.6 : opengl wraper
wxWidgets-2.6.4 : wxWidgets support

Both the two libraries are platform independent and you can get them freely.I compiled and build them into static libraries.

other libraries used in MonoSLAM :

glow_104
scenelib
vw34

All the libraries are compiled into static libraries. The compiler may complaint "unresolved link" or some other errors due to the project setting or syntax problem. Just follow these prompts and fix them one by one.
What I want to mention is that, VW34 wrapps VNL which is part of VXL library, and after successfully compiling of the source code, plenty of "multiple symbols error" prompts may jump out while linking. I checked the VXL's FAQ and found the solution :
change the project setting in VS2005 as following :
Project->Settings->C/C++ Tab->Category: Code Generation->Use run-time library: Multithreaded DLL

Now I have a copy of MonoSLAM runs on windows with USB camera support. Next I fixed some treacherous bugs mainly due to the misusage of STL.

3. Fix bugs
Here I listed several most treacherous bugs of MonoSLAM. The amazing thing is that, these bugs keep quiet under linux, due to difference between gcc & vs2005 compiler, as well as the difference between implementations of SGI's and Microsoft's STL.
But, in my view, they are really bugs, and linux just "hides" them! So I'll list them up and give the solutions respectively.

3.1. Multiple Inheritance
The first problem encountered is that, once the main GUI window redrawed, it crashed.
It is very curisou, and may be cased by the bugs in wxWidget, opengl, glut or glow. And I had on idea of it ~~

As usual, I debuged the code, and looked deep into the whole mechanism of the paint procedure of monoslamglow. Same as other GUI applications, the mechanism is message driven and call a callback function to redraw the window. After carefully tracking all the variables, I found the handler of the callback object changed during the process which crashed the application!

The inheritance tree looks like this :

MonoSLAMGlow
<-- GlowWindow <-- VWEvents::EventHandler <-- GlowPushButtonReceiver <-- GlowCheckBoxReceiver Multiple inheritance is complicated, especially when vitual inheritance is involved. You can get a idea of how complicate it is from Lippman's Inside the c++ object model.

All the four base class has virtual functions. The handler to callback object invalidated when converting to VWEvent::EventHandler polymorphicly. Lippman pointed out that the polymorphism of c++ itself is very complicate which involves adjusting pointer's address and the actual implementation is compiler-dependend.
I found that the adjusted address of pointer to the derived class had an addrss offset compared to its actual address. So, I adjusted the inheritance order of the base class, that was, put VWEvent::EventHandler at the first place, and the GUI application worked fine!

3.2. STL's pitfalls : vector::erase
Scott Meyer pointed out some dark corners of the c++ language and STL's pitfalls in effective stl. And it happened that Davison's MonoSLAM was caught by one of the pitfalls, that was vector::erase !

After erasing an object from a vector, all the iterators previously pointing to the object in the vector will be invalidate.

The above rule was emphasized again and again by Meyer. Though simple, yet easy to fall into this pitfall, as MonoSLAM did.

At first, MonoSLAM crashed for the error "incompatible iterator!" after running for a short while. Curious error, isn't it ? Experience told me that something may be wrong with stl's container, such as vector, map, set, list, etc. But locating the error was rather difficult because stl's container appeared everywhere in the code. After a careful debugging and scrutinizing the code, I found following code :

for(vector::iterator feat = feature_init_info_vector.begin();
feat != feature_init_info_vector.end();feat ++)
{
if(feat->...){}
else{
delete_partially_initialised_feature(feat); // call vector::erase inside
}
//...
}

similar codes could be found at other places.

It seemed pretty good from logical sense: go through all the objects in the vector and delete unqualified ones.
But it was wrong, according to the above rule!

Once feature_init_info_vector deleted an object, it would reallocate memory and copy the remaining object into new memory. So the feat iterator invalidated after calling delete_partially_initialised_feature, in other words, feat pointed to a old, invalidated memory object, not that in the new allocated feature_init_info_vector! That's why the application complained for "incompatible iterator"!

The solution is very easy, with Meyer's advice, return a iterator pointing to the new allocated feature_init_info_vector and assign it to feat. Ok, that's done !

4. Conclusion
Above is my experience with porting MonoSLAM to windows with USB camera support. Currently, it works well under both linux and windows. There may be some other hidden bugs, I'll fixed them when they emerges.

2007年5月28日星期一

book recipes

comments on some of the books read during spare time.

1. A first course in probability
Ross, Sheldon M.

An excellent book on fundermental concepts of probability with so many interesting examples which bring me good execises on logical reasoning.

2. Cg--A tutorial on Real-Time programming

A tutorial on GPU programming.
During my research on CV, I've found many researches turn to GPU programming for real-time performance of their algorithms, such as GPU-KLT, GPU-SIFT. The main idea is to put heavy burdons of computations to GPU in modern graphic cards which are quite suitable for parallel computing with special processing units, such as pixel and vetex shaders.

3. GPU Gems-- Programming Techniques, Tips, and Tricks for Real-Time Graphics

An good book for GPU programming tips and experience.
However it concers so much knowledge on computer graphics, e.g. lighting, shading, I am a little confused by these concepts. If having time, I'll read some books on computer graphic too.

4. Introduction to Algorithm(MIT)

As its title, this book gives me a good and through introduction to algorithms. Really helpful!

cv研究点滴

研究生以来一直在cv中打转，这也看看，那也试试，觉得cv还是很有意思的，如目标识别、匹配跟踪等等。个人觉得此领域非常广，并且逐渐倾向于与AI结合，以更好地模仿人类的视觉过程。这个领域的大牛也很多，看看他们的工作，有时觉得有点不可思议--在计算机上差不多实现了类似于人类的视觉过程或是思维过程，只能望其项背~~

此帖将记录一些个人在cv研究中的点滴体会，差不多是research log吧。

ubuntu的一些使用配置

一年多来一直在ubuntu下工作和学习，这个系统目前不断地在更新，issure a release every half year,确实也挺好用，在此记录一些关于ubuntu使用及配置的方法，为自己和他人提供方便的参考。此帖在今后会不断更新，主要涉及ubuntu下多媒体、编程环境、桌面效果等的配置情况。

1. Bleeding-edge ffmpeg on Ubuntu Feisty

First, get your dependencies:

sudo apt-get build-dep ffmpeg
sudo apt-get install liblame-dev libfaad2-dev \
libfaac-dev libxvidcore4-dev liba52-0.7.4 \
liba52-0.7.4-dev libx264-dev checkinstall \
build-essential subversion

Next, grab the ffmpeg source: svn checkout -r 8998 svn://svn.mplayerhq.hu/ffmpeg/trunk ffmpeg

If you’re feeling adventurous, you can try the very latest code by omitting the -r 8998 part of that line. Revision 8998 is the latest at the time of writing, and worked for me. Now you can configure and build ffmpeg. This takes a little while:

cd ffmpeg
./configure --enable-gpl --enable-pp --enable-libvorbis \
--enable-libogg --enable-liba52 --enable-libdts \
--enable-dc1394 --enable-libgsm --disable-debug \
--enable-libmp3lame --enable-libfaad --enable-libfaac \
--enable-xvid --enable-pthreads --enable-x264
make

Finally, install it.

checkinstall

gives you the option to edit some parameters: I set the name to ffmpeg and the version to 3:0.svn20070511

sudo checkinstall

2007年5月20日星期日

孤标傲世偕谁隐，花落人亡两不知！

陈晓旭去世了,
林妹妹走了……

她塑造了我心目中的黛玉,或者说她便是黛玉的化身.
每次想到黛玉，眼前总是能浮现出陈晓旭的影子:

"两弯似蹙非蹙罥烟眉，一双似喜非喜含情目；态生两靥之愁，娇袭一身之病；
闲静时如娇花照水，行动出似弱柳扶风；心较比干多一窍，病如西子胜三分。"

帖一首葬花吟 , 缅怀葬花之人，愿晓旭一路走好...

葬花吟

花谢花飞花满天，红消香断有谁怜？
游丝软系飘春榭，落絮轻沾扑绣帘？
闺中女儿惜春暮，愁绪满怀无释处；
手把花锄出绣闺，忍踏落花来复去？
柳丝榆荚自芳菲，不管桃飘与李飞；
桃李明年能再发，明年闺中知有谁？
三月香巢已垒成，梁间燕子太无情！
明年花发虽可啄，却不道人去梁空巢也倾。
一年三百六十日，风刀霜剑严相逼；
明媚鲜妍能几时，一朝飘泊难寻觅。
花开易见落难寻，阶前闷杀葬花人；
独倚花锄泪暗洒，洒上空枝见血痕。
杜鹃无语正黄昏，荷锄归去掩重门；
青灯照壁人初睡，冷雨敲窗被未温。
为奴底事倍伤神，半为怜春半恼春：
怜春忽至恼忽去，至又无言去不闻。
昨宵庭外悲歌发，知是花魂与鸟魂？
花魂鸟魂总难留，鸟自无言花自羞；
愿奴胁下生双翼，随花飞到天尽头。
天尽头，何处有香丘？
未若锦囊收艳骨，一抔净土掩风流；
质本洁来还洁去，强于污淖陷渠沟。
尔今死去侬收葬，未卜侬身何日丧？
侬今葬花人笑痴，他年葬侬知是谁？
试看春残花渐落，便是红颜老死时；
一朝春尽红颜老，花落人亡两不知！

2007年4月20日星期五

人生若只如初见

木兰花拟古绝决词

纳兰性德（清）

人生若只如初见，何事西风悲画扇？
等闲变却故人心，却道故人心易变。
骊山语罢清宵半，夜雨霖铃终不怨。
何如薄幸锦衣儿，比翼连枝当日愿。

水源上看到这首诗，才知道人生若只如初见的出处。
此前也常读到这句诗，或是同学签名中引用，均未细想，此刻品味之余，方觉有味。
大概人生便是如此，初见惊鸿，时间一长，便觉平淡，亦或生厌。

以下为别人的注解：

人生若只如初见，这一句实在令人哑然
悲剧往往都是这样没有征兆的开始，
而种子已深埋在初见的那一刻。
漆黑一片的世界里，有喑哑的胡琴声响起；

隆中初见，羽扇纶巾，笑谈中三分天下，
再见时已是白帝托孤，六出祁山，壮志未酬身先死；
西厢初见，风流不用千金买，月影花移玉人来，
再见时却是弃掷今何在，悔教夫婿觅封侯；
梁山初见，天罡地煞席卷天下，
那一派豪侠风光令人血脉愤张，
乃至再见，听潮而圆，遇信而寂，
寥儿洼招魂幡动，依稀鬼哭；
金屋初见，千娇百媚，永世相守，
而色衰爱驰之日，终晓千金难买长门赋；
华清池中温泉水滑，生长殿里夜半私语，
谁会料到结局是马隗坡前数丈白绫，一捏黄土；
原来所有的名字，所有的故事，都是写在水上的，
那些波澜和涟漪，在当时看来惊心动魄，
而长江滚滚，只是一朵小小的浪花而已，
流过，终于迹；
年少的意气风发，最初的感动和梦想，
在时间的浸润下渐渐磨灭；
一见如故的亲切，山盟海誓的诺言，
只剩下一个依稀的背影；
金甲的战神披着天边的彩霞，
在故事中定格成永恒的记忆；
即便是猜得中绚丽的开头，
又有谁见到了那早已注定的结尾？

初见，惊艳，蓦然回首，
曾经沧海，早已是换了人间。

2007年4月19日星期四

线性方程组求解库

求解线性方程组 Ax=b，常用的方法包括直接解法（一般是对 A 进行矩阵分解）和迭代法。常用的数值计算平台，如 Matlab，Octave，numpy 中都支持对线性方程组的求解，以 Matlab 的性能最优，支持的方法也最多，直接求解时用 x=b\A 会根据 A 的性质（如对称、稀疏）选择最优的算法，在用迭代求解时还可以先进行不完全分解来加速收敛速度。

下面总结一下我所知道的开源的求解线性方程组的相关的数值计算库，一般都是 Fortran 或 C／C＋＋开发。

BLAS(Basic Linear Algebra Subprograms)
blas 是许多数值计算软件库的核心, 一般是用 Fortran77 实现, 支持矩阵、向量基本操作。也有 C／C＋＋的封装。
据说性能最优的 BLAS 实现是 gotoBLAS。
Sparse BLAS
BLAS的 Sparse matrix 版本。见到的都是 NIST 版本，C＋＋的，netlib 上的是一个 Fortran 版本。
LAPACK（Linear Algebra PACKage）
专用于线性运算的，如求解 AX=b, 矩阵分解、求逆，求矩阵特征值、奇异值等，在 www.netlib.org/lapack/ 是 Fortran 版本的，应用很广，但还未见到针对 Sparse matrix 的版本。
ScaLapck
可以在分布内存的计算机上并行求解线性问题，即并行版的 LAPACK。
SuiteSparse
SuiteSparse 是 Prof. Tim Davis的 Sparse matrix solver 的程序包，涉及希疏矩阵分解，大多用 C 开发，提供 Matlab 接口。Matlab 中的很多数希疏矩阵函数的原型都可以在其中找到。性能非常出色。强烈推荐看看他的主页！
TNT(Template Numerical Toolkit)
C++的数值计算库，目标很大，据说还要集成 LAPACK++,Sparselib++,IML++,MV++。但目前还很薄弱，也就是定义了一些数据结构，求解线性方程组的能力基本没有，Sparse matrix 的操作功能也没有。
Sparselib++
C++的 Sparse matrix 库，与 sparse BLAS 差不多，多了对矩阵的 preconditioner功能，可以配合 IML++ 做线性方程组的迭代求解。
IML++(Iterative Methods Library)
C＋＋的。支持对 dense or sparse 的线性方程组的迭代求解，包括
* Richardson Iteration
* Chebyshev Iteration
* Conjugate Gradient (CG)
* Conjugate Gradient Squared (CGS)
* BiConjugate Gradient (BiCG)
* BiConjugate Gradient Stabilized (BiCGSTAB)
* Generalized Minimum Residual (GMRES)
* Quasi-Minimal Residual Without Lookahead (QMR)
等方法，可配合使用 sparselib++。
ITL (The Iterative Template Library)
The ITL currently includes the following methods:
- Conjugate Gradient (cg)
- Conjugate Gradient Squared (cgs)
- BiConjugate Gradient (bicg)
- BiConjugate Gradient Stabilized (bicgstab)
- Chebyshev Iteration (cheby)
- Richardson Iteration (richardson)
- Generalized Conjugate Residual (gcr)
- Generalized Minimal Residual (gmres)
- Quasi-Minimal Residual Without Lookahead (qmr)
- Transpose Free Quasi-Minimal Residual Without Lookahead (tfqmr)
The ITL currently includes the following preconditioners:

Incomplete Cholesky (cholesky)
Incomplete LU without fill-in (ILU)
Incomplete LU with n fill-in and with threshold (ILUT)
SSO

SuperLU
对大型、稀疏、非对称的线性系统的直接求解，用 Gauss 消去法做 LU 分解。用 C 开发。
SPOOLES
C。可求解对称和非对称的线性问题。
MTL
基于 C++ template 的 matrix class lib，技术比较眩，性能据说也不错，但支持的操作太少，线性方程组的只有 LU 分解，和几种迭代方法，不支持 sparse matrix。
blitz++
技术上更眩，但对线性方程组的支持很少。
GSL
GNU scientific library，很正统，但对 sparse matrix 的支持为0，性能也不突出。
其它
MKL（intel math kernel library）非常强大，包括了优化过的 blas，LAPACK，又支持 sparse direct solver，可惜不开源，使用起来不方便。
这儿对 sparse linear system direct solve software 有非常全面的总结。

I've tested some of the libraries, with following results :
testing data : sparse symmetric positive matrix
dimension : 3000x3000
density : 0.3
non-zeros : 1284213
condition : 2.7015e+04
opencv : CV_LU 45.7s(linux)
mkl : pardiso 3.17s(win)
matlab : \ 2.57s(win)
CHOLMOD : cholmod 2.48s(linux)

and I've found that linux(ubuntu-6.10 in my workstation) is far more flexible than windows in programming and scientific research!

2007年4月11日星期三

Recipes of Books

Having just finished reading two of the exceptional c++ serials today, and it worth the two weeks time, really !
I'll focus on algorithm on the next stage, and will be my next book.
Reading, thinking, and then practising, are of great interest !

2007年4月5日星期四

SIFT-Based Object Recognition

David Lowe提出的SIFT算法得到的SIFT特征具有很好的稳定性和不变性，已经被证实是目前最好的不变局部特征提取算法之一。去年大半年时间都在研究这个算法，以及Lowe在Distinctive image features from scale-invariant keypoints(IJCV.2004)中提出的目标识别框架。去年12月时已经有比较好的识别结果了，并提出了一些自己的想法和改进方案，效果也可以，最近又对算法作了些优化，已经能做到2~3fps的识别速度了。
SIFT算法复杂度比较高，多次卷积计算量比较大，并且由于产生的特征点很多，生成特征向量的时间也就比较长；至于后期的匹配和识别过程，也需要一定的时间，比如基于kd-tree的BBF算法，虽说搜索速度比逐一比较加快了几个数量级，但数据库一大，输入特征点比较多的话，还是挺耗时的。而hash hough即需要空间上资源(每个feature需要对应16个bin)，也需要一定的计算量，如求解基于最小二乘的仿射变换。因此要做到实时真是太难了。目前看到的最快速度大概也就是10帧左右吧，那已是commercial product了，并且David Lowe本人亲自参与开发，自是不可比拟的了。
SIFT算法过程也比较复杂，目前网上有一些参考代码，质量参差不齐，倒是不错的参考，完全自己写，还是比较难的，目前还没这样的算法素养。kd-tree比较成熟了，拿来用就是了，倒是generalized hash hough有点费神，Lowe的paper写得挺含糊，也并没有找到可参考的代码，只能自己边想边写了。hough变换的原理很简单，只想不明白hash与hough是如何结合的，看了一大堆hough变换的应用资料，再结合hash原理，用了三天才将代码写出，其中用到了STL中的hashmap(STL还是挺方便的，后来自己用C写了个hashtable实现hash hough，效果就有点差，可能是hash函数不够hash，数据结构安排也不合理)，效果还是挺好的，挺开心的，也挺佩服Lowe提出的算法和框架，确实很有效。
下面是一些截图，已经将SIFT-Based目标识别做成模块，集成到Demo中了：
第四张图为landmark，前三张为在线识别视频截图，左边主窗口为识别图像，左边小窗口为视频图像，中下数据窗口为识别结果，右边竖条为landmark image database。

Panoramic Image Transform

也是去年十二月写的一个小程序，全景图像展开，很简单的算法，两天便完成了。不过刚开始想得比较复杂，比如考虑镜面方程(二次抛物面)，成像模型等等。这样恢复的图像质量当然比较好，不过比较麻烦。后来使用了简易算法：project circle to clinder，公式如下：
panoramic(w, h) omni-directional(R, r) // center of omni-directional image
p(x, y) --> p1(x1, y1) p0(x0, y0)
w = (R + r) / 2
h = R - r
theta = x / w
x1 = x0 + (r + x) * sin(theta)
y1 = y0 + (r + y) * cos(theta)

按上面公式展开得到的图像会有比例失调：上半部分拉伸了，下半部分压缩了，这是由于镜面的非线性映射造成的。于是我拟了一个简单的二次函数对下半部分图像进行非线性拉伸(注:与镜面方程无关，在此就不透露公式了:)，下面是一些实验结果:

全景图

NN未拉伸的图

双线性插值并拉伸后的图

可以看到，拉伸处理后的图与实际环境的尺寸比例比较接近，效果还是挺好的。
此外也在linux下做了实时视频展开实验，将494x464的彩色图展成864x244的彩色图，使用双线性插值算法，也有很好的实时性，核心算法代码写得比较高效。

Digit Number Recognition

去年十二月写了一段针对仪表的数字识别代码，功能比较简单，现总结一下其中的过程。
分别用模板匹配和神经网络实现了识别过程，其中模板匹配算法借鉴了茅老师的idea。
template match ：
image[in] => rgb2gray -> segment(ostu) -> find digits(flood) -> normalize -> NN or Correlation => recognized digits[out]

neuronetwork :
image[in] => rgb2gray -> segment(ostu) -> binary -> find digits(projection) -> normalize ROI -> skeletonize -> generate features(top, bottom, right, left, width, height) -> normalize feature -> BP network(3 layers) => recognized digits[out]

训练与识别过程类似。
template方法比较简单，而且在实际操作中发现识别准确率也比较高，事实上对于固定的摄像机位置和固定形状的数字，模板与实际提取出来的数字不会有太大的差别，因此效果好，速度快，比较适合仪表数字识别应用。
神经网络方法有些复杂，我参考的是一个手写数字识别算法，识别效果也可以，但ANN的参数比较难调，最终选择96-192-10的3层BP网络。在调BP时也遇到了一些问题，训练BP时很多次迭代后RMS老是不能收敛到比较小的值，仔细读了pattern classification中关于ANN的介绍，明白了一些数据处理要求，如训练数据归一化，S函数选择技巧等，实践一下后，效果确实好了些。利用十个样本(0~9)进行监督学习，然后在线识别，效果不如template方法，但也是不错的。猜想使用更多样本学习后，可能效果会更好(有待实验证实)。

这样的小工程还是挺有意思的，前后也就用了两个星期，练练手，不错的。c++代码也写得也还满意，factory模式，支持xml配置文件读写，cross platform(linux, windows)，在windows上做了个Demo：一个数字仪表模拟器，一个USB摄像头视频捕捉和在线识别Demo(这些灵感也来自于茅老师，看过他们的电路实验:)，下面是Demo截图：

一些说明：左图为数字发生器，模拟LED数字，可以定时随机生成，右图为视频采集与在线实时识别Demo，识别前先要手动选择数字区域，固定好位置后就能实时识别了。

2007年4月4日星期三

Testing Tesseract-1.0.3

有关tesseract的报道不再多述，google一下就会有很多结果。早在去年12月就因为要做数字识别而尝试过tesseract-1.0.2，2月底google在sourceforge上又发布了1.0.3，修正了一些内存泄露bug，并作了不少改进，至少编译起来很少了错误(vc2005对一些语法检查比较严，有一些类型转换的错误)。
1.0.3中运动WinMain，但还是通过间接调用main，完成识别工作，也就是在控制台下工作。觉得不太方便，今天花了点时间在已有的基础上增加了剪贴板功能，直接对文字识别，其中做了些彩色变换，因为目前tesseract只支持灰度图，只能识别英文和数字。下面是截图:

Our generalized Rough algorithm uses edge inlors
matron lu dehne a mapping [rot-n the orientation nfan edge paint to a reference point of the shape. Tht: reference point may be though! of as the origin of a loualeo-ordinate system for the shape Then there is an easy way ofcompuling a measure which rates how well points in the image are likely to be ortgins of the specified shape. Figure X shows a few graphic examples of the information used by the generalized Rough
transform. Lines indicate gradient directions A feature of the transform is that it will work even when the boundary is disconnected due to noise or occlusions.This is generally not true [or other strategia whichtrack edge segments
The original algonthrn by Houghm did not use

准确率还是挺高的。
目前还没有关于tesseract的官方文章，相信google不久会整理并发布的。现在也没有时间研究它，先好好把Duda的pattern classification看完再说.

further more, perhaps I've found an interesting application of tesseract, that is Visual Reader on our PC. more details, read web pages, papers, books, etc for us. This is obvious possible with tesseract and MS's Speech SDK, plus some additional image processing, for example, selecting the text region, finding out the time to perform OCR on current screen content. I've got a picture about this demo, just some programming work remained:)
life should be ease for us slothful man ~~

2007年4月3日星期二

Testing Davison's MonoSLAM

半个月前从Davison的主页上下载了monoSLAM的源码，在ubuntu 6.10(gcc 4.1.2)上编译通过，并运行正常，最近又将它移植到windows平台，主要是考虑以后的兼容使用。其中有一些心得和经验，记录于此。

1. support usb camera under linux
Davison的代码只支持1394 camera，在linux下我先为其添加了USB camera支持. 这个工作其实也挺简单，参考VW34/VWFirewire和VW34/VWGLOW/Interface/*代码，写了个VWUSBCam接口，其中调用v4l2(2.6 kernel支持)API，这里又参考了spcaview中的代码，再修改一下makefile就行了，也就用了一天时间，麻烦的事情在于makefile的修改，automake生成的东东总是显得过于繁杂，手动修改起来比较烦。
sequence的工程机理是这样的：底层有一个循环，进行IO操作，不断地读取文件或摄像机数据；一旦得到数据后，发送消息给上层，上层调用响应消息的回调函数，在其中进行数据处理。这样的循环式操作有一个弊端：CPU占用率一直比较高。特别是读取视频时，没有数据时也会一直在"空转"。不像MS的DirectShow，底层的WDM检测到数据信号时，自动发送消息通知上层，即中断式的操作。不过对于linux下的v4l2编程，目前还没有看到过支持中断方式视频捕获。

2.port to windows
移植往往是比较麻烦的工作，linux与windows的平台差异比较大。不过Oxford的VW4是cross platform的，并提供了VC7的project，monoslamglow和scenelib也是完全platform independent，并全用c++实现。这样的工作还是挺好的，我比较喜欢这样的方式，以前写的一些程序也都是这么处理的。
不过在移植过程中还是有一些问题。我使用的是VC2005，可能是目前最好的编译器与IDE了，对C++标准支持得很好，毕竟是Lippman领导设计的。这使得monoslam中大量的模板编译能够顺利通过。
首先需要编译glut和glow，是opengl的一些扩展工具，我使用的分别是glut-3.7.6和glow_104，编译中没有问题。
其次需要编译wxWidgets，目前有2.8.3，第一次尝试搞错了一些设计，出了错，便选用了2.6.4，也是一样的问题。后来分析了一下错误，再结合readme，明白了原因：需要设置一些宏，打开或关闭一些选项。顺利通过后也懒得换高版本了，2.6.4够用了。
现在便可编译VW34了，直接用其中的VC7 solution，并设置好前面几个库的路径，编译也可以通过，但在后来编译MonoSLAMGlow时，却有unresolved link error，显然是找不到库的模块。回头再看VW34，发现了是因为有一些.cpp文件没有添加进solution，没有编译进lib，还有就是文件读取的几个template没实例化，手动加上就可以了(虽说简单，但找到错误却不容易，一开始没想到是这些问题~~)。在这里，gcc与vc2005似乎有比较大的差别，gcc似乎不合乎标准。
最后编译monoslamglow和scenelib。这两个没有现成的工程，monoslamglow其实就是scenelib中API的简单调用，负责一些界面处理，主要的vSLAM工作在scenelib中完成。直接将这两个模块放到了同一个project中，修改了一些头文件路径和gcc、vc2005的语言差别，最后也能编译完成，但冒出来很多multiple symbols error，都是来自于VW34，像是VW34中已经定义了一些标准C++的库函数，如basic_string, vector之类，细查一下又没有，比较奇怪。google了一下，从VXL(一个比较好的computer vision library，VW34中的VNL便是它的一个模块)的文档中找到了答案：C++编译器编译选项设置不一致。solution如下：

Project->Settings->C/C++ Tab->Category: Code Generation->Use run-time library: Multithreaded DLL

于是编译就通过了，porting就初步完成了。

3. fix bugs
前面完成了移植工作，这只是一部分工作而已。程序在windows上读取Davison提供的test sequence，一会便出错了，而且是比较莫名的错误：基于wxWidget的窗口一重绘便出错。wxWidget的问题？若如此，便很麻烦了，不确定是wxWidget还是opengl或是基于opengl的glut、glow有bug，没精力去研读几十M的代码的~~
仍然编译了个debug版本，初步调试了一下，发现只是在窗口重绘时出问题。像一般的界面程序一样，重绘是基于消息的，消息又与回调函数挂钩。在调试中发现调用回调函数的对象指针失效了，这是失败的原因！深入VW32的底层代码，跟踪程序的过程，了解了它的处理方式：从一个带虚函数的EventHandler类派生出窗口子类，再利于EventHandler保存子类的句柄，回调时，由该句柄调用多态调用子类的重绘函数。一切看起来很合理，充分利用了C++的多态性。但EventHandler却在重绘时失效了。不可理解。好看月前仔细阅读了，了解了C++一些底层的对象模型处理过程，对于虚拟继承内部错综复杂的关系也比较清楚了。很快便想到：基类指针失效应该是由于指针在对象转型时offset调整出了问题。这部分是编译器的工作，我们无法干预。这便又是gcc与vc2005的一个显著不同！看了一下MonoSLAMGlow类的声明：多重继承！4个类的多重继承！令人生畏的多重继承！其中EventHandler被放在第二个！这些类都有虚函数，很显然，也就会有多个虚函数表，指针调整时也会比较复杂。从理论上来说，EventHandler应该设计成抽象类，并作为公有虚拟基类，这样便不会有上述问题。我采用了一个比较简单的修改方案：将EventHandler设为第一个继承类，因为不是虚拟基类，它的地址便是MonoSLAMGlow的地址(vc2005中是这么处理的)，不再需要指针调整，快速、方便而又正确！就是这么一个问题，真是得感谢Lippman，难怪侯捷竭力推荐他的这本书，很有道理。
紧接着便是第二个错误：运行一会程序会崩溃了。使用debug版本观察了一下，是vector出错了，看似是指针无效。这个错误很难找，整个工程大量使用了vector，无法定位是哪里出了问题。前段时间发现这个问题时，觉得一时难以下手，也正好在忙别的事，就此搁下了。昨晚又拿起来分析了一下，初步判定是monoslamglow中的问题。今天花了一天时间调试，才找到这个问题所在。真是个非常隐晦而又阴暗的错误！当然，也怪我一时疏忽了，话说过来，这种bug若非亲历并尝过艰辛，是不可能想到并很快发现的，虽然我早就知道bug原因和解决方法~~建议使用STL的程序员都好好读读scott meyer的effective stl，唉，其实我已经读过两遍了~~
这个bug便是vector::erase便得基于vector的所有iterator都失效了！很简单的理由，在effective stl中反复说了好几遍。比如程序中有这样的一段代码：
for(vector::iterator feat = feature_init_info_vector.begin();
feat != feature_init_info_vector.end();feat ++)
{
if(feat->...){}
else{
delete_partially_initialised_feature(feat); // 调用vector::erase
}
}
看似没有问题，delete_partially_initialised_feature erase的是feat的一个副本，feat已经更新了。看过effective stl的人也都会这么说：这是meyer推荐的一个技巧啊，后置++返回旧值。其实不然，erase后，vector内存已经重新分配，以前的iterator都已经失效，因此更新后的feat实际是无效的，在下一次for比较中就会崩溃，错误提示为：incompatible iterator！一个我刚开始始终看不懂的错误。
修改方法就比较简单了，修改delete_partially_initialised_feature的原型，返回iterator,然后
feat = delete_partially_initialised_feature(feat)
即可。
注：程序中有三处这样的错误，需要一一修改。
gcc对于这个问题也没有出错。gcc使用的是sgi的stl，不知道底层是如何处理的，按理说不应如此。还可以说明一点的是：Davison(也有可能是其他人)的c++和stl水平不怎么样，不应该犯这样的错误的，嘿嘿！

update 1 : 2007-04-12
yet another bug of the same kind : incompatible iterator !
in Scene_single.cpp, line 301, delete_feature erases an item from feature vector and invalids all the iterators kept previously in the context, so in the next loop, the "incompatible iterator" error will jump out while comparing the iterators !
solution :
I subsitude the original prototype of delete_feature below

bool delete_feature()
with
std::vector::iterator delete_feature()

in fact, the latter conforms to the convention in STL, such as vector::erase, list::erase, etc.

update 2 : 2007-04-19
add usb camera support on both linux and windows !
It's quite interesting to run the demo with real environment.

4. conclusion
移植与修改这个程序还是很有意思的，有利于提高自己的程序水平。特别是使我们看到了大师提出的经典与意见是很有用的，经典著作不能不读！
另外，对于gcc，我也觉得比较有意思，上面发现的一些错误在gcc中完全正常，虽说有些地方不符合标准，但确实好用。还是比较喜欢gcc的。
写了这么多，觉得可以写封信给Davison了，report bugs to him and ask him to read some classic books on stl :)

5. forwards
以后再为monoslam加一个windows上的USB摄像头接口，那就可以做实验了。这倒是比较容易的，对于摄像头采集还是很熟的，今天就到此为止吧。
我还是比较喜欢在linux下写程序，工具比较丰富，也比较好玩，如emacs，eclipse之类的，只是不如windows中方便，因为vs2005的功能实在太强大了！不过支持开源、支持linux!

订阅：博文 (Atom)