2007年6月7日星期四

tips on efficient program

How to write elegent, efficient program is, nearly, all the programmers' desires and dreams. But how to make your program run faster or more efficiently, generally is beyond our ken. On some occasions, it's the very fact that an expert's hunch leads him to tune some codes which turns out to hit the point! But with the increasing complexity in modern compilers, it's much harder for us to "guess" the bottleneck of the program, without the help of profiling tools. Though we can rely on these profiling tools, however, they only tell us where the bottlenecks may lie by lots of datas, yet we have to figure out the reason in the background. If you can do so, then you must be satisfied with yourself, and say, "hei, look, I know the whole thing now!"

I'll give an interesting example below to test your ability on code optimization. The original code came from Mr Mao. Thanks him for the code !

#define NN 5000000

idata = new unsigned char [NN];
odata = new unsigned char [NN];

N1 = (int)(sqrtf( N ));
total = N1 * N1;
Table = new int [total];
for ( y = 0; y < N1; y++ )
for ( x = 0; x < N1; x++ )
{
x1 = ((a*b+1)*x+a*y)%N1;
y1 = (b*x+y)%N1;
Table[j++] = y1*N1+x1;
}

/* code one */
for ( j = 0; j < total; j++ )
{
tp = Table[j];
odata[tp] = idata[j];
}

for ( j = 0; j < total; j++ )
{
tp = Table[j];
idata[tp] = odata[N-1-j];
}

/*code two */
for ( j = 0; j < total; j++ )
{
tp = Table[j];
odata[j] = idata[tp];
}

for ( j = 0; j < total; j++ )
{
tp = Table[j];
idata[j] = odata[N-1-tp];
}

then can you tell me the difference between the two pieces of codes? which one is more efficient, and why?

2 条评论:

匿名 说...

I guess the second is more effecient.Would you like to tell me the right choice and the reason?

xuning 说...

Yes, you're right!
In fact, I am not quite sure about my explanation. I just did some tests and got the conclusion below. Maybe some other facts should be considered.

In my view, it concers with the CPU's cache management algorithm adopted by the OS. Cache is faster for data exchange than RAM, and on most platforms, for efficiency, when requiring data on the RAM, OS will first load a continuous block of datas(usually 4K bytes) from RAM into cache, and CPU later will directly operate on these data in cache. So when dealing with continuous memory blocks,CPU can perform thousands of operations before the next reading of RAM, thus, saving a lot time. But, in the contrary, if the required data scats on different blocks, CPU has to read/write RAM for each operation, which is quite time-consuming! I also noticed the difference between the complexity of reading and writing RAM by some simple tests. Writing needs more CPU instructions than reading.
Now let's look at this example, the buffer is quite large(4.7M bytes) and locates on at least 1000 pages. Code one reads datas continuously and writes back randomly while code two just in reverse order. As I have pointed out above, code one requires more writing operations than code two, and thus is more time consuming.