AMD64 string function optimizations
So time comes that we should consider whether some of our string operations is not optimial and needs some operation. NetBSD seems to have done a lot of good work to make their MD layer better, and we should take these improvements if they are proven.
David O’Brien has pointed out that we should be careful on this, however, since some micro optimizations may hinder performance in a large scenario. With this concern in mind, I will redo some more benchmarks and request for -arch@’s idea before considering commiting the patchset. Currently, the NetBSD implementation of swab(2) is proven to be slower than the GCC generated code by about 25%. I will look into the assembly code generated to see whether I can help to solve this.
Others has gave many important help too. One great thing to see is that NetBSD has taken some of our improvements to their -HEAD, and the co-operation would make the both projects stronger.