
Loop unrolling, per se, is not all that hard. But in combination with software pipelining and/or manual vectorization it can be a challenge to get it right. A template library could conceivable help. I generally use preprocessor macros in such situations but it quickly gets very ugly and very hard to maintain. However, the performance that can be gained by using such a library is largely a temporary thing. Compilers keep getting better and, in most cases, simply saying what you mean will allow a good compiler to make the necessary optimizations. If your compiler isn't that smart today chances are it will be tomorrow. So I wouldn't invest a whole lot of work in writing such a library nor in converting code to use it. Of course, even temporary wins have some value. If work does go ahead on this I will watch with interest and contribute when I can. On real code with real data I've achieved 60-70% speedups by manually unrolling, pipelineing, and vectorizing critical loops. The biggest problems with this approach, the factors that limit how much it gets done, are: 1) It takes a rocket scientist to make the code transformations safely and it takes a lot of low-level insight to know where the transformations are likely to help. 2) The performance gains are very sensitive to the microarchitecture of the execution platform so it is hard to justify for commercial software (that runs on a variety of platforms without recompilation) 3) The transformed code becomes essentially maintenance-proof. 4) To evaluate the effectiveness of the proposed transformation you really want to test a fairly large number of alternative transformations. Each one is very tedious to do: degree of unrolling .vs. alternative interleavings of pipeline stages .vs. data representation alternatives .vs. ... You can't test them all. A template library that successfully abstracted some of the mechanics of these transformations would, at a minimum, make it feasible to evaluate more alternatives on more microarchitectures and reduce the maintenance penalty of implementing the transformations. If it also made it feasible to deploy multiple runtime-selected implementations of a function from one source text that would be very interesting. -swn -----Original Message----- From: Vladimir Prus [mailto:vladimir@codesourcery.com] Sent: Sunday, June 29, 2008 1:15 AM To: boost@lists.boost.org Subject: Re: [boost] Anyone interested in a generic loop (and loopunrolling) library ? Hui Li wrote:
Loops, in particular loop-unrolling, can be made generic, easy to write, and accessible to everybody. With the help of Boost.Lambda (and a few other boost libraries), we can easily construct arbitrary loops that are unrolled at compile-time.
Can you clarify exactly what you're trying to achieve? I don't think that loops, in general, and hard-to-write and not accessible to everybody. And speaking about loop unrolling -- the whole point of that is performance -- did you measure performance of the code using your approach and traditional code with suitable optimizations? - Volodya