
As part of the Boost.SIMD effort, another library was developed, Boost.Dispatch, which relies heavily on overload resolution. Before submitting this library for review, however, I did some compile-time benchmarking, and it's not looking very good. I'm therefore asking the Boost community if 1) this is expected, 2) if there is a workaround to this problem and 3) if compilers should be made to deal with this better. Boost.Dispatch essentially overloads a single free function, with, as arguments, a tag identifying the function followed by tags identifying properties/categories of the arguments (in an attempt to make a transparent tag dispatching system). The scripts from the link below [1] (the shell script is POSIX/GCC-only but should be easily adaptable) performs two tests: - Resolving a call with 1,000 function tags, overloaded 10 times each, everything implemented by overloading the same function as described above, - Resolving a call with 1,000 function tags, overloaded 10 times each, but with each function tag using a different functon. It takes 30s in the first case, 160ms in the second one. Unfortunately, the second solution is very restrictive and doesn't play very nice with generic programming. C++ apparently lacks a mechanism to define overloadable free functions with template parameters in their name. Back to the benchmark, making the number of functions vary show an apparent exponential increase in compile time. It's reasonably fast for 100 functions, but not at all for 1,000. On the other hand, changing the depth of the inheritance tree used for the category tags doesn't seem to affect it much. Any suggestions? [1] http://dl.dropbox.com/u/26902324/dispatching_bench.tar.gz

On 21/09/11 19:20, Mathias Gaunard wrote:
As part of the Boost.SIMD effort, another library was developed, Boost.Dispatch, which relies heavily on overload resolution.
Before submitting this library for review, however, I did some compile-time benchmarking, and it's not looking very good. I'm therefore asking the Boost community if 1) this is expected, 2) if there is a workaround to this problem and 3) if compilers should be made to deal with this better.
Boost.Dispatch essentially overloads a single free function, with, as arguments, a tag identifying the function followed by tags identifying properties/categories of the arguments (in an attempt to make a transparent tag dispatching system).
The scripts from the link below [1] (the shell script is POSIX/GCC-only but should be easily adaptable) performs two tests:
- Resolving a call with 1,000 function tags, overloaded 10 times each, everything implemented by overloading the same function as described above, - Resolving a call with 1,000 function tags, overloaded 10 times each, but with each function tag using a different functon.
It takes 30s in the first case, 160ms in the second one.
clang's a bit faster than g++ here, but not much; I suspect a newer g++ would match it: $ time clang++ -fsyntax-only test.cpp real 0m30.909s user 0m29.974s sys 0m0.088s $ time g++ -fsyntax-only test.cpp real 0m53.748s user 0m53.017s sys 0m0.140s $ clang++ --version clang version 3.0 (trunk 134846) Target: x86_64-unknown-linux-gnu Thread model: posix $ g++ --version g++ (Gentoo 4.4.5 p1.0, pie-0.4.5) 4.4.5 Copyright (C) 2010 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Unfortunately, the second solution is very restrictive and doesn't play very nice with generic programming.
C++ apparently lacks a mechanism to define overloadable free functions with template parameters in their name.
I'm not really sure what your constraints are, but here are a couple of ideas (not benchmarked): What about using functors; so the call would be something like dispatching<f0_>()(h49_<int>(), h49_<int>(), h49_<int>(), h49_<int>(), h49_<int>()); with the f parameter choosing a template specialization, and then the other parameters picking an overload of operator() ion it? Or, if you must use functions, you could have two layers. The 'outer' layer would be called as in your test.cpp: dispatching(f0_(), h49_<int>(), h49_<int>(), h49_<int>(), h49_<int>(), h49_<int>()); and would be overloaded only on the first argument, forwarding the rest to one of a number of 'inner' functions (one for each f). So, one such overload would be: template<typename... A> void dispatching(f0_, A... a) { dispatching_f0_(a...) } (adapt in the obvious way to get a C++03 version) John Bytheway

On 22/09/2011 00:42, John Bytheway wrote:
What about using functors; so the call would be something like
dispatching<f0_>()(h49_<int>(), h49_<int>(), h49_<int>(), h49_<int>(), h49_<int>());
with the f parameter choosing a template specialization, and then the other parameters picking an overload of operator() ion it?
The problem is that you cannot overload dispatching<f0_>::operator() once the class has been defined.

On 21.09.2011 20:20, Mathias Gaunard wrote:
As part of the Boost.SIMD effort, another library was developed, Boost.Dispatch, which relies heavily on overload resolution.
Back to the benchmark, making the number of functions vary show an apparent exponential increase in compile time. It's reasonably fast for 100 functions, but not at all for 1,000. Overload resolution is supposed to be linear in the number of overloads. If you can show a graph that shows super-linear increase in compile time, you should file a bug with the respective compilers.
In general, all algorithms in a compiler should be linear, or worst case n*log(n). Any quadratic or worse algorithm is pretty much a bug. Sebastian

On 22/09/2011 11:25, Sebastian Redl wrote:
On 21.09.2011 20:20, Mathias Gaunard wrote:
As part of the Boost.SIMD effort, another library was developed, Boost.Dispatch, which relies heavily on overload resolution.
Back to the benchmark, making the number of functions vary show an apparent exponential increase in compile time. It's reasonably fast for 100 functions, but not at all for 1,000.
Overload resolution is supposed to be linear in the number of overloads. If you can show a graph that shows super-linear increase in compile time, you should file a bug with the respective compilers.
Will do. a quick Excel regression show a 0.98 fitness to a+b^X ...

on Thu Sep 22 2011, Sebastian Redl <sebastian.redl-AT-getdesigned.at> wrote:
Overload resolution is supposed to be linear in the number of overloads.
According to whom?
If you can show a graph that shows super-linear increase in compile time, you should file a bug with the respective compilers.
Probably.
In general, all algorithms in a compiler should be linear, or worst case n*log(n). Any quadratic or worse algorithm is pretty much a bug.
I'd like to think so, too, but I'm not sure all implementors would agree with you. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Dave Abrahams <dave@boostpro.com> writes:
In general, all algorithms in a compiler should be linear, or worst case n*log(n). Any quadratic or worse algorithm is pretty much a bug.
I'd like to think so, too, but I'm not sure all implementors would agree with you.
The problem complexities are not linear or n*log(n) so why should you expect the algorithms to be? The heuristic algorithms to achieve acceptable results could be limited to n*log(n) but like everything else, it's a tradeoff. Dependence analysis, for example, can get very expensive in some situations if you want reasonable accuracy. Customers always say they don't care about compile time as long as the code is fast. Until, of course, they do. :) -Dave

On 22.09.2011 23:40, David A. Greene wrote:
Dave Abrahams<dave@boostpro.com> writes:
In general, all algorithms in a compiler should be linear, or worst case n*log(n). Any quadratic or worse algorithm is pretty much a bug. I'd like to think so, too, but I'm not sure all implementors would agree with you. The problem complexities are not linear or n*log(n) so why should you expect the algorithms to be? The heuristic algorithms to achieve acceptable results could be limited to n*log(n) but like everything else, it's a tradeoff. Dependence analysis, for example, can get very expensive in some situations if you want reasonable accuracy.
I was thinking only of the frontend, not the optimizers or code generation. Sebastian

On 22/09/2011 23:40, David A. Greene wrote:
Dave Abrahams<dave@boostpro.com> writes:
In general, all algorithms in a compiler should be linear, or worst case n*log(n). Any quadratic or worse algorithm is pretty much a bug.
I'd like to think so, too, but I'm not sure all implementors would agree with you.
The problem complexities are not linear or n*log(n) so why should you expect the algorithms to be?
I'm pretty sure Sebastian was talking about front-ends. Nothing in the front-end should be worse than n*log(n).

Mathias Gaunard <mathias.gaunard@ens-lyon.org> writes:
In general, all algorithms in a compiler should be linear, or worst case n*log(n). Any quadratic or worse algorithm is pretty much a bug.
I'd like to think so, too, but I'm not sure all implementors would agree with you.
The problem complexities are not linear or n*log(n) so why should you expect the algorithms to be?
I'm pretty sure Sebastian was talking about front-ends. Nothing in the front-end should be worse than n*log(n).
Ok, that I can agree with. We do need to be precise in what we say. :) -Dave

On 22.09.2011 17:57, Dave Abrahams wrote:
on Thu Sep 22 2011, Sebastian Redl<sebastian.redl-AT-getdesigned.at> wrote:
Overload resolution is supposed to be linear in the number of overloads. According to whom? The C++ standard has a footnote that outlines a linear algorithm for overload resolution. Clang follows this algorithm, and I suspect pretty much every other compiler does as well. Therefore, if resolution is superlinear, it's a bug. In general, all algorithms in a compiler should be linear, or worst case n*log(n). Any quadratic or worse algorithm is pretty much a bug. I'd like to think so, too, but I'm not sure all implementors would agree with you. I can't speak for any other compilers, but I'm pretty sure Ted and Doug would agree with me about the main compilation pass of Clang. We make exceptions for emitting errors, and of course for the static analyzer, whose pass-sensitive checks are inherently exponential.
Sebastian

on Fri Sep 23 2011, Sebastian Redl <sebastian.redl-AT-getdesigned.at> wrote:
On 22.09.2011 17:57, Dave Abrahams wrote:
on Thu Sep 22 2011, Sebastian Redl<sebastian.redl-AT-getdesigned.at> wrote:
Overload resolution is supposed to be linear in the number of overloads. According to whom? The C++ standard has a footnote that outlines a linear algorithm for overload resolution. Clang follows this algorithm, and I suspect pretty much every other compiler does as well. Therefore, if resolution is superlinear, it's a bug.
I know this is a technicality, but... The fact that an efficient algorithm for X exists does not mean every author of code that does X has promised to use that algorithm. It also doesn't guarantee that factors not considered in the description of the algorithm (name mangling, generation of debug info, etc.) won't make it inefficient in practice.
In general, all algorithms in a compiler should be linear, or worst case n*log(n). Any quadratic or worse algorithm is pretty much a bug. I'd like to think so, too, but I'm not sure all implementors would agree with you.
I can't speak for any other compilers, but I'm pretty sure Ted and Doug would agree with me about the main compilation pass of Clang. We make exceptions for emitting errors, and of course for the static analyzer, whose pass-sensitive checks are inherently exponential.
I'm sure that even with Clang one can demonstrate metaprograms whose compilation in principle ought to be linear but are slower than that in practice. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On 24.09.2011, at 01:30, Dave Abrahams wrote:
on Fri Sep 23 2011, Sebastian Redl <sebastian.redl-AT-getdesigned.at> wrote:
On 22.09.2011 17:57, Dave Abrahams wrote:
on Thu Sep 22 2011, Sebastian Redl<sebastian.redl-AT-getdesigned.at> wrote:
Overload resolution is supposed to be linear in the number of overloads. According to whom? The C++ standard has a footnote that outlines a linear algorithm for overload resolution. Clang follows this algorithm, and I suspect pretty much every other compiler does as well. Therefore, if resolution is superlinear, it's a bug.
I know this is a technicality, but...
The fact that an efficient algorithm for X exists does not mean every author of code that does X has promised to use that algorithm. It also doesn't guarantee that factors not considered in the description of the algorithm (name mangling, generation of debug info, etc.) won't make it inefficient in practice.
True, although I would still believe that this generally means there's room for improvement. All general guidelines aside though, if Clang is superlinear in overload resolution, I want to know about it.
In general, all algorithms in a compiler should be linear, or worst case n*log(n). Any quadratic or worse algorithm is pretty much a bug. I'd like to think so, too, but I'm not sure all implementors would agree with you.
I can't speak for any other compilers, but I'm pretty sure Ted and Doug would agree with me about the main compilation pass of Clang. We make exceptions for emitting errors, and of course for the static analyzer, whose pass-sensitive checks are inherently exponential.
I'm sure that even with Clang one can demonstrate metaprograms whose compilation in principle ought to be linear but are slower than that in practice.
And I would love to see such metaprograms and figure out why that is. Sebastian

On 22/09/2011 11:25, Sebastian Redl wrote:
On 21.09.2011 20:20, Mathias Gaunard wrote:
As part of the Boost.SIMD effort, another library was developed, Boost.Dispatch, which relies heavily on overload resolution.
Back to the benchmark, making the number of functions vary show an apparent exponential increase in compile time. It's reasonably fast for 100 functions, but not at all for 1,000. Overload resolution is supposed to be linear in the number of overloads.
O(1) or O(log n) even seems possible for certain cases but it doesn't seem compilers optimize for them. Consider, for N unique different and unrelated types Ti, N overloads of the form void f(Ti); If I call f(T0()), it shouldn't need to test the N-1 other overloads. And that kind of logic should apply to cull the set of viable overloads when any of the arguments matches exactly. If now I have, for each Ti, template<class T> void f(Ti, foo0<T>); template<class T> void f(Ti, foo1<T>); template<class T> void f(Ti, foo2<T>); template<class T> void f(Ti, foo3<T>); if I call f(T0(), something) I should first reduce the set of possible overloads to 4 then find out what the best one is out of these.

on Wed Sep 21 2011, Mathias Gaunard <mathias.gaunard-AT-ens-lyon.org> wrote:
As part of the Boost.SIMD effort, another library was developed, Boost.Dispatch, which relies heavily on overload resolution.
Before submitting this library for review, however, I did some compile-time benchmarking, and it's not looking very good. I'm therefore asking the Boost community if 1) this is expected, 2) if there is a workaround to this problem and 3) if compilers should be made to deal with this better.
1) In a sense, yes. Matt Calabrese and Zach Laine demonstrated that at least there's no reason to expect metaprogramming with overload resolution to be faster than metaprogramming with direct class template instantiation. Google "instantiations must go". 2) None that I know of 3) Always ;-) -- Dave Abrahams BoostPro Computing http://www.boostpro.com
participants (6)
-
Dave Abrahams
-
greened@obbligato.org
-
Joel Falcou
-
John Bytheway
-
Mathias Gaunard
-
Sebastian Redl