[named_params] timing trivia and some comments

Been looking at the very cute named_params in the sandbox. Getting around to trying to do some stuff with it. Noticed something interesting w.r.t. vc7.1 optimization: BOOST_NAMED_PARAMS_FUN(double, power, 0, 2, power_keywords ) { double b = p[base | 10]; double e = p[exponent | 1]; return pow(b ,e); } double pow_wrap(double b, double e) { return pow(b,e); } are the same speed, around 25 nanoseconds on my machine, when called with some variable parameters ( now = pow_wrap(t.elapsed(),t.elapsed() / (rand() * t.elapsed())); ) to defeat the chance of variable elimination and constant propagation. But interestingly, BOOST_NAMED_PARAMS_FUN(double, power, 0, 2, power_keywords ) { return pow(p[base | 10] , p[exponent | 1]); } is more than five times slower at 144 nanoseconds. Which is the opposite to what I would of expected... Anyway, I found it interesting :-) It seems the take home message for me is that it is possible to use the named_params library with no, zippo, notta bit of abstraction overhead even for a very simple function wrap. I'm deeply impressed. Well done Dave and Daniel. It would be nice if the macro could some how encapsulate the keyword definition so this might be eliminated. struct base_t; struct exponent_t; namespace { boost::keyword<base_t> base; boost::keyword<exponent_t> exponent; } struct power_keywords : boost::keywords< base_t, exponent_t> {}; The pre-processor trickery to do this is beyond me I'm afraid. I am interested in being able to iterate over the parameters extracting keyword_types, argument types and argument values. Convenient string names would be good too but I can always uses typeid on the keyword_types. Why would I want to do this? I would like to use this approach as a way for inserting an intermediary function. Specifically, I would like to call f<direct>(x,y) and have the direct representation called or f<marshal, destination>(x,y) and have an intermediary serialize the params into a block and send it off on a transport where a representation along the lines of f<marshal, source>(x,y) would accept the block in some infrastructure somewhere else. f<queue_for_processing_in_a_thread_pool>(x,y) fits this model too. Any thoughts? Regards, Matt Hurd. __________________________________________ First case results: ---------------------------------------------------------------------------- ---- looper invasive timing estimate simple_named_params ---------------------------------------------------------------------------- ---- median time = 24.88038277512962 nanoseconds 90% range size = 9.926167350636332e-015 nanoseconds widest range size (max - min) = 10.8665071770335 microseconds minimum time = 24.88038277511962 nanoseconds maximum time = 10.89138755980862 microseconds 50% range = (24.88038277512962 nanoseconds, 24.88038277512962 nanoseconds) 50% range size = 0 nanoseconds ---------------------------------------------------------------------------- ---- looper invasive timing estimate simple_wrap ---------------------------------------------------------------------------- ---- median time = 24.88038277512962 nanoseconds 90% range size = 9.926167350636332e-015 nanoseconds widest range size (max - min) = 10.51913875598087 microseconds minimum time = 24.88038277511962 nanoseconds maximum time = 10.54401913875599 microseconds 50% range = (24.88038277512962 nanoseconds, 24.88038277512962 nanoseconds) 50% range size = 0 nanoseconds Second case: ---------------------------------------------------------------------------- ---- looper invasive timing estimate simple_named_params ---------------------------------------------------------------------------- ---- median time = 144.4976076555124 nanoseconds 90% range size = 0 nanoseconds widest range size (max - min) = 1.#INF seconds minimum time = 144.4976076555077 nanoseconds maximum time = 1.#INF seconds 50% range = (144.4976076555124 nanoseconds, 144.4976076555124 nanoseconds) 50% range size = 0 nanoseconds ---------------------------------------------------------------------------- ---- looper invasive timing estimate simple_wrap ---------------------------------------------------------------------------- ---- median time = 24.88038277512962 nanoseconds 90% range size = 3.308722450212111e-015 nanoseconds widest range size (max - min) = 1.#INF seconds minimum time = 24.88038277512295 nanoseconds maximum time = 1.#INF seconds 50% range = (24.88038277512962 nanoseconds, 24.88038277512962 nanoseconds) 50% range size = 0 nanoseconds

On Behalf Of Matthew Hurd Subject: [boost] [named_params] timing trivia and some comments
Been looking at the very cute named_params in the sandbox. Getting around to trying to do some stuff with it.
Noticed something interesting w.r.t. vc7.1 optimization:
BOOST_NAMED_PARAMS_FUN(double, power, 0, 2, power_keywords ) { double b = p[base | 10]; double e = p[exponent | 1]; return pow(b ,e); }
double pow_wrap(double b, double e) { return pow(b,e); }
are the same speed, around 25 nanoseconds on my machine, when called with some variable parameters ( now = pow_wrap(t.elapsed(),t.elapsed() / (rand() * t.elapsed())); ) to defeat the chance of variable elimination and constant propagation.
But interestingly,
BOOST_NAMED_PARAMS_FUN(double, power, 0, 2, power_keywords ) { return pow(p[base | 10] , p[exponent | 1]); }
is more than five times slower at 144 nanoseconds.
FWIW, Calling power(exponent = t.elapsed() / (rand() * t.elapsed()), base= t.elapsed() ); That is, with the parameters in the reverse order to the natural order, takes nearly twice as long with this configuration, so there seems a little penalty for being out of order with this example, compiler, hardware and OS. Not sure it means anything much though. Regards, Matt Hurd. ________________________________________ ---------------------------------------------------------------------------- ---- looper invasive timing estimate simple_named_params ---------------------------------------------------------------------------- ---- median time = 52.15311004785687 nanoseconds 90% range size = 0.4784688995215372 nanoseconds widest range size (max - min) = 10.32153110047848 microseconds minimum time = 46.41148325358853 nanoseconds maximum time = 10.36794258373207 microseconds 50% range = (52.15311004785686 nanoseconds, 52.15311004785688 nanoseconds) 50% range size = 3.308722450212111e-014 nanoseconds ---------------------------------------------------------------------------- ---- looper invasive timing estimate simple_wrap ---------------------------------------------------------------------------- ---- median time = 24.88038277512962 nanoseconds 90% range size = 3.308722450212111e-015 nanoseconds widest range size (max - min) = 10.3464114832536 microseconds minimum time = 24.88038277511962 nanoseconds maximum time = 10.37129186602872 microseconds 50% range = (24.88038277512962 nanoseconds, 24.88038277512962 nanoseconds) 50% range size = 0 nanoseconds

"Matthew Hurd" <matt@finray.net> writes:
Been looking at the very cute named_params in the sandbox. Getting around to trying to do some stuff with it.
Noticed something interesting w.r.t. vc7.1 optimization:
BOOST_NAMED_PARAMS_FUN(double, power, 0, 2, power_keywords ) { double b = p[base | 10]; double e = p[exponent | 1]; return pow(b ,e); }
double pow_wrap(double b, double e) { return pow(b,e); }
are the same speed, around 25 nanoseconds on my machine, when called with some variable parameters ( now = pow_wrap(t.elapsed(),t.elapsed() / (rand() * t.elapsed())); ) to defeat the chance of variable elimination and constant propagation.
But interestingly,
BOOST_NAMED_PARAMS_FUN(double, power, 0, 2, power_keywords ) { return pow(p[base | 10] , p[exponent | 1]); }
is more than five times slower at 144 nanoseconds.
Which is the opposite to what I would of expected... Anyway, I found it interesting :-)
It seems the take home message for me is that it is possible to use the named_params library with no, zippo, notta bit of abstraction overhead even for a very simple function wrap.
I'm deeply impressed. Well done Dave and Daniel.
You can thank the good people at Microsoft for that. I don't think we put any special attention on avoiding abstraction penalty other than doing the obvious things (e.g. pass classes by reference).
It would be nice if the macro could some how encapsulate the keyword definition so this might be eliminated. struct base_t; struct exponent_t;
namespace { boost::keyword<base_t> base; boost::keyword<exponent_t> exponent; }
struct power_keywords : boost::keywords< base_t, exponent_t> {};
Good point... but we anticipate the keywords for a given library will probably be re-used in several function interfaces, so we can't rightly do it all with a single macro invocation -- you might need to write several lines like the last one, for different functions. We can do something like: BOOST_NAMED_PARAMS_KEYWORD_DECL((base)(exponent)) to generate the keyword declarations and then something like: BOOST_NAMED_PARAMS_KEYWORD_SET(power, (base)(exponent)) for each line like the last one.
The pre-processor trickery to do this is beyond me I'm afraid.
It's not hard, if you're willing to spend the time poring through the PP lib docs. I just worry a little about ending up with a library interface that hides everything behind macros.
I am interested in being able to iterate over the parameters extracting keyword_types, argument types and argument values.
Since the parameters can have heterogeneous types, there's no way to iterate over them. It would, however, be possible to "recurse" over the items with something that looks like: for_each(params, some_templated_function_object)
Convenient string names would be good too but I can always uses typeid on the keyword_types.
Why would I want to do this? I would like to use this approach as a way for inserting an intermediary function. Specifically, I would like to call f<direct>(x,y) and have the direct representation called or f<marshal, destination>(x,y) and have an intermediary serialize the params into a block and send it off on a transport where a representation along the lines of f<marshal, source>(x,y) would accept the block in some infrastructure somewhere else. f<queue_for_processing_in_a_thread_pool>(x,y) fits this model too.
Any thoughts?
I guess my first thought is: "Whaa??? What does any of the above have to do with a named parameters library?" And then I think: "OK, he wants something that mates the serialization library from Robert Ramey with the new tuples (fusion) from Joel de Guzman". I can begin to vaguely see a reason to slap a named parameters interface on top of the whole thing, but it seems like you could do that as an afterthought. Am I missing something? I must be. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
"Matthew Hurd" <matt@finray.net> writes:
Why would I want to do this? I would like to use this approach as a way for inserting an intermediary function. Specifically, I would like to call f<direct>(x,y) and have the direct representation called or f<marshal, destination>(x,y) and have an intermediary serialize the params into a block and send it off on a transport where a representation along the lines of f<marshal, source>(x,y) would accept the block in some infrastructure somewhere else. f<queue_for_processing_in_a_thread_pool>(x,y) fits this model too.
Any thoughts?
I guess my first thought is: "Whaa??? What does any of the above have to do with a named parameters library?"
And then I think: "OK, he wants something that mates the serialization library from Robert Ramey with the new tuples (fusion) from Joel de Guzman".
Conceptually, I think that he wants to serialize a boost::function<>. My advice is "don't bother". I use shared_ptr<Command> for serializeable polymorphic functions. BTW, it is not necessary to serialize/marshal the function object in order to pass it to a thread (pool), only IPC needs marshaling.
participants (3)
-
David Abrahams
-
Matthew Hurd
-
Peter Dimov