determining if an object is faster to pass by value or const reference

In general noone would write a function that took an int by const reference because doing so doesn't make sense from a performance standpoint. However often times we pass these trivial types to functions generically written to take const T&. It seems like type traits would make it simple to overload based on this convention. However I don't see a type trait that is fit for this use case. The has_trivial_copy trait would seem promising but it could return true for large pod objects that you wouldn't want to copy. The is_scalar trait seems promising but it will return false for lightweight user defined types and be suboptimal. Using has_trivial_copy together with some sizeof limit seems promising but a bit hand wavy. Is there a trait designed for this use case I'm overlooking? Thanks, Michael Marcin

Michael Marcin wrote:
The documentation makes no mention of allowing extension for user defined types. It in fact has a table which clearly states that param_type is a const reference for all user defined types. In a specific case I was working on a 3d renderer that we used for a mobile project where we used a templated fixed point class. There were some low level functions that worked on a typename Real which used pass by const reference because cheap copy was not a requirement of our Real concept. This project happened to only be using this fixed point type which was a wrapper over int. We branched the code and changed the functions to pass by value. This resulted in a measurable performance increase. I would like to take classes like this and specialized param_type for them but the documentation for this library doesn't seem to allow for that. Thanks, Micahel Marcin

Michael Marcin skrev:
I would like to take classes like this and specialized param_type for them but the documentation for this library doesn't seem to allow for that.
That addition would be most well-come. As it is know, the trait is a bit naive. I think the trait should at least use pass by value if both a. the size of the types is smaller than sizeof(long double) b. the type is a POD For certain architectures, it might be beneficial to make pass-by value for larger types, on others perhaps only for smaller. Now, if that change was added, it also provided the customization that you needed, because you would simlpy customize is_pod. And that has other benefits on its own. John, is the user allowed to specialize is_pod? Thanks -Thorsten

Michael Marcin wrote:
There's nothing to stop you from adding a full or partial specialisation of call_traits to your header: struct my_value_type{ ... }; namespace boost{ template<> struct call_traits<my_value_type> { typedef my_value_type value_type; typedef my_value_type& reference; typedef my_value_type const& const_reference; typedef my_value_type param_type; }; } // namespace Then once your functions are call_traits aware, it's then easy to tweek behaviour by adding specialisations as required. Does this help? John.

John Maddock wrote:
Yes this likely solves my problem. I hope the documentation can be updated to mention that specialization for user defined types is supported. Thanks, Michael Marcin

While I agree with it from a convenience-oriented point of view, I don't think using a reference is any slower than passing the POD by value. My point is that it'd not make a difference if sizeof(const T&) == sizeof(T). Does anyone knows of have some kind of benchmark showing passing an int by const ref is actually slower than by value? Thanks, Philippe

Philippe Vaucher wrote:
When we wrote that utility then yes we performed some benchmarks that indicated that pass-by-value is faster: the main difference being that the function body avoids any aliasing issues, so the compiler knows it can cache a const value in a register, but not if the object is refered to by reference. However, two things have happended since then: hardware has improved to the point where loads can be *almost* as fast as caching in a register - especially if both pipelining and good processor cache hits are involved. Also compilers are getting cleverer at performing the analysis required to avoid the aliasing issue, so <shrug> your mileage may vary. HTH, John.
participants (5)
-
Alex MDC
-
John Maddock
-
Michael Marcin
-
Philippe Vaucher
-
Thorsten Ottosen