[optional] generates unnessesary code for trivial types

When decompiling my code I noticed a bunch of unnessesary code caused by boost::optional. 1) deconstruction typedef boost::optional<int> optional_int; void deconstruct_boost_optional(optional_int& o){ o.~optional_int(); } One would expect this to do nothing. Instead gcc 4.6.0 with O3 generates: if(m_initialized){ // do nothing m_initialized = false; } 00000000 <deconstruct_boost_optional(boost::optional<int>&)>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 80 38 00 cmpb $0x0,(%eax) 7: 74 03 je c <deconstruct_boost_optional(boost::optional<int>&)+0xc> 9: c6 00 00 movb $0x0,(%eax) c: f3 c3 repz ret This one could be easily fixed by removing the bit that sets m_initialized to false, since we're deconstructing anyway. 2) assignment also generates these problems: void assign_boost_optional(optional_int& o){ o=13; } Here there's a semantic issue: we have to decide to use the copy constructor or operator=. This is also wasteful for POD types or any type which has_trivial_copy<>. 3) Even more expensive is if we want to copy an optional<int> void assign_boost_optional(optional_int& a,optional_int& b){ a=b; } 00000000 <assign_boost_optional(boost::optional<int>&, boost::optional<int>&)>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 8b 54 24 08 mov 0x8(%esp),%edx 8: 80 38 00 cmpb $0x0,(%eax) b: 74 0b je 18 <assign_boost_optional(boost::optional<int>&, boost::optional<int>&)+0x18> d: 80 3a 00 cmpb $0x0,(%edx) 10: 75 16 jne 28 <assign_boost_optional(boost::optional<int>&, boost::optional<int>&)+0x28> 12: c6 00 00 movb $0x0,(%eax) 15: c3 ret 16: 66 90 xchg %ax,%ax 18: 80 3a 00 cmpb $0x0,(%edx) 1b: 74 09 je 26 <assign_boost_optional(boost::optional<int>&, boost::optional<int>&)+0x26> 1d: 8b 52 04 mov 0x4(%edx),%edx 20: c6 00 01 movb $0x1,(%eax) 23: 89 50 04 mov %edx,0x4(%eax) 26: f3 c3 repz ret 28: 8b 52 04 mov 0x4(%edx),%edx 2b: 89 50 04 mov %edx,0x4(%eax) 2e: c3 ret Three possible branches! Theoretically single 64 bit copy do the job. I'm tempted to say: it would be best if for any T has_trivial_copy< optional<T> > iff has_trivial_copy<T>. It might make a sense to make an exception for huge T, where the copying an unused T is more expensive than the branching. 4) has_trivial_destructor<T> should impl has_trivial_destructor< optional<T> > , but this is hard to implement without specialization of optional. Checking has_trivial_destructor might take care of the complexity of optional<T&> since has_trivial_destructor< T& >. I'd be willing to fix #1. The other issues need some discussion. Chris

On Jan 25, 2012, at 12:28 PM, Hite, Christopher wrote:
When decompiling my code I noticed a bunch of unnessesary code caused by boost::optional.
I happen to have been looking at the source and generated code for boost::optional recently myself, so jumping in here with a few comments.
1) deconstruction typedef boost::optional<int> optional_int;
void deconstruct_boost_optional(optional_int& o){ o.~optional_int(); }
One would expect this to do nothing. Instead gcc 4.6.0 with O3 generates:
if(m_initialized){ // do nothing m_initialized = false; }
00000000 <deconstruct_boost_optional(boost::optional<int>&)>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 80 38 00 cmpb $0x0,(%eax) 7: 74 03 je c <deconstruct_boost_optional(boost::optional<int>&)+0xc> 9: c6 00 00 movb $0x0,(%eax) c: f3 c3 repz ret
This one could be easily fixed by removing the bit that sets m_initialized to false, since we're deconstructing anyway.
This sounds right to me. Note that eliminating the assignment of m_initialized would (in this case of a trivial destructor for T) make the entire clause controlled by the conditional be empty after optimization, allowing the compiler to optimize away the conditional too. What's going on here is that the destructor is calling the destroy() helper function, which does more work than the destructor actually needs, specifically setting m_initialized to false. Other callers of destroy() do need that assignment.
3) Even more expensive is if we want to copy an optional<int>
void assign_boost_optional(optional_int& a,optional_int& b){ a=b; }
00000000 <assign_boost_optional(boost::optional<int>&, boost::optional<int>&)>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 8b 54 24 08 mov 0x8(%esp),%edx 8: 80 38 00 cmpb $0x0,(%eax) b: 74 0b je 18 <assign_boost_optional(boost::optional<int>&, boost::optional<int>&)+0x18> d: 80 3a 00 cmpb $0x0,(%edx) 10: 75 16 jne 28 <assign_boost_optional(boost::optional<int>&, boost::optional<int>&)+0x28> 12: c6 00 00 movb $0x0,(%eax) 15: c3 ret 16: 66 90 xchg %ax,%ax 18: 80 3a 00 cmpb $0x0,(%edx) 1b: 74 09 je 26 <assign_boost_optional(boost::optional<int>&, boost::optional<int>&)+0x26> 1d: 8b 52 04 mov 0x4(%edx),%edx 20: c6 00 01 movb $0x1,(%eax) 23: 89 50 04 mov %edx,0x4(%eax) 26: f3 c3 repz ret 28: 8b 52 04 mov 0x4(%edx),%edx 2b: 89 50 04 mov %edx,0x4(%eax) 2e: c3 ret
Three possible branches! Theoretically single 64 bit copy do the job. I'm tempted to say: it would be best if for any T has_trivial_copy< optional<T> > iff has_trivial_copy<T>. It might make a sense to make an exception for huge T, where the copying an unused T is more expensive than the branching.
I think the generated code gets somewhat simplified once issue (1) is addressed. I think it would be a mistake to just blindly copy the value of b when b.m_initialized is false, if for no other reason than doing so will lead to endless user complaints about compiler and valgrind warnings. Also, invoking undefined behavior can result in the compiler doing very nasty and unexpected things, even in the absence of runtime issues from reading an "uninitialized" location. Consider the possibility that the compiler can prove that the optional being copied from is uninitialized, and so can conclude that the read of its value is undefined behavior. Probably the *best* one can hope for in such a situation is a compiler warning, and many far worse results are possible. While I think this shouldn't be necessary from a theoretical standpoint, in a practical sense it might make the optimizer's job a little easier (and so increase the chances of getting the code you are looking for) to change the assign(optional) member functions that presently look something like if (is_initialized()) if (rhs.is_initialized()) assign_value(…) else destroy() else if (rhs.is_initialized()) construct(…) to instead be something like if (rhs.is_initialized()) if (is_initialized()) assign_value(...) else construct(...) else destroy() or if ( ! rhs.is_initialized()) destroy() else if (is_initialized()) assign_value(...) else construct(…)

I don't personally think that the style of programming that optional is intended for is suitable for high performance/performance critical situations in the first place. Pass by reference and return a bool for a conditional return value. Pass the bool and the object separately for a conditional argument. Pass or return a pointer and check if it is null. Yes, my advice really is to not use optional if you want performance. Even if we did everything you can think of to make optional fast you are still better off designing your interfaces in such a way that you don't need it if your goal is performance. That copy that you are counting branches in is probably unnecessary in the first place. Safety, on the other hand, is also important. All this looking at assembly code generated by optional smacks of premature optimization. If you agree with the idea that optional is valuable because of safety considerations then write your application using optional and not worrying much about performance and get the functionality right then measure your performance and optimize the places where it matters by stripping out usage of optional or whatever else is slowing you down so you get safety most of the time (with most of the benefit) and performance where you actually need it. Life is about tradeoffs. Optional will never be perfect. I find that it is quite easy to write safe C++ interfaces without using optional, so I see no reason why you can't design code that is both safe and fast without it. I know the author of optional and you haven't convinced me that we should bother him. Regards, Luke

On Jan 25, 2012, at 6:20 PM, Simonson, Lucanus J wrote:
I don't personally think that the style of programming that optional is intended for is suitable for high performance/performance critical situations in the first place. Pass by reference and return a bool for a conditional return value. Pass the bool and the object separately for a conditional argument. Pass or return a pointer and check if it is null. Yes, my advice really is to not use optional if you want performance.
All of the offered suggestions require the caller to construct an initial object that can be passed (by reference / pointer) to the callee for replacement. That may be either inefficient (object is expensive to construct) or impossible (caller doesn't have access to an appropriate constructor).

Le 26/01/12 00:20, Simonson, Lucanus J a écrit :
I don't personally think that the style of programming that optional is intended for is suitable for high performance/performance critical situations in the first place. Pass by reference and return a bool for a conditional return value. Pass the bool and the object separately for a conditional argument. Pass or return a pointer and check if it is null. Yes, my advice really is to not use optional if you want performance. Even if we did everything you can think of to make optional fast you are still better off designing your interfaces in such a way that you don't need it if your goal is performance. That copy that you are counting branches in is probably unnecessary in the first place. Safety, on the other hand, is also important. All this looking at assembly code generated by optional smacks of premature optimization. If you agree with the idea that optional is valuable because of safety considerations then write your application using optional and not worrying muc h about performance and get the functionality right then measure your performance and optimize the places where it matters by stripping out usage of optional or whatever else is slowing you down so you get safety most of the time (with most of the benefit) and performance where you actually need it. Life is about tradeoffs. Optional will never be perfect.
I find that it is quite easy to write safe C++ interfaces without using optional, so I see no reason why you can't design code that is both safe and fast without it.
I know the author of optional and you haven't convinced me that we should bother him.
Hi, the user can not always redesign an interface using optional<T> as he could be not the owner (use of 3pp libraries). I'm sure the author/maintainer of optional would adopt some patches if it is probed a performance improvement for some specific cases. Best, Vicente

On 26.1.2012. 0:20, Simonson, Lucanus J wrote:
I don't personally think that the style of programming that optional is intended for is suitable for high performance/performance critical situations in the first place. Pass by reference and return a bool for a conditional return value. Pass the bool and the object separately for a conditional argument. Pass or return a pointer and check if it is null. Yes, my advice really is to not use optional if you want performance. Even if we did everything you can think of to make optional fast you are still better off designing your interfaces in such a way that you don't need it if your goal is performance. That copy that you are counting branches in is probably unnecessary in the first place. Safety, on the other hand, is also important. All this looking at assembly code generated by opt ional smacks of premature optimization. If you agree with the idea that optional is valuable because of safety considerations then write your application using optional and not worrying muc h about performance and get the functionality right then measure your performance and optimize the places where it matters by stripping out usage of optional or whatever else is slowing you down so you get safety most of the time (with most of the benefit) and performance where you actually need it. Life is about tradeoffs. Optional will never be perfect.
I find that it is quite easy to write safe C++ interfaces without using optional, so I see no reason why you can't design code that is both safe and fast without it.
I see no reason why we can't have safe _and_ fast _and_ optional? The rationale you gave is just typical premature pessimization apologetics that also somehow assumes that C++ is "safe and slow" and that you have to go "bare metal C" to have performance. Luckily that's just plain incorrect, to put it mildly. Sadly, that rationale nonetheless also too often gives us such bloatware as std::streams, lexical_cast or boost::filesystem... When you design a such a generic library how can there be "premature optimization"? -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

I see no reason why we can't have safe _and_ fast _and_ optional?
I'm actually glad to see you putting effort into making that happen. The effort required is the only reason. Performance wasn't the reason I don't use optional, but for those who do use it I'm sure it will be valuable.
The rationale you gave is just typical premature pessimization apologetics that also somehow assumes that C++ is "safe and slow" and that you have to go "bare metal C" to have performance. Luckily that's just plain incorrect, to put it mildly. Sadly, that rationale nonetheless also too often gives us such bloatware as std::streams, lexical_cast or boost::filesystem...
I said safe and fast C++ without optional, which isn't the same thing as "bare metal C". Bare metal C wouldn't qualify as safe. I'm as annoyed by the "C is faster than C++, ergo I never learned C++" guys as you are. Optional was implemented to be safe and slow because it was targeting safe and slow use cases. For POD types and anything that has a default constructor a std::pair<bool, T> seems fine to me. Regards, Luke

On Friday, January 27, 2012 18:57:30 Simonson, Lucanus J wrote:
Optional was implemented to be safe and slow because it was targeting safe and slow use cases. For POD types and anything that has a default constructor a std::pair<bool, T> seems fine to me.
I'm failing to see why optional should be slow. I use it extensively, POD types included, and I don't consider pair<bool, T> as a valid replacement. I'll be glad if it gets optimized for POD types, why not?

On 27 January 2012 12:57, Simonson, Lucanus J <lucanus.j.simonson@intel.com> wrote:
Optional was implemented to be safe and slow because it was targeting safe and slow use cases.
You are saying it is *deliberately* slow??
For POD types and anything that has a default constructor a std::pair<bool, T> seems fine to me.
I don't want to write a different style of code depending on whether or not a type T is default constructible. I can't easily pass things like this to templates because I have to write special cases all over the place. Optional models my intent. I want to take advantage of RVO. Some things that are default constructible are still very expensive to construct (such as std::deque under gcc). When you initialize class members, do you use member intializer lists for default constructible types or do you just throw a bunch of assignments in the body of the constructor? If the former, why do you do it, given that default construction followed by assignment "seems fine" to you? -- Nevin ":-)" Liber <mailto:nevin@eviloverlord.com> (847) 691-1404

on Wed Jan 25 2012, "Simonson, Lucanus J" <lucanus.j.simonson-AT-intel.com> wrote:
I don't personally think that the style of programming that optional is intended for is suitable for high performance/performance critical situations in the first place.
Why not? It seems like a great candidate for common compiler optimizations.
Pass by reference and return a bool for a conditional return value. Pass the bool and the object separately for a conditional argument. Pass or return a pointer and check if it is null. Yes, my advice really is to not use optional if you want performance.
Why?
Even if we did everything you can think of to make optional fast you are still better off designing your interfaces in such a way that you don't need it if your goal is performance.
Why do you say that? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

From: Dave Abrahams
I don't personally think that the style of programming that optional is intended for is suitable for high performance/performance critical situations in the first place.
Why not? It seems like a great candidate for common compiler optimizations.
To some extent it depends what style of programming optional is intended for. What I had in mind was the highly object oriented defensive programming style that emphasizes safety often at the expense of performance in vogue around the time Java came out.
Pass by reference and return a bool for a conditional return value. Pass the bool and the object separately for a conditional argument. Pass or return a pointer and check if it is null. Yes, my advice really is to not use optional if you want performance.
Why?
I like pass by reference and return a bool over returning an optional for performance because we allocate memory for the result of the function outside of the function call and there is no transfer of ownership of the result. Even with move semantics, you have just changed an unnecessary copy into cheaper unnecessary move.
Even if we did everything you can think of to make optional fast you are still better off designing your interfaces in such a way that you don't need it if your goal is performance.
Why do you say that?
I don't trust the compiler to always inline what I want it to if it is busy inlining the optional function calls. The compiler heuristics for inlining can get overloaded and confused as the number of nested inline functions grows. There are no inline function calls to check the bool return value of a function or use the reference passed to the function. I believe that getting the ownership of the data at the right place in the code for performance is preferable to transferring ownership, even with move. It also helps the compiler optimize to be given less code that looks more like what you want the compiler to produce at the end so that it has less opportunity to fail to give you what you wanted. We can imagine an arbitrarily good compiler that always does what we intend, but a compiler that generates a branch for "if(m_initialize) m_initialize = false" is clearly not the ideal compiler we imagine. I did come around to supporting optimization of optional, it might as well be as good of a trade off between safety and performance as we can make it. I don't use optional myself because I prefer alternative syntax for simplicity reasons, convenience, fewer dependencies, etc, and not even performance reasons. Regards, Luke

On Tue, Jan 31, 2012 at 7:49 AM, Simonson, Lucanus J < lucanus.j.simonson@intel.com> wrote:
From: Dave Abrahams
I don't personally think that the style of programming that optional is intended for is suitable for high performance/performance critical situations in the first place.
Why not? It seems like a great candidate for common compiler optimizations.
To some extent it depends what style of programming optional is intended for. What I had in mind was the highly object oriented defensive programming style that emphasizes safety often at the expense of performance in vogue around the time Java came out.
But if we can maintain the same level of safety, while at the same time increasing efficiency, doesn't that benefit everyone?

On Jan 30, 2012, at 3:49 PM, Simonson, Lucanus J wrote:
I like pass by reference and return a bool over returning an optional for performance because we allocate memory for the result of the function outside of the function call and there is no transfer of ownership of the result.
Personally, I like returning values rather than modifying arguments. But more importantly, the caller might not even be able to construct that object to be passed by reference, due to lack of access to an appropriate combination of constructor and initialization arguments, such as when the class has no default constructor.
Even with move semantics, you have just changed an unnecessary copy into cheaper unnecessary move.
If one cares about performance and one's compiler is not capable of doing RVO for optionals, perhaps one should be looking for a better compiler, and not just for better handling of optionals.

on Mon Jan 30 2012, Kim Barrett <kab.conundrums-AT-verizon.net> wrote:
On Jan 30, 2012, at 3:49 PM, Simonson, Lucanus J wrote:
I like pass by reference and return a bool over returning an optional for performance because we allocate memory for the result of the function outside of the function call and there is no transfer of ownership of the result.
Personally, I like returning values rather than modifying arguments. But more importantly, the caller might not even be able to construct that object to be passed by reference, due to lack of access to an appropriate combination of constructor and initialization arguments, such as when the class has no default constructor.
Even with move semantics, you have just changed an unnecessary copy into cheaper unnecessary move.
If one cares about performance and one's compiler is not capable of doing RVO for optionals, perhaps one should be looking for a better compiler, and not just for better handling of optionals.
IIRC, RVO is now mandated where it's possible, so the whole move argument is kina moot. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Thanks for your feedback.
I think the generated code gets somewhat simplified once issue (1) is addressed.
It would help, but I think won't get rid of all the branches. Your refactoring might help more.
I think it would be a mistake to just blindly copy the value of b when b.m_initialized is false, if for no other reason than doing so will lead to endless user complaints about compiler and valgrind warnings. Also, invoking undefined behavior can result in the compiler doing very nasty and unexpected things, even in the absence of runtime issues from reading an "uninitialized" location. Consider the possibility that the compiler can prove that the optional being copied from is uninitialized, and so can conclude that the read of its value is undefined behavior. Probably the *best* one can hope for in such a situation is a compiler warning, and many far worse results are possible.
Consider the completely legal code below: struct cheap_optional_int{ cheap_optional_int() : m_initialized() {} // don't init m_data bool m_initialized; int m_data; }; void assign_boost_cheap_optional_int(cheap_optional_int& a,cheap_optional_int& b){ a=b; // default impl } The compiler generates nothing but 32-bit moves from the source to the destination. This is completely fine for valgrind. It only complains if a branch based is taken based on uninitialized data. 00000000 <assign_boost_cheap_optional_int(cheap_optional_int&, cheap_optional_int&)>: 0: 53 push %ebx 1: 8b 44 24 0c mov 0xc(%esp),%eax 5: 8b 58 04 mov 0x4(%eax),%ebx 8: 8b 08 mov (%eax),%ecx a: 8b 44 24 08 mov 0x8(%esp),%eax e: 89 08 mov %ecx,(%eax) 10: 89 58 04 mov %ebx,0x4(%eax) 13: 5b pop %ebx 14: c3 ret Sorry the assembler is so poorly formatted after it's mailed. The cool thing is cheap_optional_int has_trivial_destructor and has_trivial_copy because we haven't overridden the defaults. Unfotunately overriding the default ctor/dtor always breaks these, even if the code could be optimized out. It may not even be possible for a compiler to solve. Chris _____________________________________________ From: Hite, Christopher Sent: Wednesday, January 25, 2012 6:29 PM To: 'boost@lists.boost.org' Subject: [optional] generates unnessesary code for trivial types When decompiling my code I noticed a bunch of unnessesary code caused by boost::optional. 1) deconstruction typedef boost::optional<int> optional_int; void deconstruct_boost_optional(optional_int& o){ o.~optional_int(); } One would expect this to do nothing. Instead gcc 4.6.0 with O3 generates: if(m_initialized){ // do nothing m_initialized = false; } 00000000 <deconstruct_boost_optional(boost::optional<int>&)>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 80 38 00 cmpb $0x0,(%eax) 7: 74 03 je c <deconstruct_boost_optional(boost::optional<int>&)+0xc> 9: c6 00 00 movb $0x0,(%eax) c: f3 c3 repz ret This one could be easily fixed by removing the bit that sets m_initialized to false, since we're deconstructing anyway. 2) assignment also generates these problems: void assign_boost_optional(optional_int& o){ o=13; } Here there's a semantic issue: we have to decide to use the copy constructor or operator=. This is also wasteful for POD types or any type which has_trivial_copy<>. 3) Even more expensive is if we want to copy an optional<int> void assign_boost_optional(optional_int& a,optional_int& b){ a=b; } 00000000 <assign_boost_optional(boost::optional<int>&, boost::optional<int>&)>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 8b 54 24 08 mov 0x8(%esp),%edx 8: 80 38 00 cmpb $0x0,(%eax) b: 74 0b je 18 <assign_boost_optional(boost::optional<int>&, boost::optional<int>&)+0x18> d: 80 3a 00 cmpb $0x0,(%edx) 10: 75 16 jne 28 <assign_boost_optional(boost::optional<int>&, boost::optional<int>&)+0x28> 12: c6 00 00 movb $0x0,(%eax) 15: c3 ret 16: 66 90 xchg %ax,%ax 18: 80 3a 00 cmpb $0x0,(%edx) 1b: 74 09 je 26 <assign_boost_optional(boost::optional<int>&, boost::optional<int>&)+0x26> 1d: 8b 52 04 mov 0x4(%edx),%edx 20: c6 00 01 movb $0x1,(%eax) 23: 89 50 04 mov %edx,0x4(%eax) 26: f3 c3 repz ret 28: 8b 52 04 mov 0x4(%edx),%edx 2b: 89 50 04 mov %edx,0x4(%eax) 2e: c3 ret Three possible branches! Theoretically single 64 bit copy do the job. I'm tempted to say: it would be best if for any T has_trivial_copy< optional<T> > iff has_trivial_copy<T>. It might make a sense to make an exception for huge T, where the copying an unused T is more expensive than the branching. 4) has_trivial_destructor<T> should impl has_trivial_destructor< optional<T> > , but this is hard to implement without specialization of optional. Checking has_trivial_destructor might take care of the complexity of optional<T&> since has_trivial_destructor< T& >. I'd be willing to fix #1. The other issues need some discussion. Chris

I don't personally think that the style of programming that optional is intended for is suitable for high performance/performance critical situations in the first place.
You may be right, but you're talking about different use cases. I've got a protocol de/encoders so I want a friendly high level representation of messages that I want to hand off between modules. Imagine a struct with an optional substruct. Valid alternatives: a pointer to the substruct. Even if I can put the second structure on the stack, this might mean less cache hits. The total extra size is also increased bool=>pointer. Another option sometimes possible is a nullable value. FAST-FIX's nullable integer for example increments all non-negative values and uses 0 to represent a null. Another option is to use a presence map at the top of a structure with one bit(or byte) per optional field. That might help with alignment.
I find that it is quite easy to write safe C++ interfaces without using optional...
Yes I used optional because I knew it would do things correctly.
you haven't convinced me
Just focus on #1 first. Not writing to m_initialized in the deconstructor would benifit all use cases of optional. It can't be the solution to just not use boost everytime there's a performance issue.

On 27.1.2012. 11:32, Domagoj Saric wrote:
a) the lifetime management bool was changed into a properly typed pointer (this actually takes the same amount of space while it provides a no-op get_ptr() member function as well as easier debugging as the contents of optional can now clearly be seen through the pointer, as opposed to gibberish in an opaque storage array) I'd support this only if were configurable. It takes more space for small or non-word-aligned data. It might be more expensive on some systems to calculate the address and store it.
I did think that about defaulting to int_fast8_t for the bool if its alignment>= alignment of T. On my x86 system it's still 1 byte though. On some system it might help. It would also break has_trivial_copy. If someone was naughty and memcopied them, the new version would lead to a very hard to find bug. As for the debugger the new C++ allows for a union to contain a class. So if a placeholder implemention using such a union would show the data in debug.
d) skips redundant/dead stores of marking itself as uninitialised [including but limited to, in its destructor (if it has one)] e) streamlined internal assign paths to help the compiler avoid unnecessary branching Sounds like what I'm after.
f) added direct_create() and direct_destroy() member functions that allow the user to bypass the internal lifetime management (they only assert correct usage) in situations where the user's own external logic already implicitly knows the state of the optional Sounds good. I also wanted these.
g) optional now declares and defines a destructor only if the contained type has a non-trivial destructor (this prevents the compiler from detecting false EH states and thus generating bogus EH code) Yes, that's what I want.
h) optional marks itself as uninitialised _before_ calling the contained object's destructor (this makes it a little more robust in race conditions; it is of course not a complete solution for such scenarios, those require external "help" and/or (m)-reference counting to be implemented) Seems to contradict (g). I'd support something like that only if it can be configured out. Maybe there's some case completely out of optional's scope where you use atomic ops.
If you factor out the aligned storage you can build something else that does ref-counting or a thread safe state machine or whatever.
i) extracted the "placeholder" functionality into a standalone class (basically what would be left of optional<> if the lifetime management "bool" member and logic was removed) so that it can be reused (e.g. for singleton like classes, or when more complex custom lifetime management is required) I 100% agree with this. I think there should be one placeholder implementation. I think boost::function should use it as well. I think it may be useful to users.
k) the lifetime management pointer is now stored after the actual contained object (this helps in avoiding more complex/offset addressing when accessing optionals through pointers w/o checking whether they are initialised) Seems weird. If the front of T is more likely to be used (and old char buffer), your pointer may wind up in a different cache line.
o) avoid branching in assignment and copy construction of optionals that hold PODs smaller than N * sizeof( void * ) where N is some small number Again it would be cool if the user had control over this.
I'm going to have to check out your code. So the big thing I take away from all this it would be really nice if some things were configurable. How do we do that without breaking code? Changing the signature to optional<T,Properties=optional_traits<T> >, might break code that uses boost::optional as a template template parameter. You could just refer to optional_traits inside and force the user to specialize it for his T, but that could create violations of the one definition rule. Also is it OK for optional to depend on enable_if/SFINAE and type traits? Chris

Hite, Christopher wrote:
So the big thing I take away from all this it would be really nice if some things were configurable. How do we do that without breaking code?
Quite possibly, you'll need to introduce a new type that provides the configurability you want, while hardcoding backward compatible choices for the existing optional. _____ Rob Stewart robert.stewart@sig.com Software Engineer using std::disclaimer; Dev Tools & Components Susquehanna International Group, LLP http://www.sig.com ________________________________ IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

On 1.2.2012. 17:53, Hite, Christopher wrote:
On 27.1.2012. 11:32, Domagoj Saric wrote:
a) the lifetime management bool was changed into a properly typed pointer (this actually takes the same amount of space while it provides a no-op get_ptr() member function as well as easier debugging as the contents of optional can now clearly be seen through the pointer, as opposed to gibberish in an opaque storage array) I'd support this only if were configurable. It takes more space for small or non-word-aligned data.
True, I was planning on automatically deciding between bool and pointer based on sizeof( T ) after adding lifetime management policy support (m)...
It might be more expensive on some systems to calculate the address and store it.
How? You have to fetch the address either way...
It would also break has_trivial_copy. If someone was naughty and memcopied them, the new version would lead to a very hard to find bug.
True, didn't think about trivial copy until Sebastian outlined the pass-POD-in-register requirements of the AMD x64 ABI. WRT to this it boils down to whether you want a no-op get_ptr() or your platform and compiler actually support passing PODs in registers _and_ most of the types you store in optionals actually satisfy the compiler/ABI requirements for that _and_ you mostly pass and return those optionals by value...
As for the debugger the new C++ allows for a union to contain a class. So if a placeholder implemention using such a union would show the data in debug.
But the pointer approach would also work with "real world" compilers ;)
h) optional marks itself as uninitialised _before_ calling the contained object's destructor (this makes it a little more robust in race conditions; it is of course not a complete solution for such scenarios, those require external "help" and/or (m)-reference counting to be implemented) Seems to contradict (g). I'd support something like that only if it can be configured out. Maybe there's some case completely out of optional's scope where you use atomic ops.
It doesn't (contradict (g)), this applies only to situations where you actually have to mark the optional as empty (such as when reset() is called).
If you factor out the aligned storage you can build something else that does ref-counting or a thread safe state machine or whatever.
With (m) I'd rather (in some distant future:) add a refcounting policy to optional (or some future underlying more generic class) so that users don't have to reimplement this...
k) the lifetime management pointer is now stored after the actual contained object (this helps in avoiding more complex/offset addressing when accessing optionals through pointers w/o checking whether they are initialised) Seems weird. If the front of T is more likely to be used (and old char buffer), your pointer may wind up in a different cache line.
Well yes, as I said this benefits only the cases where the pointer/bool is not accessed (when an optional is accessed through a pointer/reference). IOW in 99.9% of real world cases the point is quite moot but it did make sense at a particular stage of a project I'm working on (when you have dozens of hundreds of template generated functions you can actually measure savings in code size when you do even such micromanagement). It no longer matters for me but the layout of optional2 is still like that (currently) purely because it turned out like that (in the current stage of development) so I wrote point (k) nonetheless just for the feedback ;)
So the big thing I take away from all this it would be really nice if some things were configurable. How do we do that without breaking code?
Changing the signature to optional<T,Properties=optional_traits<T> >, might break code that uses boost::optional as a template template parameter.
Judging for example from the rationale for the lack of smart_ptr configurability or from the feedback I got for my improved boost::function proposal, it would be very difficult for this type of configurability for optional to get accepted. I was rather planing on making the best of optional with automatic/self configuration based on properties of T and then later (in a galaxy far far away:) propse an underlying library ("smart resource" or something like that), that would separate the lifetime management and storage concerns in a maximally configurable manner, on top of which traditional optional smart_ptr could be built... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

On 2.2.2012. 16:18, Domagoj Saric wrote:
On 1.2.2012. 17:53, Hite, Christopher wrote:
On 27.1.2012. 11:32, Domagoj Saric wrote: k) the lifetime management pointer is now stored after the actual contained object (this helps in avoiding more complex/offset addressing when accessing optionals through pointers w/o checking whether they are initialised) Seems weird. If the front of T is more likely to be used (and old char buffer), your pointer may wind up in a different cache line.
Well yes, as I said this benefits only the cases where the pointer/bool is not accessed (when an optional is accessed through a pointer/reference). IOW in 99.9% of real world cases the point is quite moot but it did make sense at a particular stage of a project I'm working on (when you have dozens of hundreds of template generated functions you can actually measure savings in code size when you do even such micromanagement). It no longer matters for me but the layout of optional2 is still like that (currently) purely because it turned out like that (in the current stage of development) so I wrote point (k) nonetheless just for the feedback ;)
Actually I forgot a, personally, much more important reason why placing the contained object at the beginning/same address as optional itself was more desirable. My optional use cases generally fall into two categories, optionals of fundamental types (bools, ints and floats) and small PODs and optionals of nontrivial GUI objects. The latter case usually looks like this (a compile-time generated Model-View-Controller design where the "controller" is "short circuited" for simplicity and efficiency): template <typename T> class Model { optional<View<Model>> optionalGUI_; }; Without a "controller", View<Model> needs to access its Model instance and instead of storing a Model pointer it can simply deduce its address from its own address (knowing that Views only ever exist as members of Models). When View is inside an optional it first needs to calculate the address of optional<View<Model>> from its own address and then the address of the Model parent from the optional address. And the crux of the problem is: to calculate the address of the optional it needs to know the layout of optional... (Incidentally the current/original optional allowed for an ugly way to calculate the offset of the contained object by using a helper class that derives from optional_base...) -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

Sorry for not coming back quicker. I've been sick. I did some experimenting in my own codebase with a "array_vector" which acts like vector constructs things when they're added, but like boost::array uses a fixed size array. I tested the techniques I would use to improve optional. So I think I can deliver this very small set of goals cleanly: 1) ~optional doesn't set m_initialized. 2) has_trivial_destructor<T> implies has_trivial_destructor<optional<T> > 3) has_has_trivial_copy<T> and has_trivial_assign<T> implies them optional unless sizeof(T) exceeds some constant max_trivial_copy_Size, which can also be overridden. 4) I'll define a optional_traits<T> with defaults and an optional_with_traits<T,Traits=optional_traits<T> > which can be used to make optionals which override features and from which optional<T> will derive. That's the best compromise if I can't change the signature of optional (Is Robert Stewart right?). I think we should use the traits technique for any new libraries. Thanks Sebastian Redl and Domagoj Saric for pointing out that (2) and (3) will may help some compilers put cheap optionals in registers. Shall I continue? Should I make branch or do it in trunk? Chris

On Tuesday, February 07, 2012 22:36:56 Hite, Christopher wrote:
Sorry for not coming back quicker. I've been sick.
I did some experimenting in my own codebase with a "array_vector" which acts like vector constructs things when they're added, but like boost::array uses a fixed size array.
I tested the techniques I would use to improve optional. So I think I can deliver this very small set of goals cleanly:
1) ~optional doesn't set m_initialized.
2) has_trivial_destructor<T> implies has_trivial_destructor<optional<T> >
3) has_has_trivial_copy<T> and has_trivial_assign<T> implies them optional unless sizeof(T) exceeds some constant max_trivial_copy_Size, which can also be overridden.
4) I'll define a optional_traits<T> with defaults and an optional_with_traits<T,Traits=optional_traits<T> > which can be used to make optionals which override features and from which optional<T> will derive. That's the best compromise if I can't change the signature of optional (Is Robert Stewart right?). I think we should use the traits technique for any new libraries.
Do I understand it correctly that optional_with_traits is an advanced replacement for optional? If so, will the good old optional be optimized? I think, it is possible to optimize the current optional without changing its signature if we specialize optional_detail::optional_base on the types or traits we're interested in. BTW, I would really like to see optional< T& > optimized to store T* internally.
Shall I continue? Should I make branch or do it in trunk?
I think, a branch or sandbox is a good start.

On 7.2.2012. 22:36, Hite, Christopher wrote:
I tested the techniques I would use to improve optional. So I think I can deliver this very small set of goals cleanly:
1) ~optional doesn't set m_initialized.
2) has_trivial_destructor<T> implies has_trivial_destructor<optional<T> >
3) has_has_trivial_copy<T> and has_trivial_assign<T> implies them optional unless sizeof(T) exceeds some constant max_trivial_copy_Size, which can also be overridden.
4) I'll define a optional_traits<T> with defaults and an optional_with_traits<T,Traits=optional_traits<T> > which can be used to make optionals which override features and from which optional<T> will derive. That's the best compromise if I can't change the signature of optional (Is Robert Stewart right?). I think we should use the traits technique for any new libraries.
Thanks Sebastian Redl and Domagoj Saric for pointing out that (2) and (3) will may help some compilers put cheap optionals in registers.
Shall I continue? Should I make branch or do it in trunk?
The optional in sandbox (that passes regression tests) already does 1 and 2 (among many other things) so doing it from scratch again would be reinventing the wheel. ad 3) I would agree to such a compromise: that a bool be used for small PODs (so that they get trivial copy and assign) and a pointer for everything else (so that these get a no-op get_ptr() and nice debugging)... [In my version PODs always/implicitly get "nice debugging" regardless of the lifetime management implementation (bool/pointer/...).] ad 4) As said before, even though my personal prima facie stance is always "the more configurability the better", it is highly unlikely (from reasons previously given) that changing optional's signature would pass. Given that, the best workaround IMO for such "ancient"/"written in stone" constructs that suffer from the "Joe Sixpack" approach, i.e. they are good enough for 90% use cases, is to: - create a separate configurable construct and use it as an implementation detail of the original construct that maximally auto-configures based on T (improving the "good enough percentage" to "98%") - provide global configuration (that overrides auto-configuration) for the original construct (improving the "good enough percentage" to "99.8%") ...and the remaining "0.2%" can use the new construct directly... So far this corresponds to your optional_with_traits approach except that I don't think that providing global configuration by overriding/specializing the default traits is the correct approach. As you noted, this can violate the ODR and AFAIK users are not used that changing a _type_ can violate the ODR and change the behaviour of another type. I'd rather use macros for that (e.g. #define BOOST_OPTIONAL_MAX_BRANCHLESS_COPY_SIZE 4 * sizeof( void * )) because programmers are already used/"trained" to be careful with macros WRT to the ODR _and_ because there already exist tools/compilers which can detect macro ODR violations at link time (e.g. MSVC10)... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

----- Original Message -----
From: Domagoj Saric <domagoj.saric@littleendian.com> To: boost@lists.boost.org Cc: Sent: Thursday, February 9, 2012 7:45 AM Subject: Re: [boost] [optional] generates unnessesary code for trivial types
I tested the techniques I would use to improve optional. So I think I can deliver this very small set of goals cleanly:
1) ~optional doesn't set m_initialized.
2) has_trivial_destructor<T> implies has_trivial_destructor<optional<T> >
3) has_has_trivial_copy<T> and has_trivial_assign<T> implies
On 7.2.2012. 22:36, Hite, Christopher wrote: them optional
unless sizeof(T) exceeds some constant max_trivial_copy_Size, which can also be overridden.
4) I'll define a optional_traits<T> with defaults and an optional_with_traits<T,Traits=optional_traits<T> > which can be used to make optionals which override features and from which optional<T> will derive. That's the best compromise if I can't change the signature of optional (Is Robert Stewart right?). I think we should use the traits technique for any new libraries.
Thanks Sebastian Redl and Domagoj Saric for pointing out that (2) and (3) will may help some compilers put cheap optionals in registers.
Shall I continue? Should I make branch or do it in trunk?
The optional in sandbox (that passes regression tests) already does 1 and 2 (among many other things) so doing it from scratch again would be reinventing the wheel.
ad 3) I would agree to such a compromise: that a bool be used for small PODs (so that they get trivial copy and assign) and a pointer for everything else (so that these get a no-op get_ptr() and nice debugging)... [In my version PODs always/implicitly get "nice debugging" regardless of the lifetime management implementation (bool/pointer/...).]
ad 4) As said before, even though my personal prima facie stance is always "the more configurability the better", it is highly unlikely (from reasons previously given) that changing optional's signature would pass. Given that, the best workaround IMO for such "ancient"/"written in stone" constructs that suffer from the "Joe Sixpack" approach, i.e. they are good enough for 90% use cases, is to: - create a separate configurable construct and use it as an implementation detail of the original construct that maximally auto-configures based on T (improving the "good enough percentage" to "98%") - provide global configuration (that overrides auto-configuration) for the original construct (improving the "good enough percentage" to "99.8%") ...and the remaining "0.2%" can use the new construct directly...
So far this corresponds to your optional_with_traits approach except that I don't think that providing global configuration by overriding/specializing the default traits is the correct approach. As you noted, this can violate the ODR and AFAIK users are not used that changing a _type_ can violate the ODR and change the behaviour of another type. I'd rather use macros for that (e.g. #define BOOST_OPTIONAL_MAX_BRANCHLESS_COPY_SIZE 4 * sizeof( void * )) because programmers are already used/"trained" to be careful with macros WRT to the ODR _and_ because there already exist tools/compilers which can detect macro ODR violations at link time (e.g. MSVC10)...
Actually, you could just take the optional_traits as the first parameter. So you define optional<T> or optional<optional_traits<my_traits<T> > >. Then optional would be specialized for optional_traits that will get the user-defined traits.

"paul Fultz" je napisao u poruci interesnoj grupi:1328802527.4759.YahooMailNeo@web112602.mail.gq1.yahoo.com...
Actually, you could just take the optional_traits as the first parameter. So you define optional<T> or optional<optional_traits<my_traits<T> > >. Then optional would be specialized for optional_traits that will get the user-defined traits.
(possibly a bit of work to still get the special trivial destructor and assignment functionality in the specialization, but) Clever ;) -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

----- Original Message -----
From: Domagoj Saric <dsaritz@gmail.com> To: boost@lists.boost.org Cc: Sent: Thursday, February 9, 2012 2:09 PM Subject: Re: [boost] [optional] generates unnessesary code for trivial types
& quot;paul Fultz" je napisao u poruci interesnoj grupi:1328802527.4759.YahooMailNeo@web112602.mail.gq1.yahoo.com...
Actually, you could just take the optional_traits as the first parameter. So you define optional<T> or optional<optional_traits<my_traits<T> > . Then optional would be specialized for optional_traits that will get the user-defined traits.
(possibly a bit of work to still get the special trivial destructor and assignment functionality in the specialization, but) Clever ;)
Actually, you could use an optional_impl class, that always uses traits. And then when the user is not passing in their own traits you would pass in default_traits. Something like this: template<class T> class optional : public optional_impl<default_traits<T> > { //Foward constructors, and operators }; template<class Trait> class optional<optional_traits<Trait> : public optional_impl<Trait > { //Foward constructors, and operators }; Then the assign operator would forward to an assign method in the base class. Of course, this would mean that if T is trivially assignable, optional<T> would not be trivially assignable. Was that one of your goals of the original design?

on Thu Feb 09 2012, paul Fultz <pfultz2-AT-yahoo.com> wrote:
Actually, you could just take the optional_traits as the first parameter. So you define optional<T> or optional<optional_traits<my_traits<T> > >. Then optional would be specialized for optional_traits that will get the user-defined traits.
Be aware that tricks like this start to break down in generic contexts, as optional_traits<T> specializations are treated differently by optional from all other types. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

"Domagoj Saric" je napisao u poruci interesnoj grupi:jh0f5v$ljr$1@dough.gmane.org...
So far this corresponds to your optional_with_traits approach except that I don't think that providing global configuration by overriding/specializing the default traits is the correct approach. As you noted, this can violate the ODR and AFAIK users are not used that changing a _type_ can violate the ODR and change the behaviour of another type.
Or I might just be babbling :) That's what traits are for (when per type as opposed to per instantiation configuration is enough/desired)... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

On 25.1.2012. 18:28, Hite, Christopher wrote:
When decompiling my code I noticed a bunch of unnessesary code caused by boost::optional.
Hi, I've recently created an improved internal version of boost::optional to help workaround two issues: - suboptimal codegen - concurrent access. You can now find this version under https://svn.boost.org/svn/boost/sandbox/optional. So far the following has been done: a) the lifetime management bool was changed into a properly typed pointer (this actually takes the same amount of space while it provides a no-op get_ptr() member function as well as easier debugging as the contents of optional can now clearly be seen through the pointer, as opposed to gibberish in an opaque storage array) b) added another conditional constructor that accepts an in-place factory c) uses the safe bool idiom implementation from Boost.Range (which generates better code on pre MSVC10 compilers) d) skips redundant/dead stores of marking itself as uninitialised [including but limited to, in its destructor (if it has one)] e) streamlined internal assign paths to help the compiler avoid unnecessary branching f) added direct_create() and direct_destroy() member functions that allow the user to bypass the internal lifetime management (they only assert correct usage) in situations where the user's own external logic already implicitly knows the state of the optional g) optional now declares and defines a destructor only if the contained type has a non-trivial destructor (this prevents the compiler from detecting false EH states and thus generating bogus EH code) h) optional marks itself as uninitialised _before_ calling the contained object's destructor (this makes it a little more robust in race conditions; it is of course not a complete solution for such scenarios, those require external "help" and/or (m)-reference counting to be implemented) i) extracted the "placeholder" functionality into a standalone class (basically what would be left of optional<> if the lifetime management "bool" member and logic was removed) so that it can be reused (e.g. for singleton like classes, or when more complex custom lifetime management is required) j) added compiler specific "aids" to workaround situations when the compiler is unable to detect that placement new will never return a nullptr (and then generates bogus branching) - IOW "optional<int> optional_number( 3 );" no longer generates a branch before storing "3" (yes "LOL":) k) the lifetime management pointer is now stored after the actual contained object (this helps in avoiding more complex/offset addressing when accessing optionals through pointers w/o checking whether they are initialised) l) removed support for antediluvian compilers (MSVC6, BCB5) todo: m) lifetime management policy: bool, pointer, reference count (+ a more generic abstraction/interop with smart_ptr)... n) zero size overhead for optional references (requires (m)) o) avoid branching in assignment and copy construction of optionals that hold PODs smaller than N * sizeof( void * ) where N is some small number - temporarily renamed to optional2 to avoid collision with the original optional - passes all optional unit tests (after being renamed back to optional) with MSVC10 SP1 and Apple Clang 3.0 (from Xcode 4.2.1) Hope it helps ;) ps. AFAICT the only real obstacle in having really nice codegen with boost::optional<a_fundamental_type> is lack of proper ABI/compiler support for passing and returning small structs in registers... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

On 27 January 2012 10:32, Domagoj Saric <domagoj.saric@littleendian.com> wrote:
Hi, I've recently created an improved internal version of boost::optional to help workaround two issues: - suboptimal codegen - concurrent access.
Your changes sound interesting! (I'm not as sure about the "concurrent access" stuff, but only because I haven't given it much thought yet.) Regards, -- Nevin ":-)" Liber <mailto:nevin@eviloverlord.com> (847) 691-1404

On Fri, Jan 27, 2012 at 5:32 PM, Domagoj Saric <domagoj.saric@littleendian.com> wrote:
a) the lifetime management bool was changed into a properly typed pointer (this actually takes the same amount of space while it provides a no-op
AFAIK bool and pointer aren't the same size. How can it still take the same amount of space? Olaf

On Sat, Jan 28, 2012 at 11:34 PM, Olaf van der Spek <ml@vdspek.org> wrote:
On Fri, Jan 27, 2012 at 5:32 PM, Domagoj Saric <domagoj.saric@littleendian.com> wrote:
a) the lifetime management bool was changed into a properly typed pointer (this actually takes the same amount of space while it provides a no-op
AFAIK bool and pointer aren't the same size. How can it still take the same amount of space?
Olaf
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
My guess would be that the compiler promotes the size of a bool to that of the native word size of the machine because the ease and speed of aligned memory access outweigh the 'size savings' (as typically your object is going to need to occupy an entire register, word on the stack, etc -- except when in an array, but that's actually another reason you want to have the size of the object promoted, as again, unaligned memory access is slow). Afaik though, in code it will typically still be treated as if it were e.g. 1 byte on x86 (using the AL register instead of EAX), and simply ignoring the high portion of the register. Note: Not a compiler/optimization/cpu/etc expert. This is just my amateur 'guess'. If you are really curious, just compile a test and disassemble it with GDB/WinDbg/IDA/etc, testing the codegen for various scenarios and optimization flags.

On 28.1.2012. 13:34, Olaf van der Spek wrote:
On Fri, Jan 27, 2012 at 5:32 PM, Domagoj Saric <domagoj.saric@littleendian.com> wrote:
a) the lifetime management bool was changed into a properly typed pointer (this actually takes the same amount of space while it provides a no-op
AFAIK bool and pointer aren't the same size. How can it still take the same amount of space?
A "language lawyer" might be more precise but, for optional<T>: - if the bool member is before the T member the compiler has to add alignment_of<T>::value - sizeof( bool ) bytes of padding after the bool so that the T member would be properly aligned - if the bool member is after the T member the compiler has to add the same amount of padding after the bool member to satisfy the requirement that there are no "holes" between individual (properly aligned) instances of optional<T> in arrays of optional<T>... IOW, my statement in (a) does not for example hold for chars or shorts... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

I support this work. Optional should be optimal :-) on Fri Jan 27 2012, Domagoj Saric <domagoj.saric-AT-littleendian.com> wrote:
a) the lifetime management bool was changed into a properly typed pointer (this actually takes the same amount of space while it provides a no-op get_ptr() member function as well as easier debugging as the contents of optional can now clearly be seen through the pointer, as opposed to gibberish in an opaque storage array)
Seems to me this potentially makes optional<char> much bigger. No?
b) added another conditional constructor that accepts an in-place factory c) uses the safe bool idiom implementation from Boost.Range (which generates better code on pre MSVC10 compilers) d) skips redundant/dead stores of marking itself as uninitialised [including but limited to, in its destructor (if it has one)] e) streamlined internal assign paths to help the compiler avoid unnecessary branching f) added direct_create() and direct_destroy() member functions that allow the user to bypass the internal lifetime management (they only assert correct usage) in situations where the user's own external logic already implicitly knows the state of the optional g) optional now declares and defines a destructor only if the contained type has a non-trivial destructor (this prevents the compiler from detecting false EH states and thus generating bogus EH code) h) optional marks itself as uninitialised _before_ calling the contained object's destructor (this makes it a little more robust in race conditions;
I generally disagree with this sort of defensive programming. Won't it just mask bugs?
it is of course not a complete solution for such scenarios, those require external "help" and/or (m)-reference counting to be implemented) i) extracted the "placeholder" functionality into a standalone class (basically what would be left of optional<> if the lifetime management "bool" member and logic was removed) so that it can be reused (e.g. for singleton like classes, or when more complex custom lifetime management is required) j) added compiler specific "aids" to workaround situations when the compiler is unable to detect that placement new will never return a nullptr (and then generates bogus branching) - IOW "optional<int> optional_number( 3 );" no longer generates a branch before storing "3" (yes "LOL":) k) the lifetime management pointer is now stored after the actual contained object (this helps in avoiding more complex/offset addressing when accessing optionals through pointers w/o checking whether they are initialised) l) removed support for antediluvian compilers (MSVC6, BCB5)
todo:
m) lifetime management policy: bool, pointer, reference count (+ a more generic abstraction/interop with smart_ptr)...
n) zero size overhead for optional references (requires (m))
o) avoid branching in assignment and copy construction of optionals that hold PODs smaller than N * sizeof( void * ) where N is some small number
- temporarily renamed to optional2 to avoid collision with the original optional - passes all optional unit tests (after being renamed back to optional) with MSVC10 SP1 and Apple Clang 3.0 (from Xcode 4.2.1)
Hope it helps ;)
ps. AFAICT the only real obstacle in having really nice codegen with boost::optional<a_fundamental_type> is lack of proper ABI/compiler support for passing and returning small structs in registers...
Please tell me that at least *some* C++ compiler does that nowadays...? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On 30.01.2012, at 19:58, Dave Abrahams wrote:
I support this work. Optional should be optimal :-)
on Fri Jan 27 2012, Domagoj Saric <domagoj.saric-AT-littleendian.com> wrote:
ps. AFAICT the only real obstacle in having really nice codegen with boost::optional<a_fundamental_type> is lack of proper ABI/compiler support for passing and returning small structs in registers...
Please tell me that at least *some* C++ compiler does that nowadays...?
All Linux compilers on x64 platforms follow the AMD64 ABI, possibly with minor variations/bugs. This ABI specifies that classes are passed in registers if - they are trivially copyable and destructible (optional should be specialized for types that fulfill these criteria to ensure this), - they have no virtual functions or bases, - they are smaller than 2 qwords (4 qwords if all members are float, double, or SSE types), and - they don't contain any weird stuff, like 80-bit long doubles or unaligned fields. The Mac ABI for x64 is very close, though I don't know the differences. The Win64 ABI is far less nice about registers. It passes the first four arguments in registers, and spills everything else onto the stack. It does not pack multiple values into a register. If a value is larger than 8 bytes, it is not split across registers. The ABI description says that "aggregates" can be passed in registers, but it doesn't elaborate on whether this refers to the C++ definition of aggregates (unlikely!) or whatever else the definition is. It sounds pretty useless. I'm not aware of any x86-32 calling convention that passes classes of any kind in registers. Sebastian

On 30.01.2012, at 21:00, Sebastian Redl wrote:
On 30.01.2012, at 19:58, Dave Abrahams wrote:
I support this work. Optional should be optimal :-)
on Fri Jan 27 2012, Domagoj Saric <domagoj.saric-AT-littleendian.com> wrote:
ps. AFAICT the only real obstacle in having really nice codegen with boost::optional<a_fundamental_type> is lack of proper ABI/compiler support for passing and returning small structs in registers...
Please tell me that at least *some* C++ compiler does that nowadays...?
All Linux compilers on x64 platforms follow the AMD64 ABI, possibly with minor variations/bugs. This ABI specifies that classes are passed in registers if - they are trivially copyable and destructible (optional should be specialized for types that fulfill these criteria to ensure this), - they have no virtual functions or bases, - they are smaller than 2 qwords (4 qwords if all members are float, double, or SSE types), and - they don't contain any weird stuff, like 80-bit long doubles or unaligned fields.
The Mac ABI for x64 is very close, though I don't know the differences.
The Win64 ABI is far less nice about registers. It passes the first four arguments in registers, and spills everything else onto the stack. It does not pack multiple values into a register. If a value is larger than 8 bytes, it is not split across registers. The ABI description says that "aggregates" can be passed in registers, but it doesn't elaborate on whether this refers to the C++ definition of aggregates (unlikely!) or whatever else the definition is. It sounds pretty useless.
I'm not aware of any x86-32 calling convention that passes classes of any kind in registers.
Correcting myself: the Common C++ ABI for x86-32 actually specifies that trivially copyable and destructible classes are treated just like simple values for parameter passing, so they can be passed and returned in registers. Of course, the far smaller register file of x86-32 makes that still not very useful. Sebastian

On 30.1.2012. 21:30, Sebastian Redl wrote:
All Linux compilers on x64 platforms follow the AMD64 ABI, possibly with minor variations/bugs. This ABI specifies that classes are passed in registers if - they are trivially copyable and destructible (optional should be specialized for types that fulfill these criteria to ensure this), - they have no virtual functions or bases, - they are smaller than 2 qwords (4 qwords if all members are float, double, or SSE types), and - they don't contain any weird stuff, like 80-bit long doubles or unaligned fields.
The Mac ABI for x64 is very close, though I don't know the differences.
Thanks for the summary (didn't know there was a separate OS X x64 ABI).
The Win64 ABI is far less nice about registers. It passes the first four arguments in registers, and spills everything else onto the stack. It does not pack multiple values into a register. If a value is larger than 8 bytes, it is not split across registers. The ABI description says that "aggregates" can be passed in registers, but it doesn't elaborate on whether this refers to the C++ definition of aggregates (unlikely!) or whatever else the definition is. It sounds pretty useless.
Right, the Windows/MSVC x64 ABI is a major !?wth!?...I just can't think of a reason why they had to invest resources into making their own ABI that is so complicated and so inferior to the AMD proposed one (e.g. you can't pass an SSE vector through an XMM register??).
Correcting myself: the Common C++ ABI for x86-32 actually specifies that trivially copyable and destructible classes are treated just like simple values for parameter passing, so they can be passed and returned in registers. Of course, the far smaller register file of x86-32 makes that still not very useful.
Unfortunately I have never seen MSVC pass or return any struct through registers even though it has interprocedural optimizations and link time code generation capabilities so it can "invent" (as the documentation claims) its own calling conventions for non exported functions. Don't know whether any other x86 compiler is able to do so... In any case, the problem is that there is no nearly portable/standard/wide-spread way (pragma, decl specifier...) to tell the compiler to return small PODs in registers, especially not just for a particular function and/or POD type. GCC has -freg-struct-return but that seems nearly useless because it applies to the whole binary and so it requires the OS to be built with that option. -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

On Mon, Jan 30, 2012 at 9:00 PM, Sebastian Redl <sebastian.redl@getdesigned.at> wrote:
All Linux compilers on x64 platforms follow the AMD64 ABI, possibly with minor variations/bugs. This ABI specifies that classes are passed in registers if
Does that also apply to functions that aren't exported? I'd assume the compiler is free to do whatever it wants in that case. Olaf

On 30.1.2012. 19:58, Dave Abrahams wrote:
I support this work. Optional should be optimal :-)
Or, optimal should not be optional (everything should be optimal :D).
a) the lifetime management bool was changed into a properly typed pointer (this actually takes the same amount of space while it provides a no-op get_ptr() member function as well as easier debugging as the contents of optional can now clearly be seen through the pointer, as opposed to gibberish in an opaque storage array)
Seems to me this potentially makes optional<char> much bigger. No?
True (see my answer to Olaf and Christofer).
h) optional marks itself as uninitialised _before_ calling the contained object's destructor (this makes it a little more robust in race conditions;
I generally disagree with this sort of defensive programming. Won't it just mask bugs?
I generally disagree too but in cases where there is actual "defensive programming" i.e. handling of invalid/buggy usage. The typical example is code that asserts that a pointer is not null and then handles the case if it is. There is none of that here. Imaging writing optional from scratch, at one point you would have to decide the same thing, when to mark the optional as empty - before or after calling the destructor. Either way you choose won't make a difference (semantic or performance wise) for correct code. Incorrect code will crash less. Isn't that a good thing (considering there is no actual handling of incorrect code)? Considering that "there is no bug free software" (one wonders about laser brain surgery robots :), wouldn't it be better to "a priori crash less" and add separate sanity checks for invalid concurrent access in order to catch bugs (obviously this is more work and I don't know if any Boost component does anything like this actually)? Perhaps there is no "right" answer to this question and its more a matter of preference so consider the above as "my 2 cents"... ps. and yes, I forgot the buzz: (p) rvalue references support :) -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman

On Thu, Feb 2, 2012 at 4:20 PM, Domagoj Saric <domagoj.saric@littleendian.com> wrote:
There is none of that here. Imaging writing optional from scratch, at one point you would have to decide the same thing, when to mark the optional as empty - before or after calling the destructor. Either way you choose won't make a difference (semantic or performance wise) for correct code. Incorrect code will crash less. Isn't that a good thing (considering there is no actual handling of incorrect code)?
Isn't this about the destructor of optional? Marking it as empty seems unneeded there. Olaf
participants (14)
-
Andrey Semashev
-
Dave Abrahams
-
Domagoj Saric
-
Domagoj Saric
-
Hite, Christopher
-
Joshua Boyce
-
Kim Barrett
-
Nevin Liber
-
Olaf van der Spek
-
paul Fultz
-
Sebastian Redl
-
Simonson, Lucanus J
-
Stewart, Robert
-
Vicente J. Botet Escriba