[UUID] PODness Revisited

newer
Boost Sending Files over network...

Scott McMurray

24 Dec 2008 24 Dec '08

8:05 p.m.

On IRC today, Dodheim made an excellent point: A uuid should be a POD, because that way it interacts better with Boost.MPI and Boost.Interprocess. MPI seems like an absolutely perfect situation in which to use UUIDs. However, going through serialization can more than 10 times slower than using using datatypes[1]. For something to be a datatype, it needs to be a POD[2], and thanks to the portable binary format, uuids can always use the BOOST_IS_BITWISE_SERIALIZABLE[3][4] trait since a UUID's bit format is independent of endianness and word size. [1] http://www.boost.org/doc/libs/1_37_0/doc/html/mpi/performance.html [2] http://www.boost.org/doc/libs/1_37_0/doc/html/boost/mpi/is_mpi_datatype.html [3] http://www.boost.org/doc/libs/1_37_0/doc/html/mpi/tutorial.html#mpi.homogene... [4] http://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/traits.html#temp... Of course, you don't want to send UUIDs by themselves. This is even a bigger point, since if uuid isn't a POD, then sending composite types that include UUIDs pay the price of going through serialization too, even when they could have done the cheap way normally. Similarly, making uuid a POD allows it to be placed in Boost.Interprocess shared memory without the cost of going through serialization. For example, "A message queue just copies raw bytes between processes and does not send objects"[5], which requires a POD without member pointers to be safe. [5] http://www.boost.org/doc/libs/1_37_0/doc/html/interprocess/synchronization_m... I think that these are two good examples that POD is the quintessential form for a value type, not something "clearly dragged in due to backward compatibility with C"[6]. In addition, note that even Boost.Proto, one of the fancier libraries in Boost, uses aggregates extensively[7]. [6] http://permalink.gmane.org/gmane.comp.lib.boost.devel/184223 [7] http://www.boost.org/doc/libs/1_37_0/doc/html/proto/appendices.html#boost_pr... It's also trivial for someone to build a non-POD out of the pod should they desire: namespace vladimir { struct uuid : boost::uuid { uuid() : boost::uuid(boost::uuids::native_generator()()) {} uuid(boost::uuid id) : boost::uuid(id) {} }; } ~ Scott

Show replies by date

Vladimir Batov

24 Dec 24 Dec

8:32 p.m.

Scott, IMHO you raise a mighty point very much worth looking into. At least I surely will be pouring through your references, dusting off and looking at those ol' PODs (which I admit long-time discarded) from an entirely different angle.

...

It's also trivial for someone to build a non-POD out of the pod should they desire:

I've been thinking along these lines also. For that reasons I am leaning to resigning to the nil-generating def. cnstr as it seems the majority view and seems to considerably simplify the implementation. Best, V. P.S. And you named a namespace after me. Isn't it nice? ;-) ----- Original Message ----- From: "Scott McMurray" <me22.ca+boost@gmail.com> Newsgroups: gmane.comp.lib.boost.devel To: <boost@lists.boost.org> Sent: Thursday, December 25, 2008 7:05 AM Subject: [UUID] PODness Revisited

...

On IRC today, Dodheim made an excellent point: A uuid should be a POD, because that way it interacts better with Boost.MPI and Boost.Interprocess.

MPI seems like an absolutely perfect situation in which to use UUIDs. However, going through serialization can more than 10 times slower than using using datatypes[1]. For something to be a datatype, it needs to be a POD[2], and thanks to the portable binary format, uuids can always use the BOOST_IS_BITWISE_SERIALIZABLE[3][4] trait since a UUID's bit format is independent of endianness and word size.

[1] http://www.boost.org/doc/libs/1_37_0/doc/html/mpi/performance.html [2] http://www.boost.org/doc/libs/1_37_0/doc/html/boost/mpi/is_mpi_datatype.html [3] http://www.boost.org/doc/libs/1_37_0/doc/html/mpi/tutorial.html#mpi.homogene... [4] http://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/traits.html#temp...

Of course, you don't want to send UUIDs by themselves. This is even a bigger point, since if uuid isn't a POD, then sending composite types that include UUIDs pay the price of going through serialization too, even when they could have done the cheap way normally.

Similarly, making uuid a POD allows it to be placed in Boost.Interprocess shared memory without the cost of going through serialization. For example, "A message queue just copies raw bytes between processes and does not send objects"[5], which requires a POD without member pointers to be safe.

[5] http://www.boost.org/doc/libs/1_37_0/doc/html/interprocess/synchronization_m...

I think that these are two good examples that POD is the quintessential form for a value type, not something "clearly dragged in due to backward compatibility with C"[6]. In addition, note that even Boost.Proto, one of the fancier libraries in Boost, uses aggregates extensively[7].

[6] http://permalink.gmane.org/gmane.comp.lib.boost.devel/184223 [7] http://www.boost.org/doc/libs/1_37_0/doc/html/proto/appendices.html#boost_pr...

It's also trivial for someone to build a non-POD out of the pod should they desire:

namespace vladimir { struct uuid : boost::uuid { uuid() : boost::uuid(boost::uuids::native_generator()()) {} uuid(boost::uuid id) : boost::uuid(id) {} }; }

~ Scott _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

David Abrahams

9:43 p.m.

on Wed Dec 24 2008, "Vladimir Batov" <batov-AT-people.net.au> wrote:

...

Scott,

IMHO you raise a mighty point very much worth looking into. At least I surely will be pouring through your references, dusting off and looking at those ol' PODs (which I admit long-time discarded) from an entirely different angle.

...
It's also trivial for someone to build a non-POD out of the pod should they desire:

I've been thinking along these lines also. For that reasons I am leaning to resigning to the nil-generating def. cnstr as it seems the majority view and seems to considerably simplify the implementation.

Unfortunately I think any nontrivial constructor at all makes a class a non-POD. So you may have to accept the uninitialized state in order to interact well with MPI. You might consider whether you need a low-level representation class as well as a higher-level wrapper with stronger invariants. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Vladimir Batov

25 Dec 25 Dec

3:13 a.m.

...

Unfortunately I think any nontrivial constructor at all makes a class a non-POD. So you may have to accept the uninitialized state in order to interact well with MPI. You might consider whether you need a low-level representation class as well as a higher-level wrapper with stronger invariants.

I am very unsure of that POD... and the "uninitialized state"... Yuk. Are we sure it is that need to "interact well with MPI"? I am reading about MPI I see that for "normal" classes it deploys Boost.Serialization. With that I think we can make uuid serialization pretty close (efficiency-wise) to no-serialization (if we use the binary archive and simply stuff out internal data buffer into the archive). And Boost.Serialization certainly has no that "uninitialized state ... to interact well" requirement. Even better, the load() side of Boost.Serialization is easily tailored for classes without the default constructor (Do it all the time. Love the library). Best, V.

David Abrahams

4:14 p.m.

on Wed Dec 24 2008, "Vladimir Batov" <batov-AT-people.net.au> wrote:

...

...
Unfortunately I think any nontrivial constructor at all makes a class a non-POD. So you may have to accept the uninitialized state in order to interact well with MPI. You might consider whether you need a low-level representation class as well as a higher-level wrapper with stronger invariants.

I am very unsure of that POD... and the "uninitialized state"... Yuk. Are we sure it is that need to "interact well with MPI"? I am reading about MPI I see that for "normal" classes it deploys Boost.Serialization.

The actual keys are described here: http://www.boost.org/doc/libs/1_37_0/doc/html/boost/mpi/is_mpi_datatype.html and here: http://www.boost.org/doc/libs/1_37_0/doc/html/mpi/tutorial.html#mpi.serializ... So, AFAICT, it doesn't require actual POD-ness. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Vladimir Batov

1:18 a.m.

OK, I thought some more about those PODs and I admittedly do not see any magic in that Poor Old Dude. Scott, you are clearly excited about them, so pls help me out here. That's what *I* see (caveat: I admit not knowing much about Boost.MPI and Boost.Interprocess requirements and expectations). 1. Boost.MPI efficiency does not seem to rely on PODness. Rather it seem to be due to serialization (or rather ability to bypass it). I feel, that Andy's UUID class is well-suited to add no-serialization transmission functionality when needed (without the rest of us paying for that). That is, conceptually Andy's full-fledged UUID does not have that limitation. If it is a MPI implementation-specific restriction/limitation, I'd expect we'd look at addressing it in MPI rather than shaping other classes to match it. In fact, after glancing over I felt I could do it by specializing is_mpi_datatype or is_mpi_builtin_datatype. I might easily have gotten it wrong though. 2. Scott, you correctly mention that most often we "don't want to send UUIDs by themselves". The thing is that chances of that bigger class being a POD are diminishing dramatically (if not already infinitely close to 0). 3. As for deployment of an object in shared memory, it does not have to be a POD either. Any non-virtual object can be placed to and accessed in shared memory as-is (this statement is certainy an oversimplification but I won't be listing all ifs and buts). Even virtual classes can be "massaged" (vtbl-pointer adjusted/restored before usage) in shared memory (quite a hassle though). I was able to find something of that kind that I did in my previous life (http://adtmag.com/joop/carticle.aspx?ID=849). Best, V.

...

On IRC today, Dodheim made an excellent point: A uuid should be a POD, because that way it interacts better with Boost.MPI and Boost.Interprocess.

MPI seems like an absolutely perfect situation in which to use UUIDs. However, going through serialization can more than 10 times slower than using using datatypes[1]. For something to be a datatype, it needs to be a POD[2], and thanks to the portable binary format, uuids can always use the BOOST_IS_BITWISE_SERIALIZABLE[3][4] trait since a UUID's bit format is independent of endianness and word size.

[1] http://www.boost.org/doc/libs/1_37_0/doc/html/mpi/performance.html [2] http://www.boost.org/doc/libs/1_37_0/doc/html/boost/mpi/is_mpi_datatype.html [3] http://www.boost.org/doc/libs/1_37_0/doc/html/mpi/tutorial.html#mpi.homogene... [4] http://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/traits.html#temp...

Of course, you don't want to send UUIDs by themselves. This is even a bigger point, since if uuid isn't a POD, then sending composite types that include UUIDs pay the price of going through serialization too, even when they could have done the cheap way normally.

Similarly, making uuid a POD allows it to be placed in Boost.Interprocess shared memory without the cost of going through serialization. For example, "A message queue just copies raw bytes between processes and does not send objects"[5], which requires a POD without member pointers to be safe.

[5] http://www.boost.org/doc/libs/1_37_0/doc/html/interprocess/synchronization_m...

I think that these are two good examples that POD is the quintessential form for a value type, not something "clearly dragged in due to backward compatibility with C"[6]. In addition, note that even Boost.Proto, one of the fancier libraries in Boost, uses aggregates extensively[7].

[6] http://permalink.gmane.org/gmane.comp.lib.boost.devel/184223 [7] http://www.boost.org/doc/libs/1_37_0/doc/html/proto/appendices.html#boost_pr...

It's also trivial for someone to build a non-POD out of the pod should they desire:

namespace vladimir { struct uuid : boost::uuid { uuid() : boost::uuid(boost::uuids::native_generator()()) {} uuid(boost::uuid id) : boost::uuid(id) {} }; }

~ Scott _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Adam Merz

11 a.m.

Vladimir Batov writes:

...

OK, I thought some more about those PODs and I admittedly do not see any magic in that Poor Old Dude. Scott, you are clearly excited about them, so pls help me out here.

While you may not see the "magic" in POD types, I can't fathom what exactly you have against them either. Do you find them more confusing or harder to use? Do you find static initialization syntax aesthetically offensive? Is it your (no offense, but extremely misguided, IMO) lingering impression that POD types are a legacy of C that should be ignored whenever possible? A list of examples in favor of making UUID a POD type was presented, and you've argued against those examples without actually saying what you think the drawback is.

...

That's what *I* see (caveat: I admit not knowing much about Boost.MPI and Boost.Interprocess requirements and expectations).

Again, no offense intended, but I find it a bit discomfiting that the person arguing most vocally on this issue would make this admission. Just because you don't have personal knowledge of a use case where UUID being a POD type would be greatly beneficial doesn't mean such a use case doesn't exist.

...

1. Boost.MPI efficiency does not seem to rely on PODness. Rather it seem to be due to serialization (or rather ability to bypass it).

This isn't technically correct, I think; in MPI's case (though not Interprocess'), the type must be serializable regardless, but the ideal efficiency scenario comes from specializing both boost::mpi::is_mpi_datatype and boost::serialization::is_bitwise_serializable. Note that the documentation for these traits ([1] and [2], respectively) both specifically mention POD types -- this is no coincidence. [1] http://www.boost.org/doc/libs/1_37_0/doc/html/boost/mpi/is_mpi_datatype.html [2] http://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/traits.html

...

I feel, that Andy's UUID class is well-suited to add no-serialization transmission functionality when needed (without the rest of us paying for that). That is, conceptually Andy's full-fledged UUID does not have that limitation.

While this is generally correct, I think you're missing the larger point. In modern C++, types intentionally created as POD types are often (not always) done so to absolutely maximize the efficiency of copying that type. The existance of the is_pod type trait in boost.type_traits/TR1/C++0x reinforces this -- e.g. in many implementations, std::copy will use memcpy to perform an ideally efficient copy when is_fundamental<T>::value || is_pod<T>::value. Additionally, a POD type's synthesized copy constructor is generally merely a memcpy. In order to specialize is_bitwise_serializable for a given type, tracking [3] must be disabled for that type. In practice this means one should avoid using pointers to that type inside of other serializable types. UUID's "PODness" would allow the type itself to communicate to users that it is truely a value type, and is meant to be used as such (i.e., always passed/held by value); one wouldn't typically pass or hold a pointer to an int because it's so cheap to copy, so one (hopefully) wouldn't be inclined to do that for another type that, for all intents and purposes, presents itself as a primitive type. [3] http://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/special.html#obj...

...

If it is a MPI implementation-specific restriction/limitation, I'd expect we'd look at addressing it in MPI rather than shaping other classes to match it.

This is an unreasonable thought process, IMO. If a type has an good use case with another library (in this case, UUID with Serialization/MPI/Interprocess), it's up to the type to conform to the library in an ideal fashion, not the other way around. E.g., lexical_cast and serialization don't go out of their way to work with every other type in Boost, but many types in boost have serialization and lexical_cast support.

...

2. Scott, you correctly mention that most often we "don't want to send UUIDs by themselves". The thing is that chances of that bigger class being a POD are diminishing dramatically (if not already infinitely close to 0).

This is extremely off base, and points back to your lack of knowledge regarding MPI, I think. When writing an app/library/algorithm intended for use in a high-performance parallel context, one goes out of their way to use POD types extensively, for the sake of performance. Yes, the fact that MPI works with boost.serialization is nice, but when performance is critical, memcpy'able types are key; and this is C++, after all -- of course performance is critical, otherwise why bother? ;-) So the more external (in relation to the user's code) libraries that work out of the box, the better; I think to argue that a type such as UUID (which is a low-level, fundamental value type, and specifically *very* likely to be used in an inter-process context) should *not* automatically work in an ideal fashion in this scenario, one must have an *extremely* convincing argument, IMO. And so far, I haven't seen one presented. ;-)

...

3. As for deployment of an object in shared memory, it does not have to be a POD either.

Please take another look at the specific link Scott provided ([4]); boost::interprocess::message_queue only copies raw bytes between processes, so for non-POD types generally that requires that an object be binary serialized before sending. However, for a POD type, binary serialization is a completely redundant process (read: a complete waste of CPU cycles); one can just send the bytes of the object directly, and as an added bonus, avoid becoming dependant on the somewhat heavy serialization library altogether. Again, the fact that this might be possible even if UUID were not a POD type is somewhat irrelevant, IMO -- as important as it is to be able to use a type ideally and efficiently, it is equally as important that a type _communicate to the user_ its ideal usage. Personally, I would feel as though I was playing with fire (and in a literal sense, would probably be invoking UB) by taking the raw bytes underlying a non-POD type and using them with message_queue, to the extent that I wouldn't bother, as much as it might vex me; when a type is a POD type, I *know* immediately that it's safe -- no worries! [4] http://www.boost.org/doc/libs/1_37_0/doc/html/interprocess/synchronization_m... I want to touch on a few other points as well, were UUID to be a POD type: 1. The default constructor behavior/existance debate would be put to rest. ;-) 2. The efficiency of lexical_cast would be better than *any* default constructor behavior, regardless of which one was ultimately decided upon. 3. One would typically initialize a UUID in exactly the same manner as though it had no default constructor, which most people seem to find an acceptable option. 4. Initializing a nil UUID would become more succinct. Contrast 'uuid id(uuid::nil());' and 'uuid id = {};', or 'id(uuid::nil())' and 'id()' in a constructor initialization list. Assuming any level of familiarity with aggregates, the latter are much more concise, IMO. (And C++0x will certainly introduce that familiarity if one doesn't have it already.) 5. Static initialization has been greatly underrated so far in this discussion. My first use case for a Boost UUID library would be to replace some homegrown COM/XPCOM encapsulation code. In dealing with COM/XPCOM, it is *extremely* common to have hardcoded UUIDs, and *many* of them. Trivial work though it may be, spending application/library startup time initializing hundreds/thousands of UUIDs when they could be statically initialized is senseless. 6. Regarding the potential for uninitialized state: I personally view UUID as a fundamental, borderline primitive type (others will almost certainly disagree); uninitialized state is generally understood and accepted for actual primitive types, so why should it be such a scary concept for UUID? 7. Lastly, to reiterate: this is C++. Every type, every library, every algorithm should be written with performance and efficiency as primary considerations. There are demonstrable use cases where UUID can work more efficiently as a POD type, but no convincing arguments have been presented in favor of non-PODness. And as Scott and David mentioned, UUID could be trivially wrapped in a non-POD if one were so inclined, and they'd even get to choose their own default constructor behavior. ;-) Apologies for the lengthy post, but the threads regarding UUID thus far have been also quite lengthy and as I haven't responded yet, I had a few strong opinions stored up. ;-)

...

Best, V.

Regards, Adam Merz (Dodheim in #boost)

Vladimir Batov

26 Dec 26 Dec

4:08 a.m.

Adam, Wow, that was one passionate reply. Was it something that I said? ;-)

...

While you may not see the "magic" in POD types, I can't fathom what exactly you have against them either.

Well, I most certainly do not have anything against anything. It's nothing personal. If my emails came across as such, my humble apologies.

...

Do you find them more confusing or harder to use? Do you find static initialization syntax aesthetically offensive? Is it your (no offense, but extremely misguided, IMO) lingering impression that POD types are a legacy of C that should be ignored whenever possible? A list of examples in favor of making UUID a POD type was presented, and you've argued against those examples without actually saying what you think the drawback is.

That's quite an emotionally charged list you compiled. It did not have to be such. I certainly do not find aggregates confusing or anything of that sort. In fact, I was very happy with them for about 10 years while coding in C before I switched to C++ in early 90ies. PODs've come from C and, therefore, they *are* a legacy of C and are not called Plain Old Data for nothing. In C++ though do find aggregates limiting. With regard to uuid it'd be no user-provided constructors, no guaranteed invariant, no private or protected non-static data members. And that is fundamental (my view of course) to C++ -- "it is important and fundamental to have constructors acquire resources and establish a simple invariant" (Stroustrup E.3.5). Then, "One of the most important aims of a design is to provide interfaces that can remain stable in the face of changes" (Stroustrup 23.4.3.5). PODs do restrict interfaces and are wide-open implementation-wise. That opens the door for mis-use, complicates long-term maintainability. So, unless PODs provide some killer feature in return (that cannot be achieved otherwise), I do not see the point of paying that price.

...

...
That's what *I* see (caveat: I admit not knowing much about Boost.MPI and Boost.Interprocess requirements and expectations).

Again, no offense intended, but I find it a bit discomfiting that the person arguing most vocally on this issue would make this admission. Just because you don't have personal knowledge of a use case where UUID being a POD type would be greatly beneficial doesn't mean such a use case doesn't exist.

First, you are right about "most vocally". I too had that growing concern that there was somewhat too much of me lately on the list. Apologies. In my defence I might say I do not usually do that. My weak point is that once I get onto something, I tend to follow it through to completion (well, some might consider that to be a good thing). Point taken though, I'll try answering your email (hopefully to your satisfaction) and will turn it down. Secondly, I personally do not see anything wrong with the admission -- I use some libs extensively, some occasionally and do not use some at all. I suspect it is quite typical. Stating your knowledge IMO clears up a lot of possible and unnecessary confusion and many other emotions. Thirdly, I am not sure I said "such a use case doesn't exist", did I? If I did, I probably did not mean that. :-) What I am questioning though is the "greatly beneficial" part. I am glad to see that part is already obvious to you. I hope it's not a just hunch and you have hard data to back it up.

...

...
1. Boost.MPI efficiency does not seem to rely on PODness. Rather it seem to be due to serialization (or rather ability to bypass it).

This isn't technically correct, I think; in MPI's case (though not Interprocess'), the type must be serializable regardless, but the ideal efficiency scenario comes from specializing both boost::mpi::is_mpi_datatype and boost::serialization::is_bitwise_serializable. Note that the documentation for these traits ([1] and [2], respectively) both specifically mention POD types -- this is no coincidence.

[1] http://www.boost.org/doc/libs/1_37_0/doc/html/boost/mpi/is_mpi_datatype.html [2] http://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/traits.html

Yes, my wording was somewhat crude. I presume you have a lot of practical experience with MPI and you can say with authority that PODness is a must for MPI's efficiency. Would you mind providing some experimental data that you observed? My knowledge of MPI is from reading docs (I probably should stop making these discomforting admissions). There I got that impression that serializable non-aggregate classes could be made efficient too.

...

... I think you're missing the larger point. In modern C++, types intentionally created as POD types are often (not always) done so to absolutely maximize the efficiency of copying that type.

I do not understand "PODness to absolutely maximize the efficiency of copying" as I believe class NonAggregateFoo { ... int int_; }; is copied as efficiently as a raw 'int'. And NonAggregateFoo bunch[] can be memcopied as well PODFoo bunch[] (I am not advocating that but simply stating the fact). And I do not expect the respective template<class Archive> void serialize(Archive ar, unsigned int) { ar & int_; } to be that slow (with appropriately chosen Archive). Again, here you might well know more than I do. Tell me then.

...

The existance of the is_pod type trait in boost.type_traits/TR1/C++0x reinforces this -- e.g. in many implementations, std::copy will use memcpy to perform an ideally efficient copy when is_fundamental<T>::value || is_pod<T>::value. Additionally, a POD type's synthesized copy constructor is generally merely a memcpy.

Understood. It does not make copying of non-aggregates inefficient though. Non-automatic 'yes', inefficient 'no'.

...

...

...
If it is a MPI implementation-specific restriction/limitation, I'd expect we'd look at addressing it in MPI rather than shaping other classes to match it.

This is an unreasonable thought process, IMO. If a type has an good use case with another library (in this case, UUID with Serialization/MPI/Interprocess), it's up to the type to conform to the library in an ideal fashion, not the other way around. E.g., lexical_cast and serialization don't go out of their way to work with every other type in Boost, but many types in boost have serialization and lexical_cast support.

Well, again my initial wording was somewhat crude. I still stand by its meaning though. A general-purpose library should be accommodating/considerate rather than imposing. And from what I read about MPI that's the approach taken there. As for lexical_cast, it is the same -- it imposes the requirement of op>>, op<<, the def. cnstr. However, instead of rejecting non-conformant classes, it leaves the door open and accommodates those via specialization and at least as efficiently. Boost.Serialization? Same. In fact, they *do* "go out of their way to work" with as many types as possible. I think I can talk about Boost.Serialization with a little bit of confidence (as I've been using it quite extensively). I know that the library tries so remarkably hard to keep everyone happy -- optimization? yes; no-default constructors? no problem; separate load/save logic? bring it on; intrusive/non-intrusive serialization? piece of cake... the list is long.

...

...
2. Scott, you correctly mention that most often we "don't want to send UUIDs by themselves". The thing is that chances of that bigger class being a POD are diminishing dramatically (if not already infinitely close to 0).

This is extremely off base, and points back to your lack of knowledge regarding MPI, I think.

Uhm, what exactly is extremely off-base here? And what does MPI have to do with it? The bigger a class, the smaller the chance it can conform to the limitations of POD. I am currently "serving time" in the railway industry and dealing with Trains, TrackCircuits, Signals, Stations, (damn long list). All use uuids and are used in inter-process inter-machine communications. I cannot imagine those classes to be PODs.

...

When writing an app/library/algorithm intended for use in a high-performance parallel context, one goes out of their way to use POD types extensively, for the sake of performance. Yes, the fact that MPI works with boost.serialization is nice, but when performance is critical, memcpy'able types are key;

First, I am under impression that non-aggregate non-virtual objects are as memcopyable (with usual caveats) as PODs are. Second, I feel boost.serialization still can be optimized for performance. See, http://www.boost.org/doc/libs/1_37_0/doc/html/mpi/tutorial.html#mpi.serializ.... Plus binary archives (or your custom archives) can carry a very limited overhead. Still, I do not know much about MPI (Oops, I did it again! ;-)).

...

... I think to argue that a type such as UUID (which is a low-level, fundamental value type, and specifically *very* likely to be used in an inter-process context) should *not* automatically work in an ideal fashion in this scenario, one must have an *extremely* convincing argument, IMO. And so far, I haven't seen one presented. ;-)

As for inter-process context, then if it is on the same machine (in shared memory), then there is no that exclusive PODness quality that allows objects to be stored/accessed in shared memory -- non-aggregate non-virtual objects are as good for that as PODs. If that is over the network, then I suspect we have many more things to worry about efficiency- and data consistency/integrity-wise. Say, network latency, synchronization, node dropouts, (a long list). As for "an *extremely* convincing argument", then I somehow haven't seen one either so that I'd say "indeed, non-aggregates cannot do that, POD is the king". But I might not know something you do (gosh, it's turning into some "disturbing" confession now ;-)) but that's OK, right?

...

...
3. As for deployment of an object in shared memory, it does not have to be a POD either.

Please take another look at the specific link Scott provided ([4]); boost::interprocess::message_queue only copies raw bytes between processes, so for non-POD types generally that requires that an object be binary serialized before sending. However, for a POD type, binary serialization is a completely redundant process (read: a complete waste of CPU cycles); one can just send the bytes of the object directly, and as an added bonus, avoid becoming dependant on the somewhat heavy serialization library altogether.

Yes, I hear you. I just do not know how big deal that is. I can only argue this point with any conviction after I try optimized binary serialization vs. memcopy. If you tried, then I'd love to hear that. If you did not, then I am still unsure of *real* tangible benefits on PODness.

...

Again, the fact that this might be possible even if UUID were not a POD type is somewhat irrelevant,

I disagree. It is relevant to me and surely many others working on higher abstraction levels. POD comes with conditions. I need to know if I want to pay that price. Therefore, I never buy into theoretical efficiency debates -- I write stuff, I profile the stuff, I fix the actual (not imagined) bottlenecks.

...

... I want to touch on a few other points as well, were UUID to be a POD type:

1. The default constructor behavior/existance debate would be put to rest. ;-)

Well, at the expence of initial invalid invariant state? I think, I'd rather agree to the nil-behavior of uuid. Again, "it is important and fundamental to have constructors acquire resources and establish a simple invariant" (Stroustrup E.3.5).

...

2. The efficiency of lexical_cast would be better than *any* default constructor behavior, regardless of which one was ultimately decided upon.

I think you are referring to the non-initialized instance in the default lexical_cast<uuid>(string). It might or might not be correct though -- writing to and reading from those streams might have real impact instead of initialization or no initialization. Not profiled that though.

...

... 4. Initializing a nil UUID would become more succinct. Contrast 'uuid id(uuid::nil());' and 'uuid id = {};', or 'id(uuid::nil())' and 'id()' in a constructor initialization list. Assuming any level of familiarity with aggregates, the latter are much more concise, IMO. (And C++0x will certainly introduce that familiarity if one doesn't have it already.)

Here comes Vladimir disagreeing again (and not because he is not familiar with or afraid of aggregates). It is because I feel that "uuid id = {0};" exposes too much implementation detail and assumes the user knows that the invalid uuid is all zeros. If, say, tomorrow the Standard changes the value of nil, all my code becomes invalid. It might not be the case with uuid. However, it is the principle/coding habit I am talking about.

...

5. Static initialization has been greatly underrated so far in this discussion. My first use case for a Boost UUID library would be to replace some homegrown COM/XPCOM encapsulation code. In dealing with COM/XPCOM, it is *extremely* common to have hardcoded UUIDs, and *many* of them. Trivial work though it may be, spending application/library startup time initializing hundreds/thousands of UUIDs when they could be statically initialized is senseless.

I believe you'll be able to do that if we do class uuid { template<class Range> uuid(Range range); } Then you'll be able to feed your hard-coded initialization data to uuid.

...

6. Regarding the potential for uninitialized state: I personally view UUID as a fundamental, borderline primitive type (others will almost certainly disagree); uninitialized state is generally understood and accepted for actual primitive types, so why should it be such a scary concept for UUID?

It's certainly not scary. It's just not in C++ spirit (see quotes at the top of the email) and everyone knows what primitive types are. I do not think people expect other types to behave that way.

...

7. Lastly, to reiterate: this is C++. Every type, every library, every algorithm should be written with performance and efficiency as primary considerations.

I do not think C++ was designed "with performance and efficiency as primary considerations". And I do not think applications "should be written with performance and efficiency as primary considerations". Don't get up in arms -- those considerations are important. I object to the "primary" part. I do not think I even need to debate this -- Knuth, Stroustrup and many others have done that.

...

... There are demonstrable use cases where UUID can work more efficiently as a POD type,

Call me thick but I did not see those convincing use-cases showing PODs considerably more efficient than non-aggregates. Easier? Yes. *Seemingly* more efficient? Yes. How much more efficient? I dunno if that is palpably real.

...

but no convincing arguments have been presented in favor of non-PODness.

Oh, c'mon. How 'bout readng "The C++ Progr. Lang." and the "Evolution of C++" books? Discussions there do not revolve around aggregates. Best, V.

Scott McMurray

7:03 a.m.

On Thu, Dec 25, 2008 at 23:08, Vladimir Batov <batov@people.net.au> wrote:

...

That's quite an emotionally charged list you compiled. It did not have to be such. I certainly do not find aggregates confusing or anything of that sort. In fact, I was very happy with them for about 10 years while coding in C before I switched to C++ in early 90ies. PODs've come from C and, therefore, they *are* a legacy of C and are not called Plain Old Data for nothing. In C++ though do find aggregates limiting. With regard to uuid it'd be no user-provided constructors, no guaranteed invariant, no private or protected non-static data members. And that is fundamental (my view of course) to C++ -- "it is important and fundamental to have constructors acquire resources and establish a simple invariant" (Stroustrup E.3.5). Then, "One of the most important aims of a design is to provide interfaces that can remain stable in the face of changes" (Stroustrup 23.4.3.5). PODs do restrict interfaces and are wide-open implementation-wise. That opens the door for mis-use, complicates long-term maintainability. So, unless PODs provide some killer feature in return (that cannot be achieved otherwise), I do not see the point of paying that price.

UUIDs require no resource acquisition in constructors. UUIDs do not maintain any invariant on their content. UUIDs have a standard binary format. As for "mis-use", you have already confessed to such things as vtable-pointer twiddling, which is an excellent illustration that making something drastically non-POD doesn't prevent mis-use. Maintainability depends far more on sane choices by programmers than by draconian attempts by library writers. If a coder still wants to write an algorithm that breaks if shared_ptr<T>'s "unspecified-bool-type" changes (for example), that's not the fault of the library writer. As another example, std::tr1::array has its internal storage marked "Exposition only"; Seems like the same technique ought to be enough here. So what "price" is left? And I think you're reading far too much into the name with "are not called Plain Old Data for nothing". Just like templates weren't introduced for Template Meta-Programming, PODs have had surprising applications. In Boost.Phoenix (to become Lambda v3, iirc), for example, _1 is a POD -- by deliberate design choice -- despite the fact that a lambda placeholder is anything but "Plain Old Data".

...

I do not understand "PODness to absolutely maximize the efficiency of copying" as I believe

class NonAggregateFoo { ... int int_; };

is copied as efficiently as a raw 'int'. And NonAggregateFoo bunch[] can be memcopied as well PODFoo bunch[] (I am not advocating that but simply stating the fact).

Your "fact" is wrong. 3.9/3: "For any POD type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the value of obj1 is copied into obj2, using the memcpy library function, obj2 shall subsequently hold the same value as obj1." That applies PODs only, and the other representation guarantees in [basic.types] also apply only to PODs. So using a "memcopied" NonAggregateFoo bunch[] invokes undefined behaviour, preventing its use in a standards-conforming library. It's because of these rules that types can be used directly from, for example, Boost.Interprocess shared memory, and they only apply to PODs. No matter how efficient the serialization becomes, it will always have overhead over no serialization.

...

And I do not expect the respective

template<class Archive> void serialize(Archive ar, unsigned int) { ar & int_; }

to be that slow (with appropriately chosen Archive). Again, here you might well know more than I do. Tell me then.

I don't know either; I can just quote docs: "Some simple classes could be serialized just by directly copying all bits of the class. This is, in particular, the case for POD data types containing no pointer members, and which are neither versioned nor tracked. Some archives, such as non-portable binary archives can make us of this information to substantially speed up serialization."[1] Though this does make me wonder whether serialization should have a concept of portably bitwise-serializable classes. Relatedly, the v13 uuid_serialize.hpp header is mostly commented out... [1] http://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/traits.html#temp...

...

...
This is extremely off base, and points back to your lack of knowledge regarding MPI, I think.

Uhm, what exactly is extremely off-base here? And what does MPI have to do with it? The bigger a class, the smaller the chance it can conform to the limitations of POD. I am currently "serving time" in the railway industry and dealing with Trains, TrackCircuits, Signals, Stations, (damn long list). All use uuids and are used in inter-process inter-machine communications. I cannot imagine those classes to be PODs.

MPI's history is (as far as I can tell) high-performance Fortran on supercomputers. Something written specifically for such a machine will go out of its way to use the most efficient implementation possible, even if it sacrifices a small amount of expressiveness. Boost.MPI was explicitly written to match the speed of the underlying MPI library, using the same guideline. A matrix is a perfect example of a POD that can become VERY large, and is exactly the kind of thing that such programs will often deal with.

...

First, I am under impression that non-aggregate non-virtual objects are as memcopyable (with usual caveats) as PODs are. Second, I feel boost.serialization still can be optimized for performance. See, http://www.boost.org/doc/libs/1_37_0/doc/html/mpi/tutorial.html#mpi.serializ.... Plus binary archives (or your custom archives) can carry a very limited overhead. Still, I do not know much about MPI (Oops, I did it again! ;-)).

3.9 says otherwise, as mentioned.

...

As for "an *extremely* convincing argument", then I somehow haven't seen one either so that I'd say "indeed, non-aggregates cannot do that, POD is the king". But I might not know something you do (gosh, it's turning into some "disturbing" confession now ;-)) but that's OK, right?

No, I'd say that debating PODness without really knowing what they do isn't OK.

...

...
Again, the fact that this might be possible even if UUID were not a POD type is somewhat irrelevant,

I disagree. It is relevant to me and surely many others working on higher abstraction levels. POD comes with conditions. I need to know if I want to pay that price. Therefore, I never buy into theoretical efficiency debates -- I write stuff, I profile the stuff, I fix the actual (not imagined) bottlenecks.

Invariants can be non-intrusively added after (trivially, in this case); It's impossible to non-intrusively remove them. Libraries should thus error on the side of efficiency, in my opinion.

...

Well, at the expence of initial invalid invariant state? I think, I'd rather agree to the nil-behavior of uuid. Again, "it is important and fundamental to have constructors acquire resources and establish a simple invariant" (Stroustrup E.3.5).

I'm not convinced that uninitialized definitions are such a hardship. A quotation from an exception safety appendix is of questionable value, when a POD UUID trivially offers no-throw exception safety, as (to reiterate) UUIDs require no acquisition of resources nor any invariants on their contents. (As any possible bit pattern can be read in from a correctly-formatted user input, there is no invariant. Composite classes may hold their own invariants on the uuids, just like they do with the fundamental value types.)

...

Here comes Vladimir disagreeing again (and not because he is not familiar with or afraid of aggregates). It is because I feel that "uuid id = {0};" exposes too much implementation detail and assumes the user knows that the invalid uuid is all zeros. If, say, tomorrow the Standard changes the value of nil, all my code becomes invalid. It might not be the case with uuid. However, it is the principle/coding habit I am talking about.

It's not "the principle/coding habit" that's up for discussion, though. (So this pings my http://en.wikipedia.org/wiki/Fallacy_of_division meter.) As for implementation detail, I quote the standard: "In the absence of explicit application or presentation protocol specification to the contrary, a UUID is encoded as a 128-bit object, as follows: The fields are encoded as 16 octets, [...]" (RFC 4122, 4.1.2). Any other implementation is surprising. The Nil UUID is defined as the one where 16 octets are 0, which is exactly what {} and {0} say. The option to be explicit (uuid id = uuids::nil();) is always there. If the standard were to change the value of nil, all your data would also become "invalid" (since you might have the hypothetical new Nil UUID in there somewhere), so you're in trouble anyways. I'd posit that any change to the standard as gratuitously breaking as that would just lead to the updated standard being ignored by everyone, though. Any claim based on the programmers not knowing what they're doing is void, as that assertion inevitably leads to the statement that they shouldn't be programming C++ (or any turing-complete language) at all, preventing the use of any implication that requires them programming.

...

...
5. Static initialization has been greatly underrated so far in this discussion. My first use case for a Boost UUID library would be to replace some homegrown COM/XPCOM encapsulation code. In dealing with COM/XPCOM, it is *extremely* common to have hardcoded UUIDs, and *many* of them. Trivial work though it may be, spending application/library startup time initializing hundreds/thousands of UUIDs when they could be statically initialized is senseless.

I believe you'll be able to do that if we do

class uuid { template<class Range> uuid(Range range); }

Then you'll be able to feed your hard-coded initialization data to uuid.

Which will then require code to run to initialize the UUID, and twice the storage, since it's being kept twice. I'd have no interest in requiring RAM for UUIDs in my embedded system when I could have put a POD uuid into ROM.

...

I do not think C++ was designed "with performance and efficiency as primary considerations". And I do not think applications "should be written with performance and efficiency as primary considerations". Don't get up in arms -- those considerations are important. I object to the "primary" part. I do not think I even need to debate this -- Knuth, Stroustrup and many others have done that.

I've read Stroustrup's book about what goals were in mind when C++ was designed (and how those goals played out), and my recollection is the complete opposite of what you are claiming. C++ is designed so that, as much as possible, you "don't pay for what you don't use" and to be "as fast as C" so that it could "raise the level of abstraction" in "systems programming" (quotations paraphrased from memory and from the extended preface at [2]). Those sound exactly like "with performance and efficiency as primary considerations" to me. I don't have D&E at hand, though; Does someone have the exact list of goals? [2] http://www.research.att.com/~bs/dne.html

...

Call me thick but I did not see those convincing use-cases showing PODs considerably more efficient than non-aggregates. Easier? Yes. *Seemingly* more efficient? Yes. How much more efficient? I dunno if that is palpably real.

16 bytes of RAM saved by aggregate-initialization matters in, say, a distributed sensor network where you get a whole 2KB of RAM per chip, and not doing the copy means less time out of sleep mode, improving battery life. I'll claim that the line dividing "premature optimization" from "premature pessimization" is in a different spot in library development, compared to application development.

...

...
but no convincing arguments have been presented in favor of non-PODness.

Oh, c'mon. How 'bout reading "The C++ Progr. Lang." and the "Evolution of C++" books? Discussions there do not revolve around aggregates.

To me, that sounds like, "the POD design is not 'modern' enough". ~ Scott

Vladimir Batov

10:36 a.m.

...

...
class NonAggregateFoo { ... int int_; };

is copied as efficiently as a raw 'int'. And NonAggregateFoo bunch[] can be memcopied as well PODFoo bunch[] (I am not advocating that but simply stating the fact).

Your "fact" is wrong.

3.9/3: "For any POD type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the value of obj1 is copied into obj2, using the memcpy library function, obj2 shall subsequently hold the same value as obj1."

That applies PODs only, and the other representation guarantees in [basic.types] also apply only to PODs.

So using a "memcopied" NonAggregateFoo bunch[] invokes undefined behaviour, preventing its use in a standards-conforming library.

Actually my version of the standard draft (dated 2008-10-04) reads a bit differently "3.9/3. For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the value of obj1 is copied into obj2, using the std::memcpy library function, obj2 shall subsequently hold the same value as obj1." My statements above were with regard to Andy's non-POD uuid (although I am guilty of making my statement sound too generic). Andy's class seems to be "trivially copyable" (see 9/6 section for the definition). If so, then my "fact" is right and efficient copying of such uuids is covered by the Standard, well, a draft at the moment. As for the rest, if the consensus is to go with POD, I am fine with it. Best, V.

Vladimir Batov

11 a.m.

In fact, all that POD/non-POD "fight" seems to be a storm in a cup. The Standard (draft dated 2008-10-04) revised and refined definitions of POD. Now POD is a "trivial class/type" (section 9). I feel though that all the perks Adam and Scott have been mentioning are not restricted to the trivial types (PODs) but extend on to "trivially copyable types" as well which Andy's uuid seems to be the perfect example of. Best, V.

Scott McMurray

5:19 p.m.

On Fri, Dec 26, 2008 at 06:00, Vladimir Batov <batov@people.net.au> wrote:

...

In fact, all that POD/non-POD "fight" seems to be a storm in a cup. The Standard (draft dated 2008-10-04) revised and refined definitions of POD. Now POD is a "trivial class/type" (section 9). I feel though that all the perks Adam and Scott have been mentioning are not restricted to the trivial types (PODs) but extend on to "trivially copyable types" as well which Andy's uuid seems to be the perfect example of.

Fair enough. #ifndef BOOST_NO_CPP0X constexpr uuid() : data_() {} #endif That said, std::array in the draft is still an aggregate, not a "trivial class/type", which makes me think there's still a reason to use aggregates. Would construction from an initializer list be sufficient to meet the requirements for constant initialization in [basic.start.init]?

Vladimir Batov

29 Dec 29 Dec

2:20 a.m.

Scott, I feel the discussion highlighted the fact that our design priorities and usage patterns seem quite irreconcilable within one class. More so, the discussion seems to have introduced new requirements like old-standard-compliant low-level copying, ability to create an uninitialized uuid instance, efficient and economic initialization from hard-coded uuids (mentioned by Adam) that might not have been addressed in Andy's implementation yet. Therefore, I feel that yours and Dave's suggestions of "a low-level representation class as well as a higher-level wrapper with stronger invariants" might be the best approach to cater for those widely differing deployments. At present it'll probably be Andy's call if/how he decides to incorporate your suggestions and package it all as two classes or to discard one or the other. Ideally, I'd probably like to see two classes. Something like boost::aggregate_uuid and a boost::uuid wrapper as I suspect that wrapper to be the most common deployment choice. If the decision is made to go only for one aggregate-type uuid, then I'd like we make sure the documentation spells out how to "objectify" (what'd be a better term?) that aggregate type. Best, V. P.S. As a side note, I am glad we had the discussion and I appreciate you kept it going -- it forced me to have a fresh look at aggregates from a completely different perspective. Due to the specifics of my work I am unlikely to deploy any of that in the short timeframe. However, it surely enriched me. Thank you.

David Abrahams

26 Dec 26 Dec

11:24 p.m.

Vladimir Batov wrote:

...

In fact, all that POD/non-POD "fight" seems to be a storm in a cup. The Standard (draft dated 2008-10-04) revised and refined definitions of POD. Now POD is a "trivial class/type" (section 9).

Yes, but as of today I think we still need to consider C++03 our target standard. Nobody even has a C++0x compiler yet. -- David Abrahams Boostpro Computing http://www.boostpro.com

David Abrahams

7:35 a.m.

on Thu Dec 25 2008, Adam Merz <adammerz-AT-hotmail.com> wrote:

...

Vladimir Batov writes:

...
1. Boost.MPI efficiency does not seem to rely on PODness. Rather it seem to be due to serialization (or rather ability to bypass it).

This isn't technically correct, I think; in MPI's case (though not Interprocess'), the type must be serializable regardless, but the ideal efficiency scenario comes from specializing both boost::mpi::is_mpi_datatype and boost::serialization::is_bitwise_serializable. Note that the documentation for these traits ([1] and [2], respectively) both specifically mention POD types -- this is no coincidence.

[1] http://www.boost.org/doc/libs/1_37_0/doc/html/boost/mpi/is_mpi_datatype.html [2] http://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/traits.html

Vladimir is technically correct in fact. It just so happens that PODs are *guaranteed* to satisfy both the necessary qualifications. In practice, many non-PODs will also do that, but of course that's not guaranteed to be true, portably, for most non-POD types. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

6042

Age (days ago)

6047

Last active (days ago)

List overview

Download

14 comments

4 participants

participants (4)

Adam Merz
David Abrahams
Scott McMurray
Vladimir Batov