
Adam, Wow, that was one passionate reply. Was it something that I said? ;-)
While you may not see the "magic" in POD types, I can't fathom what exactly you have against them either.
Well, I most certainly do not have anything against anything. It's nothing personal. If my emails came across as such, my humble apologies.
Do you find them more confusing or harder to use? Do you find static initialization syntax aesthetically offensive? Is it your (no offense, but extremely misguided, IMO) lingering impression that POD types are a legacy of C that should be ignored whenever possible? A list of examples in favor of making UUID a POD type was presented, and you've argued against those examples without actually saying what you think the drawback is.
That's quite an emotionally charged list you compiled. It did not have to be such. I certainly do not find aggregates confusing or anything of that sort. In fact, I was very happy with them for about 10 years while coding in C before I switched to C++ in early 90ies. PODs've come from C and, therefore, they *are* a legacy of C and are not called Plain Old Data for nothing. In C++ though do find aggregates limiting. With regard to uuid it'd be no user-provided constructors, no guaranteed invariant, no private or protected non-static data members. And that is fundamental (my view of course) to C++ -- "it is important and fundamental to have constructors acquire resources and establish a simple invariant" (Stroustrup E.3.5). Then, "One of the most important aims of a design is to provide interfaces that can remain stable in the face of changes" (Stroustrup 23.4.3.5). PODs do restrict interfaces and are wide-open implementation-wise. That opens the door for mis-use, complicates long-term maintainability. So, unless PODs provide some killer feature in return (that cannot be achieved otherwise), I do not see the point of paying that price.
That's what *I* see (caveat: I admit not knowing much about Boost.MPI and Boost.Interprocess requirements and expectations).
Again, no offense intended, but I find it a bit discomfiting that the person arguing most vocally on this issue would make this admission. Just because you don't have personal knowledge of a use case where UUID being a POD type would be greatly beneficial doesn't mean such a use case doesn't exist.
First, you are right about "most vocally". I too had that growing concern that there was somewhat too much of me lately on the list. Apologies. In my defence I might say I do not usually do that. My weak point is that once I get onto something, I tend to follow it through to completion (well, some might consider that to be a good thing). Point taken though, I'll try answering your email (hopefully to your satisfaction) and will turn it down. Secondly, I personally do not see anything wrong with the admission -- I use some libs extensively, some occasionally and do not use some at all. I suspect it is quite typical. Stating your knowledge IMO clears up a lot of possible and unnecessary confusion and many other emotions. Thirdly, I am not sure I said "such a use case doesn't exist", did I? If I did, I probably did not mean that. :-) What I am questioning though is the "greatly beneficial" part. I am glad to see that part is already obvious to you. I hope it's not a just hunch and you have hard data to back it up.
1. Boost.MPI efficiency does not seem to rely on PODness. Rather it seem to be due to serialization (or rather ability to bypass it).
This isn't technically correct, I think; in MPI's case (though not Interprocess'), the type must be serializable regardless, but the ideal efficiency scenario comes from specializing both boost::mpi::is_mpi_datatype and boost::serialization::is_bitwise_serializable. Note that the documentation for these traits ([1] and [2], respectively) both specifically mention POD types -- this is no coincidence.
[1] http://www.boost.org/doc/libs/1_37_0/doc/html/boost/mpi/is_mpi_datatype.html [2] http://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/traits.html
Yes, my wording was somewhat crude. I presume you have a lot of practical experience with MPI and you can say with authority that PODness is a must for MPI's efficiency. Would you mind providing some experimental data that you observed? My knowledge of MPI is from reading docs (I probably should stop making these discomforting admissions). There I got that impression that serializable non-aggregate classes could be made efficient too.
... I think you're missing the larger point. In modern C++, types intentionally created as POD types are often (not always) done so to absolutely maximize the efficiency of copying that type.
I do not understand "PODness to absolutely maximize the efficiency of copying" as I believe class NonAggregateFoo { ... int int_; }; is copied as efficiently as a raw 'int'. And NonAggregateFoo bunch[] can be memcopied as well PODFoo bunch[] (I am not advocating that but simply stating the fact). And I do not expect the respective template<class Archive> void serialize(Archive ar, unsigned int) { ar & int_; } to be that slow (with appropriately chosen Archive). Again, here you might well know more than I do. Tell me then.
The existance of the is_pod type trait in boost.type_traits/TR1/C++0x reinforces this -- e.g. in many implementations, std::copy will use memcpy to perform an ideally efficient copy when is_fundamental<T>::value || is_pod<T>::value. Additionally, a POD type's synthesized copy constructor is generally merely a memcpy.
Understood. It does not make copying of non-aggregates inefficient though. Non-automatic 'yes', inefficient 'no'.
...
If it is a MPI implementation-specific restriction/limitation, I'd expect we'd look at addressing it in MPI rather than shaping other classes to match it.
This is an unreasonable thought process, IMO. If a type has an good use case with another library (in this case, UUID with Serialization/MPI/Interprocess), it's up to the type to conform to the library in an ideal fashion, not the other way around. E.g., lexical_cast and serialization don't go out of their way to work with every other type in Boost, but many types in boost have serialization and lexical_cast support.
Well, again my initial wording was somewhat crude. I still stand by its meaning though. A general-purpose library should be accommodating/considerate rather than imposing. And from what I read about MPI that's the approach taken there. As for lexical_cast, it is the same -- it imposes the requirement of op>>, op<<, the def. cnstr. However, instead of rejecting non-conformant classes, it leaves the door open and accommodates those via specialization and at least as efficiently. Boost.Serialization? Same. In fact, they *do* "go out of their way to work" with as many types as possible. I think I can talk about Boost.Serialization with a little bit of confidence (as I've been using it quite extensively). I know that the library tries so remarkably hard to keep everyone happy -- optimization? yes; no-default constructors? no problem; separate load/save logic? bring it on; intrusive/non-intrusive serialization? piece of cake... the list is long.
2. Scott, you correctly mention that most often we "don't want to send UUIDs by themselves". The thing is that chances of that bigger class being a POD are diminishing dramatically (if not already infinitely close to 0).
This is extremely off base, and points back to your lack of knowledge regarding MPI, I think.
Uhm, what exactly is extremely off-base here? And what does MPI have to do with it? The bigger a class, the smaller the chance it can conform to the limitations of POD. I am currently "serving time" in the railway industry and dealing with Trains, TrackCircuits, Signals, Stations, (damn long list). All use uuids and are used in inter-process inter-machine communications. I cannot imagine those classes to be PODs.
When writing an app/library/algorithm intended for use in a high-performance parallel context, one goes out of their way to use POD types extensively, for the sake of performance. Yes, the fact that MPI works with boost.serialization is nice, but when performance is critical, memcpy'able types are key;
First, I am under impression that non-aggregate non-virtual objects are as memcopyable (with usual caveats) as PODs are. Second, I feel boost.serialization still can be optimized for performance. See, http://www.boost.org/doc/libs/1_37_0/doc/html/mpi/tutorial.html#mpi.serializ.... Plus binary archives (or your custom archives) can carry a very limited overhead. Still, I do not know much about MPI (Oops, I did it again! ;-)).
... I think to argue that a type such as UUID (which is a low-level, fundamental value type, and specifically *very* likely to be used in an inter-process context) should *not* automatically work in an ideal fashion in this scenario, one must have an *extremely* convincing argument, IMO. And so far, I haven't seen one presented. ;-)
As for inter-process context, then if it is on the same machine (in shared memory), then there is no that exclusive PODness quality that allows objects to be stored/accessed in shared memory -- non-aggregate non-virtual objects are as good for that as PODs. If that is over the network, then I suspect we have many more things to worry about efficiency- and data consistency/integrity-wise. Say, network latency, synchronization, node dropouts, (a long list). As for "an *extremely* convincing argument", then I somehow haven't seen one either so that I'd say "indeed, non-aggregates cannot do that, POD is the king". But I might not know something you do (gosh, it's turning into some "disturbing" confession now ;-)) but that's OK, right?
3. As for deployment of an object in shared memory, it does not have to be a POD either.
Please take another look at the specific link Scott provided ([4]); boost::interprocess::message_queue only copies raw bytes between processes, so for non-POD types generally that requires that an object be binary serialized before sending. However, for a POD type, binary serialization is a completely redundant process (read: a complete waste of CPU cycles); one can just send the bytes of the object directly, and as an added bonus, avoid becoming dependant on the somewhat heavy serialization library altogether.
Yes, I hear you. I just do not know how big deal that is. I can only argue this point with any conviction after I try optimized binary serialization vs. memcopy. If you tried, then I'd love to hear that. If you did not, then I am still unsure of *real* tangible benefits on PODness.
Again, the fact that this might be possible even if UUID were not a POD type is somewhat irrelevant,
I disagree. It is relevant to me and surely many others working on higher abstraction levels. POD comes with conditions. I need to know if I want to pay that price. Therefore, I never buy into theoretical efficiency debates -- I write stuff, I profile the stuff, I fix the actual (not imagined) bottlenecks.
... I want to touch on a few other points as well, were UUID to be a POD type:
1. The default constructor behavior/existance debate would be put to rest. ;-)
Well, at the expence of initial invalid invariant state? I think, I'd rather agree to the nil-behavior of uuid. Again, "it is important and fundamental to have constructors acquire resources and establish a simple invariant" (Stroustrup E.3.5).
2. The efficiency of lexical_cast would be better than *any* default constructor behavior, regardless of which one was ultimately decided upon.
I think you are referring to the non-initialized instance in the default lexical_cast<uuid>(string). It might or might not be correct though -- writing to and reading from those streams might have real impact instead of initialization or no initialization. Not profiled that though.
... 4. Initializing a nil UUID would become more succinct. Contrast 'uuid id(uuid::nil());' and 'uuid id = {};', or 'id(uuid::nil())' and 'id()' in a constructor initialization list. Assuming any level of familiarity with aggregates, the latter are much more concise, IMO. (And C++0x will certainly introduce that familiarity if one doesn't have it already.)
Here comes Vladimir disagreeing again (and not because he is not familiar with or afraid of aggregates). It is because I feel that "uuid id = {0};" exposes too much implementation detail and assumes the user knows that the invalid uuid is all zeros. If, say, tomorrow the Standard changes the value of nil, all my code becomes invalid. It might not be the case with uuid. However, it is the principle/coding habit I am talking about.
5. Static initialization has been greatly underrated so far in this discussion. My first use case for a Boost UUID library would be to replace some homegrown COM/XPCOM encapsulation code. In dealing with COM/XPCOM, it is *extremely* common to have hardcoded UUIDs, and *many* of them. Trivial work though it may be, spending application/library startup time initializing hundreds/thousands of UUIDs when they could be statically initialized is senseless.
I believe you'll be able to do that if we do class uuid { template<class Range> uuid(Range range); } Then you'll be able to feed your hard-coded initialization data to uuid.
6. Regarding the potential for uninitialized state: I personally view UUID as a fundamental, borderline primitive type (others will almost certainly disagree); uninitialized state is generally understood and accepted for actual primitive types, so why should it be such a scary concept for UUID?
It's certainly not scary. It's just not in C++ spirit (see quotes at the top of the email) and everyone knows what primitive types are. I do not think people expect other types to behave that way.
7. Lastly, to reiterate: this is C++. Every type, every library, every algorithm should be written with performance and efficiency as primary considerations.
I do not think C++ was designed "with performance and efficiency as primary considerations". And I do not think applications "should be written with performance and efficiency as primary considerations". Don't get up in arms -- those considerations are important. I object to the "primary" part. I do not think I even need to debate this -- Knuth, Stroustrup and many others have done that.
... There are demonstrable use cases where UUID can work more efficiently as a POD type,
Call me thick but I did not see those convincing use-cases showing PODs considerably more efficient than non-aggregates. Easier? Yes. *Seemingly* more efficient? Yes. How much more efficient? I dunno if that is palpably real.
but no convincing arguments have been presented in favor of non-PODness.
Oh, c'mon. How 'bout readng "The C++ Progr. Lang." and the "Evolution of C++" books? Discussions there do not revolve around aggregates. Best, V.