Re: [boost] [UUID] PODness Revisited

25 Dec 2008

      Vladimir Batov writes:
...
OK, I thought some more about those PODs and I admittedly do not see any
magic in that Poor Old Dude. Scott, you are clearly excited about them, so
pls help me out here.
While you may not see the "magic" in POD types, I can't fathom what exactly
you have against them either. Do you find them more confusing or harder to
use? Do you find static initialization syntax aesthetically offensive? Is it
your (no offense, but extremely misguided, IMO) lingering impression that POD
types are a legacy of C that should be ignored whenever possible? A list of
examples in favor of making UUID a POD type was presented, and you've argued
against those examples without actually saying what you think the drawback is.
...
That's what *I* see (caveat: I admit not knowing much about Boost.MPI and
Boost.Interprocess requirements and expectations).
Again, no offense intended, but I find it a bit discomfiting that the person
arguing most vocally on this issue would make this admission. Just because you
don't have personal knowledge of a use case where UUID being a POD type would
be greatly beneficial doesn't mean such a use case doesn't exist.
...
1. Boost.MPI efficiency does not seem to rely on PODness. Rather it seem to
be due to serialization (or rather ability to bypass it).
This isn't technically correct, I think; in MPI's case (though not
Interprocess'), the type must be serializable regardless, but the ideal
efficiency scenario comes from specializing both boost::mpi::is_mpi_datatype
and boost::serialization::is_bitwise_serializable. Note that the documentation
for these traits ([1] and [2], respectively) both specifically mention POD
types -- this is no coincidence.

[1] http://www.boost.org/doc/libs/1_37_0/doc/html/boost/mpi/is_mpi_datatype.html
[2] http://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/traits.html
...
I feel, that Andy's UUID class is well-suited to add no-serialization
transmission functionality when needed (without the rest of us paying for
that). That is, conceptually Andy's full-fledged UUID does not have that
limitation.
While this is generally correct, I think you're missing the larger point. In
modern C++, types intentionally created as POD types are often (not always)
done so to absolutely maximize the efficiency of copying that type. The
existance of the is_pod type trait in boost.type_traits/TR1/C++0x reinforces
this -- e.g. in many implementations, std::copy will use memcpy to perform an
ideally efficient copy when is_fundamental<T>::value || is_pod<T>::value.
Additionally, a POD type's synthesized copy constructor is generally merely a
memcpy.

In order to specialize is_bitwise_serializable for a given type, tracking [3]
must be disabled for that type. In practice this means one should avoid using
pointers to that type inside of other serializable types. UUID's "PODness"
would allow the type itself to communicate to users that it is truely a value
type, and is meant to be used as such (i.e., always passed/held by value); one
wouldn't typically pass or hold a pointer to an int because it's so cheap to
copy, so one (hopefully) wouldn't be inclined to do that for another type that,
for all intents and purposes, presents itself as a primitive type.

[3]
http://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/special.html#obj...
...
If it is a MPI implementation-specific restriction/limitation, I'd expect
we'd look at addressing it in MPI rather than shaping other classes to match
it.
This is an unreasonable thought process, IMO. If a type has an good use case
with another library (in this case, UUID with Serialization/MPI/Interprocess),
it's up to the type to conform to the library in an ideal fashion, not the
other way around. E.g., lexical_cast and serialization don't go out of their
way to work with every other type in Boost, but many types in boost have
serialization and lexical_cast support.
...
2. Scott, you correctly mention that most often we "don't want to send UUIDs
by themselves". The thing is that chances of that bigger class being a POD
are diminishing dramatically (if not already infinitely close to 0).
This is extremely off base, and points back to your lack of knowledge
regarding MPI, I think. When writing an app/library/algorithm intended for use
in a high-performance parallel context, one goes out of their way to use POD
types extensively, for the sake of performance. Yes, the fact that MPI works
with boost.serialization is nice, but when performance is critical,
memcpy'able types are key; and this is C++, after all -- of course performance
is critical, otherwise why bother? ;-) So the more external (in relation to
the user's code) libraries that work out of the box, the better; I think to
argue that a type such as UUID (which is a low-level, fundamental value type,
and specifically *very* likely to be used in an inter-process context) should
*not* automatically work in an ideal fashion in this scenario, one must have
an *extremely* convincing argument, IMO. And so far, I haven't seen one
presented. ;-)
...
3. As for deployment of an object in shared memory, it does not have to be a
POD either.
Please take another look at the specific link Scott provided ([4]);
boost::interprocess::message_queue only copies raw bytes between processes, so
for non-POD types generally that requires that an object be binary serialized
before sending. However, for a POD type, binary serialization is a completely
redundant process (read: a complete waste of CPU cycles); one can just send
the bytes of the object directly, and as an added bonus, avoid becoming
dependant on the somewhat heavy serialization library altogether.

Again, the fact that this might be possible even if UUID were not a POD type
is somewhat irrelevant, IMO -- as important as it is to be able to use a type
ideally and efficiently, it is equally as important that a type _communicate
to the user_ its ideal usage. Personally, I would feel as though I was playing
with fire (and in a literal sense, would probably be invoking UB) by taking
the raw bytes underlying a non-POD type and using them with message_queue, to
the extent that I wouldn't bother, as much as it might vex me; when a type is
a POD type, I *know* immediately that it's safe -- no worries!

[4]
http://www.boost.org/doc/libs/1_37_0/doc/html/interprocess/synchronization_m...

I want to touch on a few other points as well, were UUID to be a POD type:

1. The default constructor behavior/existance debate would be put to rest. ;-)

2. The efficiency of lexical_cast would be better than *any* default
   constructor behavior, regardless of which one was ultimately decided upon.

3. One would typically initialize a UUID in exactly the same manner as though
   it had no default constructor, which most people seem to find an acceptable
   option.

4. Initializing a nil UUID would become more succinct. Contrast
   'uuid id(uuid::nil());' and 'uuid id = {};', or 'id(uuid::nil())' and
   'id()' in a constructor initialization list. Assuming any level of
   familiarity with aggregates, the latter are much more concise, IMO. (And
   C++0x will certainly introduce that familiarity if one doesn't have it
   already.)

5. Static initialization has been greatly underrated so far in this
   discussion. My first use case for a Boost UUID library would be to replace
   some homegrown COM/XPCOM encapsulation code. In dealing with COM/XPCOM, it
   is *extremely* common to have hardcoded UUIDs, and *many* of them. Trivial
   work though it may be, spending application/library startup time
   initializing hundreds/thousands of UUIDs when they could be statically
   initialized is senseless.

6. Regarding the potential for uninitialized state: I personally view UUID as
   a fundamental, borderline primitive type (others will almost certainly
   disagree); uninitialized state is generally understood and accepted for
   actual primitive types, so why should it be such a scary concept for UUID?

7. Lastly, to reiterate: this is C++. Every type, every library, every
   algorithm should be written with performance and efficiency as primary
   considerations. There are demonstrable use cases where UUID can work more
   efficiently as a POD type, but no convincing arguments have been presented
   in favor of non-PODness. And as Scott and David mentioned, UUID could be
   trivially wrapped in a non-POD if one were so inclined, and they'd even get
   to choose their own default constructor behavior. ;-)

Apologies for the lengthy post, but the threads regarding UUID thus far have
been also quite lengthy and as I haven't responded yet, I had a few strong
opinions stored up. ;-)
...
Best,
V.
Regards,
Adam Merz (Dodheim in #boost)