Re: [boost] [UUID] PODness Revisited

26 Dec 2008

      Adam,

Wow, that was one passionate reply. Was it something that I said? ;-)
...
While you may not see the "magic" in POD types, I can't fathom what 
exactly
you have against them either.
Well, I most certainly do not have anything against anything. It's nothing 
personal. If my emails came across as such, my humble apologies.
...
Do you find them more confusing or harder to
use? Do you find static initialization syntax aesthetically offensive? Is 
it
your (no offense, but extremely misguided, IMO) lingering impression that 
POD
types are a legacy of C that should be ignored whenever possible? A list 
of
examples in favor of making UUID a POD type was presented, and you've 
argued
against those examples without actually saying what you think the drawback 
is.
That's quite an emotionally charged list you compiled. It did not have to be 
such. I certainly do not find aggregates confusing or anything of that sort. 
In fact, I was very happy with them for about 10 years while coding in C 
before I switched to C++ in early 90ies. PODs've come from C and, therefore, 
they *are* a legacy of C and are not called Plain Old Data for nothing. In 
C++ though do find aggregates limiting. With regard to uuid it'd be no 
user-provided constructors, no guaranteed invariant, no private or protected 
non-static data members. And that is fundamental (my view of course) to 
C++ -- "it is important and fundamental to have constructors acquire 
resources and establish a simple invariant" (Stroustrup E.3.5). Then, "One 
of the most important aims of a design is to provide interfaces that can 
remain stable in the face of changes" (Stroustrup 23.4.3.5). PODs do 
restrict interfaces and are wide-open implementation-wise. That opens the 
door for mis-use, complicates long-term maintainability. So, unless PODs 
provide some killer feature in return (that cannot be achieved otherwise), I 
do not see the point of paying that price.
...
...
That's what *I* see (caveat: I admit not knowing much about Boost.MPI and
Boost.Interprocess requirements and expectations).
Again, no offense intended, but I find it a bit discomfiting that the 
person
arguing most vocally on this issue would make this admission. Just because 
you
don't have personal knowledge of a use case where UUID being a POD type 
would
be greatly beneficial doesn't mean such a use case doesn't exist.
First, you are right about "most vocally". I too had that growing concern 
that there was somewhat too much of me lately on the list. Apologies. In my 
defence I might say I do not usually do that. My weak point is that once I 
get onto something, I tend to follow it through to completion (well, some 
might consider that to be a good thing). Point taken though, I'll try 
answering your email (hopefully to your satisfaction) and will turn it down.

Secondly, I personally do not see anything wrong with the admission -- I use 
some libs extensively, some occasionally and do not use some at all. I 
suspect it is quite typical. Stating your knowledge IMO clears up a lot of 
possible and unnecessary confusion and many other emotions.

Thirdly, I am not sure I said "such a use case doesn't exist", did I? If I 
did, I probably did not mean that. :-) What I am questioning  though is the 
"greatly beneficial" part. I am glad to see that part is already obvious to 
you. I hope it's not a just hunch and you have hard data to back it up.
...
...
1. Boost.MPI efficiency does not seem to rely on PODness. Rather it seem 
to
be due to serialization (or rather ability to bypass it).
This isn't technically correct, I think; in MPI's case (though not
Interprocess'), the type must be serializable regardless, but the ideal
efficiency scenario comes from specializing both 
boost::mpi::is_mpi_datatype
and boost::serialization::is_bitwise_serializable. Note that the 
documentation
for these traits ([1] and [2], respectively) both specifically mention POD
types -- this is no coincidence.
[1] 
http://www.boost.org/doc/libs/1_37_0/doc/html/boost/mpi/is_mpi_datatype.html
[2] 
http://www.boost.org/doc/libs/1_37_0/libs/serialization/doc/traits.html
Yes, my wording was somewhat crude. I presume you have a lot of practical 
experience with MPI and you can say with authority that PODness is a must 
for MPI's efficiency. Would you mind providing some experimental data that 
you observed? My knowledge of MPI is from reading docs (I probably should 
stop making these discomforting admissions). There I got that impression 
that serializable non-aggregate classes could be made efficient too.
...
... I think you're missing the larger point. In
modern C++, types intentionally created as POD types are often (not 
always)
done so to absolutely maximize the efficiency of copying that type.
I do not understand "PODness to absolutely maximize the efficiency of 
copying" as I believe

class NonAggregateFoo { ... int int_; };

is copied as efficiently as a raw 'int'. And NonAggregateFoo bunch[] can be 
memcopied as well PODFoo bunch[] (I am not advocating that but simply 
stating the fact). And I do not expect the respective

    template<class Archive>
    void
    serialize(Archive ar, unsigned int)
    {
        ar & int_;
    }

to be that slow (with appropriately chosen Archive). Again, here you might 
well know more than I do. Tell me then.
...
The
existance of the is_pod type trait in boost.type_traits/TR1/C++0x 
reinforces
this -- e.g. in many implementations, std::copy will use memcpy to perform 
an
ideally efficient copy when is_fundamental<T>::value || is_pod<T>::value.
Additionally, a POD type's synthesized copy constructor is generally 
merely a
memcpy.
Understood. It does not make copying of non-aggregates inefficient though. 
Non-automatic 'yes', inefficient 'no'.
...
...
...
If it is a MPI implementation-specific restriction/limitation, I'd expect
we'd look at addressing it in MPI rather than shaping other classes to 
match
it.
This is an unreasonable thought process, IMO. If a type has an good use 
case
with another library (in this case, UUID with 
Serialization/MPI/Interprocess),
it's up to the type to conform to the library in an ideal fashion, not the
other way around. E.g., lexical_cast and serialization don't go out of 
their
way to work with every other type in Boost, but many types in boost have
serialization and lexical_cast support.
Well, again my initial wording was somewhat crude. I still stand by its 
meaning though. A general-purpose library should be 
accommodating/considerate rather than imposing. And from what I read about 
MPI that's the approach taken there. As for lexical_cast, it is the same --  
it imposes the requirement of op>>, op<<, the def. cnstr. However, instead 
of rejecting non-conformant classes, it leaves the door open and 
accommodates those via specialization and at least as efficiently. 
Boost.Serialization? Same. In fact, they *do* "go out of their way to work" 
with as many types as possible. I think I can talk about Boost.Serialization 
with a little bit of confidence (as I've been using it quite extensively). I 
know that the library tries so remarkably hard to keep everyone happy --  
optimization? yes; no-default constructors? no problem; separate load/save 
logic? bring it on; intrusive/non-intrusive serialization? piece of cake... 
the list is long.
...
...
2. Scott, you correctly mention that most often we "don't want to send 
UUIDs
by themselves". The thing is that chances of that bigger class being a 
POD
are diminishing dramatically (if not already infinitely close to 0).
This is extremely off base, and points back to your lack of knowledge
regarding MPI, I think.
Uhm, what exactly is extremely off-base here? And what does MPI have to do 
with it? The bigger a class, the smaller the chance it can conform to the 
limitations of POD. I am currently "serving time" in the railway industry 
and dealing with Trains, TrackCircuits, Signals, Stations, (damn long list). 
All use uuids and are used in inter-process inter-machine communications. I 
cannot imagine those classes to be PODs.
...
When writing an app/library/algorithm intended for use
in a high-performance parallel context, one goes out of their way to use 
POD
types extensively, for the sake of performance. Yes, the fact that MPI 
works
with boost.serialization is nice, but when performance is critical,
memcpy'able types are key;
First, I am under impression that non-aggregate non-virtual objects are as 
memcopyable (with usual caveats) as PODs are. Second, I feel 
boost.serialization still can be optimized for performance. See, 
http://www.boost.org/doc/libs/1_37_0/doc/html/mpi/tutorial.html#mpi.serializ.... 
Plus binary archives (or your custom archives) can carry a very limited 
overhead. Still, I do not know much about MPI (Oops, I did it again! ;-)).
...
... I think to
argue that a type such as UUID (which is a low-level, fundamental value 
type,
and specifically *very* likely to be used in an inter-process context) 
should
*not* automatically work in an ideal fashion in this scenario, one must 
have
an *extremely* convincing argument, IMO. And so far, I haven't seen one
presented. ;-)
As for inter-process context, then if it is on the same machine (in shared 
memory), then there is no that exclusive PODness quality that allows objects 
to be stored/accessed in shared memory -- non-aggregate non-virtual objects 
are as good for that as PODs. If  that is over the network, then I suspect 
we have many more things to worry about efficiency- and data 
consistency/integrity-wise. Say, network latency, synchronization, node 
dropouts, (a long list).

As for "an *extremely* convincing argument", then I somehow haven't seen one 
either so that I'd say "indeed, non-aggregates cannot do that, POD is the 
king". But I might not know something you do (gosh, it's turning into some 
"disturbing" confession now ;-)) but that's OK, right?
...
...
3. As for deployment of an object in shared memory, it does not have to 
be a
POD either.
Please take another look at the specific link Scott provided ([4]);
boost::interprocess::message_queue only copies raw bytes between 
processes, so
for non-POD types generally that requires that an object be binary 
serialized
before sending. However, for a POD type, binary serialization is a 
completely
redundant process (read: a complete waste of CPU cycles); one can just 
send
the bytes of the object directly, and as an added bonus, avoid becoming
dependant on the somewhat heavy serialization library altogether.
Yes, I hear you. I just do not know how big deal that is. I can only argue 
this point with any conviction after I try optimized binary serialization 
vs. memcopy. If you tried, then I'd love to hear that. If you did not, then 
I am still unsure of *real* tangible benefits on PODness.
...
Again, the fact that this might be possible even if UUID were not a POD 
type
is somewhat irrelevant,
I disagree. It is relevant to me and surely many others working on higher 
abstraction levels. POD comes with conditions. I need to know if I want to 
pay that price. Therefore, I never buy into theoretical efficiency 
debates -- I write stuff, I profile the stuff, I fix the actual (not 
imagined) bottlenecks.
...
...
I want to touch on a few other points as well, were UUID to be a POD type:
1. The default constructor behavior/existance debate would be put to rest. 
;-)
Well, at the expence of initial invalid invariant state? I think, I'd rather 
agree to the nil-behavior of uuid. Again, "it is important and fundamental 
to have constructors acquire resources and establish a simple invariant" 
(Stroustrup E.3.5).
...
2. The efficiency of lexical_cast would be better than *any* default
  constructor behavior, regardless of which one was ultimately decided 
upon.
I think you are referring to the non-initialized instance in the default 
lexical_cast<uuid>(string). It might or might not be correct though --  
writing to and reading from those streams might have real impact instead of 
initialization or no initialization. Not profiled that though.
...
...
4. Initializing a nil UUID would become more succinct. Contrast
  'uuid id(uuid::nil());' and 'uuid id = {};', or 'id(uuid::nil())' and
  'id()' in a constructor initialization list. Assuming any level of
  familiarity with aggregates, the latter are much more concise, IMO. (And
  C++0x will certainly introduce that familiarity if one doesn't have it
  already.)
Here comes Vladimir disagreeing again (and not because he is not familiar 
with or afraid of aggregates). It is because I feel that "uuid id = {0};" 
exposes too much implementation detail and assumes the user knows that the 
invalid uuid is all zeros. If, say, tomorrow the Standard changes the value 
of nil, all my code becomes invalid. It might not be the case with uuid. 
However, it is the principle/coding habit I am talking about.
...
5. Static initialization has been greatly underrated so far in this
  discussion. My first use case for a Boost UUID library would be to 
replace
  some homegrown COM/XPCOM encapsulation code. In dealing with COM/XPCOM, 
it
  is *extremely* common to have hardcoded UUIDs, and *many* of them. 
Trivial
  work though it may be, spending application/library startup time
  initializing hundreds/thousands of UUIDs when they could be statically
  initialized is senseless.
I believe you'll be able to do that if we do

class uuid
{
    template<class Range> uuid(Range range);
}

Then you'll be able to feed your hard-coded initialization data to uuid.
...
6. Regarding the potential for uninitialized state: I personally view UUID 
as
  a fundamental, borderline primitive type (others will almost certainly
  disagree); uninitialized state is generally understood and accepted for
  actual primitive types, so why should it be such a scary concept for 
UUID?
It's certainly not scary. It's just not in C++ spirit (see quotes at the top 
of the email) and everyone knows what primitive types are. I do not think 
people expect other types to behave that way.
...
7. Lastly, to reiterate: this is C++. Every type, every library, every
  algorithm should be written with performance and efficiency as primary
  considerations.
I do not think C++ was designed "with performance and efficiency as primary 
considerations". And I do not think applications "should be written with 
performance and efficiency as primary considerations". Don't get up in 
arms -- those considerations are important. I object to the "primary" part. 
I do not think I even need to debate this -- Knuth, Stroustrup and many 
others have done that.
...
... There are demonstrable use cases where UUID can work more
  efficiently as a POD type,
Call me thick but I did not see those convincing use-cases showing PODs 
considerably more efficient than non-aggregates. Easier? Yes. *Seemingly* 
more efficient? Yes. How much more efficient? I dunno if that is palpably 
real.
...
but no convincing arguments have been presented
  in favor of non-PODness.
Oh, c'mon. How 'bout readng "The C++ Progr. Lang." and the "Evolution of 
C++" books? Discussions there do not revolve around aggregates.

Best,
V.

Re: [boost] [UUID] PODness Revisited

Vladimir Batov