Re: [Boost-users] Filesystem, serialization, character encoding and portable software

18 Aug 2008

      Robert Ramey wrote:
...
Daniel Krügler wrote:
...
Robert Ramey wrote:
...
the wide character xml archives use UTF8.
the narrow character xml archives use the currently set locale.
Robert Ramey
Sorry for asking offhand:
What is the reasoning behind this different behaviour?
I assumed that most programs built with narrow characters used
the locale concept to deal with this.
Wide character systems lend themselves to UTF coding so I
used that for wide char archives.  In order to do this, I used
Ron Garcia's UTF code conversion facet for streams.
It would be quite easy to generate UTF coding for narrow
character archives.  Just do the following:
a) Build the UTF code conversion facet for narrow character
input (its templated on character type).
b) When the stream is opened, attach this facet to the stream.
Note the the output char format is not really a property of the 
serialization
library, but rather an artifact of the way it has been used.  That is, the
serialization library depends on the standard stream library for this
property.
Thanks for your thorough explanation, Robert. There remains a
slight bad feeling in my stomach (Apologies for a possibly
inappropriate metaphorical speaking): Many programs are written
to be compilable (and executable) in both narrow character or
wide character mode. The above described difference of the
serialization library unfortunately seem to have the effect that
those two programs could not interact with the same persisted 
serialization product, right? Or to say it in different words:
If the programmer decides to switch e.g. from one character mode
to the other (a typical usecase I think), (s)he has to take
care of those possibly needed extra steps to realize compatibility
of the serialization IO. This is especially quite cumbersome,
because the more typical way would be to switch from narrow
character to wide character mode. In this case the serialization
has already caused harm, because the old code had created
output which is locale-dependent, while the newer code is free
of this local-dependency, but has now the problem to interpret
existing serialization outputs.

Have I understood this effect correctly?

Thanks,

- Daniel