Re: [boost] [serialization] Why can't hyphens appear in NVP's?

18 Jul 2005

      On Mon, Jul 18, 2005 at 09:48:11AM -0700, Robert Ramey wrote:
...
Jonathan Wakely wrote:
...
On Mon, Jul 18, 2005 at 08:11:31AM -0700, Robert Ramey wrote:
...
Hmm, I've twiddled with the set of allowable characters from time to
time on sort of an ad hoc basis.  For some reason it never occured
to me to actually try and find the difinitive source for this.  So I
suppose there are couple
Assuming you're referring to XML, it's here:
http://www.w3.org/TR/REC-xml
...
of pending fine points here:
a) the exact rules for what characters are legal in which part of
tag names. This might not be all that obvious given that the html
can be coded in wide characters then to utf-8.  Also the narrow
character version is coded with the current locale so that's another
story.
A character is a character,  how it is encoded is irrelevent.
Thanks for the link.
That's not obvious to me - especially when one is using a locale specific
character set.  Maybe XML requires that that all characters be ucs-16 (or
32) or some such thing but as a practical matter lots of people are still
using locale-specific types for strings.  So its not obvious what the
implications are of including a '\0' as part of text string in and xml
archive.  This is one of those things that seemed simple when I started but
ran into a lot of small "gotchas' as time when on.
I agree that's a harder problem than just "can character X be used in
an element name"  :-)

The '\0' character is not valid anywhere in XML, in any encoding.  I
don't know the reasoning but it means you have to use some kind of
alternative representation for data that could contain NULs.

If you're talking text strings with embedded NULs then you might need to
define an entity that can stand in for the NUL, so you can expand it
back to NUL when you recreate the string from the XML archive, or put
all strings that might contain NULs in an element like <hex> and
hex-encode the bytes.  There might be other solutions too, but I've not
used them.

jon