
Just a small correction, the representation would be "" (without the quotes but _with_ the semicolon ) \LG "Robert Ramey" <ramey@rrsd.com> wrote in message news:dbm8l9$4ud$1@sea.gmane.org...
This is a great email. It illustrates why I tend to drag my feet on things like this. This is not going to be addressed right away so feel free to investigate and discuss it.
FWIW I personally would like options 1 - use anyway - basically because it would preserve the idea that an xml_archive can do anything any other archive can do and doesn't ripple XML - ness back into the library or user programs. But even this is not so trivial. Its not clear to me whether it should apply to all non-printable character. This then raises the issue of what is non-printable in a UTF context. Then it makes me wonder what the "encoding" attribute in XML is for in a UTF file. This is a perfect example how something that seems simple at first glance turns in to a really time consuming issue.
I've never warmed up to XML myself. I learned enough of the details to implement xml_?archive but I still never learned to like it. The only thing I've found it useful for is checking that load/save functions match. The xml_archive classes check that the end tag is found in the right place and in fact matches the start tag so any difference in the save / load functions throws an exception. So if I have an obscure problem I test using xml_archive.
Other than the above, the only utility I can see for the xml_?archive is as some sort of bridge to the "outside world". That's why I set aside the original string representation - as a sequence of numbers - in favor of the current one - a text string. The mismatch between what std::string does and xml text data does is the source of the problem.
I would hope that some smart person can find the sentence, in the paragraph, on the page, in the chapter of the relevant document which can deal with this is some sort of comforming way.
Good Luck
Robert Ramey
Eelis van der Weegen wrote:
Jonathan Wakely wrote:
It's not a valid entity, using it means your XML is not well-formed. It doesn't matter whether you say or (the decimal and hexadecmial forms are exactly equivalent - but 0 is still not a validnumerical entity.)
Yes, in XML 1.1, the null character is a special case by itself; ordinary nonprintable characters can be embedded as numerical character references, but the null character cannot (see the "Legal Character" well-formedness constraint for production 66).
As long as you can read the same data back and restore the same sequence of bytes it doesn't really matter.
I strongly agree with Robert that further processing of generated XML archives by external tools is one of the main strengths of XML archives and should be the main concern when evaluating our options when it comes to dealing with this problem. That said, I see the following options:
1. Use anyway.
I've googled around a bit and found that 's being generated by one tool in a toolchain and rejected by the next is a reasonably common problem, so I don't really like this option.
2. Encode it using some escape sequence: <foo>bar\0bas</foo>
This would introduce an extra grammar layer that software used for further processing must parse.
3. Encode it using a dedicated element: <foo>bar<serialization:null/>bas</foo>
This seems like a reasonable way to encode null characters, but wouldn't work in attribute values.
4. Encode strings containing null characters using binary encodings such as those defined by XML Schema's data types:
http://www.w3.org/TR/xmlschema-2/#base64Binary http://www.w3.org/TR/xmlschema-2/#hexBinary
This would require some additional flag that indicates whether a string is encoded textually or binary (unless of course all strings are encoded this way, but then we'd lose the human-readability of strings in XML archives).
5. Disallow serialization of std::(w)strings that contain null characters to XML archives.
This is my personal favorite. XML's normal character data is simply inherently textual and not suited to storing binary data containing null characters. We shouldn't try to hack around this. Doing so would only make things complicated in further external processing. If users insist on storing binary fragments in their XML archives they can always resort to vector<char> (by the way, the binary encodings I mentioned above might be very nice for storing things like vector<char> efficiently).
Eelis
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost