
Jonathan Wakely wrote:
It's not a valid entity, using it means your XML is not well-formed. It doesn't matter whether you say or (the decimal and hexadecmial forms are exactly equivalent - but 0 is still not a validnumerical entity.)
Yes, in XML 1.1, the null character is a special case by itself; ordinary nonprintable characters can be embedded as numerical character references, but the null character cannot (see the "Legal Character" well-formedness constraint for production 66).
As long as you can read the same data back and restore the same sequence of bytes it doesn't really matter.
I strongly agree with Robert that further processing of generated XML archives by external tools is one of the main strengths of XML archives and should be the main concern when evaluating our options when it comes to dealing with this problem. That said, I see the following options: 1. Use anyway. I've googled around a bit and found that 's being generated by one tool in a toolchain and rejected by the next is a reasonably common problem, so I don't really like this option. 2. Encode it using some escape sequence: <foo>bar\0bas</foo> This would introduce an extra grammar layer that software used for further processing must parse. 3. Encode it using a dedicated element: <foo>bar<serialization:null/>bas</foo> This seems like a reasonable way to encode null characters, but wouldn't work in attribute values. 4. Encode strings containing null characters using binary encodings such as those defined by XML Schema's data types: http://www.w3.org/TR/xmlschema-2/#base64Binary http://www.w3.org/TR/xmlschema-2/#hexBinary This would require some additional flag that indicates whether a string is encoded textually or binary (unless of course all strings are encoded this way, but then we'd lose the human-readability of strings in XML archives). 5. Disallow serialization of std::(w)strings that contain null characters to XML archives. This is my personal favorite. XML's normal character data is simply inherently textual and not suited to storing binary data containing null characters. We shouldn't try to hack around this. Doing so would only make things complicated in further external processing. If users insist on storing binary fragments in their XML archives they can always resort to vector<char> (by the way, the binary encodings I mentioned above might be very nice for storing things like vector<char> efficiently). Eelis