
Jonathan Wakely wrote:
So stop using XML. If you're not going to write well-formed XML (which means no or or etc.) then why bother writing XML? XML is verbose, inefficient and has a number of complicated details. Its main advantage is interoperability and the availablity of compatible tools. If you produce non-well-formed XML then you can't use any existing tools, so you've invented your own markup langguage with most of the drawbacks of XML and none of the advantages!
I agree - that's why I don't use it.
IMHO you should do is produce well-formed XML.
That's what we're trying to do.
I would hope that some smart person can find the sentence, in the paragraph, on the page, in the chapter of the relevant document which can deal with this is some sort of comforming way.
Either:
1) Store all strings in a hexadecimal or base64 representation. This allows any arbitrary sequence of bytes to be mapped to a portable subset of ASCII characters.
That's the way the first version worked - a lot of people were unhappy with it.
2) Store strings normally, unless they contain invalid characters, in which case put the string in a <hex> or <base64> element and use hex/base64 to store the string.
A worthy suggestion.
The advantage of 1) is consistency. The advantage of 2) is human readibility for most strings - only unrepresentable ones are not human readable.
agreed. The fundemental proble is the a std::basic string can hold data that cannot be represented in an XML string.
Do you turn all strings to UTF-8 ?
currently it works like this: a) std::string are written to the xml file using the current stream locale. Actually I use a "null" codecvt facet to work around the fact that the standard facet molests the input/output string. b) std:wstring are converted to UTF-8 using an stream codecvt facet. The library would permit any codecvt facet to be used. (Hmm - this might be the place to permit the user to insert his own decision about how to deal with this problem. The more I think about this - the more I like it)
I think there is a strong argument for not doing anything encoding-related to strings, just store the bytes exactly as they are, unless that would produce an invalid XML doc, in which case use hex or base64. Otherwise you impose a semantic meaning on the bytes in a std::string that may not be present, namely "this string contains text data that can be stored in an XML text node". C++ allows ANY bytes in a std::string and does not require those bytes to form a valid UTF-8 string, or a valid ASCII string, or any other restriction.
We're in agreement here as well. I very much want to maintain the independence of the archive from the serlializaiton. This means that the serialization of data is not in any way dependent on the type of archive to be used. Robert Ramey