
On Fri, Apr 15, 2011 at 05:14:03PM -0400, Frank Mori Hess wrote:
On Friday, April 15, 2011, Tijmen van Voorthuijsen wrote:
Recently I ran into the problem that the boost::serialization library could not handle XML files which contain the three UTF-8 BOM (Byte Order Mark) bytes.
I propose to enhance the xml_warchive and text_warchive for reading with support of the BOM bytes. Example: This logic seems wrong. Just because the first byte is 0xef doesn't mean it's necessarily a BOM.
If it's supposed to be well-formed XML, there's nothing in the mandatory 'prolog' production that can have the value 0xef as the first octet. An XML document in UTF-16 MUST have a BOM, and MAY have a BOM in UTF-8. Unless indicated externally (MIME, other framing), an XML processor MUST be able to handle the precense of BOMs, and MUST be able to process the UTF-8 and UTF-16 families of encodings. Of course, I may have misread the specification (XML 1.0 5e), feel free to show a well-formed counter-example. -- Lars Viklund | zao@acc.umu.se