Re: [boost] [serialization] Add UTF-8 BOM support to xml_warchive

15 Apr 2011

      On Fri, Apr 15, 2011 at 05:14:03PM -0400, Frank Mori Hess wrote:
...
On Friday, April 15, 2011, Tijmen van Voorthuijsen wrote:
...
Recently I ran into the problem that the boost::serialization library
could not handle XML files which contain the three UTF-8 BOM (Byte Order
Mark) bytes.
I propose to enhance the xml_warchive and text_warchive for reading with
support of the BOM bytes. Example:
This logic seems wrong.  Just because the first byte is 0xef doesn't mean 
it's necessarily a BOM.
If it's supposed to be well-formed XML, there's nothing in the mandatory
'prolog' production that can have the value 0xef as the first octet.

An XML document in UTF-16 MUST have a BOM, and MAY have a BOM in UTF-8.
Unless indicated externally (MIME, other framing), an XML processor
MUST be able to handle the precense of BOMs, and MUST be able to process
the UTF-8 and UTF-16 families of encodings.

Of course, I may have misread the specification (XML 1.0 5e), feel free
to show a well-formed counter-example.

-- 
Lars Viklund | zao@acc.umu.se