Re: [boost] [serialization] Add UTF-8 BOM support to xml_warchive

15 Apr 2011


      On Friday, April 15, 2011, Tijmen van Voorthuijsen wrote:
...
Recently I ran into the problem that the boost::serialization library
could not handle XML files which contain the three UTF-8 BOM (Byte Order
Mark) bytes.
...
I propose to enhance the xml_warchive and text_warchive for reading with
support of the BOM bytes. Example:
namespace
{
    const wchar_t    g_cchUtf8Bom1   = 0xEF;
    const wchar_t    g_cchUtf8Bom2   = 0xBB;
    const wchar_t    g_cchUtf8Bom3   = 0xBF;
}
void CheckAndCorrectUtf8Bom(std::wifstream* pifs)
{
    _ASSERT_POINTER(pifs);
wchar_t  chUtf8Bom1   = 0;
    wchar_t  chUtf8Bom2   = 0;
    wchar_t  chUtf8Bom3   = 0;
chUtf8Bom1 = pifs->peek();
    if (chUtf8Bom1 == g_cchUtf8Bom1)
    {
        *pifs >> chUtf8Bom1;
        _ASSERT(chUtf8Bom1 == g_cchUtf8Bom1);
        *pifs >> chUtf8Bom2;
        _ASSERT(chUtf8Bom2 == g_cchUtf8Bom2);
        *pifs >> chUtf8Bom3;
        _ASSERT(chUtf8Bom3 == g_cchUtf8Bom3);
This logic seems wrong.  Just because the first byte is 0xef doesn't mean 
it's necessarily a BOM.

Re: [boost] [serialization] Add UTF-8 BOM support to xml_warchive

Frank Mori Hess