[serialization] Add UTF-8 BOM support to xml_warchive

15 Apr 2011

      Hi,

Recently I ran into the problem that the boost::serialization library could not handle XML files which contain the three UTF-8 BOM (Byte Order Mark) bytes. The serialization library creates XML files without the BOM bytes but when saving such files in an external Windows program, for example XML Notepad, these bytes are automatically added. Thereafter the XML file cannot be read anymore by the boost::serialization library.

According to Wikipedia http://en.wikipedia.org/wiki/Byte_order_mark the UTF-8 BOM is optional and therefore creating XML files without the BOM bytes is all right.
However, the reading should extended and be able to handle both types, files with and without the BOM.

I propose to enhance the xml_warchive and text_warchive for reading with support of the BOM bytes.
Example:

namespace
{
    const wchar_t    g_cchUtf8Bom1   = 0xEF;
    const wchar_t    g_cchUtf8Bom2   = 0xBB;
    const wchar_t    g_cchUtf8Bom3   = 0xBF;
}

void CheckAndCorrectUtf8Bom(std::wifstream* pifs)
{
    _ASSERT_POINTER(pifs);

    wchar_t  chUtf8Bom1   = 0;
    wchar_t  chUtf8Bom2   = 0;
    wchar_t  chUtf8Bom3   = 0;

    chUtf8Bom1 = pifs->peek();
    if (chUtf8Bom1 == g_cchUtf8Bom1)
    {
        *pifs >> chUtf8Bom1;
        _ASSERT(chUtf8Bom1 == g_cchUtf8Bom1);
        *pifs >> chUtf8Bom2;
        _ASSERT(chUtf8Bom2 == g_cchUtf8Bom2);
        *pifs >> chUtf8Bom3;
        _ASSERT(chUtf8Bom3 == g_cchUtf8Bom3);
    }
    else
    {
        // Reset to start of the stream
        pifs->seekg(0, std::ios_base::beg);
    }
}

Kind regards,
Tijmen van Voorthuijsen

-----------------------------------------------------------------------------------
T. van Voorthuijsen
Senior System Engineer

Noldus Information Technology bv
Nieuwe Kanaal 5
P.O. Box 268
6700 AG Wageningen
The Netherlands

Phone: +31-(0)317-473300
Fax: +31-(0)317-424496
E-mail:   T.van.Voorthuijsen@Noldus.nl<mailto:T.van.Voorthuijsen@Noldus.nl>
Web:      www.noldus.com<http://www.noldus.com>

Tijmen van Voorthuijsen

Frank Mori Hess

Lars Viklund

tags

participants (3)