
On Friday, April 15, 2011, Tijmen van Voorthuijsen wrote:
Recently I ran into the problem that the boost::serialization library could not handle XML files which contain the three UTF-8 BOM (Byte Order Mark) bytes.
I propose to enhance the xml_warchive and text_warchive for reading with support of the BOM bytes. Example:
namespace { const wchar_t g_cchUtf8Bom1 = 0xEF; const wchar_t g_cchUtf8Bom2 = 0xBB; const wchar_t g_cchUtf8Bom3 = 0xBF; }
void CheckAndCorrectUtf8Bom(std::wifstream* pifs) { _ASSERT_POINTER(pifs);
wchar_t chUtf8Bom1 = 0; wchar_t chUtf8Bom2 = 0; wchar_t chUtf8Bom3 = 0;
chUtf8Bom1 = pifs->peek(); if (chUtf8Bom1 == g_cchUtf8Bom1) { *pifs >> chUtf8Bom1; _ASSERT(chUtf8Bom1 == g_cchUtf8Bom1); *pifs >> chUtf8Bom2; _ASSERT(chUtf8Bom2 == g_cchUtf8Bom2); *pifs >> chUtf8Bom3; _ASSERT(chUtf8Bom3 == g_cchUtf8Bom3);
This logic seems wrong. Just because the first byte is 0xef doesn't mean it's necessarily a BOM.