Re: [Boost-users] How does boost.serialization do with BOM in text/xmlfiles

what is BOM?
Probably "Byte Order Mark", see http://en.wikipedia.org/wiki/Byte-order_mark
Yes, That's what I meant. I was testing the demo_xml_load.cpp and demo_xml_save.cpp available in the boost.serialization example. By simply opening demo_save.xml produced by demo_xml_save.exe with XML copy editor(http://xml-copy-editor.sourceforge.net/) and saving it back, demo_xml_load.exe would crash. I compared the two files with Winmerge. It said it's identical. by studying the hex view, I later found it's because the 3-byte UTF-8 BOM was inserted to the beginning of file. It would not change the data, and in many cases was ignored by the text editors. I thinking that Boost.serialization should also handle this for all text files including XML. Tom

This is news to me. the wide character text/xml archives use UTF-8. They do this by creating a stream with the uft_codecvt_facet. I used this factet, it worked great and I moved on. So you're way ahead of me on this. This would probably be easy to address in the xml_iarchive code or perhaps the xml_grammar - but, as I said, I don't know anything about it. Robert Ramey Tan, Tom (Shanghai) wrote:
what is BOM?
Probably "Byte Order Mark", see http://en.wikipedia.org/wiki/Byte-order_mark
Yes, That's what I meant.
I was testing the demo_xml_load.cpp and demo_xml_save.cpp available in the boost.serialization example. By simply opening demo_save.xml produced by demo_xml_save.exe with XML copy editor(http://xml-copy-editor.sourceforge.net/) and saving it back, demo_xml_load.exe would crash. I compared the two files with Winmerge. It said it's identical.
by studying the hex view, I later found it's because the 3-byte UTF-8 BOM was inserted to the beginning of file. It would not change the data, and in many cases was ignored by the text editors.
I thinking that Boost.serialization should also handle this for all text files including XML.
Tom
participants (2)
-
Robert Ramey
-
Tan, Tom (Shanghai)