[serialization][documentation] character encoding oddity

Hi, http://boost.org/libs/serialization/doc/contents.html shows three strange characters at the top of the page (). Apparently, those characters have the values EF BB BF in hexadecimal notation. Interpreted as UTF-8 sequence, they encode the Unicode codepoint FFFE. This codepoint is used as a marker for the endianness in encodings other than UTF-8. I doubt such a marker would be legal in or would make sense for an UTF-8 encoded document. Also, the document's content-type meta-tag claims the document uses ISO-8859-1. I suspect there's a problem in the toolchain used to create the document (oherwise, I would have fixed it in the repository). Regards, m Send instant messages to your online friends http://au.messenger.yahoo.com

Martin Wille wrote:
Hi,
http://boost.org/libs/serialization/doc/contents.html shows three strange characters at the top of the page ().
Apparently, those characters have the values EF BB BF in hexadecimal notation. Interpreted as UTF-8 sequence, they encode the Unicode codepoint FFFE. This codepoint is used as a marker for the endianness in encodings other than UTF-8. I doubt such a marker would be legal in or would make sense for an UTF-8 encoded document.
I believe that the BOM (byte order mark) is legal in all Unicode encodings, including UTF-8.

Peter Dimov wrote:
Martin Wille wrote:
[...]
I doubt such a marker would be legal
in or would make sense for an UTF-8 encoded document.
I believe that the BOM (byte order mark) is legal in all Unicode encodings, including UTF-8.
Apparently, it is legal. However, it doesn't carry any useful information and not all software is able to deal with the BOM in UTF-8. See http://www.unicode.org/faq/utf_bom.html#29 Regards, m Send instant messages to your online friends http://au.messenger.yahoo.com

Martin Wille wrote: <snip>
Apparently, it is legal. However, it doesn't carry any useful information and not all software is able to deal with the BOM in UTF-8.
I believe it is intended to signal the text is UTF encoded, rather than, say, ISO-8859-15. And most of the software that can't deal with BOM's aren't too happy with UTF anyway. In Firefox choosing UTF-8 encoding explicitly makes it disappear, It should automatically figure it out, but obviously the meta tag is discouraging it. -- don't quote this

In article <00b901c5f137$be02d880$6401a8c0@pdimov2>, "Peter Dimov" <pdimov@mmltd.net> wrote:
Martin Wille wrote:
http://boost.org/libs/serialization/doc/contents.html shows three strange characters at the top of the page ().
I believe that the BOM (byte order mark) is legal in all Unicode encodings, including UTF-8.
It is. Ben -- I changed my name: <http://periodic-kingdom.org/People/NameChange.php>

Martin Wille wrote:
Hi,
http://boost.org/libs/serialization/doc/contents.html shows three strange characters at the top of the page ().
Apparently, those characters have the values EF BB BF in hexadecimal notation. Interpreted as UTF-8 sequence, they encode the Unicode codepoint FFFE. This codepoint is used as a marker for the endianness in encodings other than UTF-8. I doubt such a marker would be legal in or would make sense for an UTF-8 encoded document. Also, the document's content-type meta-tag claims the document uses ISO-8859-1.
I suspect there's a problem in the toolchain used to create the document (oherwise, I would have fixed it in the repository).
FYI - the tool chain used is Windows Notepad editor with the file saved as utf-8 Robert Ramey
participants (6)
-
Ben Artin
-
Martin Wille
-
Peter Dimov
-
Robert Ramey
-
Simon Buchan
-
Vladimir Prus