On 14/08/15 23:47, Tom Kent wrote:
Recently there was a thread that ended up changing the boost guidelines so that Unicode characters are now allowed in C++ source files. http://lists.boost.org/Archives/boost/2015/06/223822.php
However, in the 1.59 release, there was a filename that had unicode characters in it: libs\preprocessor\doc\Appendix A An Introduction to Preprocessor Metaprogramming.html. Which, HTML encoded, actually looks like: Appendix%20A%20%C2%A0%20An%20Introduction. Note the %C2%A0 character (Hex C2A0, Octal: 302240, Windows displays:  )?
This is UTF-8 for U+00A0 NO-BREAK SPACE. You're wrongly interpreting that data as Windows-1252, hence the gibberish.
Since this seems like a mistake, I've created a pull request for this in pre-processor. However, it begs the question:
Should we support unicode codepoints for filenames in the boost distribution?
Not for code obviously, but for files that are automatically generated based on the content of other files, like documentation, I don't see a problem.