Unicode characters in filenames

Recently there was a thread that ended up changing the boost guidelines so that Unicode characters are now allowed in C++ source files. http://lists.boost.org/Archives/boost/2015/06/223822.php However, in the 1.59 release, there was a filename that had unicode characters in it: libs\preprocessor\doc\Appendix A An Introduction to Preprocessor Metaprogramming.html. Which, HTML encoded, actually looks like: Appendix%20A%20%C2%A0%20An%20Introduction. Note the %C2%A0 character (Hex C2A0, Octal: 302240, Windows displays:  )? Since this seems like a mistake, I've created a pull request for this in pre-processor. However, it begs the question: Should we support unicode codepoints for filenames in the boost distribution? I would like for this answer to be 'no' as there are still lots of tools out there that don't correctly handle unicode filenames. However, it is worth bringing up the discussion. Is there a reason we would want unicode file names? I would guess that tests uses them (especially the filesystem tests), however I would also expect that these tests generate the files on the fly, and that they aren't part of what is distributed. Thoughts? Tom Kent

On 14/08/15 23:47, Tom Kent wrote:
This is UTF-8 for U+00A0 NO-BREAK SPACE. You're wrongly interpreting that data as Windows-1252, hence the gibberish.
Not for code obviously, but for files that are automatically generated based on the content of other files, like documentation, I don't see a problem.

On 8/17/2015 3:21 PM, Mathias Gaunard wrote:
What happened is that I grabbed the file, Appendix A to the Boost MPL book as an html page, which used to be hosted at Boostpro, from the wayback machine Internet archive. Then I massaged it a bit to remove all the wayback machine cruft, but I think it had the Unicode in it once I had grabbed it. After that I copied the title from that page to links in two other preprocessor pages, thus propagating the Unicode. The GUI HTML editor I was using never flagged the Unicode as anything unusual so I really didn't see it. I have applied Tom Kent's the PR to 'develop' and will no doubt merge it to 'master' fairly shortly. The preprocessor docs were all written by Paul directly as HTML and it would be too much work at this point to change it to quickbook, although I love the latter.

propagating
I've looked at this and I don't think it would take that much time for me to convert to Quickbook - but I'm not sure of the benefit apart from having a familiar look'n'feel - unless we wanted to change things significantly. It looks good and very comprehensive. Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830

On 18 August 2015 at 23:56, Edward Diener wrote:
The file still has lots of ASCII spaces in the name: doc/Appendix A - An Introduction to Preprocessor Metaprogramming.html This breaks Fedora packaging because we do: find $docdir ... | xargs install -p -m 644 -t $docpath and xargs splits on spaces. I know I can fix it with -print0 and -0, but for the sake of KISS could it be just appendix.html, or intro.html or something?
participants (5)
-
Edward Diener
-
Jonathan Wakely
-
Mathias Gaunard
-
Paul A. Bristow
-
Tom Kent