[boost] [General] Treat narrow strings as UTF-8 (compilation flag)

5 Jul 2011

      Hello All,

About half a year ago there was a long discussion titled "Always treat
std::strings as UTF-8". The only objection to the proposal was that making
an instant switch by assuming UTF-8 by default will give surprising results
to those who're unaware of the convention (or prefer using legacy encodings
instead of UTF-8). This applies almost only to Windows developers. However,
there are already many projects and organizations that switched to UTF-8
even for Windows programming. The company I work in is one of them.

Nowadays:
==========

All the libraries that accept narrow strings assume the system encoding by
default.
* filesystem::path — Can be configured through static imbue() function.
* system_error_category (windows error description), interprocess (object
names)... more? — Don't support Unicode at all. They use the narrow API on
Windows.
* program_options — Assumes UTF-8 for internal data (Good!), but uses system
encoding for paths (parse_config_file) and for environment variables
(Bad...) .

Note that, e.g. path::imbue(), is a painful solution for two reasons:
Any global state initialization is problematic in dynamically-linked,
multi-threaded systems (like the one I'm maintaining now). In such cases a
compile time configuration is more attractive.
I really don't want to have such a function in each boost library (can be
solved by having a global boost::imbue though).

Proposal:
========

Add a compile-time configuration flag that causes boost to treat all narrow
strings as UTF-8. The flag will be off by default.
For example, in filesystem it's a matter of setting `codepage` to CP_UTF8 in
just two places.

Rationale:
==========

Those who are ready to move to the UTF-8 future, they can do it by simply
setting a compilation flag..
Those who don't care about Unicode correctness are not affected by the
addition. There won't be any complaints to boost, like: "Hey! I use boost
with these libraries and it doesn't work. Your encoding is wrong!".

-- 
Yakov Galka