
Hello again, My previous mail was ignored by the community, and I would like to know why. If it wasn't clear, I want to hear your opinion on the topic. If there is a disagreement, I would like to know what is the reason for the disagreement. If there are problems in the proposal, perhaps we can fix them and come to a solution accepted by all. If you agree in principle but just don't have the resources for this work, I'm going to do this work (or part of it). I just don't want to waste my time on something that is certainly going to be rejected. Thank you in advance, -- Yakov Galka On Tue, Jul 5, 2011 at 19:25, Yakov Galka <ybungalobill@gmail.com> wrote:
Hello All,
About half a year ago there was a long discussion titled "Always treat std::strings as UTF-8". The only objection to the proposal was that making an instant switch by assuming UTF-8 by default will give surprising results to those who're unaware of the convention (or prefer using legacy encodings instead of UTF-8). This applies almost only to Windows developers. However, there are already many projects and organizations that switched to UTF-8 even for Windows programming. The company I work in is one of them.
Nowadays: ==========
All the libraries that accept narrow strings assume the system encoding by default. * filesystem::path — Can be configured through static imbue() function. * system_error_category (windows error description), interprocess (object names)... more? — Don't support Unicode at all. They use the narrow API on Windows. * program_options — Assumes UTF-8 for internal data (Good!), but uses system encoding for paths (parse_config_file) and for environment variables (Bad...) .
Note that, e.g. path::imbue(), is a painful solution for two reasons: Any global state initialization is problematic in dynamically-linked, multi-threaded systems (like the one I'm maintaining now). In such cases a compile time configuration is more attractive. I really don't want to have such a function in each boost library (can be solved by having a global boost::imbue though).
Proposal: ========
Add a compile-time configuration flag that causes boost to treat all narrow strings as UTF-8. The flag will be off by default. For example, in filesystem it's a matter of setting `codepage` to CP_UTF8 in just two places.
Rationale: ==========
Those who are ready to move to the UTF-8 future, they can do it by simply setting a compilation flag.. Those who don't care about Unicode correctness are not affected by the addition. There won't be any complaints to boost, like: "Hey! I use boost with these libraries and it doesn't work. Your encoding is wrong!".
-- Yakov Galka