
Hi, On Thu, Jan 13, 2011 at 8:21 PM, Artyom <artyomtnk@yahoo.com> wrote:
Hello All,
I wanted to talk about it for a loooooong time. however never got there.
-------------------------------------------------
Proposal Summary: ===================
- We need to treat std::string, char const * as UTF-8 strings on Windows and drop a support of so called ANSI API.
- Optuional but recommended:
Deprecate wide strings as unportable API.
Fully agree. Two years ago I would very probably be advocating some kind of TCHAR/wxChar/QChar/whatever-like character type switching, but since then I've spent a lot of time developing portable GUI applications and found out the hard way that it is better to dump all the ANSI CPXXXX / UTF-XY encodings and stick to UTF-8 and defer the conversion to whatever the native API uses until you make the actual call. a) UTF-16 in principle is ok but many implementations are not:
http://stackoverflow.com/questions/1049947/should-utf-16-be-considered-harmf...
b) UTF-32 is basically a waste of memory for most localizations.
[snip]
Suggestion: ===========
Char Strings ------------
- Under POSIX platform:
Treat them as byte sequences with current locale, by default assume that they are UTF-8 as:
a) Default Locale on most OSs is UTF-8 locale b) POSIX API does not care about encodings Even if the locale is not UTF-8 you still can do anything right as
- Under Windows platform:
a) Treat them as UTF-8 strings, convert them to UTF-16 just before accessing system services. b) Never use ANSI API always use Wide API. It is anyway default internal encoding.
Wide String: ------------
- Deprecate them, unless you have something tied to Windows system API.
+1, IMO having two APIs that are not seamlesly interchangeble in the code (at least with the macro trickery) is useless. [snip]
What problem this would solve for us? =====================================
1. All standard API support Unicode naturally as it supposed to be.
- Want to open boost::filesystem::fstream? - Want to pass parameters to other process? - Want to display message? - Want to read XML or JSON?
All works with Unicode by default because:
a) It is Unicode by default on Unix b) Because they are mapped to wide API on Windows.
2. Portable program should no longer worry about setting standard locale facets, etc.
The program becomes much more portable.
3. Fewer bugs related to Unicode handling.
Artyom
+1, but from my experience it is easier to say than to do. My knowledge of Unicode and utf-8 is little more than superficial and I didn't do a lot of char-by-char manipulation, but to do what you are proposing we need at least some straightforward (and efficient) way to convert the native strings to the required encoding at the call site. I'm not trying to nitpick on anyones implementation of a Unicode library here but having to instantiate ~10 transcoding-related classed just to call ShellExecuteW is not my idea of straightforward. :) [snip] BR, Matus