
On Fri, May 16, 2008 at 11:48:30AM +0200, Matus Chochlik wrote:
On Fri, May 16, 2008 at 11:22 AM, Jens Seidel <jensseidel@users.sf.net> wrote:
Stupid question: Do you really use the UTF-16 Unicode encoding on Linux? I now about some classical Asian 16bit encodings but these days UTF-8 (which is compatible with char*) is used everywhere on Linux ...
... but, couldn't be this issue solved by defining a portable equivalent of TCHAR type which is consistently used by WINAPI and the real char type is switched there at compile time by the means or the "UNICODE" PP symbol ?
No, I don't think so. First beside the type you also have to support initialisations and access to the type. As far as I know (really never used wchar_t) it is: const char *text = "Hi world" and const wchar_t *text = L"Hi world" How do you want to know whether you need "L" if you just have a new type? What about functions/methods which do not exist for both types? You would always have to write #ifdef ... #else ... #end Can not even UTF-8 data be stored in wchar_t (first byte is always zero)? I think support both types together with different encodings in one program is just asking for trouble. Use a fixed encoding and one of char or wchar_t accross your whole program and you simplify your code a lot. Together with wrappers which convert your data e.g. from UTF-16 wchar_t to UTF-8 char after calling string functions on Win* you may have a slowdown but also a compatible program.
TCHAR is wchar_t or char depending on whether UNICODE is or
UNICODE is a very bad name! The size of the type (char, wchar_t) could depend on the encoding (UTF-8, UTF-16, ...), not the character set (Unicode)!
isn't defined. Boost library functions would use this *boost-char-type* (whatever it's name would be) instead of char or wchar_t, where applicable.
On Windows this allows to use the same WIN32 "functions" with both character types and allows an application (when coded properly) to be compiled with both character types without the need of messing with the code.
I'm sorely missing something like this in the C++ standard or at least in Boost and I think I'm not the only one.
Please use instead a proper string class which is aware of it's encoding and just transfers it on need. This avoids really any problems and is portable. See e.g. Qt's QString class: http://doc.trolltech.com/4.4/qstring.html Jens