Re: [boost] File System - unicode support

16 May 2008

      On Fri, May 16, 2008 at 11:48:30AM +0200, Matus Chochlik wrote:
...
On Fri, May 16, 2008 at 11:22 AM, Jens Seidel <jensseidel@users.sf.net> wrote:
...
Stupid question: Do you really use the UTF-16 Unicode encoding
on Linux? I now about some classical Asian 16bit encodings but
these days UTF-8 (which is compatible with char*) is used
everywhere on Linux ...
...
... but, couldn't be this issue solved by defining a portable
equivalent of TCHAR type which is consistently used by
WINAPI and the real char type is switched there
at compile time by the means or the "UNICODE"
PP symbol ?
No, I don't think so. First beside the type you also have to
support initialisations and access to the type. As far as I know
(really never used wchar_t) it is:
const char *text = "Hi world" and
const wchar_t *text = L"Hi world"

How do you want to know whether you need "L" if you just have
a new type?

What about functions/methods which do not exist for both types?
You would always have to write #ifdef ... #else ... #end

Can not even UTF-8 data be stored in wchar_t (first byte is
always zero)? I think support both types together with
different encodings in one program is just asking for trouble.
Use a fixed encoding and one of char or wchar_t accross your whole
program and you simplify your code a lot. Together with wrappers
which convert your data e.g. from UTF-16 wchar_t to UTF-8 char
after calling string functions on Win* you may have a slowdown
but also a compatible program.
...
TCHAR is wchar_t or char depending on whether UNICODE is or
UNICODE is a very bad name! The size of the type (char, wchar_t) could
depend on the encoding (UTF-8, UTF-16, ...), not the character set
(Unicode)!
...
isn't defined. Boost library functions would use this
*boost-char-type* (whatever it's name would be)
instead of char or wchar_t, where applicable.
On Windows this allows to use the same WIN32 "functions"
with both character types and allows an application
(when coded properly) to be compiled with both character types
without the need of messing with the code.
I'm sorely missing something like this in the C++ standard or
at least in Boost and I think I'm not the only one.
Please use instead a proper string class which is
aware of it's encoding and just transfers it on need. This avoids
really any problems and is portable. See e.g. Qt's QString class:
http://doc.trolltech.com/4.4/qstring.html

Jens