> I'd like to run a test here to be sure; could you please supply an
example
> of <internationalized chars> that is causing the problem?
A filename containing a beta character “â” causes
GetFileAttributesA to fail to find the file specified on my system. It’s
possible that if my codepage was different this would not be a problem.
> I'm not at all surprised you are having problems; it isn't at all
clear to
> me that it is possible to do reliable processing on Windows using the A
> (narrow) variants if Unicode characters are present.
In fact I've discovered after experimentation with the alledged MBCS support on
windows that it really is skin deep. I’ve not been able to get any char*
function to take a UTF-8 string! Even those that claim to be affected by the
current multi-byte codepage. Anyone who has managed this – please feel
free to correct me! I’ve drilled down into the Rtl routines which seem to
deal with multi-byte characters but there just doesn’t seem to be away to
affect the bits that matter.
> I've got the internationalized revision of the Filesystem library
running
> on Windows; it seems to be handling wide characters with ease.
I’ve done the same thing myself (I guess more or less the same
thing). I initially attempted it without changing the API by converting the
wide result to UTF-8, mostly to assist in keeping common code base with our other
platforms (Mac,Sparc,Linux, etc - some of which are wide-character challenged) before
realizing the full extent of the MBCS issues outlined above. After all – if
I can’t construct a std::fstream with a UTF-8 argument…
> One fix for your problem is to switch to the internationalized
version and
> use wchar_t based paths. That will use the W Windows variants, and will
> also work well on POSIX systems with internationalized file or directory
> names.
I’d be happy to try that version (and swap it for my own) – where can
I find it? My own hacked version is switch-able, even for windows so that Win95
is supported without installing the Unicode layer (sigh).
> Another possibility is that we can work on the narrow functions to
make
> them function better in the face of internationalized names. But I'd need
a
> lot of help on that to develop test cases and strategies to handle them.
Without the underlying ANSI API variants not supporting UTF-8, I just can’t
see how this example could ever be portable in the presence of Unicode without
wide chars:
std::ofstream file( boost::filesystem::current_path().native_file_string().c_str()
);
It will finally call down to the the ANSI function CreateFileA –
which just won’t be able to deal with it.
Then there is the performance penalty of ANSI<->Unicode conversion
under Windows NT.
Thanks for your time (and your work!)
Mark
P.S. One other minor suggestion whilst I’m in the filesystem area
(and I really don’t want to (re)open a can of worms here) is to be able
to choose the default path check at compile time.