
Hi, I've been stuck for the past month working on a Win32 i18n project that seems it will never end. I don't have much background in this area, but I can answer a question or two.
First is that if you have single path that stores unicode, then
exists(path("foo"))
will perform char -> wchar_t conversion inside path constructor, and that conversion might be not exactly the same that OS would have performed. One issues is that program might not have initialized global locale with locale(""). Another is that conversion performed by OS might be different then those of locale("").
So, one thing I know is that Windows 9x and NT-class systems behave differently in this respect. You're probably aware that NT-class systems traffic in wchar_t* encoded in UCS-2 internally, and that 9x-class systems deal with char* encoded in the system's ANSI codepage. Additionally, on Win2k/XP you have the ability to set a thread's ANSI codepage separately from the system's ANSI codepage. So, I'm 100% positive about this, but I believe an example of where locale() will differ from what Windows wants is the following case: - The OS is Win2k/XP, which stores strings as UCS-2, - The system and thread codepages differ, - You initialize a path("foo") requiring a conversion up to UCS-2. I think in this case, locale() won't give you what you want. I'm no expert on this, though, so it's worth checking.
path p("a"), p2(L"b"); p /= p2; // must do conversion, might not do what's desired
I think this is important to get right. Having path and wpath distinct from each other, and forcing explicit conversion, seems like exposing a choice to users in the interface that's entirely orthogonal to filesystem manipulation. My apologies if this has already been discussed ad nauseam, but it seems to me like the "do the right thing" string conversions should be encapsulated in a different library.
Also I note that there's no conversion from basic_path<char> to basic_path<wchar_t> or vice versa, as far as I can say. To recall my argument for conversion: say I have a library which exposes paths in the interface, should I use path or wpath in it?
What seems to be common practice on Windows is something like this: typedef std::basic_string<TCHAR> tstring; where TCHAR is a macro which expands either to "char" or "wchar_t" depending on whether _UNICODE is defined. This tends to be clumsy, in my opinion. I fear the same practice would be adopted for basic_path<>. Cheers, dr