Am 16.08.2022 um 05:15 schrieb Gavin Lambert via Boost:
On 16/08/2022 11:53, Vinnie Falco wrote:
My experiences with std::filesystem and boost::filesystem have been nothing but negative. I think that the decision to make the character type different on Windows was a mistake. The need for locales and imbuements and global state and... really, it is just giving me a big headache.
Using wchar_t on Windows is actually the least painful option. (And you don't have to worry about locales and imbuements etc if you never try to convert to not-wchar_t.)
For correct behaviour, you *must* only use the W variants of the native API methods, or wchar_t methods of standard library functions.
Inevitably, everything in the standard library that accepts 'char' params assumes that these are encoded in the ANSI code page, not UTF-8. This can't be "fixed" or it breaks all the legacy apps.
In practice, this means that unless you can absolutely guarantee that your paths only contain pure ASCII (and the instant you accept a path or filename from the user, you lose), it is *never* safe to use any of the non-wide library methods.
You *can* (and many do) store paths in other libraries and in the application in 'char'-encoded-as-UTF-8, but then you have to remember every single time you hit the standard library or direct WinAPI boundaries to convert your strings to wide before passing them across, or hilarity will ensue (without even a convenient compiler error).
Storing paths as wchar_t in the first place both avoids the cost of converting back and forth and potential corruption (often overlooked, unless you regularly test with unicode paths) from accidentally forgetting a conversion.
(where is the signature of fopen that accepts a filesystem::path?)
Why are you using fopen in C++ in the first place?
Filesystem does provide 'path' overloads for fstreams, which you should have been using instead anyway.
It should be utf-8 only, use Plain Old char (even on Windows), it should be completely portable, except that it requires that directories are possible and that the filesystem isn't weird (I don't really care about compatibility with grandpa's EPROMs that can hold 9-bit flat files).
In theory, the standard library (and other wrapper libraries around the WinAPI, including Filesystem) could start doing more sane things by using the C++20 'char8_t'/'u8string' types to disambiguate between UTF-8 encoded paths and legacy idkwtf-'char'-encoded paths. But this will take a very long time to percolate through the ecosystem, especially as there are a bunch of people who hate the very idea of it. And it doesn't solve the conversion performance angle.
(Hopefully, Windows will eventually provide char8_t entrypoints and APIs, which will make it easier to interoperate with not-Windows.)
Although as Emil has already pointed out, it's valid in not-Windows to have arbitrary not-UTF-8 byte sequences in paths, so you can get into trouble in that direction as well.
That's another reason for using wchar_t in Windows and char in not-Windows: no conversions happen at all (at least where values are accepted natively from the OS), which has maximal compatibility for otherwise-invalid byte sequences that nevertheless exist.
Amen brother, you speak wisely! I want to add the following to stay sane on Windows: ensure that *both* the wide and the narrow execution character encoding is Unicode (i.e. UTF-16 for wchar_t (that's the default) and UTF-8 for char), build with _UNICODE defined, and link with <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>. This guarantees consistent semantics throughout the *whole* execution of the program on reasonably recent versions of Windows. And lastly, represent paths with std/boost filesystem paths and use APIs that know how to deal with them *correctly*. Similar advise applies to POSIX systems. UTF-8 everywhere is just a recommendation but no guarantee. Dani