On 4/12/2019 05:25, Yakov Galka wrote:
On Mon, Nov 11, 2019 at 4:19 AM Alexander Grund wrote: I raised this issue many years ago. In fact boost filesystem v2 was better in this respect, because it followed the established convention of having a templated basic_path<char>, thus not committing to a specific char type. Alas, v2 was deprecated and v3 was lobbied into WG21 for standardization. It was an unprecedented case of introducing a "char on some platforms, wchar_t on others" interface into the standard, which is a bad decision from portability stand point.
While I agree in principle, the simple fact is that performing string transcoding on filesystem paths is a Very Bad Idea™, since both Windows and Linux treat them as opaque byte sequences -- but Windows' native encoding is UTF-16 and Linux' is (mostly) UTF-8. So, while unfortunate, v3 made the correct choice. Paths have to be kept in their original encoding between original source (command line, file, or UI) and file API usage, otherwise you can get weird errors when transcoding produces a different byte sequence that appears identical when actually rendered, but doesn't match the filesystem. Transcoding is only safe when you're going to do something with the string other than using it in a file API.
While we are at it, I would like to say that boost filesystem should have never introduced a path class in the first place. filesystem::path is just a glorified string with no extra invariants. Any string -> path conversion copies the data, even if it's already in the right encoding, even on operating systems that don't need any conversions at all. There goes your "don't pay for what you don't use" principle. Most can agree that C++'s spirit is to separate containers from algorithms. A proper design would introduce path manipulation functions that work on any string types, and let users use std::string or even char[] for storage.
While copying is unfortunate, these things are rarely on a performance-critical path, and the benefits of having consistent compose/decompose operations on paths vastly outweighs that, in my opinion. Combined with the need to maintain native encoding for paths, separated algorithms don't seem particularly useful -- just less convenient to use.