Gavin Lambert wrote:
On 8/01/2020 14:43, Peter Dimov wrote:
Yes, concatenating two character sequences can result in technically invalid WTF-8. But that's not an issue unique to Windows. You can do the same on any non-Windows platform. It's still not clear how this prevents a `path` class from storing ~WTF-8 on Windows, or exposing a char-based API that ~WTF-8 decodes when passing to Windows, and encodes on the reverse trip.
It could. And if you're only round-tripping it to file APIs and doing nothing else, then you can probably get away with that.
But there's probably other code that wants to do manipulation on the path (swapping extensions, passing to some UI, truncating the filename to 10 characters, etc). Now there's more parts of the system that needs to know you have data in not-legal-WTF-8 format, and how to deal with that.
No, there aren't any (new) problems with that. That is, there aren't problems you wouldn't have otherwise, on other platforms. Vanilla POSIX can have any NTBS at all as a path/file name; macOS has UTF-8 NFD paths/file names. Any code you have that tries to truncate the filename to 10 characters (for whatever definition of character) is already broken. This is simply not an operation that can be done portably on a path or file name. (And any code that assumes that a file name will roundtrip, or that two different file names can't refer to the same file/directory entry, is also broken.)