Gavin Lambert wrote:
The main problem though is that once you start allowing transcoding of any kind, it's a slippery slope to other conversions that can make lossy changes (such as applying different canonicalisation formats, or adding/removing layout codepoints such as RTL markers).
There's no such slippery slope, no canonicalization, no adding or removing anything. You just WTF-8 encode whatever Windows gives you, and WTF-8 decode the path before passing it to Windows.
Also, if you read the WTF-8 spec, it notes that it is not legal to directly concatenate two WTF-8 strings (you either have to convert it back to UCS-16 first, or execute some special handling for the trailing characters of the first string), which immediately renders it a poor choice for a path storage format.
Do you have a specific example in which concatenation won't work for the use outlined above? Because I can't think of any.