Let me rephrase.
The user gets a path from an external source and passes it to Nowide, for example, to nowide::fopen.
Nowide does no validation on POSIX and strict UTF-8 validation on Windows.
Why is the Windows-only strict validation a good thing?
What attacks are prevented by not accepting WTF-8 in nowide::fopen ONLY under Windows, and passing everything through unvalidated on POSIX?
Ok... This is a very good question. On windows I need to convert for obvious reason. Now the question is what I accept as valid and what is not valid and where do I draw the line. --------------------------------------------------------------------------------------------------------------------------- Now as you have seen there are many possible "non-standard" UTF-8 variants. What should I accept? Should I accept CESU-8 (non BMP encoded as 2 pairs of 3 "UTF-8" like bytes) and UTF-8 Should I accept WTF-8 only non-paired surrogates? What if these surrogates can be combined into correct UTF-16, is it valid? (In fact concatenation of WTF-8 strings isn't trivial operation as simple string + string does not work and lead to invalid WTF-16) Should I accept modified UTF-8 - it was already asked (i.e. values encoded without shortest sequence) for example should I accept "x" encoded in two bytes? What about "."? How should I treat stuff like "\xFF.txt" <- invalid UTF-8? Should I convert it to L"\xFF.txt"? Should I convert it to some sort of WTF-16 to preserve the string? Now despite what it may look from the discussion WTF-8 is far from being "standard" for representing invalid UTF-16. Some may substitute it with "?" others with U+FFFD, others just remove one. I actually tested some cases for real "bad" file names by different system and each did something different. I don't think using WTF-8 is some widely used industry standard it is just one of many variants to create UTF-8 extension. But there MUST be clear line of what is accepted and what is not and the safest and most standard line to draw is well defined UTF-8 and UTF-16 that are (a) 100% convertible one from other (b) widely used accepted standards. So that was my decision - based on safety and standards (and there is no such thing as non strict UTF-8/16) Does it fits everybody? Of course not! It there some line that fits everybody? There is no such thing! Does it fits common case for vast majority of users/developers? IMHO yes. So as a policy I decided to use UTF-8 and UTF-16 as selection of encoding for each sides of widen/narrow is required.
If the program originates on Windows and as a result comes to rely on Nowide's strict validation, and is later ported to POSIX, are not the users left with a false sense of security?
You have valid point. But adding validation on POSIX systems in general will be wrong (as I noted in Q&A) because there is no single encoding for POSIX OS - it is runtime parameter - unlike Windows API that has Wide UTF-16 API so such a validation on Linux/POSIX will likely cause more issues than solve. Thanks, Artyom