Re: [boost] [program_options] Proposal: self-contained, header-only port of Boost Program Options library

17 Sep 2019

      On 17.09.19 08:32, Gavin Lambert via Boost wrote:
...
* On Unixes, argv contains whatever byte sequence the shell/caller put 
there.  This might be the actual filename on disk (if they used tab 
completion) or it might be something subtly different (if they typed it 
themselves using some kind of IME), or even a binary blob.  In the first 
two cases, while it is fairly *likely* to be UTF-8 (especially in modern 
systems), it is not guaranteed to be -- the user could be running a 
non-UTF-8 locale, or be accessing a filesystem created by someone who 
was.
Or the user could be running a non-UTF-8 locale, but accessing a 
filesystem created by somebody who was using UTF-8 - in which case any 
filenames should be in UTF-8, even if the user's locale disagrees.

It is because of this last possibility that I recommend treating all 
command-line arguments as UTF-8 on Unix systems, even if running a 
non-UTF-8 locale, for all cases where treating them as binary blobs is 
impractical.  Unix filenames are binary blobs, but the de-facto standard 
for interpreting these binary blobs as text is to use UTF-8.  How can 
two users, running two different locales, share a filesystem?  By using 
UTF-8 for all filenames, regardless of locale.  How should a program 
convert command-line arguments into UTF-8 filenames?  By assuming that 
they are already in UTF-8, because performing any kind of conversion 
will cause more problems than it will fix.

-- 
Rainer Deyke (rainerd@eldwood.com)