RE: [boost] RE: [program_options] Unicode support

Hi Volodya,
From: Vladimir Prus [mailto:ghost@cs.msu.su]
I would make program_options support an imbue() function that allows a locale to be specified (otherwise use the default locale)
Why bother? As I've indicated, internal processing will work just fine with UTF-8, and no interface of the library will expose a UTF-8 encoded string.
Yes - that's it. There can be a locale en_US.UTF-8, and then the application must accept it. If it does not, it has conversion support, imbuable or not. Application should not require the library to have a unified encoding of char * on all environments; that's not STL style.
imbue() is a good thing for specifying what encoding char** input has, but it's orthogonal to the rest of the library -- it's used only on interface boundary.
Yes, that's also my opinion.
This provides much more flexibility than just supporting UTF-8. UTF-8 is a really impractical encoding for almost any locale where the majority of text is not ASCII like and the user may well prefer to encode text is Shift-JIS or other encodings.
Again, in my case the user does not use UTF-8 string, so why would he care how the strings are encoded internally?
Yes again, now it seems to me we try to persuade each other about the same thing :-) Ferda
- Volodya

Ferdinand Prantl wrote:
Why bother? As I've indicated, internal processing will work just fine with UTF-8, and no interface of the library will expose a UTF-8 encoded string.
Yes - that's it. There can be a locale en_US.UTF-8, and then the application must accept it. If it does not, it has conversion support, imbuable or not.
Did you mean "if it does", i.e. without "not"?
Application should not require the library to have a unified encoding of char * on all environments; that's not STL style.
So, the encoding of char/string on interface boundary is determined by locale/codecvt?
This provides much more flexibility than just supporting UTF-8. UTF-8 is a really impractical encoding for almost any locale where the majority of text is not ASCII like and the user may well prefer to encode text is Shift-JIS or other encodings.
Again, in my case the user does not use UTF-8 string, so why would he care how the strings are encoded internally?
Yes again, now it seems to me we try to persuade each other about the same thing :-)
Oh, that's good :-) Thanks, Volodya
participants (2)
-
Ferdinand Prantl
-
Vladimir Prus