[boost] Re: Re: Re: [program_options] Unicode support

7 Apr 2004


      Pavol Droba wrote:
...
...
Ok, let me rephrase. You're writing boost::http_proxy library and want to
make it customizable via program_options. So you need to provide function
'get_options_descriptions'. What will the function return? If there's
only one options_descriptions class, there's no question. If there are
two versions, then which one do you return? No matter what you decide,
the main application might need to do conversions just because it either
needs unicode or does not need it.
Well, the http library have two options. Either it can be char_type
independent or it would simply accept only char* variants. Given the case
of http library, later will be probably the case because it is quite
domains specific library.
Who knows? If http library allows to make POST request, then it needs to
accept unicode string for the request data.
...
I see that we are generaly arguing, whether program_options library domain
is generic enough to support natively char and wchar_t (and be templated)
or if it is enough to provide an interface via conversions and support
only one encoding internaly.
Actually, the question I'm trying to answer is somewhat different. Using two
version will inevitable increase code size for library or client
applications. It will also somewhat complicate implementation. Using one
version will decrease performance. I believe that the decrease in
performance won't be noticed by the users so single version is better, and
would like to know if there are issues I've missed.
...
I'm in favor of the first approach.
The library works with various sources of informations and its purpose is
to restructure the information from these sources into something more
usable. I would assume for such a utility, that information passed on
input has the same encoding and format as the information on the output.
Sometimes, information on the output is not string, but just 'int', so it
has no encoding. In case where the information on the output is string, I
plan that it will have the same encoding as was passed on the input.
...
From the nature of the library it seems, that it might be possible to
avoid unnecessary conversion into some intermediate encoding.
Is there anything wrong about conversion, except for speed?
...
Another association might be a container. The library is a kind container.
It parses the input and provides a conainer-like interface for the
information stored there. I find it natural, that the container uses the
same encoding for its internals as it provides in the external interface.
The problem with this analogy is that variables_map is heterogenous
container: it stores values of different types. So, if it can store values
of both std::string and std::wstring it appears that you need some
conversion.
...
...
And why an existing operator>> which works for istream only should be
fixed to support wistream, if some other option need unicode support?
I don't understand this point.
You have 'class Font' and operator>> which works for istream only. However
you try to declare option of this type in options_description<wchar_t>.
This should cause instantination of some code which extracts 'Font' from
wistream, and there's no suitable operator>>.
...
...
Some of the conversions are unavoidable. E.g. if you have unicode-enabled
library, you'd still need to accept ascii input (because you can't expect
that all input sources are unicode -- main in Linux is never unicode).
If you want to support legacy operator>> you'd need conversion to ascii.
I'm not a linux expert. I'm mainly working on windows. If I decide to use
unicode, I have whole api in the unicode without any need for conversions.
Actualy in the project I'm working on now, I encountered a need for
conversion only once. I'm using date_time library and there was no support
for the wide strings at the time. Fortuntely it is fixed now :)
In fact, I would not be surprised if char* functions in windows just convert
input into unicode and call wide versions ;-)

- Volodya

[boost] Re: Re: Re: [program_options] Unicode support

Vladimir Prus