Re: [boost] Re: Re: [program_options] Unicode support

6 Apr 2004

      Hi,

On Tue, Apr 06, 2004 at 06:29:54PM +0400, Vladimir Prus wrote:

[snip]
...
...
This argument is quite questionable. IMHO either you stick with narrow, or
wide characters in whoule application. Otherwise you are forced to make
conversions on the border lines. I don't realy see a point in the mixed
type approach.
Ok, let me rephrase. You're writing boost::http_proxy library and want to
make it customizable via program_options. So you need to provide function
'get_options_descriptions'. What will the function return? If there's only
one options_descriptions class, there's no question. If there are two
versions, then which one do you return? No matter what you decide, the main
application might need to do conversions just because it either needs
unicode or does not need it.
Well, the http library have two options. Either it can be char_type independent
or it would simply accept only char* variants. Given the case of http library,
later will be probably the case because it is quite domains specific library.

I see that we are generaly arguing, whether program_options library domain
is generic enough to support natively char and wchar_t (and be templated) or
if it is enough to provide an interface via conversions and support only
one encoding internaly.

I'm in favor of the first approach. 

The library works with various sources of informations and its purpose is to
restructure the information from these sources into something more usable. I
would assume for such a utility, that information passed on input has the same
encoding and format as the information on the output. From the nature of the
library it seems, that it might be possible to avoid unnecessary conversion into some
intermediate encoding.

Another association might be a container. The library is a kind container. It
parses the input and provides a conainer-like interface for the information stored
there. I find it natural, that the container uses the same encoding for its
internals as it provides in the external interface.
...
And why an existing operator>> which works for istream only should be fixed
to support wistream, if some other option need unicode support?
I don't understand this point.

[snip]
...
I generally tend to ignore speed issues, since with linear time algorithsm
and contemporary processors it's not likely to be important. OTOH, code
size *is* important. I've just compiled one of the library example, with
static linking and full optimization. It takes 152K.
Probably, it's partly gcc fault, or maybe it can be reduced but now it's so.
Empty program takes several K. Now, if I tell anyone "here's a good library
for parsing command line but it will add 152K to the application size", the
someone will tell "thanks, I'll parse command line by hand".
However, is the library is shared and is available on every Linux
installation, then the code size is not issue.
I don't think, that overhead of 152kb is somehow too big. We living in the world
of GBs, few kBs does not realy change much. If an application would use some
STL stuff, it won't very small anyway. 
(probably not the best example, but I have compiled following program with gcc3.3.1
in cygwin with options -03, and stripped of debug info afterwards

#include <iostream>

using namespace std;

int main()
{
   cout << "a test" << endl;
   return 0;
}

resulting binary have 200Kb)

I would strongly prefer simplier usage of the library to an overhead of 152kBs.
...
...
If my application is unicode, and all input I have is unicode, it is realy
annoying to convert everything to and fro when interfacing to library like
program_options.
You don't have to convert anything. Parsers will accept wstring and for
values where you need unicode you'll use wstring as well.
[snip]
...
Some of the conversions are unavoidable. E.g. if you have unicode-enabled
library, you'd still need to accept ascii input (because you can't expect
that all input sources are unicode -- main in Linux is never unicode).
If you want to support legacy operator>> you'd need conversion to ascii.
I'm not a linux expert. I'm mainly working on windows. If I decide to use unicode,
I have whole api in the unicode without any need for conversions.

Actualy in the project I'm working on now, I encountered a need for conversion
only once. I'm using date_time library and there was no support for the wide
strings at the time. Fortuntely it is fixed now :)

Regards,

Pavol

Re: [boost] Re: Re: [program_options] Unicode support

Pavol Droba