Re: [boost] [locale] review

18 Apr 2011

...
From: "Stewart, Robert" <Robert.Stewart@sig.com>
...
...
Use of strings rather than  symbols:
there are a few  places where the library
uses strings such as "UTF-8"  or  "de_DE.UTF8"
rather than symbols like utf_8 or de_DE_UTF8.
This use of  strings can make things run-time errors
that  could otherwise be compile-time  errors.
Mis-typing e.g.  "UTF_8" or "de-DE" is too easy.
Perhasp in  some cases the  domain of the parameter
is not known at compile time, but even  in  those cases,
it should be possible to provide symbols for  the most
common  choices.
What most common  choices? There are few dozen of different
character encodings, there are  even more locales, and what
considered common?
Also not  all encodings are supported by all backends. For example
iconv would not  handle some windows codepages and MultiByteTo..
would not handle some  other encodings.
Locales, Is de_DE.UTF-8 common? Is he_IL.UTF-8  common?
Is zh_CN.UTF-8 common?
Also the level of support  by different backends may depend
on actually OS configuration - if some  locale is not configured
on Windows or Linux that non-ICU backends would  fallback to
the C/POSIX locale.
So should there be hard  coded constants for locales and encodings?
If there is any runtime cost  associated with the string representation,
could you use a type to represent the  encoding?  The idea being that one
could instantiate the encoding object  from a string and the constructor
could throw an exception to indicate an  unsupported encoding.
Then, one can reuse the encoding object thereafter  with no further runtime 
cost.
Thus, APIs would expect an encoding object,  not a string, but if the encoding
constructor is not explicit, the effect would  be the same.
I'm not sure I fully understand you but...

Actually the encoding is constant for each locale object and
it is usually knows to to handle it efficiently.

For example in ICU backend there is a special class that
handles conversions from locale's encoding to UTF-16 -
internal ICU's encoding.

In any case the best practice is to use one encoding
over all your code base (UTF-8) and the library 
is optimized especially for it.

Artyom