Re: [boost] [locale] Review results for Boost.Locale library

28 Apr 2011


      On 28/04/2011 00:32, Jeremy Maitin-Shepard wrote:
...
...
User gives string and says what encoding it is in, the library converts
to the catalog encoding and looks it up, then returns the localized
string, converting again if needed.
Unlike what Artyom said earlier, converting a string does not
necessarily require dynamic memory allocation, and localization is not
particularly performance critical anyway.
It may often not be performance critical. In some cases, it might be
though. Consider the case of a web server, where the work done by the
web server machines themselves may essentially just consist of pasting
together strings from various sources. (There is possibly a separate
database server, etc.) This is also precisely the use case for which
Artyom designed the library, I think. In this setting it is fairly clear
why converting the messages once when loaded is better than doing it
when needed.
Converting between encodings without memory allocation could be even 
cheaper than concatenating strings.
...
...
If that runtime conversion is a concern, it's also possible to do that
at compile time, at least with C++0x (syntax is ugly in C++03).
Maybe it can be done, but I don't think it is a viable possibility.
It could work if you only need it for short strings and you can spend 
time at compile time to do that conversion.
...
It is unfortunate simply because it is not uniform, even though it is
possible to work around that, and furthermore, it is unfortunate because
UTF-32 is generally not wanted.
It is uniform since it's always Unicode (except on some platforms that 
very few people care about).
...
...
...
but in practice the same source
code containing L"" string literals can be used on both Windows and
Linux to reliably specify Unicode string literals (provided that care is
taken to ensure the compiler knows the source code encoding). The fact
that UTF-32 (which Linux tends to use for wchar_t) is space-inefficient
does in some ways make render Linux a second-class citizen if a solution
based on wide string literals is used for portability, but using UTF-8
on MSVC is basically just impossible, rather than merely less efficient,
so there doesn't seem to be another option. (Assuming you are unwilling
to rely on the Windows "ANSI" narrow encodings.)
You can always use a macro USTRING("foo") that expands to u8"foo" or
u"foo" on systems with unicode string literals and L"foo" elsewhere.
You can, but it adds complexity, etc...
How so? It solves exactly the problem you explained, i.e. avoid wasting 
memory with UTF-32 when you can.
If USTRING is too long, you can just use _U or something like that.

Re: [boost] [locale] Review results for Boost.Locale library

Mathias Gaunard