
On 28/04/2011 00:32, Jeremy Maitin-Shepard wrote:
User gives string and says what encoding it is in, the library converts to the catalog encoding and looks it up, then returns the localized string, converting again if needed.
Unlike what Artyom said earlier, converting a string does not necessarily require dynamic memory allocation, and localization is not particularly performance critical anyway.
It may often not be performance critical. In some cases, it might be though. Consider the case of a web server, where the work done by the web server machines themselves may essentially just consist of pasting together strings from various sources. (There is possibly a separate database server, etc.) This is also precisely the use case for which Artyom designed the library, I think. In this setting it is fairly clear why converting the messages once when loaded is better than doing it when needed.
Converting between encodings without memory allocation could be even cheaper than concatenating strings.
If that runtime conversion is a concern, it's also possible to do that at compile time, at least with C++0x (syntax is ugly in C++03).
Maybe it can be done, but I don't think it is a viable possibility.
It could work if you only need it for short strings and you can spend time at compile time to do that conversion.
It is unfortunate simply because it is not uniform, even though it is possible to work around that, and furthermore, it is unfortunate because UTF-32 is generally not wanted.
It is uniform since it's always Unicode (except on some platforms that very few people care about).
but in practice the same source code containing L"" string literals can be used on both Windows and Linux to reliably specify Unicode string literals (provided that care is taken to ensure the compiler knows the source code encoding). The fact that UTF-32 (which Linux tends to use for wchar_t) is space-inefficient does in some ways make render Linux a second-class citizen if a solution based on wide string literals is used for portability, but using UTF-8 on MSVC is basically just impossible, rather than merely less efficient, so there doesn't seem to be another option. (Assuming you are unwilling to rely on the Windows "ANSI" narrow encodings.)
You can always use a macro USTRING("foo") that expands to u8"foo" or u"foo" on systems with unicode string literals and L"foo" elsewhere.
You can, but it adds complexity, etc...
How so? It solves exactly the problem you explained, i.e. avoid wasting memory with UTF-32 when you can. If USTRING is too long, you can just use _U or something like that.