
From: "Stewart, Robert" <Robert.Stewart@sig.com>
Use of strings rather than symbols: there are a few places where the library uses strings such as "UTF-8" or "de_DE.UTF8" rather than symbols like utf_8 or de_DE_UTF8. This use of strings can make things run-time errors that could otherwise be compile-time errors. Mis-typing e.g. "UTF_8" or "de-DE" is too easy. Perhasp in some cases the domain of the parameter is not known at compile time, but even in those cases, it should be possible to provide symbols for the most common choices.
What most common choices? There are few dozen of different character encodings, there are even more locales, and what considered common?
Also not all encodings are supported by all backends. For example iconv would not handle some windows codepages and MultiByteTo.. would not handle some other encodings.
Locales, Is de_DE.UTF-8 common? Is he_IL.UTF-8 common? Is zh_CN.UTF-8 common?
Also the level of support by different backends may depend on actually OS configuration - if some locale is not configured on Windows or Linux that non-ICU backends would fallback to the C/POSIX locale.
So should there be hard coded constants for locales and encodings?
If there is any runtime cost associated with the string representation, could you use a type to represent the encoding? The idea being that one could instantiate the encoding object from a string and the constructor could throw an exception to indicate an unsupported encoding. Then, one can reuse the encoding object thereafter with no further runtime cost. Thus, APIs would expect an encoding object, not a string, but if the encoding constructor is not explicit, the effect would be the same.
I'm not sure I fully understand you but... Actually the encoding is constant for each locale object and it is usually knows to to handle it efficiently. For example in ICU backend there is a special class that handles conversions from locale's encoding to UTF-16 - internal ICU's encoding. In any case the best practice is to use one encoding over all your code base (UTF-8) and the library is optimized especially for it. Artyom