
The most significant complaint seems to be the fact that the translation interface is limited to ASCII (or maybe UTF-8 is also supported, it isn't entirely clear). Even though various arguments have been made for using only ASCII text literals in the program, it seems that it would be relatively easy to support other languages. As has been mentioned by someone else, even if the text really is in English, ASCII may not be sufficient as it may be desirable to include some special symbol (e.g. the copyright symbol for instance), and having to deal with this by creating a translation from "ASCII English to appease translation system" to "Real English to display to users" would seem to be an unjustifiable additional burden. However, I don't think anyone is as familiar with the limitations of gettext-related tools as Artyom, so he is the best person to discuss exactly how this might be supported. Previously he briefly described a make-shift approach that required the use of a macro, which didn't seem like a legitimate solution. It seems that xgettext (at least the version 0.18.1 that I tested on my machine) supports non-ASCII program source provided that the --from-code option is given, so it seems that the user could keep the source code in any arbitrary character set/encoding and it would still work (and simply convert the strings to UTF-8). It also appears to successfully extract strings that are specified with a L prefix, so it seems that should not be a problem either. I suppose there is some question as to how well existing tools for translating the messages deal with non-ASCII, but as the tools can be improved fairly easily if necessary, I don't think this is a significant concern. We can assume that the compiler knows the correct character set of the source code file, as trying to fool it would seem to be inherently error prone. This seems to rule out the possibility of char * literals containing UTF-8 encoded text on MSVC, until C++1x Unicode literals are supported. The biggest nuisance is that we need to know the compile-time character set/encoding (so that we know how to interpret "narrow" string literals), and there does not appear to be any standard way in which this is recorded (maybe I'm mistaken though). However, it is easy enough for the user to simply specify this as a preprocessor define (the build system could add it to the compile flags, and it needs to be known anyway in order to invoke xgettext --- presumably it would just be based on the active locale at the time the compiler is invoked). If none is specified, it could default to UTF-8 (this can also be used for greater efficiency in the case that the compile-time encoding is not UTF-8 but the source code happens to only contain ASCII messages). By knowing the compile-time character set, all ambiguity is removed. The translation database can be assumed to be keyed based on UTF-8, so to translate a message, it needs to be converted to UTF-8. There should presumably be versions of the translation functions that take narrow strings, wide strings, and additional versions for the C++1x unicode literals once they are supported by compilers (I expect that to be very soon, at least for some compilers). If a wide string is specified, it will be assumed to be in UTF-16 or UTF-32 depending on sizeof(wchar_t), and converted to UTF-8. UTF-32 is generally undesirable, I imagine, but in practice should nonetheless work and using wide strings might be the best approach for code that needs to compile on both Windows and Linux. For the narrow version, if the compile-time narrow encoding is UTF-8, the conversion is a no-op. Otherwise, the conversion will have to be done. (The C++1x u8 literal version would naturally require no conversion also.) Note that the common case of UTF-8 narrow literals, which is the only case currently supported, there would be no performance penalty. The documentation could explicitly warn that there is a performance penalty for not using UTF-8, but I think this penalty is likely to be acceptable in many cases. If normalization proves to be an issue, then the conversion to UTF-8 could include normalization (perhaps another preprocessor definition) and the output of xgettext could also be normalized. I imagine relative to the work required for the whole library, these changes would be quite trivial, and might very well transform the library from completely unacceptable to acceptable for a number of objectors on the list, while having essentially no impact on those that are happy to use the library as is.