
Andrey Semashev wrote:
UTF-8 is a variable character length encoding which complicates processing considerably.
It's trivial compared to the real Unicode work.
I'd rather stick to UTF-16 if I had to use Unicode.
UTF-16 is a variable-length encoding too. But anyway, Unicode itself is a variable-length format, even with the UTF-32 encoding, simply because of grapheme clusters.
I'm not saying that we don't need Unicode support. We do! I'm only saying that in many cases plain ASCII does its job perfectly well: logging, system messages, simple text formatting, texts in restricted character sets, like numbers, phone numbers, identifiers of all kinds, etc.
Identifiers of all kinds aren't text, they're just bytes. As for logging, I'm not too sure whether it should be localized or not. And I don't understand what you mean by system messages. I still don't understand why you want to work with other character sets. That will just require duplicating the tables and algorithms required to process the text correctly. See http://www.unicode.org/reports/tr10/ for an idea of the complexity of collations, which allow comparison of strings. As you can see, it has little to do with encoding, yet the tables etc. require the usage of the Unicode character set, preferably in a canonical form so that it can be quite efficient.
There are cases where i18n is not needed at all - mostly server-side apps with minimal UI.
Any application that process or display non-trivial text (meaning something else than options) should have internationalization.
Being forced to use Unicode internally in these cases means increased memory footprint and degraded performance due to encoding translation overhead.
What encoding translation are you talking about?