
Mathias Gaunard wrote:
Andrey Semashev wrote:
I'd like to note that Unicode consumes more memory than narrow encodings.
That's quite dependent on the encoding used. The most popular Unicode memory-saving encoding is UTF-8 though, which doubles the size needed for non ASCII characters compared to ISO-8859-* for example. It's not that problematic though.
UTF-8 is a variable character length encoding which complicates processing considerably. I'd rather stick to UTF-16 if I had to use Unicode. And it's already twice bigger than ASCII.
Alternatives which use even less memory exist, but they have other disadvantages.
This may not be desirable in all cases, especially when the application is not intended to support multiple languages in its majority of strings (which, in fact, is a quite common case).
Algorithms to handle text boundaries, tailored grapheme clusters, collations (some of which are context-sensitive) etc. are needed to process correctly any one language. So you need Unicode anyway, and better reuse the Unicode stuff than work on top of a legacy encoding.
I'm not saying that we don't need Unicode support. We do! I'm only saying that in many cases plain ASCII does its job perfectly well: logging, system messages, simple text formatting, texts in restricted character sets, like numbers, phone numbers, identifiers of all kinds, etc. There are cases where i18n is not needed at all - mostly server-side apps with minimal UI. Being forced to use Unicode internally in these cases means increased memory footprint and degraded performance due to encoding translation overhead.