Re: [boost] [rfc] I/O Library Design

20 Jun 2007


      Mathias Gaunard wrote:
...
Andrey Semashev wrote:
...
I'd like to note that Unicode consumes more memory than narrow 
encodings.
That's quite dependent on the encoding used.
The most popular Unicode memory-saving encoding is UTF-8 though, which 
doubles the size needed for non ASCII characters compared to ISO-8859-* 
for example. It's not that problematic though.
UTF-8 is a variable character length encoding which complicates 
processing considerably. I'd rather stick to UTF-16 if I had to use 
Unicode. And it's already twice bigger than ASCII.
...
Alternatives which use even less memory exist, but they have other 
disadvantages.
...
This may not be desirable in all cases, especially when the 
application is not intended to support multiple languages in its 
majority of strings (which, in fact, is a quite common case).
Algorithms to handle text boundaries, tailored grapheme clusters, 
collations (some of which are context-sensitive) etc. are needed to 
process correctly any one language.
So you need Unicode anyway, and better reuse the Unicode stuff than work 
on top of a legacy encoding.
I'm not saying that we don't need Unicode support. We do!
I'm only saying that in many cases plain ASCII does its job perfectly 
well: logging, system messages, simple text formatting, texts in 
restricted character sets, like numbers, phone numbers, identifiers of 
all kinds, etc. There are cases where i18n is not needed at all - mostly 
server-side apps with minimal UI. Being forced to use Unicode internally 
in these cases means increased memory footprint and degraded performance 
due to encoding translation overhead.