Re: [boost] [rfc] I/O Library Design

21 Jun 2007

      Andrey Semashev wrote:
...
UTF-8 is a variable character length encoding which complicates 
processing considerably.
It's trivial compared to the real Unicode work.
...
I'd rather stick to UTF-16 if I had to use 
Unicode.
UTF-16 is a variable-length encoding too.

But anyway, Unicode itself is a variable-length format, even with the 
UTF-32 encoding, simply because of grapheme clusters.
...
I'm not saying that we don't need Unicode support. We do!
I'm only saying that in many cases plain ASCII does its job perfectly 
well: logging, system messages, simple text formatting, texts in 
restricted character sets, like numbers, phone numbers, identifiers of 
all kinds, etc.
Identifiers of all kinds aren't text, they're just bytes.
As for logging, I'm not too sure whether it should be localized or not.
And I don't understand what you mean by system messages.

I still don't understand why you want to work with other character sets. 
That will just require duplicating the tables and algorithms required to 
process the text correctly.
See http://www.unicode.org/reports/tr10/ for an idea of the complexity 
of collations, which allow comparison of strings.
As you can see, it has little to do with encoding, yet the tables etc. 
require the usage of the Unicode character set, preferably in a 
canonical form so that it can be quite efficient.
...
There are cases where i18n is not needed at all - mostly 
server-side apps with minimal UI.
Any application that process or display non-trivial text (meaning 
something else than options) should have internationalization.
...
Being forced to use Unicode internally
in these cases means increased memory footprint and degraded performance 
due to encoding translation overhead.
What encoding translation are you talking about?

Re: [boost] [rfc] I/O Library Design

Mathias Gaunard