Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]

19 Jan 2011

      On 1/19/2011 11:33 AM, Peter Dimov wrote:
...
Edward Diener wrote:
...
Inevitably a Unicode standard will be adapted where every character of
every language will be represented by a single fixed length number of
bits.
This was the prevailing thinking once. First this number of bits was 16,
which incorrect assumption claimed Microsoft and Java as victims, then
it became 21 (or 22?). Eventually, people realized that this will never
happen even if we allocate 32 bits per character, so here we are.
"Eventually, people realized..." . This is just rhetoric, where "people" 
is just whatever your own opinion is.

I do not understand the technical reason for it never happening. Are 
human "alphabets" proliferating so fast that we can not fit the notion 
of a character in any alphabet into a fixed size character ? In that 
case neither are we ever going to have multi-byte characters 
representing all of the possible characters in any language. But it is 
absurd to believe that. "Eventually people realized that making a fixed 
size character representing every character in every language was doable 
and they just did it." That sounds fairly logical to me, aside from the 
practicality of getting diverse people from different 
nationalities/character-sets to agree on things.

Of course you can argue that having a variable number of bytes 
representing each possible character in any language is better than 
having a single fixed size character and I am willing to listen to that 
technical argument. But from a programming point of view, aside from the 
"waste of space" issue, it does seem to me that having a fixed size 
character has the obvious advantage of being able to access a character 
via some offset in the character array, and that all the algorithms for 
finding/inserting/deleting/changing characters become much easier and 
quicker with a fixed size character, as well as displaying and inputting.