
On 1/19/2011 11:33 AM, Peter Dimov wrote:
Edward Diener wrote:
Inevitably a Unicode standard will be adapted where every character of every language will be represented by a single fixed length number of bits.
This was the prevailing thinking once. First this number of bits was 16, which incorrect assumption claimed Microsoft and Java as victims, then it became 21 (or 22?). Eventually, people realized that this will never happen even if we allocate 32 bits per character, so here we are.
"Eventually, people realized..." . This is just rhetoric, where "people" is just whatever your own opinion is. I do not understand the technical reason for it never happening. Are human "alphabets" proliferating so fast that we can not fit the notion of a character in any alphabet into a fixed size character ? In that case neither are we ever going to have multi-byte characters representing all of the possible characters in any language. But it is absurd to believe that. "Eventually people realized that making a fixed size character representing every character in every language was doable and they just did it." That sounds fairly logical to me, aside from the practicality of getting diverse people from different nationalities/character-sets to agree on things. Of course you can argue that having a variable number of bytes representing each possible character in any language is better than having a single fixed size character and I am willing to listen to that technical argument. But from a programming point of view, aside from the "waste of space" issue, it does seem to me that having a fixed size character has the obvious advantage of being able to access a character via some offset in the character array, and that all the algorithms for finding/inserting/deleting/changing characters become much easier and quicker with a fixed size character, as well as displaying and inputting.