Re: [boost] GSoC Unicode library: second preview

23 Jun 2009

...
Of course, the library works with UTF-8 and UTF-32 just as
well, it makes no difference to the generic algorithms
(which don't exist yet, but expect substring searching and
the like), it's up to you to choose what makes the most
sense to use for your situation (for example, you may choose
to use UTF-8 because you need to interact a lot with
programming interfaces expecting that format).
Ok, this is really good.
...
They should be fairly easy to find.
Either you're using the algorithm that does the task
correctly, or you're fiddling with the encoding by hand
which is likely to be wrong.
They are easy to find in the Unicode aware unit tests but not
in real program. I did once a small test, what Unicode aware 
programs support characters outside of BMP, i.e. I tested 
a glyph that was encoded as surrogate pair in UTF-16...

The results were total disaster:

- Windows standard dialogs: displayed character correctly but
  every operation like deletion related to is as two pairs. For example
  file name dialog had problems.
- Same behavior in notepad or any standard text-area widgets didn't
  work correctly.
- Qt3 hadn't supported surrogate pairs at all (in Qt4 most of it was 
  fixed) displaying two square "glyphs".
- Opera Web browser, had similar problems with editing and displaying
  such characters.

So... There is a huge problem with this encoding, because such simple
QA test shouldn't give such bad results for such big amount of programs.

Also, all programs that used internally utf-8 or utf-32 had passed these
tests very well.

So I really **do not** suggest recommending this encoding as "best"
one for internal use.

Artyom

Artyom

tags

participants (1)