
Rogier van Dalen <R.C.van.Dalen@umail.leidenuniv.nl> writes:
[snip]
I would like to make my point slightly clearer than I did before. I don't think it would do for a Unicode string library to concentrate on code points. Yes, the raw Unicode data should be available somewhere, so it can be written to file or sent to the OS's display routines. However, IMO it should use characters as its *only* interface for manipulation.
In practice, that isn't very useful. The most common operations would probably be collation and substring matching (as well as regular expression handling, perhaps). None of these operations are defined directly in terms of grapheme clusters, for a variety of reasons. What it comes down to is that the string will be the most basic unit for most operations.
[snip]
-- Jeremy Maitin-Shepard