
That's good, these are needed. Also needed are tables that store the various character properties, and (hopefully) some parsers that build the
And UnicodeGrapheme concept doesn't make sense to me. You say, "A model of >UnicodeGrapheme is a range of Unicode code points that is a single grapheme >cluster in Normalized Form C." A grapheme cluster != Unicode code point. It >may be many code points representing a base character an many zero-width >combining characters. So what exactly is being
It is thus important to be able to apply algorithms with graphemes as
Dear Eric/ Mathias, tables >directly from the Unicode character database so we can easily rev it >whenever the database changes. A good reloadable character library is in the vault. traversed by a >UnicodeGrapheme range? the
unit rather than code points to deal with graphemes not representable by a >single code point.
I think that a grapheme is more of an iterator concept than a data type concept. By specialising it you will unnecessarily complicate any library. Don't forget that, for example, the current grapheme may start as one character, then suddenly 'grab' the surrounding characters as it makes a combined glyph. I have never found a use case in practise where specialising the grapheme as other than a validated series of code points was helpful. The two cases where graphemes are important is in display [which requires intermediate glyph conversion anyway, and works just as well on runs of code points, so code points are fine] and in editing - and the grapheme-ness here alters during typing.
The Unicode standard also specifies various features such as a
collation >algorithm in Technical Standard #10 - Unicode Collation Algorithm for >comparison and ordering of strings with a locale-specific criterion, as >well as mechanisms to iterate over words, sentences and lines Have a look at the character library that I posted in the vault - if you can do graphemes then you can do words, paragraphs etc as they are all just attributes of the characters with simple rules. Graphemes come in to their own for text display and editing and you would need these as well to be able to support that. Don't forget that windows GDI only supports point arithmetic and this means that you need to be able to locate word boundaries to display text well at different scales to work around the GDI scaling rounding [and GDI+ is not much better].