
In article <87isg3skr6.fsf@jbms.ath.cx>, Jeremy Maitin-Shepard <jbms@attbi.com> wrote:
Right, it will certainly be necessary to provide a grapheme_cluster_iterator (with value_type = the Unicode string type). ICU should help with this.
You are conflating abstract characters (which exist in absence of a graphical representation) and graphemes (whose existence is dependent upon the graphical representation), but I believe we are talking about the same thing.
Nonetheless, it is useful to represent a single code point, for several reasons:
I agree; as I mentioned elsewhere, I believe that the Unicode string abstraction needs to support at least iteration by abstract characters, encoded characters, and encoding units.
- For the purpose of string construction, the Unicode specification explicitly states that any sequence of code points is well formed, and so this provides the smallest unit by which guaranteed-well-formed strings can be formed.
Can you refer me to a specific point in the spec where this is stated?
- It would be useful to provide functions for querying the Unicode properties of individual code points, and this code_point type would be the only suitable parameter type.
Absolutely.
I do agree, however, that for almost any output formatting, the locale-specific or user-specified fill text/symbols should be specified as strings, rather than as individual characters.
Yes. meeroh -- If this message helped you, consider buying an item from my wish list: <http://web.meeroh.org/wishlist>