Re: [boost] [gsoc] unicode tools and an unicode string type

30 Mar 2009

      On Sun, Mar 29, 2009 at 9:40 PM, Mathias Gaunard
<mathias.gaunard@ens-lyon.org> wrote:
...
I plan to submit during the week my proposal for the Summer of Code about
Unicode.
I plan to provide:
- iterator adaptors to iterate sequences of code units, code points and
graphemes, and eventually more, from a sequence in UTF-8, UTF-16, UCS-2 or
UTF-32/UCS-4.
What about conversion algorithms to conveniently generate these
sequences in the first place?
...
- miscellaneous utilities, such as categorization of code points
- normalization functions
- comparisons but not collations
- substring search algorithms
- and finally, an unicode string type
...
From prior discussions, it seemed to me that there were actually needs
for several unicode string types.
* Specific UTF-8, UTF-16, UTF-*, string classes to be used within an
application, when a particular Unicode string type and internal
representation is the optimal choice.

* A single utf_string that varies its internal representation at
run-time. This is the choice for communication between third parties
where not enough is known about the applications to choose a
particular internal representation, or within an application when the
application must cope with runtime changing needs..
...
I am well aware defining yet another new string type is quite controversial,
but I believe this is quite useful. A dedicated type would be able to
maintain certain invariants, such as maintaining a special normalization
form.
Also, I believe it can be possible to come up with a string design that
allows easy integration with any other existing string type, such as the
ones from the standard or Qt
While this is an interesting proposal, it appears to me to be several
years worth of work. How would you structure the first summer's work?
Would you aim at breadth (a prototype covering the whole) or depth
(production quality work that concentrates on one aspect)?

--Beman

Re: [boost] [gsoc] unicode tools and an unicode string type

Beman Dawes