
From: "Erik Wien" <wien@start.no>
"Rogier van Dalen" <rogiervd@gmail.com> wrote in message news:e094f9eb0410200629617a4e01@mail.gmail.com...
On Wed, 20 Oct 2004 15:51:21 +0300, Peter Dimov <pdimov@mmltd.net> wrote:
Or maybe you are arguing that the string should always be kept in a particular normalized form?
That seems to be the only way of keeping comparison, search, etcetera, implementable in terms of char_traits<> functions --- and so, the only way of getting performance similar to std::basic_string<>'s.
Note that normalisation of any kind requires access to the Unicode Character Database, which may take some time, especially if the relevant parts happen not to be in the processor cache.
Comparing any Unicode data in different or unknown normalisation forms will therefore by definition be slow.
True.. So what we basically need to determine, is what is most critical? Fast comparing of strings (Strings always represented in a given NF), or fast genereal string handling (NF determined when needed)
What if the class had the option, at least, to hold multiple forms, creating each on demand? Then, the operations you invoke would simply request the particular form they require. If that form is not currently available, it is generated. That approach means you need a dirty flag set by mutating operations to know when to invalidate the secondary forms. I can envision thrashing as operations requiring a secondary form trigger mutations which invalidate the secondary form only to be needed immediately thereafter. It might also be possible to mutate all currently available generated forms, but then the complexity guarantees are affected. -- Rob Stewart stewart@sig.com Software Engineer http://www.sig.com Susquehanna International Group, LLP using std::disclaimer;