Re: [boost] [rfc] I/O Library Design

23 Jun 2007

      Peter Bindels wrote:
...
On 23/06/07, Mathias Gaunard <mathias.gaunard@etu.u-bordeaux1.fr> wrote:
...
Peter Bindels wrote:
...
When searching ASCII text, it's
equal;
Not if you handle grapheme clusters.
If your text is "abcfoôdef", with ô coded as o + combining accent, then
searching for "foo" shouldn't work, since you would only find part of
the grapheme cluster and possibly do weird things if for example the
substring is removed.
Combining accents, nor in fact any character with accent, were in
ASCII last time I checked.
Exactly what question is being discussed here?
I thought the question was, how fast is text search with UTF-8 strings that happen
to contain ASCII only, compared with text search with ASCII strings.
Even if the UTF-8 strings happen to contain ASCII,
the search algorithm may still have to check for combining characters.

The wider question is, should people who currently use ASCII and care a lot about performance
and don't care about i18n switch to UTF-8?

It would certainly simplify life if all strings were UTF-n.
I have had to deal with BSTR, CString, QString and others.
Just having to deal with a single string type would be a good thing.

--Johan Råde