
Peter Bindels wrote:
On 23/06/07, Mathias Gaunard <mathias.gaunard@etu.u-bordeaux1.fr> wrote:
Peter Bindels wrote:
When searching ASCII text, it's equal; Not if you handle grapheme clusters.
If your text is "abcfoôdef", with ô coded as o + combining accent, then searching for "foo" shouldn't work, since you would only find part of the grapheme cluster and possibly do weird things if for example the substring is removed.
Combining accents, nor in fact any character with accent, were in ASCII last time I checked.
Exactly what question is being discussed here? I thought the question was, how fast is text search with UTF-8 strings that happen to contain ASCII only, compared with text search with ASCII strings. Even if the UTF-8 strings happen to contain ASCII, the search algorithm may still have to check for combining characters. The wider question is, should people who currently use ASCII and care a lot about performance and don't care about i18n switch to UTF-8? It would certainly simplify life if all strings were UTF-n. I have had to deal with BSTR, CString, QString and others. Just having to deal with a single string type would be a good thing. --Johan Råde