Re: [boost] String Algorithms Library: Case insensitive compareUTF-8

27 Aug 2008

      Martin Lutken wrote:
...
Anyone who knows how this could be made possible?
I suppose I need a locale facet like the std::ctype, but which works for 
UTF-8, and not just for ASCII a-z,A-Z. I guess the information in a table 
like this (http://www.unicode.org/Public/UNIDATA/CaseFolding.txt) 
could be used.
This might not work out-of-the-box. StringAlgo lib is designed around the sequences
od characters. Since UTF-8 have variable character with encoding, algotrithms
in the library would not work as expected.

To make it working, you will need a container with iterators, that will
iterate over meta-characters, not bytes.
...
If it's better/easier just to convert the string to UTF-32 before doing case 
insensitive compares, replaces I could live with that.
If you meant UTS-32 and you have a corresponding locale implementation, than
this approach is a viable solution.

Sorry, what is UTS-32 ? I tried to Google it: 351 results, with none of them 
looking like char encoding related.

I found this article on Wikipedia on UTF-32/UCS-4:
http://en.wikipedia.org/wiki/UTF-32

Is it not what I need ? 
I suspect that many people must have ran into similar problems. Perhaps we should
add a 32 bit string class to Boost. And until I get a better understanding, I will 
keep calling it UTF-32 :-)

-Regards Martin Lütken