data:image/s3,"s3://crabby-images/c749d/c749d05e54c964f75131343fde6028537575e771" alt=""
On Wed, Jul 22, 2009 at 2:25 PM, Robert Dailey
Problem with that is that std::string::length() no longer provides a meaningful value. It will count each byte is 1 character.
Instead of ICU, there's also http://utfcpp.sourceforge.net/ with its utf8::distance, which may be lighter weight. --DD Quoting from that web page: This function is used to find the length (in code points) of a UTF-8 encoded string. The reason it is called distance, rather than, say, length is mainly because developers are used that length is an O(1) function. Computing the length of an UTF-8 string is a linear operation, and it looked better to model it after std::distance algorithm. In case of an invalid UTF-8 sequence, a utf8::invalid_utf8 exception is thrown. If last does not point to the past-of-end of a UTF-8 sequence, a utf8::not_enough_room exception is thrown.