
3. It allows to use std::string meanwhile under the hood as storage giving high efficiency when assigning boost::string to std::string when the implementation is COW (almost all implementations with exception of MSVC)
COW implementations of std::string are not allowed anymore starting with C++0x.
Shame, I still have a little hope that n2668 would be reverted back.
4. It is full unicode aware 5. It pushes "UTF-8" idea to standard C++ 6. You don't pay for what you do not need.
What am I paying for? I don't see how I gain anything.
You don't pay on validation of the UTF-8 especially when 99% of uses of the string are encoding-agnostic.
#ifdef C++0x typedef char32_t const_code_point_type; #else typedef unsigned const_code_point_type; #endif
Just define boost::char32 once (depending on BOOST_NO_CHAR32_T) and use that instead of putting ifdefs everywhere. (that's what boost/cuchar.hpp does in my library)
Good point
// UTF validation
bool is_valid_utf() const;
See, that's what makes the whole thing pointless.
Actually not, consider: socket.read(my_string); if(!my_string.is_valid_utf()) ....
Your type doesn't add any semantic value on top of std::string, it's just an agglomeration of free functions into a class. That's a terrible design. The only advantage that a specific type for unicode strings would bring is that it could enforce certain useful invariants.
You don't need to enforce things you don't care 99% of cases.
Enforcing that the string is in a valid UTF encoding and is normalized in a specific normalization form can make most Unicode algorithms several orders of magnitude faster.
You do not always want to normalize text. It is user choice you may have optimized algorithms for already normalized strings but it is not always the case. Also what kind of normalization NFC? NFKC?
All of this is trivial to implement quickly with my Unicode library.
No, it is not. Your Unicode library is locale agnostic which makes it quite useless in too many cases. Almost every added function was locale sensitive: - search - collation - case handling And so on. This is major drawback of your library that it is not capable of doing locale sensitive algorithms that are vast majority of the Unicode algorithms Artyom