Re: [boost] [string] Realistic API proposal

28 Jan 2011

      ...
...
3. It allows to use std::string meanwhile  under the hood as storage
    giving high efficiency when  assigning boost::string to std::string
    when the  implementation is COW (almost all implementations with
     exception of MSVC)
COW implementations of std::string are not allowed  anymore starting with 
C++0x.
Shame, I still have a little hope that n2668 would be reverted back.
...
...
4. It is full unicode aware
 5. It pushes "UTF-8" idea to standard C++
6. You don't pay for what you  do not need.
What am I paying for? I don't see how I gain  anything.
You don't pay on validation of the UTF-8 especially when 99% of uses
of the string are encoding-agnostic.
...
...
#ifdef C++0x
         typedef  char32_t const_code_point_type;
          #else
         typedef unsigned  const_code_point_type;
          #endif
Just define boost::char32 once (depending on BOOST_NO_CHAR32_T)  and use
that instead of putting ifdefs everywhere.
(that's what  boost/cuchar.hpp does in my library)
Good point
...
...
// UTF validation
bool  is_valid_utf() const;
See, that's what makes the whole thing  pointless.
Actually not, consider:

   socket.read(my_string);
   if(!my_string.is_valid_utf())
      ....
...
Your type doesn't add any semantic value on top of std::string,
it's just an agglomeration of free functions into a class. That's a terrible  
design.
The only advantage that a specific type for unicode strings would  bring is 
that it could
enforce certain useful invariants.
You don't need to enforce things you don't care 99% of cases.
...
Enforcing that  the string is in a valid UTF encoding and is normalized 
in a specific  normalization form can make most Unicode algorithms several
orders of magnitude  faster.
You do not always want to normalize text. It is user choice you
may have optimized algorithms for already normalized strings
but it is not always the case.

Also what kind of normalization NFC? NFKC?
...
All of this is trivial to implement quickly with my  Unicode  library.
No, it is not.

Your Unicode library is locale agnostic which makes it quite
useless in too many cases.

Almost every added function was locale sensitive:

- search
- collation
- case handling

And so on. This is major drawback of your library that
it is not capable of doing locale sensitive algorithms
that are vast majority of the Unicode algorithms

Artyom