Re: [boost] [string] proposal

26 Jan 2011


      On Wed, Jan 26, 2011 at 10:37 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
...
Excuse my ignorance, but can someone explain to me why people are so keen on
immutable strings? Aren't they basically the same as 'shared_ptr<const
std::string>'?
I'm fairly neutral on the immutability issue, I do not oppose it if
someone shows why it is a superior design,  provided it does not
break everything horribly (from the backward compatibility perspective).
...
I follow these discussions, and I must admit that I already use std::string
in my projects with utf8 encoding assumed by default. What matters for me is
the lack of a "standard" way to manipulate those strings. I.e.:
1) Convert them to and from other APIs' encoding:
   SetWindowTextW(to_utf16(my_string));
2) Iterate through the codepoints, characters, words etc.. like this:
   for(char32_t cp : codepoints(my_string))
       ...;
+1
...
The original proposal (in the other thread) was to use the type of the
string to ensure at compile time that the above code is valid. I understand
that it is needed in the current world where not everybody uses utf8. It's
fine for me. But why
On Fri, Jan 21, 2011 at 13:25, Matus Chochlik <chochlik@gmail.com> wrote:
...
create a class called boost::string that will have
all the properties that a string handling class in 2011+ A.D.
should have, basically what std::string should have been.
The original proposal was to keep the existing string but to
switch to UTF-8 as the default encoding. This is what still is
my long term goal. The whole discussion changed my opinion
on how to get there. I personally would not have any problem
with doing the instant switch .. but many other people would,
and with good reasons.
...
?
What are those properties? Isn't std::string *is* what it should have been?
Do you mean that you want to put there in any possible algorithm you can
imagine?
What I was talking about is basically adding some more convenience
member functions, many of which are currently implemented by the
string_algo Boost library, to the strings interface and more importantly
to extend the strings interface with 'Unicode-functionality' i.e. the ability
to traverse the string not just as a sequence of bytes but as a sequence
of Unicode code-points and if possible even "logical characters".
...
IMO std::string is just a container of bytes with two useful convenience
methods (c_str() and substr()) and a utf8 encoding that had to be assumed by
default but unfortunately isn't. Everything else should be generic
algorithms that work with sequences of characters in some encoding. So,
maybe it's better to focus on designing something like boost::iterator_range
with an encoding associated with it and algorithms that work with these
ranges?
I that is to succeed it has to be (backward)compatible with the existing APIs,
however borked they seem to us (me included). There are lots of strings
implementations that are *cool* but unusable by anything except algorithms
specifically designed for them.

Matus