Re: [boost] [string] proposal

26 Jan 2011

      On Wed, Jan 26, 2011 at 12:42 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
...
On Wed, Jan 26, 2011 at 11:54, Matus Chochlik <chochlik@gmail.com> wrote:
[snip/]
...
I'm fairly neutral on the immutability issue, I do not oppose it if
someone shows why it is a superior design,  provided it does not
break everything horribly (from the backward compatibility perspective).
Me too, but it definitely will break existing code:
string.resize(91);
This is just one of the examples. The append/prepend/etc.
are others. The question is: do we allow them for the
sake of the backward compatibility and implement them
by using the immutable-semantic. Even resize could be
implemented this way. Another matter is whether it makes
sense.

[snip/]
...
My point is that 'Unicode-functionality' should be separate from the string
implementation. This code
for(char32_t cp : codepoints(my_string));
should work with any type of my_string whose encoding is known.
If you need just this, then why not use std::string as
it is now for my_string and use any of the Unicode libraries
around. What I would like is a string with which I *can* forget
that there ever was anything like other encodings than Unicode
except for those cases where it is completely impossible.

And even in those cases, like when calling a OS API function
I don't want to specify exactly what encoding I want but just
to say: Give me a representation (or "view" if you like) of the string
that is in the "native" encoding of the currently selected locale
for the desired character type.

Something like this:

whatever_the_string_class_name_will_be cmd = init();
system(cmd.native<char>().c_str());
ShellExecute(..., cmd.native<TCHAR>().c_str(), ...);
ShellExecuteW(..., cmd.native<wchar_t>().c_str(), ...);
wxExecute(cmd.native<wxChar>());

or

whatever_the_string_class_name_will_be caption = get_non_ascii_string();
new wxFrame(parent, wxID_ANY, caption.native<wxChar>(), ...);

In many cases the above could be a no-op, depending on the
*internal* encoding used by this string class. It could be
UTF-8 by default and maybe UTF-16 on Windows.

Specifying *exactly* (like with iso_8859_2_cp_tag, or utf32_cp_tag, ...)
which encoding I want, should be done only when absolutely
necessary and *not* every time when I want to do something
with the string.

Also, there should be iterators allowing you to do this, again
without specifying what encoding you want exactly:

// cp_begin returning a "code-point-iterator"
auto i = str.cp_begin(), e = str.cp_end();
if(i != e && *i == code_point(0x0123)) do_something();

or even (if this is possible):

// cr_begin returning a character iterator
auto i = str.cr_begin(), e = str.cr_end();
// if the first character is A with acute ...
if(i != e && *i == unicode_character({0x0041, 0x0301}))
    do_something();
...
I'm not against adding convenience functions into the string. It makes the
code more readable when you concatenate operations. However, it violates
this:
http://www.drdobbs.com/184401197
I do not want to overuse the "breaking"
of the encapsulation by adding new non-static or friend
functions. If we can take an advantage in the implementation
of say trim(str) we may but we don't have to. This is an
implementation detail (if we decide that the usage is
trim(str) and not str.trim());
...
...
...
[snip/]
...
...
I that is to succeed it has to be (backward)compatible with the existing
APIs,
however borked they seem to us (me included). There are lots of strings
implementations that are *cool* but unusable by anything except algorithms
specifically designed for them.
I can't exactly understand what has to be backward compatible with what...
Can you please provide a few code snippets that mustn't break so I could
think about that?
Maybe you have something different in mind, but what I was talking
about is that you cannot pass an iterator_range *directly* to
a WINAPI (or any other OS API that I know of) call.

BR,

Matus