
On Wed, Jan 26, 2011 at 12:42 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
On Wed, Jan 26, 2011 at 11:54, Matus Chochlik <chochlik@gmail.com> wrote: [snip/]
I'm fairly neutral on the immutability issue, I do not oppose it if someone shows why it is a superior design, provided it does not break everything horribly (from the backward compatibility perspective).
Me too, but it definitely will break existing code: string.resize(91);
This is just one of the examples. The append/prepend/etc. are others. The question is: do we allow them for the sake of the backward compatibility and implement them by using the immutable-semantic. Even resize could be implemented this way. Another matter is whether it makes sense. [snip/]
My point is that 'Unicode-functionality' should be separate from the string implementation. This code for(char32_t cp : codepoints(my_string)); should work with any type of my_string whose encoding is known.
If you need just this, then why not use std::string as it is now for my_string and use any of the Unicode libraries around. What I would like is a string with which I *can* forget that there ever was anything like other encodings than Unicode except for those cases where it is completely impossible. And even in those cases, like when calling a OS API function I don't want to specify exactly what encoding I want but just to say: Give me a representation (or "view" if you like) of the string that is in the "native" encoding of the currently selected locale for the desired character type. Something like this: whatever_the_string_class_name_will_be cmd = init(); system(cmd.native<char>().c_str()); ShellExecute(..., cmd.native<TCHAR>().c_str(), ...); ShellExecuteW(..., cmd.native<wchar_t>().c_str(), ...); wxExecute(cmd.native<wxChar>()); or whatever_the_string_class_name_will_be caption = get_non_ascii_string(); new wxFrame(parent, wxID_ANY, caption.native<wxChar>(), ...); In many cases the above could be a no-op, depending on the *internal* encoding used by this string class. It could be UTF-8 by default and maybe UTF-16 on Windows. Specifying *exactly* (like with iso_8859_2_cp_tag, or utf32_cp_tag, ...) which encoding I want, should be done only when absolutely necessary and *not* every time when I want to do something with the string. Also, there should be iterators allowing you to do this, again without specifying what encoding you want exactly: // cp_begin returning a "code-point-iterator" auto i = str.cp_begin(), e = str.cp_end(); if(i != e && *i == code_point(0x0123)) do_something(); or even (if this is possible): // cr_begin returning a character iterator auto i = str.cr_begin(), e = str.cr_end(); // if the first character is A with acute ... if(i != e && *i == unicode_character({0x0041, 0x0301})) do_something();
I'm not against adding convenience functions into the string. It makes the code more readable when you concatenate operations. However, it violates this: http://www.drdobbs.com/184401197
I do not want to overuse the "breaking" of the encapsulation by adding new non-static or friend functions. If we can take an advantage in the implementation of say trim(str) we may but we don't have to. This is an implementation detail (if we decide that the usage is trim(str) and not str.trim());
[snip/]
I that is to succeed it has to be (backward)compatible with the existing APIs, however borked they seem to us (me included). There are lots of strings implementations that are *cool* but unusable by anything except algorithms specifically designed for them.
I can't exactly understand what has to be backward compatible with what... Can you please provide a few code snippets that mustn't break so I could think about that?
Maybe you have something different in mind, but what I was talking about is that you cannot pass an iterator_range *directly* to a WINAPI (or any other OS API that I know of) call. BR, Matus