
On Wed, Jan 26, 2011 at 3:06 PM, Yakov Galka <ybungalobill@gmail.com> wrote: [snip/]
Fine. If immutable strings with backward compatibility results in changing string.resize(91); to string = string.resize(91);
I don't see why this should be needed
I vote against immutability. Even if you through compatibility away on-one explained yet why immutable strings are better. For me it smells like "modern language here" influence.
[snip/]
If you need just this, then why not use std::string as it is now for my_string and use any of the Unicode libraries around. What I would like is a string with which I *can* forget that there ever was anything like other encodings than Unicode except for those cases where it is completely impossible.
And even in those cases, like when calling a OS API function I don't want to specify exactly what encoding I want but just to say: Give me a representation (or "view" if you like) of the string that is in the "native" encoding of the currently selected locale for the desired character type.
Let me try to explain myself in other words. I propose the iterator_range idea as means by which you (we) achieve our gual. It's more like a C++08 concept. By "my_string whose encoding is known" I meant that strings like u8string should map to string_ranges with typename encoding == utf_8 (for example). As a result you *won't* need to specify the exact encoding because it will be deduced from the context. The only place you will write the encoding explicitly is at the boundaries of your code and legacy APIs.
Look at the code you provided:
Something like this:
whatever_the_string_class_name_will_be cmd = init(); system(cmd.native<char>().c_str()); ShellExecute(..., cmd.native<TCHAR>().c_str(), ...); ShellExecuteW(..., cmd.native<wchar_t>().c_str(), ...); wxExecute(cmd.native<wxChar>());
or
whatever_the_string_class_name_will_be caption = get_non_ascii_string(); new wxFrame(parent, wxID_ANY, caption.native<wxChar>(), ...);
The ShellExecuteW, wxExecute and wxFrame are actually *more verbose than they have to be*. wxString is documented to be utf16 encoded as well as LPCWSTR on windows. So, providing a mapping from wxString to the string_range concept you could write it as:
wxExecute(cmd); // creates utf16 wxString new wxFrame(parent, wxID_ANY, caption, ...); // creates utf16 wxString
As a result *less* code will be affected when switching to utf8.
OK, if this is doable in the context of Boost, then you certainly will not hear any complaining from me. [snip/]
This is what I meant.
// cp_begin returning a "code-point-iterator" auto i = str.cp_begin(), e = str.cp_end(); if(i != e && *i == code_point(0x0123)) do_something();
or even (if this is possible):
// cr_begin returning a character iterator auto i = str.cr_begin(), e = str.cr_end(); // if the first character is A with acute ... if(i != e && *i == unicode_character({0x0041, 0x0301})) do_something();
I prefer: auto i = codepoints(str).begin(), e = codepoints(str).end(); auto i = characters(str).begin(), e = characters(str).end();
I really don't insist on cr_begin, etc. to be member functions (nor on calling them cr_begin, ..., for that matter).
So 1) we can extend the syntax uniformly to words, sentences etc... 2) str may be of any type that maps to string_range concept. Will it be boost::string or (when a switch to utf8 occurs) std::string a string literal.
If str is not mapped to string_range then the programmer must specify the encoding explicitly. std::string str = "hi"; const char* str2 = exception.what(); auto i = codepoints(treat_as<utf_8>(str)).begin(); // no-copy, no-op, just a cast. auto i = codepoints(treat_as<utf_8>(str2)).begin(); // works auto i = codepoints(str).begin(); // error: string is of unknown encoding. Compiles in 20 years when everyone uses utf8.
boost::string (whatever name) will be just an std::string mapped to string_range in utf_8 encoding.
If we can wrap the treat_as<utf_8> into something that does not refer to any encoding whatsoever in cases you don't have to then *thumbs up*. OK Matus