Re: [boost] [string] proposal

26 Jan 2011

      On Wed, Jan 26, 2011 at 3:06 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
[snip/]
...
Fine. If immutable strings with backward compatibility results in changing
string.resize(91);
to
string = string.resize(91);
I don't see why this should be needed
...
I vote against immutability. Even if you through compatibility away on-one
explained yet why immutable strings are better. For me it smells like
"modern language here" influence.
[snip/]
...
...
If you need just this, then why not use std::string as
it is now for my_string and use any of the Unicode libraries
around. What I would like is a string with which I *can* forget
that there ever was anything like other encodings than Unicode
except for those cases where it is completely impossible.
And even in those cases, like when calling a OS API function
I don't want to specify exactly what encoding I want but just
to say: Give me a representation (or "view" if you like) of the string
that is in the "native" encoding of the currently selected locale
for the desired character type.
Let me try to explain myself in other words. I propose the iterator_range
idea as means by which you (we) achieve our gual. It's more like a C++08
concept. By "my_string whose encoding is known" I meant that strings like
u8string should map to string_ranges with typename encoding == utf_8 (for
example). As a result you *won't* need to specify the exact encoding because
it will be deduced from the context. The only place you will write the
encoding explicitly is at the boundaries of your code and legacy APIs.
Look at the code you provided:
...
Something like this:
whatever_the_string_class_name_will_be cmd = init();
system(cmd.native<char>().c_str());
ShellExecute(..., cmd.native<TCHAR>().c_str(), ...);
ShellExecuteW(..., cmd.native<wchar_t>().c_str(), ...);
wxExecute(cmd.native<wxChar>());
or
whatever_the_string_class_name_will_be caption = get_non_ascii_string();
new wxFrame(parent, wxID_ANY, caption.native<wxChar>(), ...);
The ShellExecuteW, wxExecute and wxFrame are actually *more verbose than
they have to be*. wxString is documented to be utf16 encoded as well as
LPCWSTR on windows. So, providing a mapping from wxString to the
string_range concept you could write it as:
wxExecute(cmd);  // creates utf16 wxString
new wxFrame(parent, wxID_ANY, caption, ...); // creates utf16 wxString
As a result *less* code will be affected when switching to utf8.
OK, if this is doable in the context of Boost, then you certainly
will not hear any complaining from me.

[snip/]
...
This is what I meant.
...
// cp_begin returning a "code-point-iterator"
auto i = str.cp_begin(), e = str.cp_end();
if(i != e && *i == code_point(0x0123)) do_something();
or even (if this is possible):
// cr_begin returning a character iterator
auto i = str.cr_begin(), e = str.cr_end();
// if the first character is A with acute ...
if(i != e && *i == unicode_character({0x0041, 0x0301}))
   do_something();
I prefer:
auto i = codepoints(str).begin(), e = codepoints(str).end();
auto i = characters(str).begin(), e = characters(str).end();
I really don't insist on cr_begin, etc. to be member functions
(nor on calling them cr_begin, ..., for that matter).
...
So
1) we can extend the syntax uniformly to words, sentences etc...
2) str may be of any type that maps to string_range concept. Will it be
boost::string or (when a switch to utf8 occurs) std::string a string
literal.
If str is not mapped to string_range then the programmer must specify the
encoding explicitly.
std::string str = "hi";
const char* str2 = exception.what();
auto i = codepoints(treat_as<utf_8>(str)).begin(); // no-copy, no-op, just a
cast.
auto i = codepoints(treat_as<utf_8>(str2)).begin(); // works
auto i = codepoints(str).begin(); // error: string is of unknown encoding.
Compiles in 20 years when everyone uses utf8.
boost::string (whatever name) will be just an std::string mapped to
string_range in utf_8 encoding.
If we can wrap the treat_as<utf_8> into something that
does not refer to any encoding whatsoever in cases
you don't have to then *thumbs up*.

OK

Matus

Re: [boost] [string] proposal

Matus Chochlik