Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter

12 Aug 2011

      On Fri, Aug 12, 2011 at 12:00, Matus Chochlik <chochlik@gmail.com> wrote:
...
On Fri, Aug 12, 2011 at 9:57 AM, Daniel James <dnljms@gmail.com> wrote:
...
On 11 August 2011 12:57, Artyom Beilis <artyomtnk@yahoo.com> wrote:
...
...
There's a lot of existing code which is not based on that assumption -
we can't just wish it out of existence and boost should be compatible
with it.
Then cross platform, Unicode aware programming will always
(I'm sorry) suck with Boost :-)
Thats it...
Unless a different solution can be found.
I see the old flam .. er discussion on text handling is back :)
...
From the previous debate(s) I now accept that it would
be a bad idea just to force the encoding of std::string to be utf8,
So a (nearly) ideal text handling class should IMO look like this
(see usage below):
[...]
// by default expect UTF8
 text(const std::string& str)
 {
    assert(is_utf8(str.begin(), str.end()));
    store(str);
 }
What you are doing is, in fact, forcing the assumed encoding of std::string
to UTF-8. You just said you think it's a bad idea.
...
[...]
text t1 = "blahblah"; // must be utf8
// whatever encoding the compiler uses for wide literals
text t2(L"blablablabl", textenc::compiler());
text t3(some_posix_function(), textenc::posix());
text t4(SomeWinapiFunc(), textenc::winapi());
text t5(SomeWinapiFuncW(), textenc::winapi());
How is it better than:
string t4 = from_narrow(SomeWinapiFuncA()); // use the default encoding used
by system for narrow strings
string t5 = from_wide(SomeWinapiFuncW()); // wchar_t on windows is always
utf16
...
text t6(pq_some_func(), textenc::libpq());
You don't need it. You're proposing a design that tries to solve a
non-existing problem. There is no such diversity of encodings in the
interfaces. I don't know what is libpq, but it either uses UTF-8 in which
case you write:

string t6 = pq_some_func();

or the default system encoding, in which case you write:

string t6 = from_narrow(pq_some_func());

As you start using more libraries with UTF-8 default encoding, you will use
from_* less frequently.
(It's possible to use a single to_utf8 instead of from_narrow/from_wide
combination.)

[...]
...
SomeWinapiFunction(t8.str(textenc::winapi()).c_str());
SomeWinapiFunctionW(concat(t9, text::newline(),
t8).wstr(textenc::winapi()).c_str());
Same as above. 'text' as a distinct type doesn't play any role here. If t9
is std::string, this becomes:

SomeWinapiFunctionA(to_narrow(t8).c_str()); // to the default narrow
system-encoding.
SomeWinapiFunctionW(to_wide(t9 + "\r\n" + t8).c_str()); // what kind of
newline is expected defined by the API, not the system.
...
[...]
i.e. besides the fact that the string "uses utf8" (there is already
a whole heap of such strings) it must also handle all the conversions
between utf8 and whatever the OS and the major libraries and
APIs expect and use; conveniently (and effectively).
Otherwise the effort is IMHO wasted.
Your 'text' doesn't do this in a transparent way. In fact you cannot do it
in transparent way because 'const char*' doesn't carry the necessary
semantic information. The burden of deciding what encoding to convert
to/from falls on the programmer *anyway*. You don't benefit anything from
defining yet-another string type.

Boost libraries (at the very least those wrapping OS functionality)
...
should adopt this text class, and do the conversions, "just-in-time"
when making the OS API call.
In the light of the said above, your 'text' class won't catch bugs like:

char str[1024];
GetWindowTextA(hwnd, str, sizeof(str));
boost::function_with_text_parameter(str);

Therefore, I don't think we should adopt this text class.

-- 
Yakov

Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter

Yakov Galka