
On Fri, Aug 12, 2011 at 12:00, Matus Chochlik <chochlik@gmail.com> wrote:
On Fri, Aug 12, 2011 at 9:57 AM, Daniel James <dnljms@gmail.com> wrote:
On 11 August 2011 12:57, Artyom Beilis <artyomtnk@yahoo.com> wrote:
There's a lot of existing code which is not based on that assumption - we can't just wish it out of existence and boost should be compatible with it.
Then cross platform, Unicode aware programming will always (I'm sorry) suck with Boost :-)
Thats it...
Unless a different solution can be found.
I see the old flam .. er discussion on text handling is back :)
From the previous debate(s) I now accept that it would be a bad idea just to force the encoding of std::string to be utf8, So a (nearly) ideal text handling class should IMO look like this (see usage below):
[...]
// by default expect UTF8 text(const std::string& str) { assert(is_utf8(str.begin(), str.end())); store(str); }
What you are doing is, in fact, forcing the assumed encoding of std::string to UTF-8. You just said you think it's a bad idea.
[...] text t1 = "blahblah"; // must be utf8
// whatever encoding the compiler uses for wide literals text t2(L"blablablabl", textenc::compiler());
text t3(some_posix_function(), textenc::posix());
text t4(SomeWinapiFunc(), textenc::winapi()); text t5(SomeWinapiFuncW(), textenc::winapi());
How is it better than: string t4 = from_narrow(SomeWinapiFuncA()); // use the default encoding used by system for narrow strings string t5 = from_wide(SomeWinapiFuncW()); // wchar_t on windows is always utf16
text t6(pq_some_func(), textenc::libpq());
You don't need it. You're proposing a design that tries to solve a non-existing problem. There is no such diversity of encodings in the interfaces. I don't know what is libpq, but it either uses UTF-8 in which case you write: string t6 = pq_some_func(); or the default system encoding, in which case you write: string t6 = from_narrow(pq_some_func()); As you start using more libraries with UTF-8 default encoding, you will use from_* less frequently. (It's possible to use a single to_utf8 instead of from_narrow/from_wide combination.) [...]
SomeWinapiFunction(t8.str(textenc::winapi()).c_str()); SomeWinapiFunctionW(concat(t9, text::newline(), t8).wstr(textenc::winapi()).c_str());
Same as above. 'text' as a distinct type doesn't play any role here. If t9 is std::string, this becomes: SomeWinapiFunctionA(to_narrow(t8).c_str()); // to the default narrow system-encoding. SomeWinapiFunctionW(to_wide(t9 + "\r\n" + t8).c_str()); // what kind of newline is expected defined by the API, not the system.
[...] i.e. besides the fact that the string "uses utf8" (there is already a whole heap of such strings) it must also handle all the conversions between utf8 and whatever the OS and the major libraries and APIs expect and use; conveniently (and effectively). Otherwise the effort is IMHO wasted.
Your 'text' doesn't do this in a transparent way. In fact you cannot do it in transparent way because 'const char*' doesn't carry the necessary semantic information. The burden of deciding what encoding to convert to/from falls on the programmer *anyway*. You don't benefit anything from defining yet-another string type. Boost libraries (at the very least those wrapping OS functionality)
should adopt this text class, and do the conversions, "just-in-time" when making the OS API call.
In the light of the said above, your 'text' class won't catch bugs like: char str[1024]; GetWindowTextA(hwnd, str, sizeof(str)); boost::function_with_text_parameter(str); Therefore, I don't think we should adopt this text class. -- Yakov