
On Fri, Aug 12, 2011 at 9:57 AM, Daniel James <dnljms@gmail.com> wrote:
On 11 August 2011 12:57, Artyom Beilis <artyomtnk@yahoo.com> wrote:
There's a lot of existing code which is not based on that assumption - we can't just wish it out of existence and boost should be compatible with it.
Then cross platform, Unicode aware programming will always (I'm sorry) suck with Boost :-)
Thats it...
Unless a different solution can be found.
I see the old flam .. er discussion on text handling is back :)
From the previous debate(s) I now accept that it would be a bad idea just to force the encoding of std::string to be utf8, So a (nearly) ideal text handling class should IMO look like this (see usage below):
// text encoding tag types for conversion function dispatching namespace /*or struct */ textenc { struct utf8 {}; struct utf16 {}; struct utf32 {}; struct winapi {}; struct posix {}; struct stdlib {}; struct sqlite {}; struct libpq {}; ... struct libxyz {}; #if WE_ARE_ON_WINDOWS typedef winapi os; #elif WE_ARE_ON_POSIX typedef posix os; #elif ... #endif struct gcc {}; struct msvc {}; struct icc {}; struct clang {}; #if COMPILING_WITH_GCC typedef compiler gcc; #elif COMPILING_WITH_MSVC typedef compiler msvc; #elif ... #endif }; class text { public: // *** construction *** // by default expect UTF8 text(const char* cstr) { assert(is_utf8(cstr)); store(cstr); } // by default expect UTF8 text(const std::string& str) { assert(is_utf8(str.begin(), str.end())); store(str); } // otherwise use the tag type to // do any necessary conversions template <typename Char, typename EncodingTag> text(const Char* cstr, EncodingTag encoding) { // use an overload to convert from the encoding // basically if the tag is textenc::winapi then use // the winapi-supplied functions and convert to utf8 // if it's posix look at the locale and convert with the posix function // if the tag is textenc::msvc convert the msvc literal from // whatever crazy encoding it uses to utf8, ...etc. convert_and_store(cstr, encoding)); } template <typename Char, typename EncodingTag> text(const std::basic_string<Char>& cstr, EncodingTag encoding) { convert_and_store(str.begin(), str.end(), encoding)); } // *** conversion *** // by default output in uft8 const char* c_str(void) const; // by default in utf8 (could be a friend fn instead of member) std::string str(void) const; // (could be a friend fn instead of member) template <typename EncodingTag> std::string str(EncodingTag encoding) const { return convert_from(encoding); } // wide char string output template <typename EncodingTag> std::wstring wstr(EncodingTag encoding) const { return wconvert_from(encoding); } // implement whatever functionality // making sense for utf8-encoded-text }; // usage text t1 = "blahblah"; // must be utf8 // whatever encoding the compiler uses for wide literals text t2(L"blablablabl", textenc::compiler()); text t3(some_posix_function(), textenc::posix()); text t4(SomeWinapiFunc(), textenc::winapi()); text t5(SomeWinapiFuncW(), textenc::winapi()); text t6(pq_some_func(), textenc::libpq()); text t7 = concat(t1, t2, t3, t4, t5, t6); std::ostream& out = get_outs(); out << t7; // output in utf8 text t8; std::istream& in = get_ins(); in.read_line(t8); text t9; in.read(t9, 1024); some_function_expecting_utf8(t9.c_str()); SomeWinapiFunction(t8.str(textenc::winapi()).c_str()); SomeWinapiFunctionW(concat(t9, text::newline(), t8).wstr(textenc::winapi()).c_str()); some_posix_function(transform(concat(t4, t7, t9)).str(textenc::posix()).c_str()); some_wrapped_os_function(str(t8, textenc::os())); some_stdlib_function(str(head(substring_after(t9, t2), 10), textenc::stdlib())); i.e. besides the fact that the string "uses utf8" (there is already a whole heap of such strings) it must also handle all the conversions between utf8 and whatever the OS and the major libraries and APIs expect and use; conveniently (and effectively). Otherwise the effort is IMHO wasted. Boost libraries (at the very least those wrapping OS functionality) should adopt this text class, and do the conversions, "just-in-time" when making the OS API call. My 0.02Euro Best, Matus