
On Tue, 18 Jan 2011 14:50:51 -0600 Christian Holmquist <c.holmquist@gmail.com> wrote:
There are two ways this could go AFAICS: [...]
2. We establish some other type for UTF-8 and *it* becomes the lingua franca
If Boost abandons std::string in interfaces that expects UTF-8, does that mean I as a user need to sprinkle boost::to_utf_8(my_std_string,...) // in whatever form to_utf8 may be all over my/ours (quite gigantic) code base?
Only for functions that need to know the encoding of a string. As Artyom has rightly pointed out, most functions operate perfectly well by treating strings as opaque blocks of data, or as individual bytes. It's only things like Boost.RegEx or some of the string-manipulation functions that might want to act a bit differently in the face of multi-byte characters. Or, of course, newly-written functions in user code, outside of the Boost library.
Without doing so, I assume will cause compilation errors, but for what gain? If some code was broken before, it will remain so after I've injected all those to_utf8 calls as well. To solve actual problems I need to track the origin of my std::string's content, which require a traditional bug-hunting session anyway. No additional typed interface in the world will help me here IMO.
Maybe. But having a function whose parameters or return type is explicitly utf8_t will tell you (and the compiler) exactly what kind of string it's expecting, right in the code, whereas something that takes or returns an std::string doesn't. If you have to look up that information in the documentation, you're a lot more likely to miss it.
[...] What would be helpful if doable, is to build boost with BOOST_TRACK_INVALID_UTF_8, also for release builds. This would cause an exception or a call to user-defined function if boost code stumbles upon bad strings.
Interesting idea, but it pushes the problem entirely to runtime. Having utf*_t types lets the compiler do at least some of the work for you. -- Chad Nelson Oak Circle Software, Inc. * * *