Re: [boost] [General] Always treat std::strings as UTF-8

19 Jan 2011


      On Tue, 18 Jan 2011 14:50:51 -0600
Christian Holmquist <c.holmquist@gmail.com> wrote:
...
...
There are two ways this could go AFAICS: [...]
2. We establish some other type for UTF-8 and *it* becomes the lingua
franca
If Boost abandons std::string in interfaces that expects UTF-8, does
that mean I as a user need to sprinkle
boost::to_utf_8(my_std_string,...) // in whatever form to_utf8 may be
all over my/ours (quite gigantic) code base?
Only for functions that need to know the encoding of a string. As
Artyom has rightly pointed out, most functions operate perfectly well
by treating strings as opaque blocks of data, or as individual bytes.
It's only things like Boost.RegEx or some of the string-manipulation
functions that might want to act a bit differently in the face of
multi-byte characters. Or, of course, newly-written functions in user
code, outside of the Boost library.
...
Without doing so, I assume will cause compilation errors, but for what
gain? If some code was broken before, it will remain so after I've
injected all those to_utf8 calls as well.
To solve actual problems I need to track the origin of my
std::string's content, which require a traditional bug-hunting
session anyway. No additional typed interface in the world will help
me here IMO.
Maybe. But having a function whose parameters or return type is
explicitly utf8_t will tell you (and the compiler) exactly what kind of
string it's expecting, right in the code, whereas something that takes
or returns an std::string doesn't. If you have to look up that
information in the documentation, you're a lot more likely to miss it.
...
[...] What would be helpful if doable, is to build boost with
BOOST_TRACK_INVALID_UTF_8, also for release builds.
This would cause an exception or a call to user-defined function if
boost code stumbles upon bad strings.
Interesting idea, but it pushes the problem entirely to runtime. Having
utf*_t types lets the compiler do at least some of the work for you.
-- 
Chad Nelson
Oak Circle Software, Inc.
*
*
*