Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]

20 Jan 2011

      On Wed, Jan 19, 2011 at 8:50 PM, Chad Nelson
<chad.thecomfychair@gmail.com> wrote:
...
Do you see another way to provide those conversions, and automatic
verification of proper UTF coding? (Automatic verification is a very
good thing, without it someone won't use it or will forget to, and open
up their programs to exploitation.)
Yes, implementing it into std::string in some future standard.
...
If Boost comes out with a version that breaks existing programs,
companies just won't upgrade to it. I can keep one of the companies
that mine works with upgrading, because the group that I work with is
the only one there using C++ and they listen to me, but most companies
have a lot more invested in the existing system. Believe me, any
breaking changes have to be eased in over many versions -- the "boiling
a frog" approach. :-)
Of course this is a valid point and what we should do is to do some
potential damage evaluation. There have been breaking changes
in Boost and the end-users finally accepted them (even if complaining
loudly) Boost is a cutting edge library and such changes should
be avoided if possible, but they should not be avoided completelly.
This would require a lot of PR and announcing the changes well
in advance.
...
If they're already using UTF-8 strings, then we provide something like
BOOST_ALL_STD_STRINGS_ARE_UTF8 that they can define. The utf*_t classes
configure themselves to accept std::strings as UTF-8-encoded, and any
changes are completely transparent to those people. No punishment
involved.
OK this could work.
...
For everyone else, we introduce the utf*_t API alongside the
std::string one, for those classes and functions that are not
encoding-agnostic. The std::string one can be deprecated in future
versions if the library author desires. Again, no punishment involved.
I don't expect that the utf*_t classes will make it into the standard.
They definitely won't make it into the now-misnamed C++0x standard, and
it'll likely be another ten years before another one is hashed out --
by then, the UTF-8 conversion should be complete, so there will be no
need for it, except possibly to confirm that a string isn't malformed.
...
Besides the ugly name and that is a new class ? No :)
If you can think of a more-acceptable-but-still-descriptive name for
it, I'm all ears. :-)
I have an idea: what about boost::string, which could possibly become
the next std::string in the future.
...
...
And the solution is long overdue. And creating utf8_t is just putting
the problem away, not solving it really.
I see it as merely easing the transition.
OK, if the long term plan is:

1) design and implement boost::string using UTF-8 doing all the things
like code-point iteration, character iteration, convenience stuff like
starts-with, ends-with, replace, trim, etc., etc. with as much backward
compatibility with std::string as possible without hindering progress

2) try really hard to push it to the standard

then I'm on board with that.

BR,

Matus

Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]

Matus Chochlik