
On Wed, Jan 19, 2011 at 8:50 PM, Chad Nelson <chad.thecomfychair@gmail.com> wrote:
Do you see another way to provide those conversions, and automatic verification of proper UTF coding? (Automatic verification is a very good thing, without it someone won't use it or will forget to, and open up their programs to exploitation.)
Yes, implementing it into std::string in some future standard.
If Boost comes out with a version that breaks existing programs, companies just won't upgrade to it. I can keep one of the companies that mine works with upgrading, because the group that I work with is the only one there using C++ and they listen to me, but most companies have a lot more invested in the existing system. Believe me, any breaking changes have to be eased in over many versions -- the "boiling a frog" approach. :-)
Of course this is a valid point and what we should do is to do some potential damage evaluation. There have been breaking changes in Boost and the end-users finally accepted them (even if complaining loudly) Boost is a cutting edge library and such changes should be avoided if possible, but they should not be avoided completelly. This would require a lot of PR and announcing the changes well in advance.
If they're already using UTF-8 strings, then we provide something like BOOST_ALL_STD_STRINGS_ARE_UTF8 that they can define. The utf*_t classes configure themselves to accept std::strings as UTF-8-encoded, and any changes are completely transparent to those people. No punishment involved.
OK this could work.
For everyone else, we introduce the utf*_t API alongside the std::string one, for those classes and functions that are not encoding-agnostic. The std::string one can be deprecated in future versions if the library author desires. Again, no punishment involved.
I don't expect that the utf*_t classes will make it into the standard. They definitely won't make it into the now-misnamed C++0x standard, and it'll likely be another ten years before another one is hashed out -- by then, the UTF-8 conversion should be complete, so there will be no need for it, except possibly to confirm that a string isn't malformed.
Besides the ugly name and that is a new class ? No :)
If you can think of a more-acceptable-but-still-descriptive name for it, I'm all ears. :-)
I have an idea: what about boost::string, which could possibly become the next std::string in the future.
And the solution is long overdue. And creating utf8_t is just putting the problem away, not solving it really.
I see it as merely easing the transition.
OK, if the long term plan is: 1) design and implement boost::string using UTF-8 doing all the things like code-point iteration, character iteration, convenience stuff like starts-with, ends-with, replace, trim, etc., etc. with as much backward compatibility with std::string as possible without hindering progress 2) try really hard to push it to the standard then I'm on board with that. BR, Matus