
At Wed, 19 Jan 2011 20:03:59 +0100, Matus Chochlik wrote:
On Wed, Jan 19, 2011 at 7:39 PM, Dave Abrahams <dave@boostpro.com> wrote:
Our influence, if we introduce new library components, is very great, because they're on a de-facto fast track to standardization, and an improved string library is exactly the sort of thing that would be adopted upstream. If we simply agree to a programming convention, that will have some impact, but much less.
OK, I see. But, is there any chance that the standard itself would be updated so that it first would recommend to use UTF-8 with C++ strings.
Well, never say "never," but... never. Such recommendations are not part of the standard's mission. It doesn't do things like that.
After some period of time all other encodings would be deprecated
By whom?
and using them would cause undefined behavior. Could Boost be the driving force here?
This doesn't seem like a very plausible scenario to me, based on my experience. Of course, others may disagree.
I really see all the obstacles that prevent us from just switching to UTF-8, but adding a new string class will not help for the same reasons adding wstring did not help.
I don't see the parallel at all. wstring is just another container of bytes, for all practical purposes. It doesn't imply any particular encoding, and does nothing to segregate the encoded from the raw.
As I already said elsewhere I think that this is a problem that has to be solved "organizationally".
Perhaps. The type system is one of our organizational tools, and Boost has an impact insofar as it produces components that people use, so if we aren't able to produce some flagship library components that help with the solution, we have little traction.
*Scenario E:* We add another string class and everyone adopts it
Ok I admit that this is possible. But let me ask: How did the C world made the transition without abandoning char ?
The transition from what to what?
I meant that for example on POSIX OSes the POSIX C-API did not have to be changed or extended by a new set of functions doing the same things, but using a new character type, when they switched from the old encodings to UTF-8.
...and people still have the problem that they lose track of what's "raw" and what's encoded as utf-8.
To compare two strings you still can use stdcmp and not utf8strcmp, to collate strings you use strcoll and not utf8strcol, etc.
Yeah... but surely POSIX's strcmp only tells you whether the two strings have the same sequence of code points, not whether they have the same characters, right? And if you inadvertently compare a "raw" string with an equivalent utf-8-encoded string, what happens? -- Dave Abrahams BoostPro Computing http://www.boostpro.com