
On Tue, 18 Jan 2011 05:35:17 -0800 (PST) Artyom <artyomtnk@yahoo.com> wrote:
From: Alexander Lamaison <awl03@doc.ic.ac.uk>
Yes, in principle. It isn't terribly necessary if everybody is operating in UTF-8 land though.
Which is exactly why it's necessary: everybody _isn't_ operating in UTF-8 land.
The problem is that you need to pic some encoding and UTF-8 is the most universal and useful.
I'll second that. Little wasted space, no byte-order problems, and very easy to work with (finding the first byte of a character, for instance, is child's play).
Otherwise you should:
1. Reinvent the string
Or at least wrap it. ;-)
2. Reinvent standard library to use new string
Not entirely necessary, for the same reason that very few changes to the standard library are needed when you switch from char strings to char16_t strings to char32_t strings -- the standard library, designed around the idea of iterators, is mostly type-agnostic. The utf*_t types provide fully functional iterators, so they'll work fine with most library functions, so long as those functions don't care that some characters are encoded as multiple bytes. It's just the ones that assume that a single byte represents all characters that you have to replace, and you'd have to replace those regardless of whether you're using a new string type or not, if you're using any multi-byte encoding.
3. Reinvent 1001 other libraries to use the new string.
Again, seldom necessary. Just use a type system that can translate between your internal coding and what the library wants, at the boundaries. If the other library you want to use can't handle multi-byte encodings, you'd have to modify or reinvent it anyway.
It is just neither feasible no necessary.
My code says it's perfectly feasible. ;-) Whether it's necessary or not is up to the individual developer, but the type-safety it offers is more in line with the design philosophy of C++ than using std::string for everything. I hate to harp on the same tired example, but why do you really need any pointer type other than void*? It's the same idea. -- Chad Nelson Oak Circle Software, Inc. * * *