
On Fri, Jan 21, 2011 at 2:35 PM, Alexander Lamaison <awl03@doc.ic.ac.uk> wrote:
Why not boost::string (explicitly stating in the docs that it is UTF-8-based) ? the name u8string suggests to me that it is meant for some special case of character encoding and the (encoding agnostic/native) std::string is still the way to go.
That was the idea, was it not? We should be encoding agnostic wherever possible.
If we can (globally) agree upon an encoding, that will be able to handle all imaginable writing systems, will be robust, etc., etc. we *will* end up being encoding agnostic. Today, what is called 'encoding agnostic' causes many problems. For example you save a file with name containing non-ASCII characters, even if it is latin with some accents, on one version of Windows and you ship it to a machine with another version of Windows using a different encoding the name becomes garbled. Same thing with applications that use text files to exchange information. Either you pick a single encoding and stick to that, or you use what is the current platforms native encoding is and do the encoding detection and transcoding on demand, and usually you loose some information in the process. In both cases you have to transcode the text explicitly. I don't see (besides support for legacy SW/HW) why so many people are saying that this is OK.
IMO we should send the message that UTF-8 is "normal"/"(semi-)standard"/"de-facto-standard" and the other encodings like the native_t (or even ansi_t, ibm_cp_xyz_t, string16_t, string32_t, ...) are the special cases and they should be treated as such.
Why? When a string doesn't need to be converted, why force it to be?
Already on many platforms you won't have to do any transcoding precisely because those platforms have already adopted a single encoding: UTF-8. I can't imagine why any new SW would choose anything else besides Unicode for text representations and to support legacy apps and/or hardware that accepts commands or prints output in a specific encoding there are tools like iconv. Matus