
On Fri, Jan 21, 2011 at 10:37 AM, Alexander Lamaison <awl03@doc.ic.ac.uk> wrote:
On Thu, 20 Jan 2011 23:26:35 -0800, Patrick Horgan wrote:
On 01/20/2011 07:43 AM, Alexander Lamaison wrote:
I imagine you wouldn't have UTF-16 and UTF-32 string being passed about as a matter for course. For instance, a UTF-16 string should only be used just before calling a Windows API call.
If this is the case, it makes sense to make the common case (UTF-8 string) have a nice name like boost::string and the others which are used for special situations can have something less snappy like boost::u16string and boost::u32string.
What would you use for a regular string where you just had, essentially a vector of char, wchar_t, char8_t, char16_t, char32_t, or unsigned char, but didn't care about encoding? I want to differentiate between this case and the case where I know that there's a particular encoding. A lot of times you just know you got a string from one system call and you're passing it to another and you don't care about encoding. [..]
Good point! boost::u8string then?
Why not boost::string (explicitly stating in the docs that it is UTF-8-based) ? the name u8string suggests to me that it is meant for some special case of character encoding and the (encoding agnostic/native) std::string is still the way to go. IMO we should send the message that UTF-8 is "normal"/"(semi-)standard"/"de-facto-standard" and the other encodings like the native_t (or even ansi_t, ibm_cp_xyz_t, string16_t, string32_t, ...) are the special cases and they should be treated as such. Matus