
On Fri, 14 Jan 2011 10:59:09 -0500 Dave Abrahams <dave@boostpro.com> wrote:
At Fri, 14 Jan 2011 17:50:02 +0200, Peter Dimov wrote:
Unfortunately not. A library that requires its input paths to be UTF-8 always gets bug reports from users who are accustomed to using another encoding for their narrow strings. There is plenty of precedent they can use to justify their complaint.
I don't see the problem you cited as an answer to my question. Let me try asking it differently: how do I program in an environment that has both "right" and "wrong" libraries?
Also, is there any use in trying to get the difference into the type system, e.g. by using some kind of wrapper over std::string that gives it a distinct "utf-8" type?
The system I'm now using for my programs might interest you. I have four classes: ascii_t, utf8_t, utf16_t, and utf32_t. Assigning one type to another automatically converts it to the target type during the copy. (Converting to ascii_t will throw an exception if a resulting character won't fit into eight bits.) Each type has an internal storage type as well, based on the character size (ascii_t and utf8_t use std::string, utf16_t uses 16-bit characters, etc). You can access the internal storage type using operator* or operator->. For a utf8_t variable 'v', for example, *v gives you the UTF-8-encoded string. An std::string is assumed to be ASCII-encoded. If you really do have UTF-8-encoded data to get into the system, you either assign it to a utf8_t using operator*, or use a static function utf8_t::precoded. std::wstring is assumed to be utf16_t- or utf32_t-encoded already, depending on the underlying character width for the OS. A function is simply declared with parameters of the type that it needs. You can call it with whichever type you've got, and it will be auto-converted to the needed type during the call, so for the most part you can ignore the different types and use whichever one makes the most sense for your application. I use utf8_t as the main internal string type for my programs. For portable OS-interface functions, there's a typedef (os::native_t) to the type that the OS's API functions need. For Linux-based systems, it's utf8_t; for Windows, utf16_t. There's also a typedef (os::unicode_t) that is utf32_t on Linux and utf16_t on Windows, but I'm not sure there's a need for that. There are some parts of the code that could use polishing, but I like the overall design, and I'm finding it pretty easy to work with. Anyone interested in seeing the code? -- Chad Nelson Oak Circle Software, Inc. * * *