
I see what you mean. Still, fixed-width-encoded strings are a lot easier to code, and I think we should focus on them first just to get something working and to have a platform to test code conversion on, which in my opinion is the most important part. Without code conversion, it would be difficult to read in non-ASCII strings in the first place, since std::wfstream just converts ASCII to UTF-16. Variable-width-encoded strings should be fairly straightforward when they are immutable, but will probably get hairy when they can be modified. Converting a VWE string would probably be no harder than a FWE string. That said, I think a good (general) roadmap for this project would be: 1) Extend std::basic_string to store UCS-2 / UCS-4 (should be easy, though string constants may pose a problem) 2) Add code conversion to move between encodings, especially for I/O 3) Create VWE string class (fairly easy if immutable, hard if mutable) - James On 9/27/07, Sebastian Redl <sebastian.redl@getdesigned.at > wrote:
James Porter wrote:
For certain special purposes (like the one above), a variable-width string class would be useful, but I think we should focus on storing strings in fixed-width encodings and then converting them appropriately during I/O. Actually, I disagree with this. The only general-purpose fixed-width encoding available is UTF-32, and hardly anyone actually uses it. For good reason: for English text, it wastes 75% of the used space. In general, it wastes about 10 bits (30%) in everything, because Unicode only has about, what, 2^21 code points?
[snip] I think the problem of UTF-8 and UTF-16 strings is important and must be
addressed.
Sebastian Redl _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost