
From: Soares Chen <crf@hypershell.org>
Hi all,
[snip]
I think there are several options that I can choose for my project: 1. To use Chad Nelson's code as base, try to incorporate other ideas proposed in the mailing list, integrate with Boost.Locale, and make it Boost quality to submit for review. If this option is chosen, I wish that Chad Nelson can be my mentor. 2. To start a new code base, gather and compile ideas suggested in mailing list, final design decisions made by me and my mentor but not the community (to keep the project going on fast), make it Boost quality and submit for review. 3. To start the boost::string project, where another better string is reinvented and fix all the weaknesses of std::string. 4. Adopt different proposal, and improve on existing project such as Boost.Unicode [2] or Boost.Locale [3] such that it really solves the encoding awareness problem. 5. Any other suggestion?
Hello, I want you to address several points: It would be very hard to get the consensus about the way to solve the problem. Probably the best and the most wishful thinking solution is to assume that all strings are UTF-8 based, however it is not the reality. The problem is actually not the string but rather the way you code. Even if you create a perfect UTF-8 string and then call fopen(your_perfect_string.c_str(),"r") Under windows... And it would not work <sigh... damn Windows> As you can see from multiple discussions, there are many contradicting requirements about how should string look like and what should it bring with. If you want to provide better Unicode awareness to Boost you don't need new cool utf-XYZ string, you need a policy. I think boost::filesystem v3 is a big step forward, it allows you to use UTF-8 strings on Windows which I think is a really good beginning. This is my opinion. Boost.Locale and several other my projects (CppCMS, CppDB) live happily with std::string. The problem is that in vast majority of cases you don't need encoding aware string, as so many operations you usually do on strings are encoding agnostic. But this is other story. Bottom line, if you want to improve Unicode awareness of Boost I think you need to adopt Boost.Filesystem v3 like policy all over the code base of Boost. 1. Use Wide API as native one in Boost everywhere under Windows 2. Use char * API as native one in Boost everywhere under non-Windows platforms 3. Use std::codecvt to handle this (after many tricks... ) The Unicode String/Encoding Aware String is the last thing to do not the first thing. Why? 1. Because you will never get the consensus about what is the "right-thing" to do (wide, narrow, utf-8, utf-16) etc. Project that are handled and directed by a single source or management like Qt, GTK(mm), Java, C#, Python or others may decide what is the right thing. This will never happen in Boost as it is too pluralistic even in cases where it does not always make sense, just because the way libraries are developed, reviewed and got in - based on public reviews that eventually encourages diversity. 2. Because you would not likely to be able to enforce users to actually use your string. As boost is more about collaboration then enforcement of specific style. 3. Even heavy discussions there hadn't got to any conclusion. So what would happen and final review of your library? My $0.02 Artyom