
On 4/9/04 3:54 AM, "Vladimir Prus" <ghost@cs.msu.su> wrote:
Daryle Walker wrote:
On 4/6/04 3:27 AM, "Vladimir Prus" <ghost@cs.msu.su> wrote:
it seems that Unicode support is the last issue that should be addressed before the library can be added to CVS. Since the issue is somewhat tricky, I'd appreciate some comments before I start coding.
[TRUNCATE]
What about: * There's no guarantee that "char" is based on ASCII * There's no guarantee that "wchar_t" is based on Unicode
Since other text-related parts of Boost don't really deal with Unicode issues, maybe you should address it after putting it in CVS.
It was specifically requested that some Unicode/wchar_t support be added before putting to CVS.
That doesn't mean that you _have_ to do it. You can give the person who gave the request a (temporary) rejection notice.
Maybe after discussions on how Unicode can fit in Boost-wide. (Other posts in this thread have admitted that the problem is big and difficult. I don't think it's worth delaying the library over. Sometimes, cool-sounding ideas in the abstract turn out to be bad ones in practice.)
What 'cool-sounding idea' do you mean? What I proposed was that unicode data is just passed though, without modification.
I read messages in this thread about doing full-blown Unicode handling, and I've read about doing nothing (being as Unicode-ignorant as other text-processing Boost libraries). I wouldn't mind adding "wchar_t" support, without necessarily assuming that it's Unicode. However, the Unicode "problem" is so big that it could take more time and effort than what you have done on program-options so far. _That_ is what I don't want to delay the library for. Also, a solution should be applicable for all of Boost's text libraries, not just this one.
Even if you do come up with some grand Unicode plan, you would have to make sure your library works with platforms that don't use ASCII/Unicode.
Do you know specific case there wchar_t does not implicitly means Unicode.
Not personally, but that's about as relevant as asking for a platform whose "char" isn't 8 bits. (I've heard platforms like that have existed.) Just because all the common platforms do it a certain way (and/or there's no counter-examples) doesn't mean you can portably assume that the common assumption is all that matters. The identities and code-points of the members of the (narrow and wide) character sets are implementation-defined. The C++ parser allows characters to be named by their ISO-Unicode number, but it's supposed to be mapped to the platform's code-point for that character, not necessarily maintained in Unicode. -- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com