
On 9/26/07, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Joseph Gauterin wrote:
[snip]
I've noticed that there are frequent requests/proposals for some sort of boost unicode/string encoding library. I've thought about the problem and it seems to big for one person to handle in their spare time
Let me say "part time" rather than "spare time"...
Sorry to jump into the discussion, but I've been watching it since the start. And I'm interested in this project too. Though I'm a little swamped with work right now, I do work with exactly one use case exposed in the thread: e-mail parsing (and etcs about it). And tagging is exactly the best approach. The worst part being: how do we compare external strings and email text? Until now the safest approach I found is to convert everything to Unicode and then compare both (if they don't have the same encoding).
- perhaps a group of us should get together to discuss working on one? I'd be happy to participate.
I would definitely encourage breaking the work up into smaller chunks. IMHO "smaller is better" for Boost libraries; there have been a number of occasions when I've discovered that a feature I want is hidden as an internal component of a Boost library, and I've felt that it should have been a stand-alone public entity. So let's think about how this work can be split up:
This seem like a very good approach.
- A charset_trait class. I have started on this. The missing piece is a way to look up traits of character sets that are known at run-time; input would be appreciated.
- Compile-time and run-time tagged strings. The basics of this are straightforward and done.
Not as easy if a "universal string" class is to be achieved. But we can probably left it out for now.
- Conversions. My approach at present is to use iconv via a functor that I wrote a while ago. I believe iconv is widely available; however, some implementations may support only a small set of character sets. Alternatives would be interesting.
I use icu extensively, never used iconv.
- Variable width iterators, including the issue that you raised above.
boost.iterator makes this job quite easy.
- Interaction with locales, internationalisation, and system APIs.
I'm not an IOStream expert, but I'm very use to working with Windows API.
and no doubt more. Thinking about the interfaces between these areas and the user would be a good place to start.
There were also some interesting discussions about Unicode in the past, though they didn't seem to go anywhere towards any conclusion. But were raised very important concerns w.r.t internationalization.
Regards,
Phil.
Thanks Phil, -- Felipe Magno de Almeida