status of Boost Unicode library/enhancements ?

newer
Review Wizard Report - April 2006

Chris Pirazzi

28 Mar 2006 28 Mar '06

7:08 p.m.

Hello, I just scanned about 300 boost-devel messages with the word "Unicode" and am very excited about the occasional mentions I see of a Boost Unicode library. Is that project still alive? Is there a prototype or beta of any sort, or even a simple statement of goals I can look at for the proposed boost project? I am about to embark on a large text processing (but _not_ display) project and could make use of such a library. (digression: part of it will even involve the processing of Thai text, which seems to be the #1 cited example of a weird language as far as i18n is concerned. Having myself typeset a 283-page bilingual Thai-English book, I have to agree :) The last mentions I found were from late 2005, where Graham Barnett mentioned a Unicode library was under development: http://thread.gmane.org/gmane.comp.lib.boost.devel/128403 http://thread.gmane.org/gmane.comp.lib.boost.devel/129807 I tried searching the vault for 'unicode' but no dice. I have examined (and would use by default) ICU from IBM: http://icu.sourceforge.net/userguide/intro.html I would use its C++ UnicodeString, CharacterIterator, Locale-based codepage converters, Normalization support, Collation support, and regex matching (in particular with regex's that match character classes like "nonspacing mark"). How do the proposed Boost library's capabilities differ from those offered by ICU? I've seen that there is ICU integration in Boost.Regex http://www.boost.org/libs/regex/doc/unicode.html And of course it is possible today to store UTF-16 data in a std::wstring and convert between UTF-8, UTF-16, and UTF-32 using various easily available routines. But as you can see above I need more capability than just that. ICU is probably sufficient, but I thought it might be nice to use something that fits in with the rest of boost and STL more nicely. Something that used/extended existing string mechanisms, iteration mechanisms, and conversion mechanisms (e.g. those "code conversion facets" which I do not yet understand :). Consistent naming, error reporting, and coding conventions would be a superficial but nice added bonus. I would hope that any such library would make some stabs at performance enhancements such as ICU's UnicodeString's ability to alias other strings to avoid copies, or store very small strings inline. Since ICU has since disabled some of those enhancements: http://icu.sourceforge.net/userguide/strings.html#unistr_performance perhaps that would provide the Boost library an opportunity to beat ICU's performance! Thanks for all updates, - Chris Pirazzi

Show replies by date

Jeff Garland

29 Mar 29 Mar

2:46 a.m.

On Wed, 29 Mar 2006 02:08:24 +0700, Chris Pirazzi wrote

...

Hello,

I just scanned about 300 boost-devel messages with the word "Unicode" and am very excited about the occasional mentions I see of a Boost Unicode library.

Is that project still alive? Is there a prototype or beta of any sort, or even a simple statement of goals I can look at for the proposed boost project?

I believe all of these projects are dead and I don't recall seeing code posted. So unless someone is out there toiling quietly I'm afraid we are still looking to recruit someone to take this area on. Jeff

Keith MacDonald

6 p.m.

I'd be grateful for an elegant C++ wrapper to the ICU library (http://www-306.ibm.com/software/globalization/icu/index.jsp). A lot of resources go into developing and maintaining ICU, and it has an unrestricted license, so why try to compete with it? And to preempt the question, no, I don't do elegant, but I know it when I see it - which is why I use Boost. "Jeff Garland" <jeff@crystalclearsoftware.com> wrote in message news:20060329024321.M16322@crystalclearsoftware.com...

...

I believe all of these projects are dead and I don't recall seeing code posted. So unless someone is out there toiling quietly I'm afraid we are still looking to recruit someone to take this area on.

Jeff _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Eric Niebler

6:53 p.m.

Keith MacDonald wrote:

...

I'd be grateful for an elegant C++ wrapper to the ICU library (http://www-306.ibm.com/software/globalization/icu/index.jsp). A lot of resources go into developing and maintaining ICU, and it has an unrestricted license, so why try to compete with it?

http://www.firebirdnews.org/?p=243 A code analysis tool recently run on the Firebird code base turned up lots of bugs -- in ICU. Doesn't mean a wrapper wouldn't have value, but it also might not be practical. I don't know. FWIW, I have the interest and the ability to write Boost.Unicode. What I lack is time. Anybody with a vested interest in C++ and Unicode should consider hiring Boost Consulting. *nudge, nudge* :-) -- Eric Niebler Boost Consulting www.boost-consulting.com

Thorsten Ottosen

31 Mar 31 Mar

12:02 p.m.

New subject: [boost.money] was: status of Boost Unicode library/enhancements ?

Eric Niebler wrote:

...

FWIW, I have the interest and the ability to write Boost.Unicode. What I lack is time. Anybody with a vested interest in C++ and Unicode should consider hiring Boost Consulting. *nudge, nudge* :-)

This a big problem we have to do something about somehow. There are a lot of rather big libraries that takes so much time to develop, that it is unrealistic that people can do them in their spare-time. (unicode, xml, database seems to be the most needed right now) OTOH, we have lot's of gifted people that could take on development if given money. For the benefit of the whole C++ community, we should try to organize some kind of public money-gathering where companies can sign up to support the development of these very important libraries. I imagine that many companies would be willing to pay, say 100 USD, to support eg. a unicode library. That is sufficiently low for me to be able to persuade my boos, for example. If we have some kind of estimate of how expensive it would be to develop the library, it might turn out that 100-200 willing companies would be enough fully fund the initial development. The website could show then show a bar indicating how close to funding we where. Any thoughts? -Thorsten

David Abrahams

2:30 p.m.

New subject: [boost.money] was: status of Boost Unicode library/enhancements ?

Thorsten Ottosen <thorsten.ottosen@dezide.com> writes:

...

If we have some kind of estimate of how expensive it would be to develop the library, it might turn out that 100-200 willing companies would be enough fully fund the initial development.

The website could show then show a bar indicating how close to funding we where.

Any thoughts?

Boost.org is not going to get into this area, at least not without undergoing a total transformation of the way we operate. There are just too many problems here, such as how to manage the funds and how to choose who they're given to, not to mention the fact that Boost then would have to become an organization with some legal standing. -- Dave Abrahams Boost Consulting www.boost-consulting.com

Anthony Williams

3:25 p.m.

New subject: [boost.money] was: status of Boost Unicode library/enhancements ?

David Abrahams <dave@boost-consulting.com> writes:

...

Thorsten Ottosen <thorsten.ottosen@dezide.com> writes:

...
If we have some kind of estimate of how expensive it would be to develop the library, it might turn out that 100-200 willing companies would be enough fully fund the initial development.

The website could show then show a bar indicating how close to funding we where.

Any thoughts?

Boost.org is not going to get into this area, at least not without undergoing a total transformation of the way we operate. There are just too many problems here, such as how to manage the funds and how to choose who they're given to, not to mention the fact that Boost then would have to become an organization with some legal standing.

At first look, I like Thorsten's idea. If we could find some way to allow companies to spend just a little amount, in support of a specific library, and we could find enough companies willing to make such a contribution, then we could make it work. As you say, the problem is deciding who does the work, and how much they get for it. Your rate might be double mine, but your work might be ten times the quality, or you might be done in a quarter of the time (or both!). Once Boost.org starts accepting payment, and paying people to do work, then it has to become a proper legal entity, with stricter guidelines on which of us are members, rather than just the random assortment of developers we are at the moment. Particular individuals from the Boost community could run such a scheme on their own, or a group could form a partnership to do so, but it couldn't be an "official" Boost thing. That said, if anyone wants to pay me to develop a library for Boost, or to discuss setting up such a partnership, I'm listening ;-) Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Thorsten Ottosen

7:31 p.m.

New subject: [boost.money] was: status of Boost Unicode library/enhancements ?

Anthony Williams wrote:

...

David Abrahams <dave@boost-consulting.com> writes:

...
Thorsten Ottosen <thorsten.ottosen@dezide.com> writes:

...
If we have some kind of estimate of how expensive it would be to develop the library, it might turn out that 100-200 willing companies would be enough fully fund the initial development.

The website could show then show a bar indicating how close to funding we where.

Any thoughts?

Boost.org is not going to get into this area, at least not without undergoing a total transformation of the way we operate. There are just too many problems here, such as how to manage the funds and how to choose who they're given to, not to mention the fact that Boost then would have to become an organization with some legal standing.

At first look, I like Thorsten's idea. If we could find some way to allow companies to spend just a little amount, in support of a specific library, and we could find enough companies willing to make such a contribution, then we could make it work.

I vividly remember many Amiga games where developed after a similar model. After presenting some demo and/or screenshot of the game in progress, the team would wait until they had confirmation that, say 500 people would buy the game. I personally think, however, that that model was to insucure for the developers.

...

As you say, the problem is deciding who does the work, and how much they get for it. Your rate might be double mine, but your work might be ten times the quality, or you might be done in a quarter of the time (or both!).

The work should be done by whoever is willing to write a contract for the work. Boost would be a mediator giving trust to those paying and support to those developing. Those developing should be willing to spend some extra time on the effort, some of their spare-time, just like anhybody else not getting paid should.

...

Once Boost.org starts accepting payment, and paying people to do work, then it has to become a proper legal entity, with stricter guidelines on which of us are members, rather than just the random assortment of developers we are at the moment.

Right. I kinda imagined that Boost would be a mediator, ensuring quality, support and trust into the process.

...

That said, if anyone wants to pay me to develop a library for Boost, or to discuss setting up such a partnership, I'm listening ;-)

That's the thing: hardly no normal company would sponsor free software for other companies, we would need to keep the donation small. -Thorsten

Eugene Talagrand

1 Apr 1 Apr

10:33 a.m.

New subject: [boost.money] was: status of Boost Unicode library/enhancements ?

...

...
At first look, I like Thorsten's idea. If we could find some way to allow companies to spend just a little amount, in support of a specific library, and we could find enough companies willing to make such a contribution, then we could make it work.

I vividly remember many Amiga games where developed after a similar model. After presenting some demo and/or screenshot of the game in progress, the team would wait until they had confirmation that, say 500 people would buy the game.

I personally think, however, that that model was to insucure for the developers.

I remember seeing some online donation systems, where if the donation target was not met everyone got their money back. So there'd be no risk on either part. I can't seem to find the reference now though.

John Maddock

31 Mar 31 Mar

6:02 p.m.

New subject: [boost.money] was: status of Boost Unicode library/enhancements ?

...

This a big problem we have to do something about somehow. There are a lot of rather big libraries that takes so much time to develop, that it is unrealistic that people can do them in their spare-time. (unicode, xml, database seems to be the most needed right now)

Right, and some of those: certainly Unicode is going to be very time intensive, and require ongoing support as new Unicode versions are produced etc.

...

I imagine that many companies would be willing to pay, say 100 USD, to support eg. a unicode library. That is sufficiently low for me to be able to persuade my boos, for example.

If we have some kind of estimate of how expensive it would be to develop the library, it might turn out that 100-200 willing companies would be enough fully fund the initial development.

As Dave A. says, it creates problems if Boost.org becomes a legal entity accepting money etc. However, I note that OSDL have just started a fellowship fund for FOSS projects, although they're very tied to Linux-related projects. See http://www.osdl.org/lab_activities/fellowship_fund/ I also note that Sourceforge has a project-donation facility that we've never turned on. I guess one solution would be for individual users to start their own SF project, turn on the donation option and then request funds.... but it requires a fair amount of trust on all sides. John.

Thorsten Ottosen

7:38 p.m.

New subject: [boost.money] was: status of Boost Unicode library/enhancements ?

John Maddock wrote:

...

...
This a big problem we have to do something about somehow. There are a lot of rather big libraries that takes so much time to develop, that it is unrealistic that people can do them in their spare-time. (unicode, xml, database seems to be the most needed right now)

Right, and some of those: certainly Unicode is going to be very time intensive, and require ongoing support as new Unicode versions are produced etc.

New version can be separate projects or they may be easy enough to handle as normal maintenance.

...

...
I imagine that many companies would be willing to pay, say 100 USD, to support eg. a unicode library. That is sufficiently low for me to be able to persuade my boos, for example.

If we have some kind of estimate of how expensive it would be to develop the library, it might turn out that 100-200 willing companies would be enough fully fund the initial development.

As Dave A. says, it creates problems if Boost.org becomes a legal entity accepting money etc.

Ok. So the money don't go to Boost, but are kept by the one doing the work, or paid back if there could not be raised enough funds. I would mind that Boost Consulting handled the money issues as a community service (and perhaps as a principal developer).

...

However, I note that OSDL have just started a fellowship fund for FOSS projects, although they're very tied to Linux-related projects. See http://www.osdl.org/lab_activities/fellowship_fund/

I also note that Sourceforge has a project-donation facility that we've never turned on. I guess one solution would be for individual users to start their own SF project, turn on the donation option and then request funds.... but it requires a fair amount of trust on all sides.

Right. For those paying, the money should be repaid if the library is not accepted into boost. For those developing, continuous discussions on the dev list should insure a high quality and thus great chances of acceptance. -Thorsten

Anthony Williams

2:49 p.m.

"Eric Niebler" <eric@boost-consulting.com> writes:

...

FWIW, I have the interest and the ability to write Boost.Unicode. What I lack is time.

I expect there's quite a few of us in that boat.

...

Anybody with a vested interest in C++ and Unicode should consider hiring <snip>

.... someone to develop the library. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Rogier van Dalen

29 Mar 29 Mar

9:57 p.m.

On 3/28/06, Chris Pirazzi <cpirazzi@gmail.com> wrote:

...

Hello,

I just scanned about 300 boost-devel messages with the word "Unicode" and am very excited about the occasional mentions I see of a Boost Unicode library.

...

...

The last mentions I found were from late 2005, where Graham Barnett mentioned a Unicode library was under development:

http://thread.gmane.org/gmane.comp.lib.boost.devel/128403 http://thread.gmane.org/gmane.comp.lib.boost.devel/129807

Graham and I started on it, but I'm afraid the project stranded due to lack of time (as always). I'm sorry. If memory serves correctly, all we had that's reasonably finished is the codecvt facets. Still - some day I'd like to have a good Boost.Unicode library. Regards, Rogier

loufoque

31 Mar 31 Mar

8:28 a.m.

Chris Pirazzi wrote :

...

And of course it is possible today to store UTF-16 data in a std::wstring

Not really. std::wstring can only be used for UCS-2 or UCS-4/UTF-32. (UCS-2 is UTF-16 without surrogate pairs, limiting the range of representable Unicode characters to 0-65535)

...

ICU is probably sufficient, but I thought it might be nice to use something that fits in with the rest of boost and STL more nicely.

Have you tried Glib::ustring from glibmm ? It is an utf-8 implementation with the same interface as std::string. It should work with STL algorithms and the like.

7056

Age (days ago)

7060

Last active (days ago)

List overview

Download

13 comments

11 participants

participants (11)

Anthony Williams
Chris Pirazzi
David Abrahams
Eric Niebler
Eugene Talagrand
Jeff Garland
John Maddock
Keith MacDonald
loufoque
Rogier van Dalen
Thorsten Ottosen