New subject: UTF8 library - second call for informal review

6 Dec 2006

      Hello Rogier,

Thanks for your comments.

1) Iterators, or rather itarator adapters. I believe the iterators should be built on top of these functions. In fact, I am already developing them in the version 2 of the library (see here for the latest snapshot: http://utfcpp.svn.sourceforge.net/viewvc/utfcpp/v2_0/source/ ). However, I see some other iterator implementations, and would rather start with this free functions until we decide the best design for the iterators.
2) IO - currently it is out of the scope of this library. If I enough people agree with you, this may change, but currently I have no plans for IO. Honestly, I dislike C++ standard IO and would love to avoid it if possible :)
3) Tables for data: As I replied to Hervé, my test cases showed a version with the table to run slower (with two different compilers). I will investigate it further, though, since I agree it is not very logical.
4) A string type. There are way too many C++ string types out there already, and I wanted to provide a tool for making them work with UTF-8 encoding, rather than introducing yet another string class. Probably the same philosophy as Boost String Algorithms http://www.boost.org/doc/html/string_algo.html

Best,

Nemanja Trifunovic

----- Original Message ----
From: Rogier van Dalen <rogiervd@gmail.com>
To: boost@lists.boost.org
Sent: Wednesday, December 6, 2006 5:11:17 AM
Subject: Re: [boost] UTF8 library - second call for informal review

Dear Nemanja,

On 12/5/06, Nemanja Trifunovic <nemanja_trifunovic@yahoo.com> wrote:
...
This is the second call for the informal review of the UTF8 library. It is based on verson 1.02 of UTF8-CPP: http://utfcpp.sourceforge.net/ and you can find it at
I like the functions you provide, and the "unchecked" namespace.
Unlike Hervé, I do think exceptions are the way to go. I seem to miss
a couple of things though.
In a recent discussion on this list there seemed to be a preference
for using iterators, which can be composed, for example to perform
UTF-8->UTF-16 conversion, or conversions to other codepages. Iterators
can be much more flexible than these free functions.
Is there any particular reason why you do not include similar
functions for UTF-16?
One of the most important uses for UTF must be IO. Shouldn't a
utf_codecvt be part of the library?
Hervé is right: reading UTF-8 can be optimised a lot using tables with
data. I've got an implementation lying around that I'd be happy to
share. It took 30% less time than the straightforward implementation
and it did all the necessary checks.

The final thing is, your functions try to maintain strings with of
valid UTF-8. Why not provide a string type that maintains this
variant?

Conclusion: in my opinion a lot of things are missing from the library
at the moment.

Regards,
Rogier
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

____________________________________________________________________________________
Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com

Re: [boost] UTF8 library - second call for informal review

Nemanja Trifunovic

Rogier van Dalen

Nemanja Trifunovic

tags

participants (2)